Re: [U2] General guidelines on indexing

jpb-u2ug Wed, 08 Jul 2009 09:46:42 -0700

It's been a long time since I've been in hardware so I may be all off on
this, but, this could be possible because after it sends the WRITESEQ it's
basically up to the disk subsystem to actually do the writing. The written
sequential record get's put in the queue and the subsystem sends back a
signal that it's been done and to continue. However that doesn't necessarily
mean it has completed the write. By the time the process is done the
sequential file may not have completed writing all of the records in the
queue but the process will think it has. I know I have noticed this happen
when writing to a networked file system and it makes the file unusable until
the writing has completed.

Jerry Banker

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Marco Manyevere
Sent: Wednesday, July 08, 2009 10:44 AM
To: U2 Users List
Subject: Re: [U2] General guidelines on indexing

In one test I did a couple of months back, I found that appending IDs to the
end of a dynamic array perfomed _much_ _much_ slower than a WRITESEQ to the
end of a disk file and the dynamic array wasnt even a 100 000 records long.
We were able to reduce the time required to produce a report from over 30
minutes to less than 2 minutes by removing the dynamic array operations and
replacing them with WRITESEQs. We tried all the different syntaxes of
appending to the array with no noticeable difference in poor performance
once the array got large.

----- Original Message ----
From: "Baakkonen, Rodney A (Rod) 46K" <[email protected]>
To: U2 Users List <[email protected]>
Sent: Wednesday, 8 July, 2009 17:15:57
Subject: Re: [U2] General guidelines on indexing

In theory, I would have to agree with you.  Who knows how all this stuff
really works under the hood. You have Unidata Shared memory management
and shared basic code server. We also have a huge SAN with a lots of
cache. Measuring performance and impacting it is different these days.
There are so many layers  and events, finding the bottleneck can be
tricky.

All I can tell you is that we did extensive testing and we would not be
doing what we are unless it worked. Maybe on you site, on your server
and your database, these techniques don't enhance performance. I'm
telling you what we found, your mileage may vary.

We have been told to use concatenate with a @AM to build our arrays, as
Unidata has a pointer to the end of a variable. We have been told to use
REMOVE when pulling data out of the array. We know that. We still had
performance gains that went from days to hours when we changed to a well
sized work file.

There are many reasons why we use the work file. Typically this is used
in the selection part of the process. The sorting and reporting is
after. Sometimes the work file contains more than keys ( vs. having
multiple dynamic arrays). Sometimes multiple reports are generated from
the same work file.  

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Edward Brown
Sent: Wednesday, July 08, 2009 6:47 AM
To: U2 Users List
Subject: Re: [U2] General guidelines on indexing

I don't agree. Disk access is inherently slower than RAM access.
Therefore a process that makes efficient use of RAM will be faster than
an equivalent algorithm making efficient use of disk.

In your case, it's just a matter of scale:

50 million records at (lets say) 14 bytes per ID plus the multivalue
marker needed to build up the dynamic array.

15 * 50,000,000 = 750,000,000 bytes.

That's 732,422KB,  715MB

If your process is running on a modern server then this kind of op
becomes practical.

Assumptions:
- that the dynamic array isn't using Unicode. If it is then memory
reqirements double.
- That you select every record - normally (presumably) it would be just
a fraction?

In fact isn't all of this theoretical? Using the index select / readfwd
/ own tests method, there's no need to build workfiles or dynamic arrays
at all - simply do the tests as each record is retrieved with readfwd
and then create the report / do the processing all within the same loop?

Ed

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Baakkonen,
Rodney A (Rod) 46K
Sent: 08 July 2009 12:30
To: U2 Users List
Subject: Re: [U2] General guidelines on indexing

When you have a  file with 50 million records, it does not matter how
you build the or parse the dynamic array. A well sized work file will
run circles around the dynamic array. 

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Edward Brown
Sent: Wednesday, July 08, 2009 6:12 AM
To: U2 Users List
Subject: Re: [U2] General guidelines on indexing

> After indexing, we made a lot more use of the SETINDEX and READFWD
logic
in our programs. 

I find this curious / disappointing - is it really the case that unidata
can't take the mix of indexed / unindexed dictionary items and do just
as efficient a job as the code you're writing?

Also, the performance of dynamic arrays need not be as much an issue as
you've found. If they're built up with -1 rather than a counter then the
speed penalty of adding items to a very large list is much the same as a
tiny one. Then, when you loop through them for further processing use
the REMOVE (or FORMLIST) command rather than a counter.

The only real issue with dynamic arrays is if the machine does not have
the physical memory available to hold the variable.

Ed

------------------------------------------------------------------------
-------------------
Please remember to recycle wherever possible. Reduce, reuse, recycle,
think do you need to print this e-mail?
------------------------------------------------------------------------
-------------------
This e-mail and any attachment(s), is confidential and may be legally
privileged. It is intended solely for the addressee. If you are not the
addressee, dissemination, copying or use of this e-mail or any of its
content is prohibited and may be unlawful. If you are not the intended
recipient please inform the sender immediately and destroy the e-mail,
any attachment(s) and any copies. All liability for viruses is excluded
to the fullest extent permitted by law. It is your responsibility to
scan or otherwise check this email and any attachment(s). Unless
otherwise stated (i) views expressed in this message are those of the
individual sender (ii) no contract may be construed by this e-mail.
Emails may be monitored and you are taken to consent to this monitoring.

Civica Services Limited, Company No. 02374268; Civica UK Limited,
Company No. 01628868 Both companies are registered in England and Wales
and each has its registered office at 2 Burston Road, Putney, London,
SW15 6AR.
------------------------------------------------------------------------
-------------------

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users
_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

_______________________________________________
U2-Users mailing list
[email protected]
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] General guidelines on indexing

Reply via email to