Re: [U2] General guidelines on indexing

Edward Brown Wed, 08 Jul 2009 10:14:05 -0700

Martin

Your test program on unidata 7.1 takes 1137 milliseconds - 1.1 seconds.


I changed it to use system(12), this is a better resolution clock on
unidata than TIME().

Interesting commentary on chunking. I believe (and I might be talking
out of my ar*e here) that chunking done with (system memory) page-sized
blocks could be made to appear contiguous to software sitting above the
operating system by taking advantage of the hardware vm / memory
controller. I would not be surprised if unidata benefitted from
something like this.

Ed





-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Martin
Phillips
Sent: 08 July 2009 17:59
To: U2 Users List
Subject: Re: [U2] General guidelines on indexing

Hi all,

>> I don't agree. Disk access is inherently slower than RAM access.

I think that this discussion started for Unidata and then got UniVerse 
involved too but it might have been the other way around. Sadly, there
is no 
internals training material for Unidata so we have to guess what goes
on.

Different multivalue products approach string management in varying
ways. In 
UniVerse, strings are stored as contiguous memory. If I write a
statement 
such as
   X<-1> = 'ABC'
this run machine has to work out how big the new string will be,
allocate 
memory, copy the old value of X to the new area appending ABC to it, and

then release the original memory used by X.

As you append successive fields, the string to be moved gets longer and 
longer. We tend to think of computers as being blindingly fast but
copying a 
big string is still a slow process. If I have a string that starts empty
and 
I add a million fields, each of 3 bytes plus the delimiter, I will end
up 
copying a total of 1,999,998,000,000 bytes - hardly an insignificant
task.

>From my own experiments some time ago, I believe that Unidata also uses

contiguous strings but I have no direct proof of this. The alternative 
(adopted by our QM product, by PI/open, Information and perhaps others)
is 
to use "chunked strings" where a string is stored as a series of chunks.
In 
this model, appending a field requires only addition of a new chunk or,
for 
better performance, replacement of the final chunk.

Of course, the performance gain of chunked strings in this example may
be 
offset by their decreased performance for things like substring
extraction 
which is now more complex than a simple indexing operation.

By way of a simple expample, I just tried the following program...
   s = ''
   z = str('*', 1000)
   t1 = time()
   for i = 1 to 100000
      s<-1> = z
   next i
   t2 = time()
   crt t2 - t1

This took six seconds on QM but 32 minutes on UniVerse. I do not have a 
Unidata system available at the moment to try. To be fair, I am sure
that I 
could construct an example that reversed the performance difference.

Writing to a sequential file is somewhat similar to the chunked string
model 
as it buffers data until it has a good sized chunk and then writes it
out, 
continuing with an empty buffer.


Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB
+44-(0)1604-709200 

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

-------------------------------------------------------------------------------------------
Please remember to recycle wherever possible. 
Reduce, reuse, recycle, think do you need to print this e-mail?
-------------------------------------------------------------------------------------------
This e-mail and any attachment(s), is confidential and may be legally 
privileged. It is intended solely for the addressee. If you are not the 
addressee, dissemination, copying or use of this e-mail or any of its content 
is prohibited and may be unlawful. If you are not the intended recipient please 
inform the sender immediately and destroy the e-mail, any attachment(s) and any 
copies. All liability for viruses is excluded to the fullest extent permitted 
by law. It is your responsibility to scan or otherwise check this email and any 
attachment(s). Unless otherwise stated (i) views expressed in this message are 
those of the individual sender (ii) no contract may be construed by this 
e-mail. Emails may be monitored and you are taken to consent to this 
monitoring.  

Civica Services Limited, Company No. 02374268; Civica UK Limited, Company No. 
01628868
Both companies are registered in England and Wales and each has its registered 
office at 2 Burston Road, Putney, London, SW15 6AR.
-------------------------------------------------------------------------------------------

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] General guidelines on indexing

Reply via email to