Will

> I don't understand what's wrong with indexing, can you clarify this point,
and I'll wipe out a fix in three days :)

Well for a start I didn't say there's anything wrong, I said it could be
improved - not the same thing!

But as to specifics, take the following scenario (UniVerse specific):

- Grab a transaction file for say, 10 million records. 
- Assume a reasonable key length say 10 chars.
- Add a field with two states (open/closed, male/female, that kind of
thing).
- Index it, and watch what happens to the performance on that file.
- Better still, don't use an existing file! Create a new file and a program
to copy or build the content in basic and show a counter every 1000 records.
At the start it will be quick. After about 500,000 you can grab a beer in
between the counters.

The problem is, that a UniVerse index is very good at establishing the index
key: it has a nice B+tree format with a decent level of fan-out. 

But when it comes to the list of primary keys being indexed against each
index key, that's really just treated as a block of data. 

If you have a good ratio with a lot of index keys (date*type*something_else)
each of which gives a relatively short list of primary keys you can get very
good indexing performance. But it isn't very clever when you have a small
number of index keys to a large list of primary keys.

So every time you changed the flag value in the file above it would have to
load up the two lists (one for old value, one for new), locate and delete
from the old and locate/fail/append to the new, each list averaging 11 byte
* 5 million entries. And then write it back to a succession of oversize
blocks in the index file. 

Now you might say -  well, you wouldn't index a transaction file like that.
And you would be right - because of the design of the index. But it's a
perfectly legitimate and reasonable thing to want to do.

How to better manage a large index list is, of course, the question. Since
it is a large list into which elements are potentially inserted/deleted in
order, the list itself could be made into a set of B+Tree pages over a
certain threshold, reducing the cost of location/insertion and
location/deletion. Other databases use techniques such as key deltas and
compression to alleviate this. And I'm sure there are better options if I
could be bothered to research them.

So there you go, Will. Your job for the weekend. Redesign the UniVerse
indexing so it works for large lists, and get Rocket to adopt it.

:)

Brian 

-----Original Message-----
From: u2-users-boun...@listserver.u2ug.org
[mailto:u2-users-boun...@listserver.u2ug.org] On Behalf Of Wjhonson
Sent: 04 October 2012 16:43
To: u2-users@listserver.u2ug.org
Subject: Re: [U2] Consuming Web Services







_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Reply via email to