[U2] [UD] Indexing 'Best Practices'

David Wolverton Tue, 01 Aug 2006 19:06:19 -0700

I have a scenario to describe and would like comments as to what others see
as a 'Best Practice' for Indexing. We're on UniData, currentish Windows
version, and have indexes in place on files that are starting to grow at
larger client sites and are heavily modified.


Here is a common index issue we face on some files, and I'd like thoughts
and comments...

A file contains about 500,000 records, growing 100,000 records a year, (but
I need to think about 5,000,000 growing 1,000,000 a year as well) and we
have indexed the record 'status' so that we can quickly grab the records
that are in varying stages of completion.  This status field occupies a
'real' attribute and contains entries like  OK, PEND, MAIL, WAIT, ENTRY, and
so on. As well, every record at some point and time is finally COMPLETE.
These statuses change ofen - so a given record may travel through 5 to 15
status changes before it becomes COMPLETE. 

So we have tens, and hundreds of records (and have to think about thousands)
in the different status 'buckets' and can grab those instantly via a SELECT
- and that is great and works like a champ -- except that we are also
carrying an index for COMPLETE which is just enormous (and essentially
useless) - every record in the file except for the currently active is
stamped COMPLETE.

We would never have a reason to 'look' at all complete records via this
index, as to do so would be almost the entire file. So if we did not have
this index, I don't see that it would be a problem.

SO... The choice:  Considering Time/Resources/Better Living... Would we be
better off to create a "Virtual" field definition to index that carries the
'real' code into the index, unless it is COMPLETE, in which case we would
make the virtual field pass a NULL and then index this new virtual field
using NoNulls?  Doing so would remove the COMPLETEs from the index - Would
the computation and resources be better served by getting rid of the index
for COMPLETE?  Or would be time spent 'deriving' the Indexes be more
'expensive' in the long run considering these statuses change a lot, which
means the computation would have to be run each time with the only 'net'
saving is the write to the COMPLETE index?

>From my view, the 'cost' to remove COMPLETE via a Virtual Field would only
save a disk operation in the life of the record - that is, the actual
'write' of the COMPLETE index record. That and some space on the disk. But
it does not appear to me that the 'size' of the index created has
significant operational cost - possibly not as much the expense of doing all
those Virtual Field computations along the way .  The way B+Tree works
(according to WikiPedia!), the system would just keep adding layers, and an
insert that triggers a B+Tree 'split' would cause at worst 3 or so extra
disk writes it appears - not a complete cascade. Plus, if we ever DO need
it, COMPLETE is indexed for selection.

Or are disk writes evil incarnate and the Virtual Field removal of that
operations is good riddance?

I see arguments both ways -- Thoughts?  I'm sure others have a similar
issue, and I'm wondering what has been found to be the 'best practice' here.

David Wolverton
-------
u2-users mailing list
[email protected]
To unsubscribe please visit http://listserver.u2ug.org/

[U2] [UD] Indexing 'Best Practices'

Reply via email to