I have a scenario to describe and would like comments as to what others see as a 'Best Practice' for Indexing. We're on UniData, currentish Windows version, and have indexes in place on files that are starting to grow at larger client sites and are heavily modified.
Here is a common index issue we face on some files, and I'd like thoughts and comments... A file contains about 500,000 records, growing 100,000 records a year, (but I need to think about 5,000,000 growing 1,000,000 a year as well) and we have indexed the record 'status' so that we can quickly grab the records that are in varying stages of completion. This status field occupies a 'real' attribute and contains entries like OK, PEND, MAIL, WAIT, ENTRY, and so on. As well, every record at some point and time is finally COMPLETE. These statuses change ofen - so a given record may travel through 5 to 15 status changes before it becomes COMPLETE. So we have tens, and hundreds of records (and have to think about thousands) in the different status 'buckets' and can grab those instantly via a SELECT - and that is great and works like a champ -- except that we are also carrying an index for COMPLETE which is just enormous (and essentially useless) - every record in the file except for the currently active is stamped COMPLETE. We would never have a reason to 'look' at all complete records via this index, as to do so would be almost the entire file. So if we did not have this index, I don't see that it would be a problem. SO... The choice: Considering Time/Resources/Better Living... Would we be better off to create a "Virtual" field definition to index that carries the 'real' code into the index, unless it is COMPLETE, in which case we would make the virtual field pass a NULL and then index this new virtual field using NoNulls? Doing so would remove the COMPLETEs from the index - Would the computation and resources be better served by getting rid of the index for COMPLETE? Or would be time spent 'deriving' the Indexes be more 'expensive' in the long run considering these statuses change a lot, which means the computation would have to be run each time with the only 'net' saving is the write to the COMPLETE index? >From my view, the 'cost' to remove COMPLETE via a Virtual Field would only save a disk operation in the life of the record - that is, the actual 'write' of the COMPLETE index record. That and some space on the disk. But it does not appear to me that the 'size' of the index created has significant operational cost - possibly not as much the expense of doing all those Virtual Field computations along the way . The way B+Tree works (according to WikiPedia!), the system would just keep adding layers, and an insert that triggers a B+Tree 'split' would cause at worst 3 or so extra disk writes it appears - not a complete cascade. Plus, if we ever DO need it, COMPLETE is indexed for selection. Or are disk writes evil incarnate and the Virtual Field removal of that operations is good riddance? I see arguments both ways -- Thoughts? I'm sure others have a similar issue, and I'm wondering what has been found to be the 'best practice' here. David Wolverton ------- u2-users mailing list [email protected] To unsubscribe please visit http://listserver.u2ug.org/
