>From: Zeugswetter Andreas DAZ SD <[EMAIL PROTECTED]> >Sent: Sep 29, 2005 9:28 AM >Subject: RE: [HACKERS] [PERFORM] A Better External Sort? > >>In my original example, a sequential scan of the 1TB of 2KB >>or 4KB records, => 250M or 500M records of data, being sorted >>on a binary value key will take ~1000x more time than reading >>in the ~1GB Btree I described that used a Key+RID (plus node >>pointers) representation of the data. > >Imho you seem to ignore the final step your algorithm needs of >collecting the data rows. After you sorted the keys the collect >step will effectively access the tuples in random order (given a >sufficiently large key range). > "Collecting" the data rows can be done for each RAM buffer full of of data in one pass through RAM after we've built the Btree. Then if desired those data rows can be read out to HD in sorted order in essentially one streaming burst. This combination of index build + RAM buffer rearrangement + write results to HD can be repeat as often as needed until we end up with an overall Btree index and a set of sorted sublists on HD. Overall HD IO for the process is only two effectively sequential passes through the data.
Subsequent retrieval of the sorted information from HD can be done at full HD streaming speed and whatever we've decided to save to HD can be reused later if we desire. Hope this helps, Ron ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly