>From: "Jeffrey W. Baker" <[EMAIL PROTECTED]> >Sent: Sep 29, 2005 12:27 AM >To: Ron Peacetree <[EMAIL PROTECTED]> >Cc: email@example.com, firstname.lastname@example.org >Subject: Re: [HACKERS] [PERFORM] A Better External Sort? > >You are engaging in a length and verbose exercise in mental >masturbation, because you have not yet given a concrete example of a >query where this stuff would come in handy. A common, general-purpose >case would be the best. > ??? I posted =3= specific classes of common, general-purpose query operations where OES and the OES Btrees look like they should be superior to current methods: 1= when splitting sorting or other operations across multiple CPUs 2= when doing joins of different tables by doing the join on these Btrees rather than the original tables. 3= when the opportunity arises to reuse OES Btree results of previous sorts for different keys in the same table. Now we can combine the existing Btrees to obtain the new order based on the composite key without ever manipulating the original, much larger, table.
In what way are these examples not "concrete"? >We can all see that the method you describe might be a good way to sort >a very large dataset with some known properties, which would be fine if >you are trying to break the terasort benchmark. But that's not what >we're doing here. We are designing and operating relational databases. >So please explain the application. > This is a GENERAL method. It's based on CPU cache efficient Btrees that use variable length prefix keys and RIDs. It assumes NOTHING about the data or the system in order to work. I gave some concrete examples for the sake of easing explanation, NOT as an indication of assumptions or limitations of the method. I've even gone out of my way to prove that no such assumptions or limitations exist. Where in the world are you getting such impressions? >Your main example seems to focus on a large table where a key column has >constrained values. This case is interesting in proportion to the >number of possible values. If I have billions of rows, each having one >of only two values, I can think of a trivial and very fast method of >returning the table "sorted" by that key: make two sequential passes, >returning the first value on the first pass and the second value on the >second pass. This will be faster than the method you propose. > 1= No that was not my main example. It was the simplest example used to frame the later more complicated examples. Please don't get hung up on it. 2= You are incorrect. Since IO is the most expensive operation we can do, any method that makes two passes through the data at top scanning speed will take at least 2x as long as any method that only takes one such pass. >I think an important aspect you have failed to address is how much of >the heap you must visit after the sort is complete. If you are >returning every tuple in the heap then the optimal plan will be very >different from the case when you needn't. > Hmmm. Not sure which "heap" you are referring to, but the OES Btree index is provably the lowest (in terms of tree height) and smallest possible CPU cache efficient data structure that one can make and still have all of the traditional benefits associated with a Btree representation of a data set. Nonetheless, returning a RID, or all RIDs with(out) the same Key, or all RIDs (not) within a range of Keys, or simply all RIDs in sorted order is efficient. Just as should be for a Btree (actually it's a B+ tree variant to use Knuth's nomenclature). I'm sure someone posting from acm.org recognizes how each of these Btree operations maps to various SQL features... I haven't been talking about query plans because they are orthogonal to the issue under discussion? If we use a layered model for PostgreSQL's architecture, this functionality is more primal than that of a query planner. ALL query plans that currently involve sorts will benefit from a more efficient way to do, or avoid, sorts. >PS: Whatever mailer you use doesn't understand or respect threading nor >attribution. Out of respect for the list's readers, please try a mailer >that supports these 30-year-old fundamentals of electronic mail. > That is an issue of infrastructure on the recieving side, not on the sending (my) side since even my web mailer seems appropriately RFC conformant. Everything seems to be going in the correct places and being properly organized on archival.postgres.org ... Ron ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match