Markus Schaber wrote:
Ron wrote:
...and of course if you know enough about the data to be sorted so as to
constrain it appropriately, one should use a non comparison based O(N)
sorting algorithm rather than any of the general comparison based
O(NlgN) methods.

Sounds interesting, could you give us some pointers (names, URLs,
papers) to such algorithms?

Most of these techniques boil down to good ol' "bucket sort".  A simple example: suppose 
you have a column of integer percentages, range zero to 100.  You know there are only 101 distinct 
values.  So create 101 "buckets" (e.g. linked lists), make a single pass through your 
data and drop each row's ID into the right bucket, then make a second pass through the buckets, and 
write the row ID's out in bucket order.  This is an O(N) sort technique.

Any time you have a restricted data range, you can do this.  Say you have 100 
million rows of scientific results known to be good to only three digits -- it 
can have at most 1,000 distinct values (regardless of the magnitude of the 
values), so you can do this with 1,000 buckets and just two passes through the 
data.

You can also use this trick when the optimizer is asked for "fastest first result." Say you have a cursor on a column of numbers with good distribution. If you do a bucket sort on the first two or three digits only, you know the first "page" of results will be in the first bucket. So you only need to apply qsort to that first bucket (which is very fast, since it's small), and you can deliver the first page of data to the application. This can be particularly effective in interactive situations, where the user typically looks at a few pages of data and then abandons the search.
I doubt this is very relevant to Postgres.  A relational database has to be general 
purpose, and it's hard to give it "hints" that would tell it when to use this 
particular optimization.

Craig

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Reply via email to