On 13 April 2012 16:03, Greg Stark <st...@mit.edu> wrote: > Fwiw it also only holds for comparison sorts. If you can sort your > items without actually comparing each item with the others then you > aren't bound by it at all. Notably algorithms like radix sort and > direct sort aren't bound by it and are O(n). I'm hoping to look at > trying some of these for integer sorts when they apply using the new > sort specialization code Peter added.
Actually, though a later revision of the patch did nominally allow for user-defined sorting functions through the SortSupport infrastructure (I didn't intend that it would be complete/useful enough to be really worth documenting), the version that was finally committed did not. However, there is a fairly obvious entry point for a radix sort, which is here: /* * We were able to accumulate all the tuples within the allowed * amount of memory. Just qsort 'em and we're done. */ if (state->memtupcount > 1) { /* Can we use the single-key sort function? */ if (state->onlyKey != NULL) qsort_ssup(state->memtuples, state->memtupcount, state->onlyKey); else qsort_tuple(state->memtuples, state->memtupcount, state->comparetup, state); } The only specialisation that occurs here is between tuple sorts of a single key and multiple (type-specific specialisations were ultimately not judged to be worth the increase in binary size relative to their diminishing benefits). You'd probably set-up a type-specific positional notation machinery within the state's SortSupport struct (the type-specific abstraction through which we elide the SQL function call machinery for types that support it). One insight that I had at the time was that text comparisons where the c locale isn't used are really rather expensive, and I doubt that there is too much that can be done to address that directly. However, if we were to support timsort, that could substantially cut down on the number of comparisons for text columns, which could be a real win. Maybe that would happen through some explicit mechanism, or maybe the planner could actually infer that it was likely to be optimal to use timsort based on a high statistical correlation between physical row ordering and logical ordering of a key column's values. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers