Re: [HACKERS] Memory usage during sorting

Peter Geoghegan Fri, 13 Apr 2012 09:43:29 -0700

On 13 April 2012 16:03, Greg Stark <st...@mit.edu> wrote:
> Fwiw it also only holds for comparison sorts. If you can sort your
> items without actually comparing each item with the others then you
> aren't bound by it at all. Notably algorithms like radix sort and
> direct sort aren't bound by it and are O(n). I'm hoping to look at
> trying some of these for integer sorts when they apply using the new
> sort specialization code Peter added.


Actually, though a later revision of the patch did nominally allow for
user-defined sorting functions through the SortSupport infrastructure
(I didn't intend that it would be complete/useful enough to be really
worth documenting), the version that was finally committed did not.
However, there is a fairly obvious entry point for a radix sort, which
is here:

                        /*
                         * We were able to accumulate all the tuples within the 
allowed
                         * amount of memory.  Just qsort 'em and we're done.
                         */
                        if (state->memtupcount > 1)
                        {
                                /* Can we use the single-key sort function? */
                                if (state->onlyKey != NULL)
                                        qsort_ssup(state->memtuples, 
state->memtupcount,
                                                           state->onlyKey);
                                else
                                        qsort_tuple(state->memtuples,
                                                                
state->memtupcount,
                                                                
state->comparetup,
                                                                state);
                        }

The only specialisation that occurs here is between tuple sorts of a
single key and multiple (type-specific specialisations were ultimately
not judged to be worth the increase in binary size relative to their
diminishing benefits).  You'd probably set-up a type-specific
positional notation machinery within the state's SortSupport struct
(the type-specific abstraction through which we elide the SQL function
call machinery for types that support it).

One insight that I had at the time was that text comparisons where the
c locale isn't used are really rather expensive, and I doubt that
there is too much that can be done to address that directly.  However,
if we were to support timsort, that could substantially cut down on
the number of comparisons for text columns, which could be a real win.
Maybe that would happen through some explicit mechanism, or maybe the
planner could actually infer that it was likely to be optimal to use
timsort based on a high statistical correlation between physical row
ordering and logical ordering of a key column's values.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Memory usage during sorting

Reply via email to