Re: [HACKERS] Progress on fast path sorting, btree index creation time

Robert Haas Thu, 05 Jan 2012 12:23:44 -0800

On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan <pe...@2ndquadrant.com> wrote:
> 3. Resolve two anticipated controversies that are, respectively,
> somewhat orthogonal and completely orthogonal to the binary bloat
> controversy. The first (controversy A) is that I have added a new
> piece of infrastructure, pg_always_inline, which, as the name
> suggests, is a portable way of insisting that a function should be
> invariably inlined. Portable libraries like libc have been doing this
> for a long time now, and I actually use the libc macro (that expands
> to __attribute__((always_inline)) ) where possible.


I don't have a problem with the idea of a pg_always_inline, but I'm
wondering what sort of fallback mechanism you propose.  It seems to me
that if the performance improvement is conditioned on inlining be done
and we're not confident that we can force the compiler to inline, we
ought to fall back to not making the specialization available, rather
than to making something available that may not work well.  Of course
if it's only a question of a smaller performance gain, rather than an
actual loss, then that's not as much of an issue...

> The second
> possible point of contention (controversy B) is that I have jettisoned
> various protections against bad qsort implementations that I believe
> are a legacy of when we used the system qsort pre-2006, that can no
> longer be justified. For example, quick sort performing badly in the
> face of lots of duplicates is a well understood problem today
> (partitioning should stop on equal keys), and ISTM that protecting
> against that outside the qsort implementation (but only for index
> sorts) is wrong-headed.

Assuming you mean that qsort_arg() will protect against this stuff so
that callers don't need to do it separately, +1.

> I had another idea when writing this patch that I haven't developed at
> all but I'll share anyway. That is, it might be a good idea to use a
> balance quicksort:
>
> http://xlinux.nist.gov/dads//HTML/balancedqsrt.html

I can't find any description of how this actually works... obviously,
we'd like to find a close-to-median element, but how do we actually do
that cheaply?

> It might be possible to get a reasonable approximation of the actual
> median value of a given column with existing statistics, which could
> be hinted to qsort_arg.

I doubt that this would be worthwhile -- the pivot that we pick at the
toplevel doesn't really matter much; even if it's bad, we can recover
just fine if the pivot we pick one level down is good, or the level
after that.  We only really have a problem if we keep systematically
picking bad pivots every time.  And if we do have that problem,
improving the toplevel choice of pivot is only going to help modestly,
because we'll still be O(n^2) on each partition.

> Can I get a straw poll on how much of a problem worst-case performance
> of qsort is believed to be?

I'd be reluctant to remove any of the protections we currently have,
because I bet they were all put in to fix problems that people hit in
the field.  But I don't know that we need more.  Of course,
consolidating them into qsort() itself rather than its callers
probably makes sense, as noted above.

> In a perfect world, if it were somehow possible to know the perfect
> pivots ahead of time from a histogram or something, we'd have a quick
> sort variant with worst-case performance of O(n log(n)). That, or the
> limited approximation that I've sketched would perhaps be worthwhile,
> even if it was subject to a number of caveats. Wikipedia claims that
> the worst case for quick sort is O(n log(n)) with the refinements
> recommended by Sedgewick's 1978 paper, but this seems like nonsense to
> me - the med3 technique is a good heuristic in practice, but it's
> perfectly possible in theory for it's sampling to always get things
> completely wrong (this is not an unfamiliar problem).

I agree.  After reading
http://nanxkurniawan.wordpress.com/2010/05/31/quicksort/ I am inclined
to think that things are still O(n lg n) even if we always partition
so that any fixed percentage of the data -- is on one side of the
partition element and the remainder is on the other side.  So for
example if all of our partition elements are greater than 1% of the
elements in their partition and less than the other 99%, we'll still
be O(n lg n), though with a considerably higher constant factor, I
think.  However, if our partition elements are always a fixed NUMBER
of elements from the end - whether it's 1 or 100 - then we'll be
O(n^2), though again the constant factor will depend on how many
elements you knock off each time.  In practice I'm not sure whether an
algorithmic analysis matters much, because in real life the constant
factors are probably pretty important, hence the median-of-medians
approach with n > 40.

> While I'm thinking out loud, here's another idea: Have a GUC which,
> when enabled (perhaps potentially at various different granularities),
> makes Timsort the in-memory sorting algorithm, as it now is by default
> for Python, Java SE7 (for arrays) and the Android platform; for
> certain applications, this could be a real win, and I don't think that
> it would have to be invasive: it could be dropped in.

I gather from a quick read that this is supposed to be especially good
when the data is already somewhat sorted.  Would there be any merit in
trying to guess when that's likely to be the case (e.g. based on the
correlation statistic)?   That seems like a stretch, but OTOH a GUC
doesn't feel great either: what is the poor user to do with a query
that does 2 sorts, one of which is faster with QS and the other of
which is faster with Timsort?  Ugh.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Progress on fast path sorting, btree index creation time

Reply via email to