On Thu, Dec 29, 2011 at 9:03 PM, Peter Geoghegan <pe...@2ndquadrant.com> wrote: > 3. Resolve two anticipated controversies that are, respectively, > somewhat orthogonal and completely orthogonal to the binary bloat > controversy. The first (controversy A) is that I have added a new > piece of infrastructure, pg_always_inline, which, as the name > suggests, is a portable way of insisting that a function should be > invariably inlined. Portable libraries like libc have been doing this > for a long time now, and I actually use the libc macro (that expands > to __attribute__((always_inline)) ) where possible.
I don't have a problem with the idea of a pg_always_inline, but I'm wondering what sort of fallback mechanism you propose. It seems to me that if the performance improvement is conditioned on inlining be done and we're not confident that we can force the compiler to inline, we ought to fall back to not making the specialization available, rather than to making something available that may not work well. Of course if it's only a question of a smaller performance gain, rather than an actual loss, then that's not as much of an issue... > The second > possible point of contention (controversy B) is that I have jettisoned > various protections against bad qsort implementations that I believe > are a legacy of when we used the system qsort pre-2006, that can no > longer be justified. For example, quick sort performing badly in the > face of lots of duplicates is a well understood problem today > (partitioning should stop on equal keys), and ISTM that protecting > against that outside the qsort implementation (but only for index > sorts) is wrong-headed. Assuming you mean that qsort_arg() will protect against this stuff so that callers don't need to do it separately, +1. > I had another idea when writing this patch that I haven't developed at > all but I'll share anyway. That is, it might be a good idea to use a > balance quicksort: > > http://xlinux.nist.gov/dads//HTML/balancedqsrt.html I can't find any description of how this actually works... obviously, we'd like to find a close-to-median element, but how do we actually do that cheaply? > It might be possible to get a reasonable approximation of the actual > median value of a given column with existing statistics, which could > be hinted to qsort_arg. I doubt that this would be worthwhile -- the pivot that we pick at the toplevel doesn't really matter much; even if it's bad, we can recover just fine if the pivot we pick one level down is good, or the level after that. We only really have a problem if we keep systematically picking bad pivots every time. And if we do have that problem, improving the toplevel choice of pivot is only going to help modestly, because we'll still be O(n^2) on each partition. > Can I get a straw poll on how much of a problem worst-case performance > of qsort is believed to be? I'd be reluctant to remove any of the protections we currently have, because I bet they were all put in to fix problems that people hit in the field. But I don't know that we need more. Of course, consolidating them into qsort() itself rather than its callers probably makes sense, as noted above. > In a perfect world, if it were somehow possible to know the perfect > pivots ahead of time from a histogram or something, we'd have a quick > sort variant with worst-case performance of O(n log(n)). That, or the > limited approximation that I've sketched would perhaps be worthwhile, > even if it was subject to a number of caveats. Wikipedia claims that > the worst case for quick sort is O(n log(n)) with the refinements > recommended by Sedgewick's 1978 paper, but this seems like nonsense to > me - the med3 technique is a good heuristic in practice, but it's > perfectly possible in theory for it's sampling to always get things > completely wrong (this is not an unfamiliar problem). I agree. After reading http://nanxkurniawan.wordpress.com/2010/05/31/quicksort/ I am inclined to think that things are still O(n lg n) even if we always partition so that any fixed percentage of the data -- is on one side of the partition element and the remainder is on the other side. So for example if all of our partition elements are greater than 1% of the elements in their partition and less than the other 99%, we'll still be O(n lg n), though with a considerably higher constant factor, I think. However, if our partition elements are always a fixed NUMBER of elements from the end - whether it's 1 or 100 - then we'll be O(n^2), though again the constant factor will depend on how many elements you knock off each time. In practice I'm not sure whether an algorithmic analysis matters much, because in real life the constant factors are probably pretty important, hence the median-of-medians approach with n > 40. > While I'm thinking out loud, here's another idea: Have a GUC which, > when enabled (perhaps potentially at various different granularities), > makes Timsort the in-memory sorting algorithm, as it now is by default > for Python, Java SE7 (for arrays) and the Android platform; for > certain applications, this could be a real win, and I don't think that > it would have to be invasive: it could be dropped in. I gather from a quick read that this is supposed to be especially good when the data is already somewhat sorted. Would there be any merit in trying to guess when that's likely to be the case (e.g. based on the correlation statistic)? That seems like a stretch, but OTOH a GUC doesn't feel great either: what is the poor user to do with a query that does 2 sorts, one of which is faster with QS and the other of which is faster with Timsort? Ugh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers