On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfre...@gmail.com> wrote: > On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmh...@gmail.com> wrote: >>> This shouldn't be too complex, and should give us a fixed nlogn complexity >>> even for wild data sets, without affecting existing normal data sets that >>> are present in every day transactions. I even believe that those data sets >>> will also benefit from the above optimisation. >> >> The only method of selecting a pivot for quicksort that obtain O(n lg >> n) run time with 100% certainty is have a magical oracle inside the >> computer that tells you in fixed time and with perfect accuracy which >> pivot you should select. > > Doesn't a linear median algorithm (say median of medians) get you O(n lg n)? > > Granted, with a huge constant (I think 4)... but it should still be O(n lg n).

No. Thinking about this a little more, I believe the way it works out is that any algorithm for picking the median that guarantees that a certain *percentage* of the tuples will be in the smaller partition will have O(n lg n) complexity, but any algorithm that only guarantees that a fixed *number* of tuples in the smaller partition is still quadratic in complexity. In the case of a median algorithm, you're only guaranteed to have 2 elements in the smaller partition, which is a constant. If you take a median of medians, you're guaranteed to have 8 elements in the smaller partition, which is bigger, but still a constant. The reason why this doesn't matter much in practice is because the data distribution that causes quadratic behavior for median-of-medians is not one which is likely to occur in the real world and will probably only come up if chosen by an adversary who is attempting to make your life miserable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers