On Wed, Aug 10, 2016 at 11:59 AM, Robert Haas <robertmh...@gmail.com> wrote: > I think that last part is a very important property; my intuition is > that dividing up the work between cooperating processes in a way that > should come out equal will often fail to do so, either due to the > operating system scheduler or due to some data being cached and other > data not being cached or due to the comparator running faster on some > data than other data or due to NUMA effects that make some processes > run faster than others or due to any number of other causes. So I > think that algorithms that allocate the work dynamically are going to > greatly outperform those that use a division of labor which is fixed > at the beginning of a computation phase.
I agree that dynamic sampling has big advantages. Our Quicksort implementation does dynamic sampling, of course. You need to be strict about partition boundaries: they may not be drawn at a point in the key space that is not precisely defined, and in general there can be no ambiguity about what bucket a tuple can end up in ahead of time. In other words, you cannot carelessly allow equal tuples to go on either side of an equal boundary key. The reason for this restriction is that otherwise, stuff breaks when you later attempt to "align" boundaries across sort operations that are performed in parallel. I don't think you can introduce an artificial B-Tree style tie-breaker condition to avoid the problem, because that will slow things right down (B&M Quicksort does really well with many equal keys). When you have one really common value, load balancing for partitioning just isn't going to do very well. My point is that there will be a somewhat unpleasant worst case that will need to be accepted. It's not practical to go to the trouble of preventing it entirely. So, the comparison with quicksort works on a couple of levels. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers