On Tue, Nov 24, 2015 at 5:42 PM, Greg Stark <st...@mit.edu> wrote: > Actually I kind of agree. What I would like to see is a series of > numbers for increasing sizes of sorts plotted against the same series > for the existing algorithm. Specifically with the sort size varying to > significantly more than the physical memory on the machine. For > example on a 16GB machine sorting data ranging from 1GB to 128GB.
There already was a test case involving a 1TB/16 billion tuple sort  (well, a 1TB gensort Postgres table ). Granted, I don't have a large number of similar test cases across a variety of scales, but there are only so many hours in the day. Disappointingly, the results at that scale were merely good, not great, but there was probably various flaws in how representative the hardware used was. > There's a lot more information in a series of numbers than individual > numbers. We'll be able to see whether all our pontificating about the > rates of growth of costs of different algorithms or which costs > dominate at which scales are actually borne out in reality. You yourself said that 1GB is sufficient to get a single-pass merge phase for a sort of about 4TB - 8TB, so I think the discussion of the growth in costs tells us plenty about what can happen at the high end. My approach might help less overall, but it certainly won't falter. See the 1TB test case -- output from trace_sort is all there. > And see > where the break points are where I/O overtakes memory costs. And it'll > be clearer where to look for problematic cases where the new algorithm > might not dominate the old one. I/O doesn't really overtake memory cost -- if it does, then it should be worthwhile to throw more sequential I/O bandwidth at the problem, which is a realistic, economical solution with a mature implementation (unlike buying more memory bandwidth). I didn't do that with the 1TB test case. If you assume, as cost_sort() does, that it takes N log2(N) comparisons to sort some tuples, then it breaks down like this: 10 items require 33 comparisons, ratio 3.32192809489 100 items require 664 comparisons, ratio 6.64385618977 1,000 items require 9,965 comparisons, ratio 9.96578428466 1,000,000 items require 19,931,568 comparisons, ratio 19.9315685693 1,000,000,000 items require 29,897,352,853 comparisons, ratio 29.897352854 16,000,000,000 items require 542,357,645,663 comparisons, ratio 33.897352854 The cost of writing out and reading runs should be more or less in linear proportion to their size, which is a totally different story. That's the main reason why "quicksort with spillover" is aimed at relatively small sorts, which we expect more of overall. I think the big issue is that a non-parallel sort is significantly under-powered when you go to sort 16 billion tuples. It's probably not very sensible to do so if you have a choice of parallelizing the sort. There is no plausible way to do replacement selection in parallel, since you cannot know ahead of time with any accuracy where to partition workers, as runs can end up arbitrarily larger than memory with presorted inputs. That might be the single best argument for what I propose to do here. This is what Corey's case showed for the final run with 30GB maintenance_work_mem: LOG: starting quicksort of run 40: CPU 1815.99s/19339.80u sec elapsed 24910.38 sec LOG: finished quicksorting run 40: CPU 1820.09s/19565.94u sec elapsed 25140.69 sec LOG: finished writing run 40 to tape 39: CPU 1833.76s/19642.11u sec elapsed 25234.44 sec (Note that the time taken to copy tuples comprising the final run is not displayed or accounted for) This is the second last run, run 40, so it uses the full 30GB of maintenance_work_mem. We spend 00:01:33.75 writing the run. However, we spent 00:03:50.31 just sorting the run. That's roughly the same ratio that I see on my laptop with far smaller runs. I think the difference isn't wider because the server is quite I/O bound -- but we could fix that by adding more disks.  http://www.postgresql.org/message-id/CAM3SWZQtdd=q+ef1xszayg1cioyqj7szfcl08gyqchpjtgn...@mail.gmail.com  https://github.com/petergeoghegan/gensort -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers