On Fri, Apr 14, 2017 at 5:57 AM, Robert Haas <robertmh...@gmail.com> wrote: > I don't think there's any one fixed answer, because increasing the > number of tapes reduces I/O by adding CPU cost, and visca versa.
Sort of, but if you have to merge hundreds of runs (a situation that should be quite rare), then you should be concerned about being CPU bound first, as Knuth was. Besides, on modern hardware, read-ahead can be more effective if you have more merge passes, to a point, which might also make it worth it -- using hundreds of tapes results in plenty of *random* I/O. Plus, most of the time you only do a second pass over a subset of initial quicksorted runs -- not all of them. Probably the main complicating factor that Knuth doesn't care about is time to return the first tuple -- startup cost. That was a big advantage for commit df700e6 that I should have mentioned. I'm not seriously suggesting that we should prefer multiple passes in the vast majority of real world cases, nor am I suggesting that we should go out of our way to help cases that need to do that. I just find all this interesting. -- Peter Geoghegan VMware vCenter Server https://www.vmware.com/ -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers