On Sun, Sep 11, 2016 at 9:01 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Peter Geoghegan <p...@heroku.com> writes: >> I think that we *can* refine this guess, and should, because random >> I/O is really quite unlikely to be a large cost these days (I/O in >> general often isn't a large cost, actually). More fundamentally, I >> think it's a problem that cost_sort() thinks that external sorts are >> far more expensive than internal sorts in general. There is good >> reason to think that that does not reflect the reality. I think we can >> expect external sorts to be *faster* than internal sorts with >> increasing regularity in Postgres 10. > > TBH, if that's true, haven't you broken something?
It's possible for external sorts to be faster some of the time because the memory access patterns can be more cache efficient: smaller runs are better when accessing tuples in sorted order, scattered across memory. More importantly, the sort can start returning tuples earlier in the common case where a final on-the-fly merge can be performed. In principle, you could adopt internal sorts to have the same advantages, but that hasn't and probably won't happen. Finally, the external sort I/O costs grow linearly, whereas the CPU costs grow in a linearithmic fashion, which will eventually come to dominate. We can hide the latency of those costs pretty well, too, with asynchronous I/O. I'm not arguing that cost_sort() should think that external sorts are cheaper under any circumstances, since all of this is very hard to model. I only mention this because it illustrates nicely that cost_sort() has the wrong idea. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers