On Sun, Sep 11, 2016 at 9:01 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Peter Geoghegan <p...@heroku.com> writes:
>> I think that we *can* refine this guess, and should, because random
>> I/O is really quite unlikely to be a large cost these days (I/O in
>> general often isn't a large cost, actually). More fundamentally, I
>> think it's a problem that cost_sort() thinks that external sorts are
>> far more expensive than internal sorts in general. There is good
>> reason to think that that does not reflect the reality. I think we can
>> expect external sorts to be *faster* than internal sorts with
>> increasing regularity in Postgres 10.
> TBH, if that's true, haven't you broken something?

It's possible for external sorts to be faster some of the time because
the memory access patterns can be more cache efficient: smaller runs
are better when accessing tuples in sorted order, scattered across
memory. More importantly, the sort can start returning tuples earlier
in the common case where a final on-the-fly merge can be performed. In
principle, you could adopt internal sorts to have the same advantages,
but that hasn't and probably won't happen. Finally, the external sort
I/O costs grow linearly, whereas the CPU costs grow in a linearithmic
fashion, which will eventually come to dominate. We can hide the
latency of those costs pretty well, too, with asynchronous I/O.

I'm not arguing that cost_sort() should think that external sorts are
cheaper under any circumstances, since all of this is very hard to
model. I only mention this because it illustrates nicely that
cost_sort() has the wrong idea.

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to