On Sun, Sep 11, 2016 at 12:13 PM, Peter Geoghegan <p...@heroku.com> wrote: > On Sun, Sep 11, 2016 at 9:01 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Peter Geoghegan <p...@heroku.com> writes: >>> I think that we *can* refine this guess, and should, because random >>> I/O is really quite unlikely to be a large cost these days (I/O in >>> general often isn't a large cost, actually). More fundamentally, I >>> think it's a problem that cost_sort() thinks that external sorts are >>> far more expensive than internal sorts in general. There is good >>> reason to think that that does not reflect the reality. I think we can >>> expect external sorts to be *faster* than internal sorts with >>> increasing regularity in Postgres 10. >> >> TBH, if that's true, haven't you broken something? > > It's possible for external sorts to be faster some of the time because > the memory access patterns can be more cache efficient: smaller runs > are better when accessing tuples in sorted order, scattered across > memory. More importantly, the sort can start returning tuples earlier > in the common case where a final on-the-fly merge can be performed. In > principle, you could adopt internal sorts to have the same advantages, > but that hasn't and probably won't happen. Finally, the external sort > I/O costs grow linearly, whereas the CPU costs grow in a linearithmic > fashion, which will eventually come to dominate. We can hide the > latency of those costs pretty well, too, with asynchronous I/O. > > I'm not arguing that cost_sort() should think that external sorts are > cheaper under any circumstances, since all of this is very hard to > model. I only mention this because it illustrates nicely that > cost_sort() has the wrong idea.
I think that the only real way to judge whether cost_sort() is any good is to see whether it causes the wrong plan to be chosen in some cases. For example, it could cause merge joins to be picked too frequently or too infrequently, or it could cause a wrong decision between hash and group aggregates. Now, there is bound to be an argument about whether any test cases that we might pick are representative and the results will further differ between hardware platforms, but I think changing the formula based on an abstract idea about what's happening is probably not a good idea. Because it's necessary to set work_mem so conservatively to avoid running out of memory, most supposedly-external sorts aren't really doing any I/O at all, and therefore arguing about whether we should be charging seq_page_cost or random_page_cost for I/O is kinda missing the point. Those numbers may just be reflecting other work that the sort is doing that isn't being properly accounted for elsewhere, or it's possible that the current costing of sorts is fairly far away from reality. How would we know? The only thing that makes me think they probably aren't *too* bad is that in a somewhat-systematic study of query performance problems (https://sites.google.com/site/robertmhaas/query-performance) done several years ago I found no evidence of a problem with sort costing, and my experience outside of that study bears that out. But that's pretty back of the envelope. Of course, the issue of supposed I/O actually being reads of cached data is not unique to sorting. Sequential scans have the same problem. We've previously discussed the idea of adding a cached_page_cost, but this has fallen down over the issue that you also need to know what percentage of the table is likely to be read from cache, and you don't want that estimate to be too unstable or your plans will bounce all over the place. One idea I had for dealing with that is to just base it on the size of the table: assume small tables are probably cached, and large tables probably aren't. The latter is certainly true at least in the case where "large" means "bigger than physical memory", and the former isn't a bad guess either. So basically where this would go is to assume that scanning a small table is cheaper per page than scanning a big one, which I suspect is probably better than the uniform estimate we're using today. However, I haven't got a single test case to prove it, and in fact I'm not even sure how one could go about figuring out whether this makes things better or worse in general. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers