Postgres 9.6 saw the performance characteristics of sorting significantly altered. While almost every external sort will see a benefit without any change in configuration, there is a new scope for DBAs to tune the system to better take advantage of the improved implementation of external sorting. They should get guidance from us on this to make it more likely that this actually happens, though.
I'd like to discuss where this guidance should be added, but first, a little background. Firstly, DBAs may get a significant, appreciable benefit from making sure that temp_tablespaces puts temp files on a fast filesystem; this would probably have been far less helpful prior to 9.6. It seems reasonable to suppose that explicitly setting temp_tablespaces does not happen all that often on production installations today, since external sorts were stalled on memory access most of the time with the old replacement selection approach. While memory stalls remain a big cost in 9.6, the situation is much improved. External sorts may be significantly I/O bound far more frequently in 9.6, especially during merging (which is also optimized in 9.6), and especially during merging for CREATE INDEX in particular. In the latter case, it's likely that the system writes index tuples and WAL out as it reads runs in, with everything on the same filesystem. Modern filesystems are really good at read-ahead and write-behind. It really is not at all unexpected for sequential I/O with fast *disks* to be faster than random *memory* access , and so I really doubt that there is any question that I/O will now matter more. I expect that blocking on sequential I/O will become much more of a problem when parallel sort is added, but it's still going to be a problem with serial sorts on busy production systems servicing many clients. (With only a single client, this is much less true, but that's seldom the case.) Secondly, with 9.6 it's roughly true that the more memory you can spare for maintenance_work_mem and work_mem, the better. There are some caveats that I won't rehash right now; the caveats are rather limited, and DBAs can probably just pretend that the common, simple intuition "more memory is better, until you OOM" is at last true. I don't imagine more information (the caveats) is of much practical use. A duality ========= Adding faster disks to get more sequential write bandwidth is a practical measure available to many DBAs, which is more important for sorting now. While not every user can afford to do so, it's nice that some have the *option* of upgrading to get better performance. It need not be expensive; we get plenty of benefit from cheaper SATA HDDs, since, with care, only sequential I/O performance really matters. This was not really the case in prior releases. At the same time, if you can afford the memory, you should probably just increase work_mem/maintenance_work_mem to get another performance benefit. These two benefits will often be complementary. There may a feedback loop that looks a bit like this: Quicksort is cache oblivious, which implies that we may effectively use more working memory, which implies longer runs, which implies fewer merge passes, which implies that sequential I/O performance is mostly where I/O costs are paid, which implies that you can manage I/O costs by adding (consumer grade?) SATA HDDs configured in RAID0 for temp files, which implies that sorts finish sooner, which implies that in aggregate memory use is lower, which implies that you can afford to set work_mem/maintenance_work_mem higher still, which implies ... . This may seem facile, but consider the huge difference in costs I'm focussing on. There is a huge difference between the cost of random I/O and sequential I/O (when FS cache is not able to "amortize" the random I/O), just as there is a huge difference between the cost of an L1 cache access, and main memory access (a cache miss). So, if I can be forgiven for using such a hand-wavey term, there is an interesting duality here. We should avoid "the expensive variety" of I/O (random I/O) wherever possible, and avoid "the expensive variety" of memory access (involving a last-level CPU cache miss) as much as possible. Certainly, the economics of modern hardware strongly support increasing locality of access at every level. We're talking about differences of perhaps several orders of magnitude. I/O bound sorts in earlier releases ----------------------------------- Perhaps someone has personally seen a DBA adding a new temp_tablespace on a separate, faster filesystem. Perhaps this clearly improved performance. I don't think that undermines my argument, though. This *does* sometimes happen on prior versions of Postgres, but my guess is that this is almost accidental: * Increasing work_mem/maintenance_work_mem perversely makes external sorts *slower* before 9.6. Iterative tuning of these settings on busy production systems will therefore tend to guide the DBA to decrease work_mem/maintenance_work_mem settings (maybe external sorts got so slow that OOMs happened). Whether or not the DBA realizes it, the counter-intuitive slowdown occurs because replacement selection is sensitive to CPU cache size. * As those memory settings are pushed down, runs become smaller, and more merge steps are required for any larger external sorts performed. Merge steps increase the amount of I/O, particularly when a few passes are necessary; *random* I/O suddenly spikes. Of course, this could have been avoided if work_mem/maintenance_work_mem were higher, but that's already been ruled out. Documentation ============= I think we can expand "21.6. Tablespaces" to describe the implications of these new performance characteristics. I'd like to hear opinions on how to approach that before proposing a patch, though. The basic guidance should, IMV, be: * A temp tablespace with cheaper disks that have good sequential I/O performance can speed up external sorts quite a lot. Probably not a great idea to have many temp tablespaces. Use RAID0 instead, because that performs better, and because it doesn't matter that temp files are not recoverable if a disk is faulty. * More memory for sorting and hashing is often better in PostgreSQL 9.6. Notably, the performance of hash joins that spill will tend to degrade less predictably than the performance of sorts that spill as less memory is made available. (Perhaps mention the number of external sort passes?) * Increasing work_mem/maintenance_work_mem may fail to improve performance only because sorts then become more I/O bound. When in doubt, testing is advised. A balance may need to be found, if only to avoid wasting memory. Thoughts?  https://queue.acm.org/detail.cfm?id=1563874 -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers