On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > 1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns > show speed-up relative to that time): > > mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 > -----+--------+-------+-------+-------+-------+-------+-------+-------+------- > 1 | 119.97 | 4.61x | 4.83x | 5.32x | 5.61x | 5.88x | 6.10x | 6.18x | 6.09x > 64 | 19.42 | 1.18x | 1.10x | 1.23x | 1.23x | 1.16x | 1.19x | 1.20x | 1.21x > 256 | 18.35 | 1.02x | 0.92x | 0.98x | 1.02x | 1.06x | 1.07x | 1.08x | 1.10x > 512 | 17.75 | 1.01x | 0.89x | 0.95x | 0.99x | 1.02x | 1.05x | 1.06x | 1.07x
I think that this presorted case doesn't improve much because the sorting itself is so cheap, as explained in my last mail. However, the improvements as workers are added is still smaller than expected. I think that this indicates that there isn't enough I/O capacity available here to truly show the full potential of the patch -- I've certainly seen better scalability for cases like this when there is a lot of I/O bandwidth available, and I/O parallelism is there to be taken advantage of. Say, when using a system with a large RAID array (I used a RAID0 array with 12 HDDs for my own tests). Another issue is that you probably don't have enough data here to really show off the patch. I don't want to dismiss the benchmark, which is still quite informative, but it's worth pointing out that the feature is going to be most compelling for very large indexes, that will take at least several minutes to build under any circumstances. (Having a reproducible case is also important, which what you have here has going for it, on the other hand.) I suspect that this system isn't particularly well balanced for the task of benchmarking the patch. You would probably see notably better scalability than any you've shown in any test if you could add additional sequential I/O bandwidth, which is probably an economical, practical choice for many users. I suspect that you aren't actually saturating available CPUs to the greatest extent that the implementations makes possible. Another thing I want to point out is that with 1MB of maintenance_work_mem, the patch appears to do very well, but that isn't terribly meaningful. I would suggest that we avoid testing this patch with such a low amount of memory -- it doesn't seem important. This is skewed by the fact that you're using replacement selection in the serial case only. I think what this actually demonstrates is that replacement selection is very slow, even with its putative best case. I believe that commit 2459833 was the final nail in the coffin of replacement selection. I certainly don't want to relitigate the discussion on replacement_sort_tuples, and am not going to push too hard, but ISTM that we should fully remove replacement selection from tuplesort.c and be done with it. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers