On Sat, Feb 4, 2017 at 11:58 AM, Peter Geoghegan <p...@bowt.ie> wrote: > On Fri, Feb 3, 2017 at 5:04 AM, Thomas Munro > <thomas.mu...@enterprisedb.com> wrote: >> 1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns >> show speed-up relative to that time): >> >> mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 >> -----+--------+-------+-------+-------+-------+-------+-------+-------+------- >> 1 | 119.97 | 4.61x | 4.83x | 5.32x | 5.61x | 5.88x | 6.10x | 6.18x | 6.09x >> 64 | 19.42 | 1.18x | 1.10x | 1.23x | 1.23x | 1.16x | 1.19x | 1.20x | 1.21x >> 256 | 18.35 | 1.02x | 0.92x | 0.98x | 1.02x | 1.06x | 1.07x | 1.08x | 1.10x >> 512 | 17.75 | 1.01x | 0.89x | 0.95x | 0.99x | 1.02x | 1.05x | 1.06x | 1.07x > > I think that this presorted case doesn't improve much because the > sorting itself is so cheap, as explained in my last mail. However, the > improvements as workers are added is still smaller than expected. I > think that this indicates that there isn't enough I/O capacity > available here to truly show the full potential of the patch -- I've > certainly seen better scalability for cases like this when there is a > lot of I/O bandwidth available, and I/O parallelism is there to be > taken advantage of. Say, when using a system with a large RAID array > (I used a RAID0 array with 12 HDDs for my own tests). Another issue is > that you probably don't have enough data here to really show off the > patch. I don't want to dismiss the benchmark, which is still quite > informative, but it's worth pointing out that the feature is going to > be most compelling for very large indexes, that will take at least > several minutes to build under any circumstances. (Having a > reproducible case is also important, which what you have here has > going for it, on the other hand.)
Right. My main reason for starting smallish was to allow me to search a space with several variables without waiting eons. Next I would like to run a small subset of those tests with, say, 10, 20 or even 100 times more data loaded, so the tables would be ~20GB, ~40GB or ~200GB. About read bandwidth: It shouldn't have been touching the disk at all for reads: I did a dummy run of the index build before the measured runs, so that a 2GB table being sorted in ~2 minutes would certainly have come entirely from the OS page cache since the machine has oodles of RAM. About write bandwidth: The WAL, the index and the temp files all went to an SSD array, though I don't have the characteristics of that to hand. I should also be able to test on multi-spindle HDD array. I doubt either can touch your 12 way RAID0 array, but will look into that. > I suspect that this system isn't particularly well balanced for the > task of benchmarking the patch. You would probably see notably better > scalability than any you've shown in any test if you could add > additional sequential I/O bandwidth, which is probably an economical, > practical choice for many users. I suspect that you aren't actually > saturating available CPUs to the greatest extent that the > implementations makes possible. I will look into what IO options I can access before running larger tests. Also I will look into running the test with both cold and warm caches (ie "echo 1 > /proc/sys/vm/drop_caches") so that read bandwidth enters the picture. > Another thing I want to point out is that with 1MB of > maintenance_work_mem, the patch appears to do very well, but that > isn't terribly meaningful. I would suggest that we avoid testing this > patch with such a low amount of memory -- it doesn't seem important. > This is skewed by the fact that you're using replacement selection in > the serial case only. I think what this actually demonstrates is that > replacement selection is very slow, even with its putative best case. > I believe that commit 2459833 was the final nail in the coffin of > replacement selection. I certainly don't want to relitigate the > discussion on replacement_sort_tuples, and am not going to push too > hard, but ISTM that we should fully remove replacement selection from > tuplesort.c and be done with it. Interesting. I haven't grokked this but will go and read about it. Based on your earlier comments about banana skin effects, I'm wondering if it would be interesting to add a couple more heap distributions to the test set that are almost completely sorted except for a few entries out of order. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers