Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Fri, Mar 15, 2013 at 05:47:30PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: I'm marking this patch Ready for Committer, qualified with a recommendation to adopt only the wal.sgml changes. I've committed this along with some further wordsmithing. I kept Peter's change to pg_test_fsync's default -s value; I've always felt that 2 seconds was laughably small. It might be all right for very quick-and-dirty tests, but as a default value, it seems like a poor choice, because it's at the very bottom of the credible range of choices. Agreed, 2 seconds was at the bottom. The old behavior was very slow so I went low. Now that we are using it, 5 secs makes sense. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
Noah Misch n...@leadboat.com writes: I'm marking this patch Ready for Committer, qualified with a recommendation to adopt only the wal.sgml changes. I've committed this along with some further wordsmithing. I kept Peter's change to pg_test_fsync's default -s value; I've always felt that 2 seconds was laughably small. It might be all right for very quick-and-dirty tests, but as a default value, it seems like a poor choice, because it's at the very bottom of the credible range of choices. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Mon, Jan 28, 2013 at 04:48:56AM +, Peter Geoghegan wrote: On 28 January 2013 03:34, Noah Misch n...@leadboat.com wrote: Would you commit to the same git repository the pgbench-tools data for the graphs appearing in that blog post? I couldn't readily tell what was happening below 16 clients due to the graphed data points blending together. I'm afraid that I no longer have that data. Of course, I could fairly easily recreate it, but I don't think I'll have time tomorrow. Is it important? Are you interested in both the insert and tpc-b cases? No need, then. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Mon, Jan 28, 2013 at 04:29:12AM +, Peter Geoghegan wrote: On 28 January 2013 03:34, Noah Misch n...@leadboat.com wrote: On the EBS configuration with volatile fsync timings, the variability didn't go away with 15s runs. On systems with stable fsync times, 15s was no better than 2s. Absent some particular reason to believe 5s is better than 2s, I would leave it alone. I'm not recommending doing so because I thought you'd be likely to get better numbers on EBS; obviously the variability you saw there likely had a lot to do with the fact that the underlying physical machines have multiple tenants. It has just been my observation that more consistent figures can be obtained (on my laptop) by using a pg_test_fsync --secs-per-test of about 5. That being the case, why take the chance with 2 seconds? I can't get too excited about it either way. It isn't as if people run pg_test_fsync everyday, or that they cannot set --secs-per-test to whatever they like themselves. On the other hand, the cost of setting it too low could be quite high now, because the absolute values (and not just how different wal_sync_methods compare) is now important. True. You'd actually want to run the tool with a short interval to select a wal_sync_method, then test the chosen method for a longer period to get an accurate reading for commit_delay. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
Hi Noah, On 27 January 2013 02:31, Noah Misch n...@leadboat.com wrote: I did a few more benchmarks along the spectrum. So that's a nice 27-53% improvement, fairly similar to the pattern for your laptop pgbench numbers. I presume that this applies to a tpc-b benchmark (the pgbench default). Note that the really compelling numbers that I reported in that blog post (where there is an increase of over 80% in transaction throughput at lower client counts) occur with an insert-based benchmark (i.e. a maximally commit-bound workload). Next, based on your comment about the possible value for cloud-hosted applications -clients- -tps@commit_delay=0--tps@commit_delay=500- 32 1224,1391,1584 1175,1229,1394 64 1553,1647,1673 1544,1546,1632 128 1717,1833,1900 1621,1720,1951 256 1664,1717,1918 1734,1832,1918 The numbers are all over the place, but there's more loss than gain. I suspected that the latency of cloud storage might be relatively poor. Since that is evidently not actually the case with Amazon EBS, it makes sense that commit_delay isn't compelling there. I am not disputing whether or not Amazon EBS should be considered representative of such systems in general - I'm sure that it should be. There was no appreciable performance advantage from setting commit_delay=0 as opposed to relying on commit_siblings to suppress the delay. That's good news. Thank you for doing that research; I investigated that the fastpath in MinimumActiveBackends() works well myself, but it's useful to have my findings verified. On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us. However, the sleep was consistently around 70us longer than requested. A 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep. Mac OS X was similarly precise for short sleeps, but it could oversleep a full 1000us on a 35000us sleep. Ugh. The beginning of this paragraph stills says commit_delay causes a delay just before a synchronous commit attempts to flush WAL to disk. Since it now applies to every WAL flush, that should be updated. Agreed. There's a similar problem at the beginning of this paragraph; it says specifically, The commit_delay parameter defines for how many microseconds the server process will sleep after writing a commit record to the log with LogInsert but before performing a LogFlush. Right. As a side note, if we're ever going to recommend a fire-and-forget method for setting commit_delay, it may be worth detecting whether the host sleep granularity is limited like this. Setting commit_delay = 20 for your SSD and silently getting commit_delay = 1 would make for an unpleasant surprise. Yes, it would. Note on possible oversleeping added. ! para !Since the purpose of varnamecommit_delay/varname is to allow !the cost of each flush operation to be more effectively amortized !across concurrently committing transactions (potentially at the !expense of transaction latency), it is necessary to quantify that !cost when altering the setting. The higher that cost is, the more !effective varnamecommit_delay/varname is expected to be in !increasing transaction throughput. The That's true for spinning disks, but I suspect it does not hold for storage with internal parallelism, notably virtualized storage. Consider an iSCSI configuration with high bandwidth and high latency. When network latency is the limiting factor, will sending larger requests less often still help? Well, I don't like to speculate about things like that, because it's just too easy to be wrong. That said, it doesn't immediately occur to me why the statement that you've highlighted wouldn't be true of virtualised storage that has the characteristics you describe. Any kind of latency at flush time means that clients idle, which means that the CPU is potentially not kept fully busy for a greater amount of wall time, where it might otherwise be kept more busy. One would be foolish to run a performance-sensitive workload like those in question, including the choice to have synchronous_commit=on, on spinning disks with no battery-backed write cache. A cloud environment is more credible, but my benchmark showed no gain there. In an everyday sense you are correct. It would typically be fairly senseless to run an application that was severely limited by transaction throughput like this, when a battery-backed cache could be used at the cost of a couple of hundred dollars. However, it's quite possible to imagine a scenario in which the economics favoured using commit_delay instead. For example, I am aware that at Facebook, a similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs) is used. Furthermore, it might not be obvious that fsync speed is an issue in practice. Setting commit_delay to 4,000 has seemingly no downside on my laptop - it *positively* affects
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Mon, Jan 28, 2013 at 12:16:24AM +, Peter Geoghegan wrote: On 27 January 2013 02:31, Noah Misch n...@leadboat.com wrote: I did a few more benchmarks along the spectrum. So that's a nice 27-53% improvement, fairly similar to the pattern for your laptop pgbench numbers. I presume that this applies to a tpc-b benchmark (the pgbench default). Note that the really compelling numbers that I reported in that blog post (where there is an increase of over 80% in transaction throughput at lower client counts) occur with an insert-based benchmark (i.e. a maximally commit-bound workload). Correct. The pgbench default workload is already rather friendly toward commit_delay, so I wanted to stay away from even-friendlier tests. Would you commit to the same git repository the pgbench-tools data for the graphs appearing in that blog post? I couldn't readily tell what was happening below 16 clients due to the graphed data points blending together. ! para !Since the purpose of varnamecommit_delay/varname is to allow !the cost of each flush operation to be more effectively amortized !across concurrently committing transactions (potentially at the !expense of transaction latency), it is necessary to quantify that !cost when altering the setting. The higher that cost is, the more !effective varnamecommit_delay/varname is expected to be in !increasing transaction throughput. The That's true for spinning disks, but I suspect it does not hold for storage with internal parallelism, notably virtualized storage. Consider an iSCSI configuration with high bandwidth and high latency. When network latency is the limiting factor, will sending larger requests less often still help? Well, I don't like to speculate about things like that, because it's just too easy to be wrong. That said, it doesn't immediately occur to me why the statement that you've highlighted wouldn't be true of virtualised storage that has the characteristics you describe. Any kind of latency at flush time means that clients idle, which means that the CPU is potentially not kept fully busy for a greater amount of wall time, where it might otherwise be kept more busy. On further reflection, I retract the comment. Regardless of internal parallelism of the storage, PostgreSQL issues WAL fsyncs serially. One would be foolish to run a performance-sensitive workload like those in question, including the choice to have synchronous_commit=on, on spinning disks with no battery-backed write cache. A cloud environment is more credible, but my benchmark showed no gain there. In an everyday sense you are correct. It would typically be fairly senseless to run an application that was severely limited by transaction throughput like this, when a battery-backed cache could be used at the cost of a couple of hundred dollars. However, it's quite possible to imagine a scenario in which the economics favoured using commit_delay instead. For example, I am aware that at Facebook, a similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs) is used. Furthermore, it might not be obvious that fsync speed is an issue in practice. Setting commit_delay to 4,000 has seemingly no downside on my laptop - it *positively* affects both average and worse-case transaction latency - so with spinning disks, it probably would actually be sensible to set it and forget it, regardless of workload. I agree that commit_delay is looking like a safe bet for spinning disks. I attach a revision that I think addresses your concerns. I've polished it a bit further too - in particular, my elaborations about commit_delay have been concentrated at the end of wal.sgml, where they belong. I've also removed the reference to XLogInsert, because, since all XLogFlush call sites are now covered by commit_delay, XLogInsert isn't particularly relevant. I'm happy with this formulation. I have also increased the default time that pg_test_fsync runs - I think that the kind of variability commonly seen in its output, that you yourself have reported, justifies doing so in passing. On the EBS configuration with volatile fsync timings, the variability didn't go away with 15s runs. On systems with stable fsync times, 15s was no better than 2s. Absent some particular reason to believe 5s is better than 2s, I would leave it alone. I'm marking this patch Ready for Committer, qualified with a recommendation to adopt only the wal.sgml changes. Thanks, nm -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On 28 January 2013 03:34, Noah Misch n...@leadboat.com wrote: On the EBS configuration with volatile fsync timings, the variability didn't go away with 15s runs. On systems with stable fsync times, 15s was no better than 2s. Absent some particular reason to believe 5s is better than 2s, I would leave it alone. I'm not recommending doing so because I thought you'd be likely to get better numbers on EBS; obviously the variability you saw there likely had a lot to do with the fact that the underlying physical machines have multiple tenants. It has just been my observation that more consistent figures can be obtained (on my laptop) by using a pg_test_fsync --secs-per-test of about 5. That being the case, why take the chance with 2 seconds? It isn't as if people run pg_test_fsync everyday, or that they cannot set --secs-per-test to whatever they like themselves. On the other hand, the cost of setting it too low could be quite high now, because the absolute values (and not just how different wal_sync_methods compare) is now important. -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On 28 January 2013 03:34, Noah Misch n...@leadboat.com wrote: Would you commit to the same git repository the pgbench-tools data for the graphs appearing in that blog post? I couldn't readily tell what was happening below 16 clients due to the graphed data points blending together. I'm afraid that I no longer have that data. Of course, I could fairly easily recreate it, but I don't think I'll have time tomorrow. Is it important? Are you interested in both the insert and tpc-b cases? -- Regards, Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
Hi Peter, I took a look at this patch and the benchmarks you've furnished: https://github.com/petergeoghegan/commit_delay_benchmarks http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html On Wed, Nov 14, 2012 at 08:44:26PM +, Peter Geoghegan wrote: Attached is a doc-patch that makes recommendations that are consistent with my observations about what works best here. I'd like to see us making *some* recommendation - for sympathetic cases, setting commit_delay appropriately can make a very large difference to transaction throughput. Such sympathetic cases - many small write transactions - are something that tends to be seen relatively frequently with web applications, that disproportionately use cloud hosting. It isn't at all uncommon for these cases to be highly bound by their commit rate, and so it is compelling to try to amortize the cost of a flush as effectively as possible there. It would be unfortunate if no one was even aware that commit_delay is now useful for these cases, since the setting allows cloud hosting providers to help these cases quite a bit, without having to do something like compromise durability, which in general isn't acceptable. Your fast-fsync (SSD, BBWC) benchmarks show a small loss up to 8 clients and a 10-20% improvement at 32 clients. That's on a 4-core/8-thread CPU, assuming HT was left enabled. Your slow-fsync (laptop) benchmarks show a 40-100% improvement in the 16-64 client range. I did a few more benchmarks along the spectrum. First, I used a Mac, also 4-core/8-thread, with fsync_writethrough; half of fsync time gave commit_delay = 35000. I used pgbench, scale factor 100, 4-minute runs, three trials each: -clients- -tps@commit_delay=0--tps@commit_delay=35000- 8 51,55,6382,84,86 16 98,100,107 130,134,143 32 137,148,157 192,200,201 64 199,201,214 249,256,258 So that's a nice 27-53% improvement, fairly similar to the pattern for your laptop pgbench numbers. Next, based on your comment about the possible value for cloud-hosted applications, I tried a cc2.8xlarge (16 core, 32 thread), GNU/Linux EC2 instance with a data directory on a standard EBS volume, ext4 filesystem. Several 15s pg_test_fsync runs could not agree on an fsync time; I saw results from 694us to 1904us. Ultimately I settled on trying commit_delay=500, scale factor 300: -clients- -tps@commit_delay=0--tps@commit_delay=500- 32 1224,1391,1584 1175,1229,1394 64 1553,1647,1673 1544,1546,1632 128 1717,1833,1900 1621,1720,1951 256 1664,1717,1918 1734,1832,1918 The numbers are all over the place, but there's more loss than gain. Amit Kapila also measured small losses in tps@-c16: http://www.postgresql.org/message-id/000701cd6ff0$013a6210$03af2630$@kap...@huawei.com I was curious about the cost of the MinimumActiveBackends() call when relying on commit_siblings to skip the delay. I ran a similar test with an extra 500 idle backends, clients=8, commit_siblings=20 (so the delay would never be used), and either a zero or nonzero commit_delay. There was no appreciable performance advantage from setting commit_delay=0 as opposed to relying on commit_siblings to suppress the delay. That's good news. On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us. However, the sleep was consistently around 70us longer than requested. A 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep. Mac OS X was similarly precise for short sleeps, but it could oversleep a full 1000us on a 35000us sleep. diff doc/src/sgml/wal.sgml index fc5c3b2..92619dd *** a/doc/src/sgml/wal.sgml --- b/doc/src/sgml/wal.sgml *** *** 375,382 just before a synchronous commit attempts to flush acronymWAL/acronym to disk, in the hope that a single flush executed by one such transaction can also serve other transactions !committing at about the same time. Setting varnamecommit_delay/varname !can only help when there are many concurrently committing transactions. /para /sect1 --- 375,397 The beginning of this paragraph stills says commit_delay causes a delay just before a synchronous commit attempts to flush WAL to disk. Since it now applies to every WAL flush, that should be updated. *** *** 560,570 is not enabled, or if fewer than xref linkend=guc-commit-siblings other sessions are currently in active transactions; this avoids sleeping when it's unlikely that any other session will commit soon. !Note that on most platforms, the resolution of a sleep request is ten milliseconds, so that any nonzero varnamecommit_delay/varname setting between 1 and 1 microseconds would have the same effect. !Good values for these parameters are not yet clear;
[HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Mon, Jan 21, 2013 at 12:23:21AM +, Peter Geoghegan wrote: On 19 January 2013 20:38, Noah Misch n...@leadboat.com wrote: staticloud.com seems to be gone. Would you repost these? I've pushed these to a git repo, hosted on github. https://github.com/petergeoghegan/commit_delay_benchmarks I'm sorry that I didn't take the time to make the html benchmarks easily viewable within a browser on this occasion. That's plenty convenient; thanks. What filesystem did you use for testing? Would you also provide /proc/cpuinfo or a rough description of the system's CPUs? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: Doc patch making firm recommendation for setting the value of commit_delay
On Wed, Nov 14, 2012 at 08:44:26PM +, Peter Geoghegan wrote: http://commit-delay-results-ssd-insert.staticloud.com http://commit-delay-stripe-insert.staticloud.com http://commit-delay-results-stripe-tpcb.staticloud.com http://commit-delay-results-ssd-insert.staticloud.com/19/pg_settings.txt staticloud.com seems to be gone. Would you repost these? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers