On Tue, Jan 3, 2017 at 8:59 AM, Simon Riggs <si...@2ndquadrant.com> wrote: > On 3 January 2017 at 13:45, Amit Kapila <amit.kapil...@gmail.com> wrote: >> On Tue, Jan 3, 2017 at 6:41 PM, Simon Riggs <si...@2ndquadrant.com> wrote: >>> On 2 January 2017 at 21:23, Jim Nasby <jim.na...@bluetreble.com> wrote: >>> >>>> It's not clear from the thread that there is consensus that this feature >>>> is desired. In particular, the performance aspects of changing segment >>>> size from a C constant to a variable are in question. Someone with access >>>> to large hardware should test that. Andres[1] and Robert[2] did suggest >>>> that the option could be changed to a bitshift, which IMHO would also >>>> solve some sanity-checking issues. >>> >>> Overall, Robert has made a good case. The only discussion now is about >>> the knock-on effects it causes. >>> >>> One concern that has only barely been discussed is the effect of >>> zero-ing new WAL files. That is a linear effect and will adversely >>> effect performance as WAL segment size increases. >>> >> >> Sorry, but I am not able to understand why this is a problem? The >> bigger the size of WAL segment, lesser the number of files. So IIUC, >> then it can only impact if zero-ing two 16MB files is cheaper than >> zero-ing one 32MB file. Is that your theory or you have something >> else in mind? > > The issue I see is that at present no backend needs to do more than > 16MB of zeroing at one time, so the impact on response time is > reduced. If we start doing zeroing in larger chunks than the impact on > response times will increase. So instead of regular blips we have one > large blip, less often. I think the latter will be worse, but welcome > measurements that show that performance is smooth and regular with > large files sizes.
Yeah. I don't think there's any way to get around the fact that there will be bigger latency spikes in some cases with larger WAL files. I think the question is whether they'll be common enough or serious enough to worry about. For example, in a quick test on my laptop, zero-filling a 16 megabyte file using "dd if=/dev/zero of=x bs=8k count=2048" takes about 11 milliseconds, and zero-filling a 64 megabyte file with a count of 8192 increases the time to almost 50 milliseconds. That's something, but I wouldn't rate it as concerning. There are a lot of things that can cause latency changes multiple orders of magnitude larger than that, so worrying about that one in particular would seem to me to be fairly pointless. However, that's also a measurement on an unloaded system with an SSD, and the impact may be a lot more on a big system where with lots of concurrent activity, and if the process that does the write also has to do an fsync, that will increase the cost considerably, too. But the flip side is that it's wrong to imagine that there's no harm in leaving the situation as it is. Even my MacBook Pro can crank out about 2.7 WAL segments/second on "pgbench -c 16 -j 16 -T 60". I think a decent server with a few more CPU cores than my laptop has could do 4-5 times that. So we shouldn't imagine that the costs of spewing out a bajillion segment files are being paid only at the very high end. Even somebody running PostgreSQL on a low-end virtual machine might find it difficult to write an archive_command that can keep up if the system is under continuous load. Of course, as Stephen pointed out, there are toolkits that can do it and you should probably be using one of those anyway for other reasons, but nevertheless spitting out almost 3 WAL segments per second even on a laptop gives a whole new meaning to the term "continuous archiving". Another point to consider is that a bigger WAL segment size can actually *improve* latency because every segment switch triggers an immediate fsync, and every backend in the system ends up waiting for it to finish. We should probably eventually try to push those flushes into the background, and the zeroing work as well. My impression (possibly incorrect?) is that we expect to settle into a routine where zeroing new segments is relatively uncommon because we reuse old segment files, but the forced end-of-segment flushes never go away. So it's possible we might actually come out ahead on latency with this change, at least sometimes. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers