Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
- Цитат от David Fetter (da...@fetter.org), на 23.07.2012 в 15:41 - I'm not sure how you automate testing a pull-the-plug scenario. I have a dim memory of how the FreeBSD project was alleged to have done it, namely by rigging a serial port (yes, it was that long ago) to the power supply of another machine and randomly cycling the power. These days most of the server class hardware could be power-cycled with IPMI command. Best regards -- Luben Karavelov
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On Mon, Jul 23, 2012 at 5:41 AM, David Fetter wrote: > On Mon, Jul 23, 2012 at 08:29:16AM -0400, Andrew Dunstan wrote: >> >> >> I'm not sure how you automate testing a pull-the-plug scenario. > > I have a dim memory of how the FreeBSD project was alleged to have > done it, namely by rigging a serial port (yes, it was that long ago) > to the power supply of another machine and randomly cycling the power. On Linux, echo b > /proc/sysrq-trigger Is supposed to take it down instantly, with no flushing of dirty buffers. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/23/2012 09:47 PM, Andrew Dunstan wrote: On 07/23/2012 09:04 AM, Craig Ringer wrote: On 07/23/2012 08:29 PM, Andrew Dunstan wrote: I'm not sure how you automate testing a pull-the-plug scenario. fire up kvm or qemu instances, then kill 'em. Yeah, maybe. Knowing just when to kill them might be an interesting question. I'm also unsure how much nice cleanup the host supervisor does in such cases. VMs are wonderful things, but they aren't always the answer. I'm not saying they aren't here, just wondering. I've done some testing with this, and what it boils down to is that any data that made it to the virtual disk is persistent after a VM kill. Anything in dirty buffers on the VM guest is lost. It's a very close match for real hardware. I haven't tried to examine the details of the handling of virtualised disk hardware write caches, but disks should be in write-through mode anyway. A `kill -9` will clear 'em for sure, anyway, as the guest has no chance to do any cleanup. One of the great things about kvm and qemu for this sort of testing is that it's just another program. There's very little magic, and it's quite easy to test and trace. I have a qemu/kvm test harness I've been using for another project that I need to update and clean up as it'd be handy for this. It's just a matter of making the time, as it's been a busy few days. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/23/2012 09:04 AM, Craig Ringer wrote: On 07/23/2012 08:29 PM, Andrew Dunstan wrote: I'm not sure how you automate testing a pull-the-plug scenario. fire up kvm or qemu instances, then kill 'em. Yeah, maybe. Knowing just when to kill them might be an interesting question. I'm also unsure how much nice cleanup the host supervisor does in such cases. VMs are wonderful things, but they aren't always the answer. I'm not saying they aren't here, just wondering. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/23/2012 08:29 PM, Andrew Dunstan wrote: I'm not sure how you automate testing a pull-the-plug scenario. fire up kvm or qemu instances, then kill 'em. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/23/2012 08:41 AM, David Fetter wrote: The buildfarm is not at all designed to test performance. That's why we want a performance farm. Right. Apart from hardware, what are we stalled on? Software :-) I am trying to find some cycles to get something going. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On Mon, Jul 23, 2012 at 08:29:16AM -0400, Andrew Dunstan wrote: > > On 07/23/2012 12:37 AM, David Fetter wrote: > >On Tue, Jul 17, 2012 at 06:56:50PM -0400, Tom Lane wrote: > >>Robert Haas writes: > >>>On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane wrote: > BTW, while we are on the subject: hasn't this split completely > broken the statistics about backend-initiated writes? > >>>Yes, it seems to have done just that. > >>This implies that nobody has done pull-the-plug testing on either > >>HEAD or 9.2 since the checkpointer split went in (2011-11-01), > >>because even a modicum of such testing would surely have shown that > >>we're failing to fsync a significant fraction of our write traffic. > >> > >>Furthermore, I would say that any performance testing done since > >>then, if it wasn't looking at purely read-only scenarios, isn't > >>worth the electrons it's written on. In particular, any performance > >>gain that anybody might have attributed to the checkpointer splitup > >>is very probably hogwash. > >> > >>This is not giving me a warm feeling about our testing practices. > >Is there any part of this that the buildfarm, or some other automation > >framework, might be able to handle? > > > > I'm not sure how you automate testing a pull-the-plug scenario. I have a dim memory of how the FreeBSD project was alleged to have done it, namely by rigging a serial port (yes, it was that long ago) to the power supply of another machine and randomly cycling the power. > The buildfarm is not at all designed to test performance. That's why > we want a performance farm. Right. Apart from hardware, what are we stalled on? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/23/2012 12:37 AM, David Fetter wrote: On Tue, Jul 17, 2012 at 06:56:50PM -0400, Tom Lane wrote: Robert Haas writes: On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane wrote: BTW, while we are on the subject: hasn't this split completely broken the statistics about backend-initiated writes? Yes, it seems to have done just that. This implies that nobody has done pull-the-plug testing on either HEAD or 9.2 since the checkpointer split went in (2011-11-01), because even a modicum of such testing would surely have shown that we're failing to fsync a significant fraction of our write traffic. Furthermore, I would say that any performance testing done since then, if it wasn't looking at purely read-only scenarios, isn't worth the electrons it's written on. In particular, any performance gain that anybody might have attributed to the checkpointer splitup is very probably hogwash. This is not giving me a warm feeling about our testing practices. Is there any part of this that the buildfarm, or some other automation framework, might be able to handle? I'm not sure how you automate testing a pull-the-plug scenario. The buildfarm is not at all designed to test performance. That's why we want a performance farm. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On Tue, Jul 17, 2012 at 06:56:50PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane wrote: > >> BTW, while we are on the subject: hasn't this split completely > >> broken the statistics about backend-initiated writes? > > > Yes, it seems to have done just that. > > This implies that nobody has done pull-the-plug testing on either > HEAD or 9.2 since the checkpointer split went in (2011-11-01), > because even a modicum of such testing would surely have shown that > we're failing to fsync a significant fraction of our write traffic. > > Furthermore, I would say that any performance testing done since > then, if it wasn't looking at purely read-only scenarios, isn't > worth the electrons it's written on. In particular, any performance > gain that anybody might have attributed to the checkpointer splitup > is very probably hogwash. > > This is not giving me a warm feeling about our testing practices. Is there any part of this that the buildfarm, or some other automation framework, might be able to handle? Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 17 July 2012 23:56, Tom Lane wrote: > Robert Haas writes: >> On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane wrote: >>> BTW, while we are on the subject: hasn't this split completely broken >>> the statistics about backend-initiated writes? > >> Yes, it seems to have done just that. > > So I went to fix this in the obvious way (attached), but while testing > it I found that the number of buffers_backend events reported during > a regression test run barely changed; which surprised the heck out of > me, so I dug deeper. The cause turns out to be extremely scary: > ForwardFsyncRequest isn't getting called at all in the bgwriter process, > because the bgwriter process has a pendingOpsTable. So it just queues > its fsync requests locally, and then never acts on them, since it never > runs any checkpoints anymore. > > This implies that nobody has done pull-the-plug testing on either HEAD > or 9.2 since the checkpointer split went in (2011-11-01), because even > a modicum of such testing would surely have shown that we're failing to > fsync a significant fraction of our write traffic. That problem was reported to me on list some time ago, and I made note to fix that after last CF. I added a note to 9.2 open items about it myself, but it appears my fix was too simple and fixed only the reported problem not the underlying issue. Reading your patch gave me strong deja vu, so not sure what happened there. Not very good from me. Feel free to thwack me to fix such things if I seem not to respond quickly enough. I'm now looking at the other open items in my area. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 18.07.2012 02:48, Peter Geoghegan wrote: On 17 July 2012 23:56, Tom Lane wrote: This implies that nobody has done pull-the-plug testing on either HEAD or 9.2 since the checkpointer split went in (2011-11-01), because even a modicum of such testing would surely have shown that we're failing to fsync a significant fraction of our write traffic. Furthermore, I would say that any performance testing done since then, if it wasn't looking at purely read-only scenarios, isn't worth the electrons it's written on. In particular, any performance gain that anybody might have attributed to the checkpointer splitup is very probably hogwash. This is not giving me a warm feeling about our testing practices. The checkpointer slit-up was not justified as a performance optimisation so much as a re-factoring effort that might have some concomitant performance benefits. Agreed, but it means that we need to re-run the tests that were done to make sure the extra fsync-request traffic is not causing a performance regression, http://archives.postgresql.org/pgsql-hackers/2011-10/msg01321.php. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Greg Smith writes: > On 07/17/2012 06:56 PM, Tom Lane wrote: >> Furthermore, I would say that any performance testing done since then, >> if it wasn't looking at purely read-only scenarios, isn't worth the >> electrons it's written on. In particular, any performance gain that >> anybody might have attributed to the checkpointer splitup is very >> probably hogwash. > There hasn't been any performance testing that suggested the > checkpointer splitup was justified. The stuff I did showed it being > flat out negative for a subset of pgbench oriented cases, which didn't > seem real-world enough to disprove it as the right thing to do though. Just to clarify, I'm not saying that this means we should revert the checkpointer split. What I *am* worried about is that we may have been hacking other things on the basis of faulty performance tests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/18/2012 12:00 PM, Greg Smith wrote: The second justification for the split was that it seems easier to get a low power result from, which I believe was the angle Peter Geoghegan was working when this popped up originally. The checkpointer has to run sometimes, but only at a 50% duty cycle as it's tuned out of the box. It seems nice to be able to approach that in a way that's power efficient without coupling it to whatever heartbeat the BGW is running at. I could even see people changing the frequencies for each independently depending on expected system load. Tune for lower power when you don't expect many users, that sort of thing. Yeah - I'm already seeing benefits from that on my laptop, with much less need to stop Pg when I'm not using it. -- Craig Ringer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 07/17/2012 06:56 PM, Tom Lane wrote: So I went to fix this in the obvious way (attached), but while testing it I found that the number of buffers_backend events reported during a regression test run barely changed; which surprised the heck out of me, so I dug deeper. The cause turns out to be extremely scary: ForwardFsyncRequest isn't getting called at all in the bgwriter process, because the bgwriter process has a pendingOpsTable. When I did my testing early this year to look at checkpointer performance (among other 9.2 write changes like group commit), I did see some cases where buffers_backend was dramatically different on 9.2 vs. 9.1 There were plenty of cases where the totals across a 10 minute pgbench were almost identical though, so this issue didn't stick out then. That's a very different workload than the regression tests though. This implies that nobody has done pull-the-plug testing on either HEAD or 9.2 since the checkpointer split went in (2011-11-01), because even a modicum of such testing would surely have shown that we're failing to fsync a significant fraction of our write traffic. Ugh. Most of my pull the plug testing the last six months has been focused on SSD tests with older versions. I want to duplicate this (and any potential fix) now that you've highlighted it. Furthermore, I would say that any performance testing done since then, if it wasn't looking at purely read-only scenarios, isn't worth the electrons it's written on. In particular, any performance gain that anybody might have attributed to the checkpointer splitup is very probably hogwash. There hasn't been any performance testing that suggested the checkpointer splitup was justified. The stuff I did showed it being flat out negative for a subset of pgbench oriented cases, which didn't seem real-world enough to disprove it as the right thing to do though. I thought there were two valid justifications for the checkpointer split (which is not a feature I have any corporate attachment to--I'm as isolated from how it was developed as you are). The first is that it seems like the right architecture to allow reworking checkpoints and background writes for future write path optimization. A good chunk of the time when I've tried to improve one of those (like my spread sync stuff from last year), the code was complicated by the background writer needing to follow the drum of checkpoint timing, and vice-versa. Being able to hack on those independently got a sign of relief from me. And while this adds some code duplication in things like the process setup, I thought the result would be cleaner for people reading the code to follow too. This problem is terrible, but I think part of how it crept in is that the single checkpoint+background writer process was doing way too many things to even follow all of them some days. The second justification for the split was that it seems easier to get a low power result from, which I believe was the angle Peter Geoghegan was working when this popped up originally. The checkpointer has to run sometimes, but only at a 50% duty cycle as it's tuned out of the box. It seems nice to be able to approach that in a way that's power efficient without coupling it to whatever heartbeat the BGW is running at. I could even see people changing the frequencies for each independently depending on expected system load. Tune for lower power when you don't expect many users, that sort of thing. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
On 17 July 2012 23:56, Tom Lane wrote: > This implies that nobody has done pull-the-plug testing on either HEAD > or 9.2 since the checkpointer split went in (2011-11-01), because even > a modicum of such testing would surely have shown that we're failing to > fsync a significant fraction of our write traffic. > > Furthermore, I would say that any performance testing done since then, > if it wasn't looking at purely read-only scenarios, isn't worth the > electrons it's written on. In particular, any performance gain that > anybody might have attributed to the checkpointer splitup is very > probably hogwash. > > This is not giving me a warm feeling about our testing practices. The checkpointer slit-up was not justified as a performance optimisation so much as a re-factoring effort that might have some concomitant performance benefits. While I agree that it is regrettable that this was allowed to go undetected for so long, I do not find it especially surprising that some performance testing results post-split didn't strike somebody as fool's gold. Much of the theory surrounding checkpoint tuning, if followed, results in relatively little work being done during the sync phase of a checkpoint, especially if an I/O scheduler like deadline is used. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Robert Haas writes: > On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane wrote: >> BTW, while we are on the subject: hasn't this split completely broken >> the statistics about backend-initiated writes? > Yes, it seems to have done just that. So I went to fix this in the obvious way (attached), but while testing it I found that the number of buffers_backend events reported during a regression test run barely changed; which surprised the heck out of me, so I dug deeper. The cause turns out to be extremely scary: ForwardFsyncRequest isn't getting called at all in the bgwriter process, because the bgwriter process has a pendingOpsTable. So it just queues its fsync requests locally, and then never acts on them, since it never runs any checkpoints anymore. This implies that nobody has done pull-the-plug testing on either HEAD or 9.2 since the checkpointer split went in (2011-11-01), because even a modicum of such testing would surely have shown that we're failing to fsync a significant fraction of our write traffic. Furthermore, I would say that any performance testing done since then, if it wasn't looking at purely read-only scenarios, isn't worth the electrons it's written on. In particular, any performance gain that anybody might have attributed to the checkpointer splitup is very probably hogwash. This is not giving me a warm feeling about our testing practices. As far as fixing the bug is concerned, the reason for the foulup is that mdinit() looks to IsBootstrapProcessingMode() to decide whether to create a pendingOpsTable. That probably was all right when it was coded, but what it means today is that *any* process started via AuxiliaryProcessMain will have one; thus not only do bgwriters have one, but so do walwriter and walreceiver processes; which might not represent a bug today but it's pretty scary anyway. I think we need to fix that so it's more directly dependent on the auxiliary process type. We can't use flags set by the respective FooMain() functions, such as am_bg_writer, because mdinit is called from BaseInit() which happens before reaching those functions. My suggestion is that bootstrap.c ought to make the process's AuxProcType value available and then mdinit should consult that to decide what to do. (Having done that, we might consider getting rid of the "retail" process-type flags am_bg_writer etc.) regards, tom lane diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c index 5f93fccbfab1bbb8306f5de4ad228f3cb48b0862..41a7b2be4f680db08556d948eaaa002ed50119c5 100644 *** a/src/backend/postmaster/bgwriter.c --- b/src/backend/postmaster/bgwriter.c *** BackgroundWriterMain(void) *** 341,346 --- 341,357 } + /* + * IsBackgroundWriterProcess + * Return true if running in background writer process. + */ + bool + IsBackgroundWriterProcess(void) + { + return am_bg_writer; + } + + /* * signal handler routines * diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c index 92fd4276cd1b3be81d1ac741f9f6ea09d241ea52..bd1db4811d661d75e616bc77157c4a9e9b7e92fc 100644 *** a/src/backend/postmaster/checkpointer.c --- b/src/backend/postmaster/checkpointer.c *** bool *** 1124,1129 --- 1124,1130 ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno) { CheckpointerRequest *request; + bool am_bg_writer; bool too_full; if (!IsUnderPostmaster) *** ForwardFsyncRequest(RelFileNode rnode, F *** 1131,1141 if (am_checkpointer) elog(ERROR, "ForwardFsyncRequest must not be called in checkpointer"); LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); /* Count all backend writes regardless of if they fit in the queue */ ! CheckpointerShmem->num_backend_writes++; /* * If the checkpointer isn't running or the request queue is full, the --- 1132,1144 if (am_checkpointer) elog(ERROR, "ForwardFsyncRequest must not be called in checkpointer"); + am_bg_writer = IsBackgroundWriterProcess(); LWLockAcquire(CheckpointerCommLock, LW_EXCLUSIVE); /* Count all backend writes regardless of if they fit in the queue */ ! if (!am_bg_writer) ! CheckpointerShmem->num_backend_writes++; /* * If the checkpointer isn't running or the request queue is full, the *** ForwardFsyncRequest(RelFileNode rnode, F *** 1150,1156 * Count the subset of writes where backends have to do their own * fsync */ ! CheckpointerShmem->num_backend_fsync++; LWLockRelease(CheckpointerCommLock); return false; } --- 1153,1160 * Count the subset of writes where backends have to do their own * fsync */ ! if (!am_bg_writer) ! CheckpointerShmem->num_backend_fsync++; LWLockRelease(CheckpointerCommLock); return false; } diff --git a/src/include/postmas