[HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-24 Thread Greg Stark
On Wed, Jul 18, 2012 at 1:13 AM, Craig Ringer ring...@ringerc.id.au wrote:

 That makes me wonder if on top of the buildfarm, extending some buildfarm
 machines into a crashfarm is needed:

 - Keep kvm instances with copy-on-write snapshot disks and the build env
 on them
 - Fire up the VM, do a build, and start the server
 - From outside the vm have the test controller connect to the server and
 start a test run
 - Hard-kill the OS instance at a random point in time.


For what it's worth you don't need to do a hard kill of the vm and start
over repeatedly to kill at different times. You could take a snapshot of
the disk storage and keep running. You could take many snapshots from a
single run. Each snapshot would represent the storage that would exist if
the machine had crashed at the point in time that the snapshot was taken.

You do want the snapshots to be taken using something outside the virtual
machine. Either the kvm storage layer or using lvm on the host. But not
using lvm on the guest virtual machine.

And yes, the hard part that always stopped me from looking at this was
having any way to test the correctness of the data.

-- 
greg


Re: [HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-18 Thread Tom Lane
Craig Ringer ring...@ringerc.id.au writes:
 On 07/18/2012 08:31 AM, Tom Lane wrote:
 Not sure if we need a whole farm, but certainly having at least one
 machine testing this sort of stuff on a regular basis would make me feel
 a lot better.

 OK. That's something I can actually be useful for.

 My current qemu/kvm test harness control code is in Python since that's 
 what all the other tooling for the project I was using it for is in. Is 
 it likely to be useful for me to adapt that code for use for a Pg 
 crash-test harness, or will you need a particular tool/language to be 
 used? If so, which/what? I'll do pretty much anything except Perl. I'll 
 have a result for you more quickly working in Python, though I'm happy 
 enough to write it in C (or Java, but I'm guessing that won't get any 
 enthusiasm around here).

If we were talking about code that was going to end up in the PG
distribution, I'd kind of want it to be in C or Perl, just to keep down
the number of languages we're depending on.  However, it's not obvious
that a tool like this would ever go into our distribution.  I'd suggest
working with what you're comfortable with, and we can worry about
translation when and if there's a reason to.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-18 Thread Robert Haas
On Tue, Jul 17, 2012 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So I went to fix this in the obvious way (attached), but while testing
 it I found that the number of buffers_backend events reported during
 a regression test run barely changed; which surprised the heck out of
 me, so I dug deeper.  The cause turns out to be extremely scary:
 ForwardFsyncRequest isn't getting called at all in the bgwriter process,
 because the bgwriter process has a pendingOpsTable.  So it just queues
 its fsync requests locally, and then never acts on them, since it never
 runs any checkpoints anymore.

:-(

 This implies that nobody has done pull-the-plug testing on either HEAD
 or 9.2 since the checkpointer split went in (2011-11-01), because even
 a modicum of such testing would surely have shown that we're failing to
 fsync a significant fraction of our write traffic.

 Furthermore, I would say that any performance testing done since then,
 if it wasn't looking at purely read-only scenarios, isn't worth the
 electrons it's written on.  In particular, any performance gain that
 anybody might have attributed to the checkpointer splitup is very
 probably hogwash.

I don't think anybody thought that was going to result in a direct
performance gain, but I agree the performance testing needs to be
redone.  I suspect that the impact on my testing is limited, because I
do mostly pgbench testing, and the lost fsync requests were probably
duplicated by non-lost fsync requests from backend writes.  But I
agree that it needs to be redone once this is fixed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-17 Thread Craig Ringer

On 07/18/2012 06:56 AM, Tom Lane wrote:

Robert Haas robertmh...@gmail.com writes:

On Mon, Jul 16, 2012 at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:

BTW, while we are on the subject: hasn't this split completely broken
the statistics about backend-initiated writes?

Yes, it seems to have done just that.

So I went to fix this in the obvious way (attached), but while testing
it I found that the number of buffers_backend events reported during
a regression test run barely changed; which surprised the heck out of
me, so I dug deeper.  The cause turns out to be extremely scary:
ForwardFsyncRequest isn't getting called at all in the bgwriter process,
because the bgwriter process has a pendingOpsTable.  So it just queues
its fsync requests locally, and then never acts on them, since it never
runs any checkpoints anymore.

This implies that nobody has done pull-the-plug testing on either HEAD
or 9.2 since the checkpointer split went in (2011-11-01)


That makes me wonder if on top of the buildfarm, extending some 
buildfarm machines into a crashfarm is needed:


- Keep kvm instances with copy-on-write snapshot disks and the build env 
on them

- Fire up the VM, do a build, and start the server
- From outside the vm have the test controller connect to the server and 
start a test run

- Hard-kill the OS instance at a random point in time.
- Start the OS instance back up
- Start Pg back up and connect to it again
- From the test controller, test the Pg install for possible corruption 
by reading the indexes and tables, doing some test UPDATEs, etc.


The main challenge would be coming up with suitable tests to run, ones 
that could then be checked to make sure nothing was broken. The test 
controller would know how far a test got before the  OS got killed and 
would know which test it was running, so it'd be able to check for 
expected data if provided with appropriate test metadata. Use of enable_ 
flags should permit scans of indexes and table heaps to be forced.


What else should be checked? The main thing that comes to mind for me is 
something I've worried about for a while: that Pg might not always 
handle out-of-disk-space anywhere near as gracefully as it's often 
claimed to. There's no automated testing for that, so it's hard to 
really know. A harnessed VM could be used to test that. Instead of 
virtual plug pull tests it could generate a virtual disk of constrained 
random size, run its tests until out-of-disk caused failure, stop Pg, 
expand the disk, restart Pg, and run its checks.


Variants where WAL was on a separate disk and only WAL or only the main 
non-WAL disk run out of space would also make sense and be easy to 
produce with such a harness.


I've written some automated kvm test harnesses, so I could have a play 
with this idea. I would probably need some help with the test design, 
though, and the guest OS would be Linux, Linux, or Linux at least to 
start with.


Opinions?

--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-17 Thread Tom Lane
Craig Ringer ring...@ringerc.id.au writes:
 On 07/18/2012 06:56 AM, Tom Lane wrote:
 This implies that nobody has done pull-the-plug testing on either HEAD
 or 9.2 since the checkpointer split went in (2011-11-01)

 That makes me wonder if on top of the buildfarm, extending some 
 buildfarm machines into a crashfarm is needed:

Not sure if we need a whole farm, but certainly having at least one
machine testing this sort of stuff on a regular basis would make me feel
a lot better.

 The main challenge would be coming up with suitable tests to run, ones 
 that could then be checked to make sure nothing was broken.

One fairly simple test scenario could go like this:

* run the regression tests
* pg_dump the regression database
* run the regression tests again
* hard-kill immediately upon completion
* restart database, allow it to perform recovery
* pg_dump the regression database
* diff previous and new dumps; should be the same

The main thing this wouldn't cover is discrepancies in user indexes,
since pg_dump doesn't do anything that's likely to result in indexscans
on user tables.  It ought to be enough to detect the sort of system-wide
problem we're talking about here, though.

In general I think the hard part is automated reproduction of an
OS-crash scenario, but your ideas about how to do that sound promising.
Once we have that going, it shouldn't be hard to come up with tests
of the form do X, hard-crash, recover, check X still looks sane.

 What else should be checked? The main thing that comes to mind for me is 
 something I've worried about for a while: that Pg might not always 
 handle out-of-disk-space anywhere near as gracefully as it's often 
 claimed to.

+1

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

2012-07-17 Thread Craig Ringer

On 07/18/2012 08:31 AM, Tom Lane wrote:

Not sure if we need a whole farm, but certainly having at least one
machine testing this sort of stuff on a regular basis would make me feel
a lot better.


OK. That's something I can actually be useful for.

My current qemu/kvm test harness control code is in Python since that's 
what all the other tooling for the project I was using it for is in. Is 
it likely to be useful for me to adapt that code for use for a Pg 
crash-test harness, or will you need a particular tool/language to be 
used? If so, which/what? I'll do pretty much anything except Perl. I'll 
have a result for you more quickly working in Python, though I'm happy 
enough to write it in C (or Java, but I'm guessing that won't get any 
enthusiasm around here).



One fairly simple test scenario could go like this:

* run the regression tests
* pg_dump the regression database
* run the regression tests again
* hard-kill immediately upon completion
* restart database, allow it to perform recovery
* pg_dump the regression database
* diff previous and new dumps; should be the same

The main thing this wouldn't cover is discrepancies in user indexes,
since pg_dump doesn't do anything that's likely to result in indexscans
on user tables.  It ought to be enough to detect the sort of system-wide
problem we're talking about here, though.


It also won't detect issues that only occur during certain points in 
execution, under concurrent load, etc. Still, a start, and I could look 
at extending it into some kind of crash fuzzing once the basics were 
working.



In general I think the hard part is automated reproduction of an
OS-crash scenario, but your ideas about how to do that sound promising.


It's worked well for other testing I've done. Any writes that're still 
in the guest OS's memory, write queues, etc are lost when kvm is killed, 
just like a hard crash. Anything the kvm guest has flushed to disk is 
on the host and preserved - either on the host's disks 
(cache=writethrough) or at least in dirty writeback buffers in ram 
(cache=writeback).


kvm can even do a decent job of simulating a BBU-equipped write-through 
volume by allowing the host OS to do write-back caching of KVM's backing 
device/files. You don't get to set a max write-back cache size directly, 
but Linux I/O writeback settings provide some control.


My favourite thing about kvm is that it's just another command. It can 
be run headless and controlled via virtual serial console and/or its 
monitor socket. It doesn't require special privileges and can operate on 
ordinary files. It's very well suited for hooking into test harnesses.


The only challenge with using kvm/qemu is that there have been some 
breaking changes and a couple of annoying bugs that mean I won't be able 
to support anything except pretty much the latest versions initially. 
kvm is easy to compile and has limited dependencies, so I don't expect 
that to be an issue, but thought it was worth raising.


--
Craig Ringer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers