Re: [HACKERS] silent data loss with ext4 / all current versions

Tomas Vondra Fri, 22 Jan 2016 18:40:39 -0800

On 01/23/2016 02:35 AM, Michael Paquier wrote:

On Fri, Jan 22, 2016 at 9:41 PM, Greg Stark <[email protected]> wrote:

On Fri, Jan 22, 2016 at 8:26 AM, Tomas Vondra
<[email protected]> wrote:

On 01/22/2016 06:45 AM, Michael Paquier wrote:

So, I have been playing with a Linux VM with VMware Fusion and on
ext4 with data=ordered the renames are getting lost if the root
folder is not fsync. By killing-9 the VM I am able to reproduce that
really easily.



Yep. Same experience here (with qemu-kvm VMs).


I still think a better approach for this is to run the database on an
LVM volume and take lots of snapshots. No VM needed, though it doesn't
hurt. LVM volumes are below the level of the filesystem and a snapshot
captures the state of the raw blocks the filesystem has written to the
block layer. The block layer does no caching though the drive may but
neither the VM solution nor LVM would capture that.

LVM snapshots would have the advantage that you can keep running the
database and you can take lots of snapshots with relatively little
overhead. Having dozens or hundreds of snapshots would be unacceptable
performance drain in production but for testing it should be practical
and they take relatively little space -- just the blocks changed since
the snapshot was taken.


Another idea: hardcode a PANIC just after rename() with
restart_after_crash = off (this needs is IsBootstrapProcess() checks).
Once server crashes, kill-9 the VM. Then restart the VM and the
Postgres instance with a new binary that does not have the PANIC, and
see how things are moving on. There is a window of up to several
seconds after the rename() call, so I guess that this would work.

I don't see how that would improve anything, as the PANIC has no impacton the I/O requests already issued to the system. What you need is somesort of coordination between the database and the script that kills theVM (or takes a LVM snapshot).

That can be done by simply emitting a particular log message, and the"kill script" may simply watch the file (for example over SSH). This hasthe benefit that you can also watch for additional conditions that aredifficult to check from that particular part of the code (and only killthe VM when all of them trigger - for example only on the thirdcheckpoint since start, and such).

The reason why I was not particularly thrilled about the LVM snapshotidea is that to identify this particular data loss issue, you need to beable to reason about the expected state of the database (whattransactions are committed, how many segments are there). And myunderstanding was that Greg's idea was merely "try to start the DB on asnapshot and see if starts / is not corrupted," which would not workwith this particular issue, as the database seemed just fine - the dataloss is silent. Adding the "last XLOG segment" into pg_controldata wouldmake it easier to detect without having to track details about whichtransactions got committed.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

Reply via email to