On 5/9/13 5:18 PM, Jeff Davis wrote:
On Thu, 2013-05-09 at 14:28 -0500, Jim Nasby wrote:
What about moving some critical data from the beginning of the WAL
record to the end? That would make it easier to detect that we don't
have a complete record. It wouldn't necessarily replace the CRC
though, so maybe that's not good enough.
Actually, what if we actually *duplicated* some of the same WAL header
info at the end of the record? Given a reasonable amount of data that
would damn-near ensure that a torn record was detected, because the
odds of having the exact same sequence of random bytes would be so
low. Potentially even just duplicating the LSN would suffice.
I think both of these ideas have some false positives and false
negatives.
If the corruption happens at the record boundary, and wipes out the
special information at the end of the record, then you might think it
was not fully flushed, and we're in the same position as today.
If the WAL record is large, and somehow the beginning and the end get
written to disk but not the middle, then it will look like corruption;
but really the WAL was just not completely flushed. This seems pretty
unlikely, but not impossible.
That being said, I like the idea of introducing some extra checks if a
perfect solution is not possible.
Yeah, I don't think a perfect solution is possible, short of attempting to tie
directly into the filesystem (ie: on a journaling FS have some way to
essentially treat the FS journal as WAL).
One additional step we might be able to take would be to scan forward looking
for a record that would tell us when an fsync must have occurred (heck, maybe
we should add an fsync WAL record...). If we find a corrupt WAL record followed
by an fsync we know that we've now lost data. That closes some of the holes.
Actually, that might handle all the holes...
On the separate write idea, if that could be controlled by a GUC I
think it'd be worth doing. Anyone that needs to worry about this
corner case probably has hardware that would support that.
It sounds pretty easy to do that naively. I'm just worried that the
performance will be so bad for so many users that it's not a very
reasonable choice.
Today, it would probably make more sense to just use sync rep. If the
master's WAL is corrupt, and it starts up too early, then that should be
obvious when you try to reconnect streaming replication. I haven't tried
it, but I'm assuming that it gives a useful error message.
I wonder if there are DW environments that are too large to keep a SR copy but
would be able to afford the double-write overhead.
BTW, isn't performance what killed the double-buffer idea?
--
Jim C. Nasby, Data Architect j...@nasby.net
512.569.9461 (cell) http://jim.nasby.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers