Andres Freund <and...@2ndquadrant.com> writes:
> On 2014-01-21 21:42:19 -0500, Tom Lane wrote:
>> Uh, what? The behavior I'm talking about is *exactly the same*
>> as what happens now. The only change is that the data sent to the
>> WAL file is laid out a bit differently, and the replay logic has
>> to work harder to reassemble it before it can apply the commit or
>> abort action. If anything outside replay can detect a difference
>> at all, that would be a bug.
>> Once again: the replayer is not supposed to act immediately on the
>> subsidiary records. It's just supposed to remember their contents
>> so it can reattach them to the eventual commit or abort record,
>> and then do what it does today to replay the commit or abort.
> I (think) I get what you want to do, but splitting the record like that
> nonetheless opens up behaviour that previously wasn't there.
Obviously we are not on the same page yet.
In my vision, the WAL writer is dumping the same data it would have
dumped, though in a different layout, and it's working from process-local
state same as it does now. The WAL replayer is taking the same actions at
the same time using the same data as it does now. There is no "behavior
that wasn't there", unless you're claiming that there are *existing* race
conditions in commit/abort WAL processing.
The only thing that seems mildly squishy about this is that it's not clear
how long the WAL replayer ought to hang onto subsidiary records for a
commit or abort it hasn't seen yet. In the case where we change our minds
and abort a transaction after already having written some subsidiary
records for the commit, it's not really a problem; the replayer can throw
away any saved data related to the commit of xid N as soon as it sees an
abort for xid N. However, what if the session crashes and never writes
either a final commit or abort record? I think we can deal with this
fairly easily though, because that case should end with a crash recovery
cycle writing a shutdown checkpoint to the log (we do do that no?).
So the rule can be "discard any unmatched subsidiary records if you see a
shutdown checkpoint". This makes sense on its own terms since there are
surely no active transactions at that point in the log.
regards, tom lane
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: