Robert Haas <robertmh...@gmail.com> writes:
> 2010/6/30 Tom Lane <t...@sss.pgh.pa.us>:
>> Surely you'd have to roll back, not commit, in that situation.  You have
>> no excuse for assuming that you've replayed all effects of the
>> transaction.

> Hmm, good point.  But you could make it work either way, I think.  If
> you flush WAL stream A, write commit record to WAL stream B, flush WAL
> stream B, write commit record to WAL stream A, then commit is correct.

I don't think so.  "I flushed this" is not equivalent to "it is certain
that it will be possible to read this again".  In particular, corruption
of WAL stream A leaves you in trouble if you take the commit on B as a
certificate for stream A being complete.

(thinks for a bit...)  Maybe if the commit record on B included a
minimum stopping point for stream A, it'd be all right.  This wouldn't
be exactly the expected LSN of the A commit record, mind you, because
you don't want to block insertions into the A stream while you're
flushing B.  But it would say that all non-commit records for the xact
on stream A are known to be before that point.  If you've replayed A
that far then you can take the transaction as being committable.

(thinks some more...)  No, you still lose, because a commit record isn't
just a single bit.  What about subtransactions for example?  I guess
maybe the commit record written/flushed first is the real commit record
with all the auxiliary data, and the one written second isn't so much
a commit record as a fencepoint record to prevent advancing beyond that
point in stream A before you've processed the relevant commit from B.

(thinks some more...)  Maybe you don't even need the fencepoint record
per se.  I think all it's doing for you is making sure you don't process
commit records on different streams out-of-order.  There might be some
other, more direct way to do that.

(thinks yet more...)  Actually the weak point in this scheme is that it
wouldn't serialize transactions that occur in different databases and
don't touch any shared catalogs.  It'd be entirely possible for T1 in
DB1 to be reported committed, then T2 in DB2 to be reported committed,
then a crash occurs after which T2 is seen committed and T1 not.  While
this would be all right if the clients for T1 and T2 can't communicate,
that isn't the real world.

                        regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to