>> On 2 Mar 2017, at 11:00, Craig Ringer <cr...@2ndquadrant.com> wrote:
>> 
>> BTW, I've been reviewing the patch in more detail. Other than a bunch
>> of copy-and-paste that I'm cleaning up, the main issue I've found is
>> that in DecodePrepare, you call:
>> 
>>   SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>>                      parsed->nsubxacts, parsed->subxacts);
>> 
>> but I am not convinced it is correct to call it at PREPARE TRANSACTION
>> time, only at COMMIT PREPARED time. We want to see the 2pc prepared
>> xact's state when decoding it, but there might be later commits that
>> cannot yet see that state and shouldn't have it visible in their
>> snapshots. 
> 
> Agree, that is problem. That allows to decode this PREPARE, but after that
> it is better to mark this transaction as running in snapshot or perform 
> prepare
> decoding with some kind of copied-end-edited snapshot. I’ll have a look at 
> this.
> 

While working on this i’ve spotted quite a nasty corner case with aborted 
prepared
transaction. I have some not that great ideas how to fix it, but maybe i 
blurred my
view and missed something. So want to ask here at first.

Suppose we created a table, then in 2pc tx we are altering it and after that 
aborting tx.
So pg_class will have something like this:

xmin | xmax | relname
100  | 200    | mytable
200  | 0        | mytable

After previous abort, tuple (100,200,mytable) becomes visible and if we will 
alter table
again then xmax of first tuple will be set current xid, resulting in following 
table:

xmin | xmax | relname
100  | 300    | mytable
200  | 0        | mytable
300  | 0        | mytable

In that moment we’ve lost information that first tuple was deleted by our 
prepared tx.
And from POV of historic snapshot that will be constructed to decode prepare 
first
tuple is visible, but actually send tuple should be used. Moreover such 
snapshot could
see both tuples violating oid uniqueness, but heapscan stops after finding 
first one.

I see here two possible workarounds:

* Try at first to scan catalog filtering out tuples with xmax bigger than 
snapshot->xmax
as it was possibly deleted by our tx. Than if nothing found scan in a usual way.

* Do not decode such transaction at all. If by the time of decoding prepare 
record we
already know that it is aborted than such decoding doesn’t have a lot of sense.
IMO intended usage of logical 2pc decoding is to decide about commit/abort based
on answers from logical subscribers/replicas. So there will be barrier between
prepare and commit/abort and such situations shouldn’t happen.

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to