On 17/03/17 03:34, Craig Ringer wrote:
> On 17 March 2017 at 08:10, Stas Kelvich <s.kelv...@postgrespro.ru> wrote:
>> While working on this i’ve spotted quite a nasty corner case with aborted 
>> prepared
>> transaction. I have some not that great ideas how to fix it, but maybe i 
>> blurred my
>> view and missed something. So want to ask here at first.
>> Suppose we created a table, then in 2pc tx we are altering it and after that 
>> aborting tx.
>> So pg_class will have something like this:
>> xmin | xmax | relname
>> 100  | 200    | mytable
>> 200  | 0        | mytable
>> After previous abort, tuple (100,200,mytable) becomes visible and if we will 
>> alter table
>> again then xmax of first tuple will be set current xid, resulting in 
>> following table:
>> xmin | xmax | relname
>> 100  | 300    | mytable
>> 200  | 0        | mytable
>> 300  | 0        | mytable
>> In that moment we’ve lost information that first tuple was deleted by our 
>> prepared tx.
> Right. And while the prepared xact has aborted, we don't control when
> it aborts and when those overwrites can start happening. We can and
> should check if a 2pc xact is aborted before we start decoding it so
> we can skip decoding it if it's already aborted, but it could be
> aborted *while* we're decoding it, then have data needed for its
> snapshot clobbered.
> This hasn't mattered in the past because prepared xacts (and
> especially aborted 2pc xacts) have never needed snapshots, we've never
> needed to do something from the perspective of a prepared xact.
> I think we'll probably need to lock the 2PC xact so it cannot be
> aborted or committed while we're decoding it, until we finish decoding
> it. So we lock it, then check if it's already aborted/already
> committed/in progress. If it's aborted, treat it like any normal
> aborted xact. If it's committed, treat it like any normal committed
> xact. If it's in progress, keep the lock and decode it.
> People using logical decoding for 2PC will presumably want to control
> 2PC via logical decoding, so they're not so likely to mind such a
> lock.
>> * Try at first to scan catalog filtering out tuples with xmax bigger than 
>> snapshot->xmax
>> as it was possibly deleted by our tx. Than if nothing found scan in a usual 
>> way.
> I don't think that'll be at all viable with the syscache/relcache
> machinery. Way too intrusive.

I think only genam would need changes to do two-phase scan for this as
the catalog scans should ultimately go there. It's going to slow down
things but we could limit the impact by doing the two-phase scan only
when historical snapshot is in use and the tx being decoded changed
catalogs (we already have global knowledge of the first one, and it
would be trivial to add the second one as we have local knowledge of
that as well).

What I think is better strategy than filtering out by xmax would be
filtering "in" by xmin though. Meaning that first scan would return only
tuples modified by current tx which are visible in snapshot and second
scan would return the other visible tuples. That way whatever the
decoded tx seen should always win.

  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to