Hi,

On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> +     /*
> +      * c) we already seen the xl_running_xacts and tried to do the above.
> +      * However because of race condition in LogStandbySnapshot() there might
> +      * have been transaction reported as running but in reality has already
> +      * written commit record before the xl_running_xacts so decoding has
> +      * missed it. We now see xl_running_xacts that suggests all transactions
> +      * from the original one were closed but the consistent state wasn't
> +      * reached which means the race condition has indeed happened.
> +      *
> +      * Start tracking again as if this was the first xl_running_xacts we've
> +      * seen, with the advantage that because decoding was already running,
> +      * any transactions committed before the xl_running_xacts record will be
> +      * known to us so we won't hit with the same issue again.
> +      */

Unfortunately I don't think that's true, as coded.  You're using
information about committed transactions:

> +     else if (TransactionIdFollows(running->oldestRunningXid,
> +                                                               
> builder->running.xmax))
> +     {
> +             int off;
> +
> +             SnapBuildStartXactTracking(builder, running);
> +
>               /*
> +              * Nark any transactions that are known to have committed 
> before the
> +              * xl_running_xacts as finished to avoid the race condition in
> +              * LogStandbySnapshot().
>                *
> +              * We can use SnapBuildEndTxn directly as it only does the
> +              * transaction running check and handling without any additional
> +              * side effects.
>                */
> +             for (off = 0; off < builder->committed.xcnt; off++)
> +                     SnapBuildEndTxn(builder, lsn, 
> builder->committed.xip[off]);

but a transaction might just have *aborted* before the new snapshot, no?
Since we don't keep track of those, I don't think this guarantees anything?

ISTM, we need a xip_status array in SnapBuild->running.  Then, whenever
a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
the xid is not xl_running_xacts?  Does that make sense?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to