On 20 September 2016 at 22:46, Robert Haas <robertmh...@gmail.com> wrote:

> I did some cleanup of 0001 (see attached) and was all set to commit it
> when I realized what I think is a problem: holding XidGenLock doesn't
> seem to help with the race condition between this function and CLOG
> truncation, because vac_truncate_clog() updates the shared memory
> limits AFTER performing the truncation.

Thanks ... and drat.

> If the order of those
> operations were reversed, we'd be fine, because it would get stuck
> trying to update the shared memory limits and wouldn't be able to
> truncate until it did - and conversely, if it updated the shared
> memory limits before we examined them, that would be OK, too, because
> we'd be sure not to consult the pages that are about to be truncated.

I'm hesitant to mess with something that fundamental for what I was
hoping was a low-impact feature, albeit one that seems to be trying
hard not to be at every turn.  It looks pretty reasonable to update
oldestXid before clog truncation but I don't want to be wrong and
create some obscure race or crash recovery issue related to wraparound
and clog truncation. It could well be a problem if we're very close to

So far nothing has had any reason to care about this, since there's no
way to attempt to access an older xid in a normally functioning
system. The commit timestamp lookup code doesn't care whether the xid
is still in clog or not and nothing else does lookups based on xids
supplied by the user. If anything else did or does in future it will
have the same problem as txid_status.

We've already ruled out releasing the slru LWLock in SlruReportIOError
then PG_TRY / PG_CATCH()ing clog access errors in txid_status() per my
original approach to this issue.

So I see a few options now:

* Do nothing. Permit this race to exist, document the error, and
expect apps to cope. I'm pretty tempted to go for exactly this since
it pushes the cost onto users of the feature and doesn't make a mess
elsewhere. People who use this will typically be invoking it as a
standalone toplevel function anyway, so it's mostly just a bit of
noise in the error logs if you lose the race - and we have plenty of
other sources of that already.

* Rather than calling TransactionIdDidCommit / TransactionIdDidAbort,
call clog.c's TransactionIdGetStatus with a new missing_ok flag. That
sets a bool* missing  param added to SimpleLruReadPage_ReadOnly(...)
and in turn to SimpleLruReadPage(...) that's set instead of calling
SlruReportIOError(...). This seems rather intrusive and will add
little-used params and paths to fairly hot slru.c code so I'm not

* Take CLogControlLock LW_SHARED in txid_status() to prevent
truncation before reading oldestXid. We'd need a way to pass an
"already locked" state through TransactionIdGetStatus(...) to
SimpleLruReadPage_ReadOnly(...), which isn't great since again it's
pretty hot code.

* Don't release the slru LWLock in SlruReportIOError; instead release
CLogControlLock from txid_status on clog access failure. As before
means PG_TRY / PG_CATCH without PG_RE_THROW, but it means the only
place it affects is callers of txid_status(...). For added safety,
restrict txid_status() to being called in a toplevel virtual xact so
we know we'll finish up promptly. It's a horrible layering violation
having txid_status(...) release the clog control lock though, and
seems risky.

The only non-horrible one of those is IMO to just let the caller see
an error if they lose the race. It's a function that's intended for
use when you're dealing with indeterminate transaction state after a
server or application error anyway, so it's part of an error path
already. So I say we just document the behaviour. Because slru.c
doesn't release its LWLock on error we also need to ensure
txid_status(...) is also only called from a toplevel xact so the user
doesn't attempt to wrap it in plpgsql BEGIN ... EXCEPTION block and it
causes the xact to abort.

Will follow up with just that.

 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to