On 27 September 2016 at 09:23, Craig Ringer <cr...@2ndquadrant.com> wrote: > On 21 September 2016 at 22:16, Robert Haas <robertmh...@gmail.com> wrote: >> >> It might not be too hard to add a second copy of oldestXid in shared >> memory that is updated before truncation rather than afterward... but >> yeah, like you, I'm not finding this nearly as straightforward as >> might have been hoped. > > Yeah. > > I suspect that'll be the way to go, to add another copy that's updated > before clog truncation. It just seems ... unclean. Like it shouldn't > be necessary, something else isn't right. But it's probably the lowest > pain option.
OK, so summarizing the problem: slru.c and clog.c have no soft-failure entrypoints where we can look it up and fail gracefully if the xid isn't in the clog. vac_truncate_clog() calls TruncateCLOG to chop SLRU pages from the clog before it takes XidGenLock to advance oldestXid. So we cannot use oldestXid protected by XidGenLock as an interlock against concurrent clog truncation to prevent SLRU access errors looking up an xid, and there's no other candidate lock. This hasn't been an issue before because nobody looks up arbitrary user-supplied XIDs in clog, we only look up things that're protected by datfrozenxid etc. But the whole point of txid_status is to look up user-supplied xids. We can't just take ClogControlLock from txid_status() to block truncation because clog.c expects to own that lock, and takes it (via slru.c) in TransactionIdGetStatus, with no way to pass an already_locked state. We'd self-deadlock. Adding a second copy of oldestXid - say pendingOldestXid - won't actually help us unless we also take some suitable LWLock before updating it, otherwise truncation can continue after we look at pendingOldestXid but before we do the clog lookup. That means an extra LWLock for each clog truncation, but compared to the I/O done during clog truncation and the cost of the SlruScanDirectory() pre-check done by TruncateCLOG it's nothing. We could take XidGenLock twice, once to update this new pendingOldestXid field and once to update oldestXid, but that's a highly contested lock I'd rather not mess with even on a path that's not hit much. Instead, I've added a new LWLock, ClogTruncationLock, for that purpose. vac_truncate_clog() takes it if it decides to attempt clog truncation. This lock is held throughout the whole process of clog truncation and oldestXid advance, so there's no need for a new pendingOldestXid field in ShmemVariableCache. We just take the lock then look at oldestXid, knowing that it's guaranteed to correspond to an existing clog page that won't go away while we're looking. ClogTruncationLock is utterly uncontested so it's going to have no meaningful impact compared to all the work we do scanning the clog to decide whether we're even going to try truncating it, etc. (BTW, it seems like a pity that lwlocknames.txt doesn't have comments on each lock. We could have src/backend/storage/lmgr/generate-lwlocknames.pl transform the # comments into /* comments */ on the generated header. Thoughts?) -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
From 0fcfc84226afa4901277a43ee78ec34a30322e8f Mon Sep 17 00:00:00 2001 From: Craig Ringer <cr...@2ndquadrant.com> Date: Wed, 21 Dec 2016 15:37:29 +0800 Subject: [PATCH 1/3] Introduce txid_status(bigint) to get status of an xact If an application loses its connection while a COMMIT request is in flight, the backend crashes mid-commit, etc, then an application may not be sure whether or not a commit completed successfully or was rolled back. While two-phase commit solves this it does so at a considerable overhead, so introduce a lighter alternative. txid_status(bigint) lets an application determine the status of a a commit based on an xid-with-epoch as returned by txid_current() or similar. Status may be committed, aborted, in-progress (including prepared xacts) or null if the xact is too old for its commit status to still be retained because it has passed the wrap-around epoch boundary. Applications must call txid_current() in their transactions to make much use of this since PostgreSQL does not automatically report an xid to the client when one is assigned. Introduces TransactionIdInRecentPast(...) for the use of other functions that need similar logic in future. There was previously no way to look up an arbitrary xid without running the risk of having clog truncated out from under you. This hasn't been a problem because anything looking up xids in clog knows they're protected by datminxid, but that's not the case for arbitrary user-supplied XIDs. clog is truncated before we advance oldestXid so taking XidGenLock is insufficient, and there's no way to look up a SLRU with soft-failure. So we introduce ClogTruncationLock to guard against concurrent clog truncation. --- doc/src/sgml/func.sgml | 31 +++++++ src/backend/access/transam/clog.c | 2 + src/backend/access/transam/commit_ts.c | 2 + src/backend/access/transam/multixact.c | 1 + src/backend/commands/vacuum.c | 13 +++ src/backend/storage/lmgr/lwlocknames.txt | 1 + src/backend/utils/adt/txid.c | 141 +++++++++++++++++++++++++++++++ src/include/catalog/pg_proc.h | 2 + src/include/utils/builtins.h | 1 + src/test/regress/expected/txid.out | 68 +++++++++++++++ src/test/regress/sql/txid.sql | 38 +++++++++ 11 files changed, 300 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 47fcb30..3123232 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -17143,6 +17143,10 @@ SELECT collation for ('foo' COLLATE "de_DE"); <primary>txid_visible_in_snapshot</primary> </indexterm> + <indexterm> + <primary>txid_status</primary> + </indexterm> + <para> The functions shown in <xref linkend="functions-txid-snapshot"> provide server transaction information in an exportable form. The main @@ -17193,6 +17197,11 @@ SELECT collation for ('foo' COLLATE "de_DE"); <entry><type>boolean</type></entry> <entry>is transaction ID visible in snapshot? (do not use with subtransaction ids)</entry> </row> + <row> + <entry><literal><function>txid_status(<parameter>bigint</parameter>)</function></literal></entry> + <entry><type>txid_status</type></entry> + <entry>report the status of the given xact - <literal>committed</literal>, <literal>aborted</literal>, <literal>in progress</literal>, or NULL if the xid is too old</entry> + </row> </tbody> </tgroup> </table> @@ -17263,6 +17272,28 @@ SELECT collation for ('foo' COLLATE "de_DE"); </para> <para> + <function>txid_status(bigint)</> reports the commit status of a recent + transaction. Applications may use it to determine whether a transaction + committed or aborted when the application and/or database server crashed or + lost connection while a <literal>COMMIT</literal> command was in progress. + The status of a transaction will be reported as one of: + <itemizedlist> + <listitem><para><literal>'in progress'</></></> + <listitem><para><literal>'committed'</></></> + <listitem><para><literal>'aborted'</></></> + <listitem><para><literal>NULL</> if xid too old</></> + </itemizedlist> + PostgreSQL discards the commit status transactions after no references to + the transaction survive in other active transactions, tables, replication + slots, etc. This means that the status of older transactions cannot be + determined. <function>txid_status(bigint)</> returns <literal>NULL</> if a + transaction is too old to look up. Prepared transactions are reported as + <literal>in progress</>; applications must check <link + linkend="view-pg-prepared-xacts"><literal>pg_prepared_xacts</></> if they + need to determine if the xid is a prepared transaction. + </para> + + <para> The functions shown in <xref linkend="functions-commit-timestamp"> provide information about transactions that have been already committed. These functions mainly provide information about when the transactions diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 2634476..b35438e 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -644,6 +644,8 @@ TruncateCLOG(TransactionId oldestXact) { int cutoffPage; + Assert(LWLockHeldByMe(ClogTruncationLock)); + /* * The cutoff point is the start of the segment containing oldestXact. We * pass the *page* containing oldestXact to SimpleLruTruncate. diff --git a/src/backend/access/transam/commit_ts.c b/src/backend/access/transam/commit_ts.c index a8d275f..6d71a15 100644 --- a/src/backend/access/transam/commit_ts.c +++ b/src/backend/access/transam/commit_ts.c @@ -805,6 +805,8 @@ TruncateCommitTs(TransactionId oldestXact) { int cutoffPage; + Assert(LWLockHeldByMe(ClogTruncationLock)); + /* * The cutoff point is the start of the segment containing oldestXact. We * pass the *page* containing oldestXact to SimpleLruTruncate. diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index e9588a7..a2875e6 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2937,6 +2937,7 @@ TruncateMultiXact(MultiXactId newOldestMulti, Oid newOldestMultiDB) Assert(!RecoveryInProgress()); Assert(MultiXactState->finishedStartup); + Assert(LWLockHeldByMe(ClogTruncationLock)); /* * We can only allow one truncation to happen at once. Otherwise parts of diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 58bbf55..ddc486b 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -1153,6 +1153,17 @@ vac_truncate_clog(TransactionId frozenXID, return; /* + * Because we advance oldestXid only after truncating the clog, holding + * XidGenLock is insufficient to be sure that a given xid will still exist + * in clog when we go to look up its state. Code that may look up arbitrary + * and possibly old xids should take ClogTruncationLock in shared mode to + * guard against concurrent clog truncation. If such code also takes + * XidGenLock it must be taken after ClogTruncationLog to guard against + * deadlocks. + */ + LWLockAcquire(ClogTruncationLock, LW_EXCLUSIVE); + + /* * Truncate CLOG, multixact and CommitTs to the oldest computed value. */ TruncateCLOG(frozenXID); @@ -1168,6 +1179,8 @@ vac_truncate_clog(TransactionId frozenXID, SetTransactionIdLimit(frozenXID, oldestxid_datoid); SetMultiXactIdLimit(minMulti, minmulti_datoid); AdvanceOldestCommitTsXid(frozenXID); + + LWLockRelease(ClogTruncationLock); } diff --git a/src/backend/storage/lmgr/lwlocknames.txt b/src/backend/storage/lmgr/lwlocknames.txt index f8996cd..7aebeeb 100644 --- a/src/backend/storage/lmgr/lwlocknames.txt +++ b/src/backend/storage/lmgr/lwlocknames.txt @@ -47,3 +47,4 @@ CommitTsLock 39 ReplicationOriginLock 40 MultiXactTruncationLock 41 OldSnapshotTimeMapLock 42 +ClogTruncationLock 43 diff --git a/src/backend/utils/adt/txid.c b/src/backend/utils/adt/txid.c index 276075e..7af0abb 100644 --- a/src/backend/utils/adt/txid.c +++ b/src/backend/utils/adt/txid.c @@ -21,6 +21,7 @@ #include "postgres.h" +#include "access/clog.h" #include "access/transam.h" #include "access/xact.h" #include "access/xlog.h" @@ -28,6 +29,7 @@ #include "miscadmin.h" #include "libpq/pqformat.h" #include "postmaster/postmaster.h" +#include "storage/lwlock.h" #include "utils/builtins.h" #include "utils/memutils.h" #include "utils/snapmgr.h" @@ -93,6 +95,77 @@ load_xid_epoch(TxidEpoch *state) } /* + * Helper to get a TransactionId from a 64-bit txid with wraparound + * detection. + * + * ERRORs if the txid is in the future. Returns true if the xid is + * within the xid wraparound threshold and clog truncation threshold + * or if the xid is a permanent xid. Returns false if the xid is past + * the clog truncation threshold or wraparound threshold. Sets + * extracted_xid to the 32-bit xid. + * + * It's only safe to use the extracted_xid for most purposes if the + * function returns true, otherwise the clog could be truncated away + * or it might be an xid from a prior epoch. + * + * XIDs older than ShmemVariableCache->oldestXid are treated as too + * old to look up because the clog could've been truncated away - even + * if they're still far from the xid wraparound theshold. The caller + * should have at least a share lock on XidGenLock to prevent + * oldestXid from advancing between our oldestXid check and subsequent + * lookups of transaction status using the returned xid. Failure to do + * so risks ERRORs on clog access but nothing worse. + */ +static bool +TransactionIdInRecentPast(uint64 xid_with_epoch, TransactionId *extracted_xid) +{ + uint32 xid_epoch = (uint32) (xid_with_epoch >> 32); + TransactionId xid = (TransactionId) xid_with_epoch; + uint32 now_epoch; + TransactionId now_epoch_last_xid; + bool result; + + Assert(LWLockHeldByMe(ClogTruncationLock)); + + GetNextXidAndEpoch(&now_epoch_last_xid, &now_epoch); + + result = true; + + if (!TransactionIdIsNormal(xid)) + { + /* must be a permanent XID, ignore the epoch and return unchanged */ + } + else if (xid_epoch > now_epoch + || (xid_epoch == now_epoch && xid > now_epoch_last_xid)) + { + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("transaction ID "UINT64_FORMAT" is in the future", + xid_with_epoch))); + } + else if (xid_epoch + 1 < now_epoch + || (xid_epoch + 1 == now_epoch && xid < now_epoch_last_xid)) + { + /* xid is wrapped, too far in the past */ + result = false; + } + else if (TransactionIdPrecedes(xid, ShmemVariableCache->oldestXid)) + { + /* xid isn't wrapped, but clog could've been truncated away */ + result = false; + } + else + { + Assert(TransactionIdPrecedesOrEquals(xid, now_epoch_last_xid)); + } + + if (extracted_xid != NULL) + *extracted_xid = xid; + + return result; +} + +/* * do a TransactionId -> txid conversion for an XID near the given epoch */ static txid @@ -354,6 +427,9 @@ bad_format: * * Return the current toplevel transaction ID as TXID * If the current transaction does not have one, one is assigned. + * + * This value has the epoch as the high 32 bits and the 32-bit xid + * as the low 32 bits. */ Datum txid_current(PG_FUNCTION_ARGS) @@ -658,3 +734,68 @@ txid_snapshot_xip(PG_FUNCTION_ARGS) SRF_RETURN_DONE(fctx); } } + +/* + * Report the status of a recent transaction ID, or null for wrapped, + * truncated away or otherwise too old XIDs. + */ +Datum +txid_status(PG_FUNCTION_ARGS) +{ + const char *status; + uint64 xid_with_epoch = PG_GETARG_INT64(0); + TransactionId xid; + + /* + * We use oldestXid to determine whether the xid we're examining still has + * commit information retained. We must ensure the clog for the XIDs we're + * examining doesn't get truncated away while we're looking, otherwise we'll + * fail with confusing SLRU access errors. See vac_truncate_clog(..). + */ + LWLockAcquire(ClogTruncationLock, LW_SHARED); + /* + * We should also hold XidGenLock to prevent oldestXid advancing, which can + * happen whether or not actual clog truncation is attempted. We don't want + * to try to look up state for an xid that becomes in-the-future after + * we check if it's in the recent past. + */ + LWLockAcquire(XidGenLock, LW_SHARED); + if (TransactionIdInRecentPast(xid_with_epoch, &xid)) + { + if (!TransactionIdIsValid(xid)) + { + LWLockRelease(XidGenLock); + LWLockRelease(ClogTruncationLock); + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("transaction ID "UINT64_FORMAT" is an invalid xid", + xid_with_epoch))); + } + + if (TransactionIdIsCurrentTransactionId(xid)) + status = gettext_noop("in progress"); + else if (TransactionIdDidCommit(xid)) + status = gettext_noop("committed"); + else if (TransactionIdDidAbort(xid)) + status = gettext_noop("aborted"); + else + /* + * can't test TransactionIdIsInProgress here or we race + * with concurrent commit/abort. There's no point anyway, + * since it might then commit/abort just after we check. + */ + status = gettext_noop("in progress"); + } + else + { + status = NULL; + } + + LWLockRelease(XidGenLock); + LWLockRelease(ClogTruncationLock); + + if (status == NULL) + PG_RETURN_NULL(); + else + PG_RETURN_TEXT_P(cstring_to_text(status)); +} diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h index e2d08ba..0ad870c 100644 --- a/src/include/catalog/pg_proc.h +++ b/src/include/catalog/pg_proc.h @@ -4928,6 +4928,8 @@ DATA(insert OID = 2947 ( txid_snapshot_xip PGNSP PGUID 12 1 50 0 0 f f f f t DESCR("get set of in-progress txids in snapshot"); DATA(insert OID = 2948 ( txid_visible_in_snapshot PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "20 2970" _null_ _null_ _null_ _null_ _null_ txid_visible_in_snapshot _null_ _null_ _null_ )); DESCR("is txid visible in snapshot?"); +DATA(insert OID = 3346 ( txid_status PGNSP PGUID 12 1 0 0 0 f f f f t f v s 1 0 25 "20" _null_ _null_ _null_ _null_ _null_ txid_status _null_ _null_ _null_ )); +DESCR("commit status of transaction"); /* record comparison using normal comparison rules */ DATA(insert OID = 2981 ( record_eq PGNSP PGUID 12 1 0 0 0 f f f f t f i s 2 0 16 "2249 2249" _null_ _null_ _null_ _null_ _null_ record_eq _null_ _null_ _null_ )); diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h index 2ae212a..baffa38 100644 --- a/src/include/utils/builtins.h +++ b/src/include/utils/builtins.h @@ -1227,6 +1227,7 @@ extern Datum txid_snapshot_xmin(PG_FUNCTION_ARGS); extern Datum txid_snapshot_xmax(PG_FUNCTION_ARGS); extern Datum txid_snapshot_xip(PG_FUNCTION_ARGS); extern Datum txid_visible_in_snapshot(PG_FUNCTION_ARGS); +extern Datum txid_status(PG_FUNCTION_ARGS); /* uuid.c */ extern Datum uuid_in(PG_FUNCTION_ARGS); diff --git a/src/test/regress/expected/txid.out b/src/test/regress/expected/txid.out index 802ccb9..015dae3 100644 --- a/src/test/regress/expected/txid.out +++ b/src/test/regress/expected/txid.out @@ -254,3 +254,71 @@ SELECT txid_current_if_assigned() IS NOT DISTINCT FROM BIGINT :'txid_current'; (1 row) COMMIT; +-- test xid status functions +BEGIN; +SELECT txid_current() AS committed \gset +COMMIT; +BEGIN; +SELECT txid_current() AS rolledback \gset +ROLLBACK; +BEGIN; +SELECT txid_current() AS inprogress \gset +SELECT txid_status(:committed) AS committed; + committed +----------- + committed +(1 row) + +SELECT txid_status(:rolledback) AS rolledback; + rolledback +------------ + aborted +(1 row) + +SELECT txid_status(:inprogress) AS inprogress; + inprogress +------------- + in progress +(1 row) + +SELECT txid_status(1); -- BootstrapTransactionId is always committed + txid_status +------------- + committed +(1 row) + +SELECT txid_status(2); -- FrozenTransactionId is always committed + txid_status +------------- + committed +(1 row) + +SELECT txid_status(3); -- in regress testing FirstNormalTransactionId will always be behind oldestXmin + txid_status +------------- + +(1 row) + +COMMIT; +BEGIN; +CREATE FUNCTION test_future_xid_status(bigint) +RETURNS void +LANGUAGE plpgsql +AS +$$ +BEGIN + PERFORM txid_status($1); + RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected'; +EXCEPTION + WHEN invalid_parameter_value THEN + RAISE NOTICE 'Got expected error for xid in the future'; +END; +$$; +SELECT test_future_xid_status(:inprogress + 10000); +NOTICE: Got expected error for xid in the future + test_future_xid_status +------------------------ + +(1 row) + +ROLLBACK; diff --git a/src/test/regress/sql/txid.sql b/src/test/regress/sql/txid.sql index 4aefd9e..bd6decf 100644 --- a/src/test/regress/sql/txid.sql +++ b/src/test/regress/sql/txid.sql @@ -59,3 +59,41 @@ SELECT txid_current_if_assigned() IS NULL; SELECT txid_current() \gset SELECT txid_current_if_assigned() IS NOT DISTINCT FROM BIGINT :'txid_current'; COMMIT; + +-- test xid status functions +BEGIN; +SELECT txid_current() AS committed \gset +COMMIT; + +BEGIN; +SELECT txid_current() AS rolledback \gset +ROLLBACK; + +BEGIN; +SELECT txid_current() AS inprogress \gset + +SELECT txid_status(:committed) AS committed; +SELECT txid_status(:rolledback) AS rolledback; +SELECT txid_status(:inprogress) AS inprogress; +SELECT txid_status(1); -- BootstrapTransactionId is always committed +SELECT txid_status(2); -- FrozenTransactionId is always committed +SELECT txid_status(3); -- in regress testing FirstNormalTransactionId will always be behind oldestXmin + +COMMIT; + +BEGIN; +CREATE FUNCTION test_future_xid_status(bigint) +RETURNS void +LANGUAGE plpgsql +AS +$$ +BEGIN + PERFORM txid_status($1); + RAISE EXCEPTION 'didn''t ERROR at xid in the future as expected'; +EXCEPTION + WHEN invalid_parameter_value THEN + RAISE NOTICE 'Got expected error for xid in the future'; +END; +$$; +SELECT test_future_xid_status(:inprogress + 10000); +ROLLBACK; -- 2.5.5
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers