Hello

> This is safe because replay is frozen at this
> point: the only ways out of the pause are promotion and shutdown, so no
> transaction's commit status can change afterwards, and any transaction a
> query finds committed in CLOG necessarily committed before that query's
> snapshot.

But if I look at the documentation, after shutdown it allows a restart
with a later recovery target:

> The intended use of the pause setting is to allow queries to be executed
> against the database to check if this recovery target is the most desirable
> point for recovery. The paused state can be resumed by using 
> pg_wal_replay_resume()
> (see Table 9.81), which then causes recovery to end. If this recovery target 
> is
> not the desired stopping point, then shut down the server, change the recovery
> target settings to a later target and restart to continue recovery.

"so no transaction's commit status can change after this point" is
true within the lifetime of the paused instance, but if I shut down
and restart the server with a later recovery target?

Even a read-only query can mark a tuple with HEAP_XMIN_INVALID if
HeapTupleSatisfiesMVCC decides that a transaction aborted or crashed.
And then in bufmgr.c:MarkSharedBufferDirtyHint, we can see the
following conditions that prevent this change from being flushed with
an early return:

if (XLogHintBitIsNeeded() && (lockstate & BM_PERMANENT))
{
  /*
   * If we must not write WAL, due to a relfilelocator-specific
   * condition or being in recovery, don't dirty the page.  We can
   * set the hint, just not dirty the page as a result so the hint
   * is lost when we evict the page or shutdown.
   *
   * See src/backend/storage/page/README for longer discussion.
   */
   if (RecoveryInProgress() ||
       RelFileLocatorSkippingWAL(BufTagGetRelFileLocator(&bufHdr->tag)))
     return;
   ...

Where

#define XLogHintBitIsNeeded() (wal_log_hints || DataChecksumsNeedWrite())

So if we turn off both wal_log_hints and data checksums, that return
disappears, and we can cause data corruption with just a select in a
paused state with the patch.

See the attached tap test that showcases the problem.

Attachment: subxid_corruption.pl
Description: Binary data

Reply via email to