On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
On Sat, Feb 02, 2019 at 03:38:22AM -0500, Noah Misch wrote:
The main consequence is the false alarm.

That conclusion was incorrect.  On further study, I was able to reproduce data
loss via either of two weaknesses in the "apparent wraparound" test:

1. The result of the test is valid only until we release the SLRU ControlLock,
  which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
  segments for deletion.  Once we release that lock, latest_page_number can
  advance.  This creates a TOCTOU race condition, allowing excess deletion:

  [local] test=# table trunc_clog_concurrency ;
  ERROR:  could not access status of transaction 2149484247
  DETAIL:  Could not open file "pg_xact/0801": No such file or directory.

2. By the time the "apparent wraparound" test fires, we've already WAL-logged
  the truncation.  clog_redo() suppresses the "apparent wraparound" test,
  then deletes too much.  Startup then fails:

  881997 2019-02-10 02:53:32.105 GMT FATAL:  could not access status of 
transaction 708112327
  881997 2019-02-10 02:53:32.105 GMT DETAIL:  Could not open file 
"pg_xact/02A3": No such file or directory.
  881855 2019-02-10 02:53:32.107 GMT LOG:  startup process (PID 881997) exited 
with exit code 1


Fixes are available:

a. Fix the rounding in SimpleLruTruncate().  (The patch I posted upthread is
  wrong; I will correct it in a separate message.)

b. Arrange so only one backend runs vac_truncate_clog() at a time.  Other than
  AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
  checkpoint, or in the startup process.  Hence, also arrange for only one
  backend to call SimpleLruTruncate(AsyncCtl) at a time.

c. Test "apparent wraparound" before writing WAL, and don't WAL-log
  truncations that "apparent wraparound" forces us to skip.

d. Hold the ControlLock for the entirety of SimpleLruTruncate().  This removes
  the TOCTOU race condition, but TransactionIdDidCommit() and other key
  operations would be waiting behind filesystem I/O.

e. Have the SLRU track a "low cutoff" for an ongoing truncation.  Initially,
  the low cutoff is the page furthest in the past relative to cutoffPage (the
  "high cutoff").  If SimpleLruZeroPage() wishes to use a page in the
  truncation range, it would acquire an LWLock and increment the low cutoff.
  Before unlinking any segment, SlruScanDirCbDeleteCutoff() would take the
  same LWLock and recheck the segment against the latest low cutoff.

With both (a) and (b), the only way I'd know to reach the "apparent
wraparound" message is to restart in single-user mode and burn XIDs to the
point of bona fide wraparound.  Hence, I propose to back-patch (a) and (b),
and I propose (c) for HEAD only.  I don't want (d), which threatens
performance too much.  I would rather not have (e), because I expect it's more
complex than (b) and fixes strictly less than (b) fixes.

Can you see a way to improve on that plan?  Can you see other bugs of this
nature that this plan does not fix?


Seems reasonable, although I wonder how much more expensive would just
doing (d) be. It seems by far the least complex solution, and it moves
"just" the SlruScanDirectory() call before the lock. It's true it adds
I/O requests, OTOH it's just unlink() without fsync() and I'd expect the
number of files to be relatively low. Plus we already do SimpleLruWaitIO
and SlruInternalWritePage in the loop.

BTW isn't that an issue that SlruInternalDeleteSegment does not do any
fsync calls after unlinking the segments? If the system crashes/reboots
before this becomes persistent (i.e. some of the segments reappear,
won't that cause a problem)?


It's a bit unfortunate that a patch for a data corruption / loss issue
(even if a low-probability one) fell through multiple commitfests.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to