On Wed, Feb 16, 2022 at 08:44:42AM -0800, Nathan Bossart wrote:
> On Tue, Feb 15, 2022 at 10:57:32PM -0800, Nathan Bossart wrote:
>> On Tue, Feb 15, 2022 at 10:14:04PM -0800, Nathan Bossart wrote:
>>> It looks like register_unlink_segment() is called prior to the checkpoint,
>>> but the checkpointer is not calling RememberSyncRequest() until after
>>> SyncPreCheckpoint().  This means that the requests are registered with the
>>> next checkpoint cycle count, so they aren't processed until the next
>>> checkpoint.
>> 
>> Calling AbsorbSyncRequests() before advancing the checkpoint cycle counter
>> seems to fix the issue.  However, this requires moving SyncPreCheckpoint()
>> out of the critical section in CreateCheckPoint().  Patch attached.
> 
> An alternative fix might be to call AbsorbSyncRequests() after increasing
> the ckpt_started counter in CheckpointerMain().  AFAICT there is a window
> just before checkpointing where new requests are registered for the
> checkpoint following the one about to begin.

Here's a patch that adds a call to AbsorbSyncRequests() in
CheckpointerMain() instead of SyncPreCheckpoint().  I've also figured out a
way to reproduce the issue without the pre-allocation patches applied:

1. In checkpointer.c, add a 30 second sleep before acquiring ckpt_lck to
   increment ckpt_started.
2. In session 1, run the following commands:
        a. CREATE TABLESPACE test LOCATION '/path/to/dir';
        b. CREATE TABLE test TABLESPACE test AS SELECT 1;
3. In session 2, start a checkpoint.
4. In session 1, run these commands:
        a. ALTER TABLE test SET TABLESPACE pg_default;
        b. DROP TABLESPACE test;  -- fails
        c. DROP TABLESPACE test;  -- succeeds

With the attached patch applied, the first attempt at dropping the
tablespace no longer fails.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e9707dfde25eaa9c447032c4b5a61e3011141dc9 Mon Sep 17 00:00:00 2001
From: Nathan Bossart <nathandboss...@gmail.com>
Date: Wed, 16 Feb 2022 09:26:08 -0800
Subject: [PATCH v2 1/1] call AbsorbSyncRequests() after indicating checkpoint
 start

---
 src/backend/postmaster/checkpointer.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/src/backend/postmaster/checkpointer.c b/src/backend/postmaster/checkpointer.c
index 4488e3a443..e93d34b71f 100644
--- a/src/backend/postmaster/checkpointer.c
+++ b/src/backend/postmaster/checkpointer.c
@@ -401,6 +401,18 @@ CheckpointerMain(void)
 
 			ConditionVariableBroadcast(&CheckpointerShmem->start_cv);
 
+			/*
+			 * Pick up any last minute requests.  DROP TABLESPACE schedules a
+			 * checkpoint to clean up any lingering files that are scheduled for
+			 * deletion.  If we don't absorb those requests now, they might not
+			 * be absorbed until after incrementing the checkpoint cycle
+			 * counter, so the files won't be deleted until the following
+			 * checkpoint.  By absorbing requests after indicating the
+			 * checkpoint has started, operations like DROP TABLESPACE can be
+			 * sure that the next checkpoint will clean up any such files.
+			 */
+			AbsorbSyncRequests();
+
 			/*
 			 * The end-of-recovery checkpoint is a real checkpoint that's
 			 * performed while we're still in recovery.
-- 
2.25.1

Reply via email to