On Thu, May 27, 2021 at 7:11 PM Julien Rouhaud <rjuju...@gmail.com> wrote:
>
> On Tue, May 25, 2021 at 4:39 PM Paul Guo <paul...@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > I found this when reading the related code. Here is the scenario:
> >
> > bool
> > RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
> >                     bool retryOnError)
> >
> > For the case retryOnError is true, the function would in loop call
> > ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
> > we can see if we run into the below branch, RegisterSyncRequest() will
> > need to loop until the checkpointer absorbs the existing requests so
> > ForwardSyncRequest() might hang for some time until a checkpoint
> > request is triggered. This scenario seems to be possible in theory
> > though the chance is not high.
>
> It seems like a really unlikely scenario, but maybe possible if you
> use a lot of unlogged tables maybe (as you could eventually
> dirty/evict more than NBuffers buffers without triggering enough WALs
> activity to trigger a checkpoint with any sane checkpoint
> configuration).

RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and
SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for
buffer sync.

> > ForwardSyncRequest():
> >
> >     if (CheckpointerShmem->checkpointer_pid == 0 ||
> >         (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests 
> > &&
> >          !CompactCheckpointerRequestQueue()))
> >     {
> >         /*
> >          * Count the subset of writes where backends have to do their own
> >          * fsync
> >          */
> >         if (!AmBackgroundWriterProcess())
> >             CheckpointerShmem->num_backend_fsync++;
> >         LWLockRelease(CheckpointerCommLock);
> >         return false;
> >     }
> >
> > One fix is to add below similar code in RegisterSyncRequest(), trigger
> > a checkpoint for the scenario.
> >
> > // checkpointer_triggered: variable for one trigger only.
> > if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> > !checkpointer_triggered)
> >         SetLatch(ProcGlobal->checkpointerLatch);
> >
> > Any comments?
>
> It looks like you intended to set the checkpointer_triggered var but

Yes this is just pseduo code.

> didn't.  Also this will wake up the checkpointer but won't force a
> checkpoint (unlike RequestCheckpoint()).  It may be a good thing

I do not expect an immediate checkpoint. AbsorbSyncRequests()
is enough since after that RegisterSyncRequest() could finish.

> though as it would only absorb the requests and go back to sleep if no
> other threshold is reachrf.  Apart from the implementation details it
> seems like it could help in this unlikely event.



-- 
Paul Guo (Vmware)


Reply via email to