On Thu, May 27, 2021 at 7:11 PM Julien Rouhaud <rjuju...@gmail.com> wrote: > > On Tue, May 25, 2021 at 4:39 PM Paul Guo <paul...@gmail.com> wrote: > > > > Hi hackers, > > > > I found this when reading the related code. Here is the scenario: > > > > bool > > RegisterSyncRequest(const FileTag *ftag, SyncRequestType type, > > bool retryOnError) > > > > For the case retryOnError is true, the function would in loop call > > ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(), > > we can see if we run into the below branch, RegisterSyncRequest() will > > need to loop until the checkpointer absorbs the existing requests so > > ForwardSyncRequest() might hang for some time until a checkpoint > > request is triggered. This scenario seems to be possible in theory > > though the chance is not high. > > It seems like a really unlikely scenario, but maybe possible if you > use a lot of unlogged tables maybe (as you could eventually > dirty/evict more than NBuffers buffers without triggering enough WALs > activity to trigger a checkpoint with any sane checkpoint > configuration).
RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for buffer sync. > > ForwardSyncRequest(): > > > > if (CheckpointerShmem->checkpointer_pid == 0 || > > (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests > > && > > !CompactCheckpointerRequestQueue())) > > { > > /* > > * Count the subset of writes where backends have to do their own > > * fsync > > */ > > if (!AmBackgroundWriterProcess()) > > CheckpointerShmem->num_backend_fsync++; > > LWLockRelease(CheckpointerCommLock); > > return false; > > } > > > > One fix is to add below similar code in RegisterSyncRequest(), trigger > > a checkpoint for the scenario. > > > > // checkpointer_triggered: variable for one trigger only. > > if (!ret && retryOnError && ProcGlobal->checkpointerLatch && > > !checkpointer_triggered) > > SetLatch(ProcGlobal->checkpointerLatch); > > > > Any comments? > > It looks like you intended to set the checkpointer_triggered var but Yes this is just pseduo code. > didn't. Also this will wake up the checkpointer but won't force a > checkpoint (unlike RequestCheckpoint()). It may be a good thing I do not expect an immediate checkpoint. AbsorbSyncRequests() is enough since after that RegisterSyncRequest() could finish. > though as it would only absorb the requests and go back to sleep if no > other threshold is reachrf. Apart from the implementation details it > seems like it could help in this unlikely event. -- Paul Guo (Vmware)