Re: Race conditions with checkpointer and shutdown

Andres Freund Tue, 16 Apr 2019 17:22:18 -0700

Hi,

On 2019-04-16 17:05:36 -0700, Andres Freund wrote:
> On 2019-04-16 18:59:37 -0400, Robert Haas wrote:
> > On Tue, Apr 16, 2019 at 6:45 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> > > Do we need to think harder about establishing rules for multiplexed
> > > use of the process latch?  I'm imagining some rule like "if you are
> > > not the outermost event loop of a process, you do not get to
> > > summarily clear MyLatch.  Make sure to leave it set after waiting,
> > > if there was any possibility that it was set by something other than
> > > the specific event you're concerned with".
> >
> > Hmm, yeah.  If the latch is left set, then the outer loop will just go
> > through an extra and unnecessary iteration, which seems fine.  If the
> > latch is left clear, then the outer loop might miss a wakeup intended
> > for it and hang forever.
> 
> Arguably that's a sign that the latch using code in the outer loop(s) isn't
> written correctly? If you do it as:
> 
> while (true)
> {
>     CHECK_FOR_INTERRUPTS();
> 
>     ResetLatch(MyLatch);
> 
>     if (work_needed)
>     {
>         Plenty();
>         Code();
>         Using(MyLatch);
>     }
>     else
>     {
>         WaitLatch(MyLatch);
>     }
> }
> 
> I think that's not a danger? I think the problem really is that we
> suggest doing that WaitLatch() unconditionally:
> 
>  * The correct pattern to wait for event(s) is:
>  *
>  * for (;;)
>  * {
>  *       ResetLatch();
>  *       if (work to do)
>  *               Do Stuff();
>  *       WaitLatch();
>  * }
>  *
>  * It's important to reset the latch *before* checking if there's work to
>  * do. Otherwise, if someone sets the latch between the check and the
>  * ResetLatch call, you will miss it and Wait will incorrectly block.
>  *
>  * Another valid coding pattern looks like:
>  *
>  * for (;;)
>  * {
>  *       if (work to do)
>  *               Do Stuff(); // in particular, exit loop if some condition 
> satisfied
>  *       WaitLatch();
>  *       ResetLatch();
>  * }
> 
> Obviously there's the issue that a lot of latch using code isn't written
> that way - but I also don't think there's that many latch using code
> that then also uses latch. Seems like we could fix that.  While it has
> obviously dangers of not being followed, so does the
> 'always-set-latch-unless-outermost-loop' approach.
> 
> I'm not sure I like the idea of incurring another unnecessary SetLatch()
> call for most latch using places.
> 
> I guess there's a bit bigger danger of taking longer to notice
> postmaster-death. But I'm not sure I can quite see that being
> problematic - seems like all we should incur is another cycle through
> the loop, as the latch shouldn't be set anymore.


I think we should thus change our latch documentation to say:

something like:

diff --git a/src/include/storage/latch.h b/src/include/storage/latch.h
index fc995819d35..dc46dd94c5b 100644
--- a/src/include/storage/latch.h
+++ b/src/include/storage/latch.h
@@ -44,22 +44,31 @@
  * {
  *    ResetLatch();
  *    if (work to do)
- *        Do Stuff();
- *    WaitLatch();
+ *        DoStuff();
+ *    else
+ *        WaitLatch();
  * }
  *
  * It's important to reset the latch *before* checking if there's work to
  * do. Otherwise, if someone sets the latch between the check and the
  * ResetLatch call, you will miss it and Wait will incorrectly block.
  *
+ * The reason to only wait on the latch in case there is nothing to do is that
+ * code inside DoStuff() might use the same latch, and leave it reset, even
+ * though a SetLatch() aimed for the outer loop arrived. Which again could
+ * lead to incorrectly blocking in Wait.
+ *
  * Another valid coding pattern looks like:
  *
  * for (;;)
  * {
  *    if (work to do)
- *        Do Stuff(); // in particular, exit loop if some condition satisfied
- *    WaitLatch();
- *    ResetLatch();
+ *        DoStuff(); // in particular, exit loop if some condition satisfied
+ *     else
+ *     {
+ *        WaitLatch();
+ *        ResetLatch();
+ *     }
  * }
  *
  * This is useful to reduce latch traffic if it's expected that the loop's

and adapt code to match (at least in the outer loops).

Greetings,

Andres Freund

Re: Race conditions with checkpointer and shutdown

Reply via email to