Terry Chong of Salesforce pointed me at an apparent oversight in the Unix
version of WaitLatchOrSocket.  Assuming that it is called with a timeout
specification, the only place where we detect timeout is if poll() (or
select()) returns 0.  Now suppose that for some reason we have gotten into
a state where poll() always fails with EINTR or returns a positive value
because it's detected some event on the FDs being checked, but that event
is not one that would cause us to exit the loop.  In that case we have an
infinite busy-loop, persisting even after the timeout has elapsed.

I don't have a super plausible explanation for *how* this might happen,
unless perhaps there's some persistent error condition on the self-pipe or
the postmaster-death pipe.  But in any case, it seems like the logic is
unnecessarily fragile.  As it stands, at the bottom of the loop we
recompute the remaining timeout like this:

        /* If we're not done, update cur_timeout for next iteration */
        if (result == 0 && cur_timeout >= 0)
            INSTR_TIME_SUBTRACT(cur_time, start_time);
            cur_timeout = timeout - (long) INSTR_TIME_GET_MILLISEC(cur_time);
            if (cur_timeout < 0)
                cur_timeout = 0;

Now, if we compute a negative cur_timeout here, that means that the
timeout has elapsed.  So rather than clamping it back to zero and assuming
that the next iteration of the loop will detect the timeout, why don't
we change those last two lines to be like

            if (cur_timeout <= 0)
                /* Timeout has expired, no need to continue looping */
                result |= WL_TIMEOUT;

which will force the loop to exit?  At least then any busy-waiting will
terminate when the timeout expires, rather than persisting forever.

                        regards, tom lane

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to