n 18/09/17 18:42, Tom Lane wrote:
> Amit Kapila <amit.kapil...@gmail.com> writes:
>> On Mon, Sep 18, 2017 at 7:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
>>> The subscriber log includes
>>> 2017-09-18 08:43:08.240 UTC [15672] WARNING:  out of background worker slots
>>> Maybe that's harmless, but I'm suspicious that it's a smoking gun.
>> I have noticed while fixing the issue you are talking that this path
>> is also susceptible to such problems.  In
>> WaitForReplicationWorkerAttach(), it relies on
>> GetBackgroundWorkerPid() to know the status of the worker which won't
>> give the correct status in case of fork failure.  The patch I have
>> posted has a fix for the issue,
> Your GetBackgroundWorkerPid fix looks good as far as it goes, but
> I feel that WaitForReplicationWorkerAttach's problems go way deeper
> than that --- in fact, that function should not exist at all IMO.
> The way that launcher.c is set up, it's relying on either the calling
> process or the called process to free the logicalrep slot when done.
> The called process might never exist at all, so the second half of
> that is obviously unreliable; but WaitForReplicationWorkerAttach
> can hardly be relied on either, seeing it's got a big fat exit-on-
> SIGINT in it.  I thought about putting a PG_TRY, or more likely
> PG_ENSURE_ERROR_CLEANUP, into it, but that doesn't fix the basic
> problem: you can't assume that the worker isn't ever going to start,
> just because you got a signal that means you shouldn't wait anymore.
> Now, this rickety business might be fine if it were only about the
> states of the caller and called processes, but we've got long-lived
> shared data involved (ie the worker slots); failing to clean it up
> is not an acceptable outcome.

We'll clean it up eventually if somebody requests creation of new
logical replication worker and that slot is needed. See the "garbage
collection" part in logicalrep_worker_launch(). I know it's not ideal
solution, but it's the best one I could think of given the bellow.

> So, frankly, I think we would be best off losing the "logical rep
> worker slot" business altogether, and making do with just bgworker
> slots.  The alternative is getting the postmaster involved in cleaning
> up lrep slots as well as bgworker slots, and I'm going to resist
> any such idea strongly.  That way madness lies, or at least an
> unreliable postmaster --- if we need lrep slots, what other forty-seven
> data structures are we going to need the postmaster to clean up
> in the future?
> I haven't studied the code enough to know why it grew lrep worker
> slots in the first place.  Maybe someone could explain?

I am not quite sure I understand this question, we need to store
additional info about workers in shared memory so we need slots for that.

If you are asking why they are not identified by the
BackgroundWorkerHandle, then it's because it's private struct and can't
be shared with other processes so there is no way to link the logical
worker info with bgworker directly. I mentioned that I am not happy
about this during the original development but it didn't gather any
attention which I took to mean that nobody minds.

  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to