On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
<kuntalghosh.2...@gmail.com> wrote:
> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
> <tomas.von...@2ndquadrant.com> wrote:
>> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh <kuntalghosh.2...@gmail.com>
>>> wrote:
>>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas <robertmh...@gmail.com>
>>>> wrote:
>>>>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>>>> <kuntalghosh.2...@gmail.com> wrote:
>>>>>> 2. the server restarts automatically, initialize
>>>>>> BackgroundWorkerData->parallel_register_count and
>>>>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>>>>> After that, it calls ForgetBackgroundWorker and it increments
>>>>>> parallel_terminate_count.
>>>>> Hmm.  So this seems like the root of the problem.  Presumably those
>>>>> things need to be reset AFTER forgetting any background workers from
>>>>> before the crash.
>>>> IMHO, the fix would be not to increase the terminated parallel worker
>>>> count whenever ForgetBackgroundWorker is called due to a bgworker
>>>> crash. I've attached a patch for the same. PFA.
>>> While I'm not opposed to that approach, I don't think this is a good
>>> way to implement it.  If you want to pass an explicit flag to
>>> ForgetBackgroundWorker telling it whether or not it should performing
>>> the increment, fine.  But with what you've got here, you're
>>> essentially relying on "spooky action at a distance".  It would be
>>> easy for future code changes to break this, not realizing that
>>> somebody's got a hard dependency on 0 having a specific meaning.
>> I'm probably missing something, but I don't quite understand how these
>> values actually survive the crash. I mean, what I observed is OOM followed
>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>> some reason?
> AFAICU, during crash recovery, we wait for all non-syslogger children
> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
> StartupDataBase. While starting the startup process we check if any
> bgworker is scheduled for a restart.

In general, your theory appears right, but can you check how it
behaves in standby server because there is a difference in how the
startup process behaves during master and standby startup?  In master,
it stops after recovery whereas in standby it will keep on running to
receive WAL.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to