On 04/05/2017 12:36 PM, Kuntal Ghosh wrote:
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com> wrote:

On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:

AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.

OK, so essentially we reset the counters, getting

     parallel_register_count = 0
     parallel_terminate_count = 0

and then ForgetBackgroundWworker() runs and increments the terminate_count,
breaking the logic?

And you're using (parallel_register_count > 0) to detect whether we're still
in the init phase or not. Hmmm.

Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.

Sure, and I agree that having an explicit flag is the right solution. I'm just trying to make sure I understand what's happening.

The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values

     parallel_register_count = UINT_MAX + 1 = 1
     parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and so

     parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values directly:

     Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.

Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(

Actually, that assert would work, because C does handle overflows on uint values during subtraction just fine. That is,

    (UINT_MAX+x) - UINT_MAX = x

Also, the comment about overflows before BackgroundWorkerArray claims this is the case.


Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to