Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas  wrote:
> On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
>  wrote:
>> At first I was like 'WTF? Why do we need a new GUC just becase of an
>> assert?' but you're actually not adding a new GUC parameter, you're adding a
>> constant which is then used as a max value for max for the two existing
>> parallel GUCs.
>>
>> I think this is fine.
>
> I think it is pretty odd-looking.  As written, it computes an unsigned
> -- and therefore necessarily non-negative -- value into a signed --
> and thus possibly neative -- value only to pass it back to abs() to
> make sure it's not negative:
>
> +   Assert(!parallel ||
> abs((int)(BackgroundWorkerData->parallel_register_count -
> +
>   BackgroundWorkerData->parallel_terminate_count)) <=
> +   MAX_PARALLEL_WORKER_LIMIT);
>
> I think we can just say
>
> Assert(!parallel || BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

Actually, there's an even simpler way: stick it inside the if () block
where we return false if we're outside the limit.  Then we don't need
to test the "parallel" bool either, because it's already known to be
true.  Committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra
 wrote:
> At first I was like 'WTF? Why do we need a new GUC just becase of an
> assert?' but you're actually not adding a new GUC parameter, you're adding a
> constant which is then used as a max value for max for the two existing
> parallel GUCs.
>
> I think this is fine.

I think it is pretty odd-looking.  As written, it computes an unsigned
-- and therefore necessarily non-negative -- value into a signed --
and thus possibly neative -- value only to pass it back to abs() to
make sure it's not negative:

+   Assert(!parallel ||
abs((int)(BackgroundWorkerData->parallel_register_count -
+
  BackgroundWorkerData->parallel_terminate_count)) <=
+   MAX_PARALLEL_WORKER_LIMIT);

I think we can just say

Assert(!parallel || BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count <= MAX_PARALLEL_WORKER_LIMIT);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Kuntal Ghosh
On Tue, Apr 11, 2017 at 2:36 AM, Robert Haas  wrote:
> On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri  wrote:
>> On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas  wrote:
>>> 1. Forget BGW_NEVER_RESTART workers in
>>> ResetBackgroundWorkerCrashTimes() rather than leaving them around to
>>> be cleaned up after the conclusion of the restart, so that they go
>>> away before rather than after shared memory is reset.
>>
+1 for the solution.

>> Now with this, would it still be required to forget BGW_NEVER_RESTART
>> workers in maybe_start_bgworker():
>>
>> if (rw->rw_crashed_at != 0)
>> {
>> if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
>> {
>> ForgetBackgroundWorker();
>> continue;
>> }
>>  ..
>> }
>
> Well, as noted before, not every background worker failure leads to a
> crash-and-restart; for example, a non-shmem-connected worker or one
> that exits with status 1 will set rw_crashed_at but will not cause a
> crash-and-restart cycle.   It's not 100% clear to me whether the code
> you're talking about can be reached in those cases, but I wouldn't bet
> against it.
I think that for above-mentioned background workers, we follow a
different path to call ForgetBackgroundWorker().
CleanupBackgroundWorker()
- ReportBackgroundWorkerExit()
 - ForgetBackgroundWorker()
But, I'm not sure for any other cases.

Should we include the assert statement as well?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Tomas Vondra

On 04/10/2017 01:39 PM, Kuntal Ghosh wrote:

On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas  wrote:

On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri  wrote:

The problem here seem to be the change in the max_parallel_workers value
while the parallel workers are still under execution. So this poses two
questions:

1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.


Well, that would be letting the tail wag the dog.  The maximum value
of max_parallel_workers is only 1024, and what we're really worried
about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
daylight.  How about just creating a #define that's used by guc.c as
the maximum for the GUC, and here we assert that we're <= that value?


I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;

Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.

I've attached both the patches for better accessibility. PFA.



At first I was like 'WTF? Why do we need a new GUC just becase of an 
assert?' but you're actually not adding a new GUC parameter, you're 
adding a constant which is then used as a max value for max for the two 
existing parallel GUCs.


I think this is fine.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri  wrote:
> On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas  wrote:
>> 1. Forget BGW_NEVER_RESTART workers in
>> ResetBackgroundWorkerCrashTimes() rather than leaving them around to
>> be cleaned up after the conclusion of the restart, so that they go
>> away before rather than after shared memory is reset.
>
> Now with this, would it still be required to forget BGW_NEVER_RESTART
> workers in maybe_start_bgworker():
>
> if (rw->rw_crashed_at != 0)
> {
> if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
> {
> ForgetBackgroundWorker();
> continue;
> }
>  ..
> }

Well, as noted before, not every background worker failure leads to a
crash-and-restart; for example, a non-shmem-connected worker or one
that exits with status 1 will set rw_crashed_at but will not cause a
crash-and-restart cycle.   It's not 100% clear to me whether the code
you're talking about can be reached in those cases, but I wouldn't bet
against it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Neha Khatri
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas  wrote:

>
> 1. Forget BGW_NEVER_RESTART workers in
> ResetBackgroundWorkerCrashTimes() rather than leaving them around to
> be cleaned up after the conclusion of the restart, so that they go
> away before rather than after shared memory is reset.


Now with this, would it still be required to forget BGW_NEVER_RESTART
workers in maybe_start_bgworker():

if (rw->rw_crashed_at != 0)
{
if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
{
ForgetBackgroundWorker();
continue;
}
 ..
}

Regards,
Neha


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
[ Adding Julien, whose patch this was. ]

On Thu, Apr 6, 2017 at 5:34 AM, Kuntal Ghosh  wrote:
> While performing StartupDatabase, both master and standby server
> behave in similar way till postmaster spawns startup process.
> In master, startup process completes its job and dies. As a result,
> reaper is called which in turn calls maybe_start_bgworker().
> In standby, after getting a valid snapshot, startup process sends
> postmaster a signal to enable connections. Signal handler in
> postmaster calls maybe_start_bgworker().
> In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
> 0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
> forget the bgworker process.
>
> I've attached the patch for adding an argument in
> ForgetBackgroundWorker() to indicate a crashed situation. Based on
> that, we can take the necessary actions. I've not included the Assert
> statement in this patch.

This doesn't seem right, because this will potentially pass wasCrashed
= true even if there has been no crash-and-restart cycle.  The place
where you're passing "true" only knows that rw->rw_crashed_at != 0 &&
rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART, but that doesn't
prove much of anything.  Some *other* background worker could have
crashed, rather than the one at issue.  Even there's only one worker
involved, the test for whether to call HandleChildCrash() involves
other factors:

if ((rw->rw_worker.bgw_flags & BGWORKER_SHMEM_ACCESS) != 0)
{
if (!EXIT_STATUS_0(exitstatus) && !EXIT_STATUS_1(exitstatus))
{
HandleChildCrash(pid, exitstatus, namebuf);
return true;
}
}

if (!ReleasePostmasterChildSlot(rw->rw_child_slot) &&
(rw->rw_worker.bgw_flags & BGWORKER_SHMEM_ACCESS) != 0)
{
HandleChildCrash(pid, exitstatus, namebuf);
return true;
}

After some study, I think the solution here is as follows:

1. Forget BGW_NEVER_RESTART workers in
ResetBackgroundWorkerCrashTimes() rather than leaving them around to
be cleaned up after the conclusion of the restart, so that they go
away before rather than after shared memory is reset.

2. Don't allow BGWORKER_CLASS_PARALLEL workers to be anything other
than BGW_NEVER_RESTART, so that (1) suffices to guarantee that, after
recovering from a crash, 0 is the correct starting value for
parallel_register_count.  (Alternatively, we could iterate through the
worker list and compute the correct starting value for
parallel_register_count, but that seems silly since we only ever
expect BGW_NEVER_RESTART here anyway.)

The attached patch seems to fix the problem for me.  I'll commit it
tomorrow unless somebody sees a problem with it; if that happens, I'll
provide some other kind of update tomorrow.  [For clarity, for
official status update purposes, I am setting a next-update date of
tomorrow, 2017-04-11.]

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


forget-before-crash-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Kuntal Ghosh
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas  wrote:
> On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri  wrote:
>> The problem here seem to be the change in the max_parallel_workers value
>> while the parallel workers are still under execution. So this poses two
>> questions:
>>
>> 1. From usecase point of view, why could there be a need to tweak the
>> max_parallel_workers exactly at the time when the parallel workers are at
>> play.
>> 2. Could there be a restriction on tweaking of max_parallel_workers while
>> the parallel workers are at play? At least do not allow setting the
>> max_parallel_workers less than the current # of active parallel workers.
>
> Well, that would be letting the tail wag the dog.  The maximum value
> of max_parallel_workers is only 1024, and what we're really worried
> about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
> daylight.  How about just creating a #define that's used by guc.c as
> the maximum for the GUC, and here we assert that we're <= that value?
>
I've added a GUC parameter named MAX_PARALLEL_WORKER_LIMIT and
asserted that the absolute difference between the two counts <= that
value, i.e.,
abs((int)(register_count - terminate_count)) <= MAX_PARALLEL_WORKER_LIMIT;

Because, it's possible that register count has overflowed but
terminate count has not yet overflowed. As a result, the difference in
unsigned integer will be near UINT32_MAX. Hence, we need the absolute
difference after typecasting the same to integer. This value should be
less than the max_parallel_workers upper limit.

I've attached both the patches for better accessibility. PFA.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash.patch
Description: binary/octet-stream


0002-Add-GUC-for-max_parallel_workers-upper-limit.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-09 Thread Noah Misch
On Thu, Apr 06, 2017 at 03:04:13PM +0530, Kuntal Ghosh wrote:
> On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila  wrote:
> > On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
> >  wrote:
> >> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
> >>> I'm probably missing something, but I don't quite understand how these
> >>> values actually survive the crash. I mean, what I observed is OOM followed
> >>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
> >>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash 
> >>> for
> >>> some reason?
> >> AFAICU, during crash recovery, we wait for all non-syslogger children
> >> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
> >> StartupDataBase. While starting the startup process we check if any
> >> bgworker is scheduled for a restart.
> >>
> >
> > In general, your theory appears right, but can you check how it
> > behaves in standby server because there is a difference in how the
> > startup process behaves during master and standby startup?  In master,
> > it stops after recovery whereas in standby it will keep on running to
> > receive WAL.
> >
> While performing StartupDatabase, both master and standby server
> behave in similar way till postmaster spawns startup process.
> In master, startup process completes its job and dies. As a result,
> reaper is called which in turn calls maybe_start_bgworker().
> In standby, after getting a valid snapshot, startup process sends
> postmaster a signal to enable connections. Signal handler in
> postmaster calls maybe_start_bgworker().
> In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
> 0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
> forget the bgworker process.
> 
> I've attached the patch for adding an argument in
> ForgetBackgroundWorker() to indicate a crashed situation. Based on
> that, we can take the necessary actions. I've not included the Assert
> statement in this patch.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-06 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila  wrote:
> On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
>  wrote:
>> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
>>> I'm probably missing something, but I don't quite understand how these
>>> values actually survive the crash. I mean, what I observed is OOM followed
>>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>>> some reason?
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart.
>>
>
> In general, your theory appears right, but can you check how it
> behaves in standby server because there is a difference in how the
> startup process behaves during master and standby startup?  In master,
> it stops after recovery whereas in standby it will keep on running to
> receive WAL.
>
While performing StartupDatabase, both master and standby server
behave in similar way till postmaster spawns startup process.
In master, startup process completes its job and dies. As a result,
reaper is called which in turn calls maybe_start_bgworker().
In standby, after getting a valid snapshot, startup process sends
postmaster a signal to enable connections. Signal handler in
postmaster calls maybe_start_bgworker().
In maybe_start_bgworker(), if we find a crashed bgworker(crashed_at !=
0) with a NEVER RESTART flag, we call ForgetBackgroundWorker().to
forget the bgworker process.

I've attached the patch for adding an argument in
ForgetBackgroundWorker() to indicate a crashed situation. Based on
that, we can take the necessary actions. I've not included the Assert
statement in this patch.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash_v1.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri  wrote:
> The problem here seem to be the change in the max_parallel_workers value
> while the parallel workers are still under execution. So this poses two
> questions:
>
> 1. From usecase point of view, why could there be a need to tweak the
> max_parallel_workers exactly at the time when the parallel workers are at
> play.
> 2. Could there be a restriction on tweaking of max_parallel_workers while
> the parallel workers are at play? At least do not allow setting the
> max_parallel_workers less than the current # of active parallel workers.

Well, that would be letting the tail wag the dog.  The maximum value
of max_parallel_workers is only 1024, and what we're really worried
about here is seeing a value near PG_UINT32_MAX, which leaves a lot of
daylight.  How about just creating a #define that's used by guc.c as
the maximum for the GUC, and here we assert that we're <= that value?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Neha Khatri
On Wed, Apr 5, 2017 at 5:34 PM, Kuntal Ghosh 
 wrote:

> On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri 
> wrote:
>
> > I feel there should be an assert if
> > (BackgroundWorkerData->parallel_register_count -
> > BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)
> >
> Backend 1 > SET max_parallel_worker = 8;
> Backend 1 > Execute a long running parallel query q1 with number of
> parallel worker spawned is say, 4.
> Backend 2> SET max_parallel_worker = 3;
> Backend 2 > Execute a long running parallel query q2 with number of
> parallel worker spawned > 0.
>
> The above assert statement will bring down the server unnecessarily
> while executing q2.


Right, with multiple backends trying to fiddle with max_parallel_workers,
that might bring the server down with the said assert:
Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers)

The problem here seem to be the change in the max_parallel_workers value while
the parallel workers are still under execution. So this poses two questions:

1. From usecase point of view, why could there be a need to tweak the
max_parallel_workers exactly at the time when the parallel workers are at
play.
2. Could there be a restriction on tweaking of max_parallel_workers while
the parallel workers are at play? At least do not allow setting the
max_parallel_workers less than the current # of active parallel workers.

Regards,
Neha


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra

On 04/05/2017 04:26 PM, Kuntal Ghosh wrote:

On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas  wrote:

On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
 wrote:

Did you intend to attach that patch to this email?


Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.


At this point, parallel_register_count should be equal to
parallel_terminate_count.  4 workers were started, and 4 have
terminated.


Actually, I'm referring to the case when q1 is still running. In that
case, parallel_register_count = 4 and parallel_terminate_count = 0.


Backend 2> SET max_parallel_worker = 3;

Now, parallel_register_count - parallel_terminate_count = 4 >
max_parallel_worker.


Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.



Hence, the assert will fail here.



Actually, you probably don't even need two sessions to trigger the 
assert. All you need is to tweak the max_parallel_workers and then 
reload the config while the parallel query is running. Then 
ForgetBackgroundWorker() will see the new value.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas  wrote:
> On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
>  wrote:
>>> Did you intend to attach that patch to this email?
>>>
>> Actually, I'm confused how we should ensure (register_count >
>> terminate_count) invariant. I think there can be a system crash what
>> Tomas has suggested up in the thread.
>>
>> Assert(parallel_register_count - parallel_terminate_count <=
>> max_parallel_workers);
>> Backend 1 > SET max_parallel_worker = 8;
>> Backend 1 > Execute a long running parallel query q1 with number of
>> parallel worker spawned is say, 4.
>
> At this point, parallel_register_count should be equal to
> parallel_terminate_count.  4 workers were started, and 4 have
> terminated.
>
Actually, I'm referring to the case when q1 is still running. In that
case, parallel_register_count = 4 and parallel_terminate_count = 0.

>> Backend 2> SET max_parallel_worker = 3;
Now, parallel_register_count - parallel_terminate_count = 4 >
max_parallel_worker.

>> Backend 2 > Try to execute any parallel query q2 with number of
>> parallel worker spawned > 0.
>
Hence, the assert will fail here.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra

On 04/05/2017 04:15 PM, Robert Haas wrote:

On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
 wrote:

Did you intend to attach that patch to this email?


Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.


At this point, parallel_register_count should be equal to
parallel_terminate_count.  4 workers were started, and 4 have
terminated.



No, the workers were started, but are still running, so

register_count = 4
terminate_count = 0


Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.


Now here parallel_register_count should get bumped up to 4+(# of
workers now launched) at the beginning and then
parallel_terminate_count at the end.  No problem.

What's going wrong, here?



Well, assuming the other workers are still running, you get

   register_count = 4
   terminate_count = 0

and the difference is greater than max_parallel_workers = 3. Which 
violates the assert.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra

On 04/05/2017 04:09 PM, Kuntal Ghosh wrote:

On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas  wrote:

On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh  wrote:

Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.


Did you intend to attach that patch to this email?


Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.

The above assert statement will bring down the server unnecessarily
while executing q2. If the assert statement was not there, it could
have gone ahead without launching any workers.



Ah, right. I forgot max_parallel_workers may be changed in session. I 
think we can use max_worker_processes instead:


  Assert(parallel_register_count - parallel_terminate_count <= 
max_worker_processes);


The whole point is that if parallel_terminate_count exceeds 
parallel_register_count, the subtraction wraps around to a value close 
to UINT_MAX. All we need is an maximum possible delta between the two 
values to detect this, and max_worker_processes seems fine. We could 
also use UINT_MAX/2 for example.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh
 wrote:
>> Did you intend to attach that patch to this email?
>>
> Actually, I'm confused how we should ensure (register_count >
> terminate_count) invariant. I think there can be a system crash what
> Tomas has suggested up in the thread.
>
> Assert(parallel_register_count - parallel_terminate_count <=
> max_parallel_workers);
> Backend 1 > SET max_parallel_worker = 8;
> Backend 1 > Execute a long running parallel query q1 with number of
> parallel worker spawned is say, 4.

At this point, parallel_register_count should be equal to
parallel_terminate_count.  4 workers were started, and 4 have
terminated.

> Backend 2> SET max_parallel_worker = 3;
> Backend 2 > Try to execute any parallel query q2 with number of
> parallel worker spawned > 0.

Now here parallel_register_count should get bumped up to 4+(# of
workers now launched) at the beginning and then
parallel_terminate_count at the end.  No problem.

What's going wrong, here?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas  wrote:
> On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh  
> wrote:
>> Yes. But, as Robert suggested up in the thread, we should not use
>> (parallel_register_count = 0) as an alternative to define a bgworker
>> crash. Hence, I've added an argument named 'wasCrashed' in
>> ForgetBackgroundWorker to indicate a bgworker crash.
>
> Did you intend to attach that patch to this email?
>
Actually, I'm confused how we should ensure (register_count >
terminate_count) invariant. I think there can be a system crash what
Tomas has suggested up in the thread.

Assert(parallel_register_count - parallel_terminate_count <=
max_parallel_workers);
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Try to execute any parallel query q2 with number of
parallel worker spawned > 0.

The above assert statement will bring down the server unnecessarily
while executing q2. If the assert statement was not there, it could
have gone ahead without launching any workers.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh  wrote:
> Yes. But, as Robert suggested up in the thread, we should not use
> (parallel_register_count = 0) as an alternative to define a bgworker
> crash. Hence, I've added an argument named 'wasCrashed' in
> ForgetBackgroundWorker to indicate a bgworker crash.

Did you intend to attach that patch to this email?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Tue, Apr 4, 2017 at 1:52 PM, Tomas Vondra
 wrote:
> In any case, the comment right before BackgroundWorkerArray says this:
>
>  * These counters can of course overflow, but it's not important here
>  * since the subtraction will still give the right number.
>
> which means that this assert
>
> +   Assert(BackgroundWorkerData->parallel_register_count >=
> +   BackgroundWorkerData->parallel_terminate_count);
>
> is outright broken, just like any other attempts to rely on simple
> comparisons of these two counters, no?

Yeah, that's busted.  Good catch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Amit Kapila
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh
 wrote:
> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
>  wrote:
>> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>>
>>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh 
>>> wrote:

 On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas 
 wrote:
>
> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>  wrote:
>>
>> 2. the server restarts automatically, initialize
>> BackgroundWorkerData->parallel_register_count and
>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>> After that, it calls ForgetBackgroundWorker and it increments
>> parallel_terminate_count.
>
>
> Hmm.  So this seems like the root of the problem.  Presumably those
> things need to be reset AFTER forgetting any background workers from
> before the crash.
>
 IMHO, the fix would be not to increase the terminated parallel worker
 count whenever ForgetBackgroundWorker is called due to a bgworker
 crash. I've attached a patch for the same. PFA.
>>>
>>>
>>> While I'm not opposed to that approach, I don't think this is a good
>>> way to implement it.  If you want to pass an explicit flag to
>>> ForgetBackgroundWorker telling it whether or not it should performing
>>> the increment, fine.  But with what you've got here, you're
>>> essentially relying on "spooky action at a distance".  It would be
>>> easy for future code changes to break this, not realizing that
>>> somebody's got a hard dependency on 0 having a specific meaning.
>>>
>>
>> I'm probably missing something, but I don't quite understand how these
>> values actually survive the crash. I mean, what I observed is OOM followed
>> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
>> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
>> some reason?
> AFAICU, during crash recovery, we wait for all non-syslogger children
> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
> StartupDataBase. While starting the startup process we check if any
> bgworker is scheduled for a restart.
>

In general, your theory appears right, but can you check how it
behaves in standby server because there is a difference in how the
startup process behaves during master and standby startup?  In master,
it stops after recovery whereas in standby it will keep on running to
receive WAL.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra



On 04/05/2017 01:09 PM, Kuntal Ghosh wrote:

On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
 wrote:


The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values

  parallel_register_count = UINT_MAX + 1 = 1
  parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and
so

  parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values
directly:

  Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.


Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(



Actually, that assert would work, because C does handle overflows on uint
values during subtraction just fine. That is,

 (UINT_MAX+x) - UINT_MAX = x

Also, the comment about overflows before BackgroundWorkerArray claims this
is the case.


Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:

Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */

Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?


I don't follow. How exactly do you get into this situation, assuming you 
actually enforce the (register_count > terminate_count) invariant 
consistently? In the modulo arithmetic of course.


But now that I'm thinking about it, the subtraction actually happens in 
unsigned ints, so the result will be unsigned int too, i.e. always >=0. 
Whether it makes sense depends on the invariant being true.


But I think this would work:

Assert(parallel_register_count - parallel_terminate_count <= 
max_parallel_workers);


If the difference gets 'negative' and wraps around, it'll be close to 
UINT_MAX.


But man, this unsigned int arithmetic makes my brain hurt ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra
 wrote:
>>>
>>> The comment says that the counters are allowed to overflow, i.e. after a
>>> long uptime you might get these values
>>>
>>>  parallel_register_count = UINT_MAX + 1 = 1
>>>  parallel_terminate_count = UINT_MAX
>>>
>>> which is fine, because the C handles the overflow during subtraction and
>>> so
>>>
>>>  parallel_register_count - parallel_terminate_count = 1
>>>
>>> But the assert is not doing subtraction, it's comparing the values
>>> directly:
>>>
>>>  Assert(parallel_register_count >= parallel_terminate_count);
>>>
>>> and the (perfectly valid) values trivially violate this comparison.
>>>
>> Thanks for the explanation. So, we can't use the above assert
>> statement. Even the following assert statement will not be helpful:
>> Assert(parallel_register_count - parallel_terminate_count >= 0);
>> Because, it'll fail to track the scenario when parallel_register_count
>> is not overflowed, still less than parallel_terminate_count. :(
>>
>
> Actually, that assert would work, because C does handle overflows on uint
> values during subtraction just fine. That is,
>
> (UINT_MAX+x) - UINT_MAX = x
>
> Also, the comment about overflows before BackgroundWorkerArray claims this
> is the case.
>
Agreed on the overflowed case. But, my concern is when an overflow has
not yet happened:

Suppose,
uint parallel_register_count = 1; /* Didn't overflow* /
uint parallel_terminate_count = 2; /* Didn't overflow */

Assert(parallel_register_count - parallel_terminate_count >= 0);
We want the assert statement to fail here, but I think it won't since
-1 has a valid representation in unsigned int and it is greater than
0, no?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra



On 04/05/2017 12:36 PM, Kuntal Ghosh wrote:

On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
 wrote:



On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:


AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.



OK, so essentially we reset the counters, getting

 parallel_register_count = 0
 parallel_terminate_count = 0

and then ForgetBackgroundWworker() runs and increments the terminate_count,
breaking the logic?

And you're using (parallel_register_count > 0) to detect whether we're still
in the init phase or not. Hmmm.


Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.



Sure, and I agree that having an explicit flag is the right solution. 
I'm just trying to make sure I understand what's happening.




The comment says that the counters are allowed to overflow, i.e. after a
long uptime you might get these values

 parallel_register_count = UINT_MAX + 1 = 1
 parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and so

 parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values directly:

 Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.


Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(



Actually, that assert would work, because C does handle overflows on 
uint values during subtraction just fine. That is,


(UINT_MAX+x) - UINT_MAX = x

Also, the comment about overflows before BackgroundWorkerArray claims 
this is the case.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra
 wrote:
>
>
> On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:
>>
>> AFAICU, during crash recovery, we wait for all non-syslogger children
>> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
>> StartupDataBase. While starting the startup process we check if any
>> bgworker is scheduled for a restart. If a bgworker is crashed and not
>> meant for a restart(parallel worker in our case), we call
>> ForgetBackgroundWorker() to discard it.
>>
>
> OK, so essentially we reset the counters, getting
>
> parallel_register_count = 0
> parallel_terminate_count = 0
>
> and then ForgetBackgroundWworker() runs and increments the terminate_count,
> breaking the logic?
>
> And you're using (parallel_register_count > 0) to detect whether we're still
> in the init phase or not. Hmmm.
>
Yes. But, as Robert suggested up in the thread, we should not use
(parallel_register_count = 0) as an alternative to define a bgworker
crash. Hence, I've added an argument named 'wasCrashed' in
ForgetBackgroundWorker to indicate a bgworker crash.

>>>
>>> In any case, the comment right before BackgroundWorkerArray says this:
>>>
>>>   * These counters can of course overflow, but it's not important here
>>>   * since the subtraction will still give the right number.
>>>
>>> which means that this assert
>>>
>>> +   Assert(BackgroundWorkerData->parallel_register_count >=
>>> +   BackgroundWorkerData->parallel_terminate_count);
>>>
>>> is outright broken, just like any other attempts to rely on simple
>>> comparisons of these two counters, no?
>>>
>> IIUC, the assert statements ensures that register count should always
>> be greater than or equal to the terminate count.
>> Whereas, the comment refers to the fact that register count and
>> terminate count indicate the total number of parallel workers spawned
>> and terminated respectively since the server has been started. At
>> every session, we're not resetting the counts, hence they can
>> overflow. But, their substraction should give you the number of active
>> parallel worker at any instance.
>> So, I'm not able to see how the assert is broken w.r.t the aforesaid
>> comment. Am I missing something here?
>>
>
> The comment says that the counters are allowed to overflow, i.e. after a
> long uptime you might get these values
>
> parallel_register_count = UINT_MAX + 1 = 1
> parallel_terminate_count = UINT_MAX
>
> which is fine, because the C handles the overflow during subtraction and so
>
> parallel_register_count - parallel_terminate_count = 1
>
> But the assert is not doing subtraction, it's comparing the values directly:
>
> Assert(parallel_register_count >= parallel_terminate_count);
>
> and the (perfectly valid) values trivially violate this comparison.
>
Thanks for the explanation. So, we can't use the above assert
statement. Even the following assert statement will not be helpful:
Assert(parallel_register_count - parallel_terminate_count >= 0);
Because, it'll fail to track the scenario when parallel_register_count
is not overflowed, still less than parallel_terminate_count. :(

It seems to me the only thing we can make sure is
(parallel_register_count == parallel_terminate_count == 0) in
ForgetBackgroundWorker in case of a crash.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra



On 04/05/2017 09:05 AM, Kuntal Ghosh wrote:

On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
 wrote:

On 04/04/2017 06:52 PM, Robert Haas wrote:


On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh 
wrote:


On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas 
wrote:


On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
 wrote:


2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.



Hmm.  So this seems like the root of the problem.  Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.


IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.



While I'm not opposed to that approach, I don't think this is a good
way to implement it.  If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine.  But with what you've got here, you're
essentially relying on "spooky action at a distance".  It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.



I'm probably missing something, but I don't quite understand how these
values actually survive the crash. I mean, what I observed is OOM followed
by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
some reason?

AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.



OK, so essentially we reset the counters, getting

parallel_register_count = 0
parallel_terminate_count = 0

and then ForgetBackgroundWworker() runs and increments the 
terminate_count, breaking the logic?


And you're using (parallel_register_count > 0) to detect whether we're 
still in the init phase or not. Hmmm.




In any case, the comment right before BackgroundWorkerArray says this:

  * These counters can of course overflow, but it's not important here
  * since the subtraction will still give the right number.

which means that this assert

+   Assert(BackgroundWorkerData->parallel_register_count >=
+   BackgroundWorkerData->parallel_terminate_count);

is outright broken, just like any other attempts to rely on simple
comparisons of these two counters, no?


IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?



The comment says that the counters are allowed to overflow, i.e. after a 
long uptime you might get these values


parallel_register_count = UINT_MAX + 1 = 1
parallel_terminate_count = UINT_MAX

which is fine, because the C handles the overflow during subtraction and so

parallel_register_count - parallel_terminate_count = 1

But the assert is not doing subtraction, it's comparing the values 
directly:


Assert(parallel_register_count >= parallel_terminate_count);

and the (perfectly valid) values trivially violate this comparison.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri  wrote:

> I feel there should be an assert if
> (BackgroundWorkerData->parallel_register_count -
> BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)
>
Backend 1 > SET max_parallel_worker = 8;
Backend 1 > Execute a long running parallel query q1 with number of
parallel worker spawned is say, 4.
Backend 2> SET max_parallel_worker = 3;
Backend 2 > Execute a long running parallel query q2 with number of
parallel worker spawned > 0.

The above assert statement will bring down the server unnecessarily
while executing q2.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra
 wrote:
> On 04/04/2017 06:52 PM, Robert Haas wrote:
>>
>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh 
>> wrote:
>>>
>>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas 
>>> wrote:

 On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
  wrote:
>
> 2. the server restarts automatically, initialize
> BackgroundWorkerData->parallel_register_count and
> BackgroundWorkerData->parallel_terminate_count in the shared memory.
> After that, it calls ForgetBackgroundWorker and it increments
> parallel_terminate_count.


 Hmm.  So this seems like the root of the problem.  Presumably those
 things need to be reset AFTER forgetting any background workers from
 before the crash.

>>> IMHO, the fix would be not to increase the terminated parallel worker
>>> count whenever ForgetBackgroundWorker is called due to a bgworker
>>> crash. I've attached a patch for the same. PFA.
>>
>>
>> While I'm not opposed to that approach, I don't think this is a good
>> way to implement it.  If you want to pass an explicit flag to
>> ForgetBackgroundWorker telling it whether or not it should performing
>> the increment, fine.  But with what you've got here, you're
>> essentially relying on "spooky action at a distance".  It would be
>> easy for future code changes to break this, not realizing that
>> somebody's got a hard dependency on 0 having a specific meaning.
>>
>
> I'm probably missing something, but I don't quite understand how these
> values actually survive the crash. I mean, what I observed is OOM followed
> by a restart, so shouldn't BackgroundWorkerShmemInit() simply reset the
> values back to 0? Or do we call ForgetBackgroundWorker() after the crash for
> some reason?
AFAICU, during crash recovery, we wait for all non-syslogger children
to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform
StartupDataBase. While starting the startup process we check if any
bgworker is scheduled for a restart. If a bgworker is crashed and not
meant for a restart(parallel worker in our case), we call
ForgetBackgroundWorker() to discard it.

>
> In any case, the comment right before BackgroundWorkerArray says this:
>
>  * These counters can of course overflow, but it's not important here
>  * since the subtraction will still give the right number.
>
> which means that this assert
>
> +   Assert(BackgroundWorkerData->parallel_register_count >=
> +   BackgroundWorkerData->parallel_terminate_count);
>
> is outright broken, just like any other attempts to rely on simple
> comparisons of these two counters, no?
>
IIUC, the assert statements ensures that register count should always
be greater than or equal to the terminate count.
Whereas, the comment refers to the fact that register count and
terminate count indicate the total number of parallel workers spawned
and terminated respectively since the server has been started. At
every session, we're not resetting the counts, hence they can
overflow. But, their substraction should give you the number of active
parallel worker at any instance.
So, I'm not able to see how the assert is broken w.r.t the aforesaid
comment. Am I missing something here?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Tomas Vondra

On 04/04/2017 06:52 PM, Robert Haas wrote:

On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh  wrote:

On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas  wrote:

On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
 wrote:

2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count.


Hmm.  So this seems like the root of the problem.  Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.


IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.


While I'm not opposed to that approach, I don't think this is a good
way to implement it.  If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine.  But with what you've got here, you're
essentially relying on "spooky action at a distance".  It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.



I'm probably missing something, but I don't quite understand how these 
values actually survive the crash. I mean, what I observed is OOM 
followed by a restart, so shouldn't BackgroundWorkerShmemInit() simply 
reset the values back to 0? Or do we call ForgetBackgroundWorker() after 
the crash for some reason?



In any case, the comment right before BackgroundWorkerArray says this:

 * These counters can of course overflow, but it's not important here
 * since the subtraction will still give the right number.

which means that this assert

+   Assert(BackgroundWorkerData->parallel_register_count >=
+   BackgroundWorkerData->parallel_terminate_count);

is outright broken, just like any other attempts to rely on simple 
comparisons of these two counters, no?



BTW, if this isn't on the open items list, it should be.  It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.



Unrelated, but perhaps we should also add this to open items:

https://www.postgresql.org/message-id/flat/57bbc52c-5d40-bb80-2992-7e1027d0b67c%402ndquadrant.com

(although that's more a 9.6 material).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Robert Haas
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh  wrote:
> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas  wrote:
>> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>>  wrote:
>>> 2. the server restarts automatically, initialize
>>> BackgroundWorkerData->parallel_register_count and
>>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>>> After that, it calls ForgetBackgroundWorker and it increments
>>> parallel_terminate_count.
>>
>> Hmm.  So this seems like the root of the problem.  Presumably those
>> things need to be reset AFTER forgetting any background workers from
>> before the crash.
>>
> IMHO, the fix would be not to increase the terminated parallel worker
> count whenever ForgetBackgroundWorker is called due to a bgworker
> crash. I've attached a patch for the same. PFA.

While I'm not opposed to that approach, I don't think this is a good
way to implement it.  If you want to pass an explicit flag to
ForgetBackgroundWorker telling it whether or not it should performing
the increment, fine.  But with what you've got here, you're
essentially relying on "spooky action at a distance".  It would be
easy for future code changes to break this, not realizing that
somebody's got a hard dependency on 0 having a specific meaning.

BTW, if this isn't on the open items list, it should be.  It's
presumably the fault of b460f5d6693103076dc554aa7cbb96e1e53074f9.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Neha Khatri
Looking further in this context, number of active parallel workers is:
parallel_register_count - parallel_terminate_count

Can active workers ever be greater than max_parallel_workers, I think no.
Then why should there be greater than check in the following condition:

if (parallel && (BackgroundWorkerData->parallel_register_count -
BackgroundWorkerData->parallel_terminate_count) >= max_parallel_workers)

I feel there should be an assert if
(BackgroundWorkerData->parallel_register_count
- BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers)

And the check could be
if (parallel && (active_parallel_workers == max_parallel_workers))
then do not register new parallel wokers and return.

There should be no tolerance for the case when active_parallel_workers >
max_parallel_workers. After all that is the purpose of max_parallel_workers.

Is it like multiple backends trying to register parallel workers at the
same time, for which the greater than check should be present?

Thoughts?

Regards,
Neha


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-03 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas  wrote:
> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
>  wrote:
>> 2. the server restarts automatically, initialize
>> BackgroundWorkerData->parallel_register_count and
>> BackgroundWorkerData->parallel_terminate_count in the shared memory.
>> After that, it calls ForgetBackgroundWorker and it increments
>> parallel_terminate_count.
>
> Hmm.  So this seems like the root of the problem.  Presumably those
> things need to be reset AFTER forgetting any background workers from
> before the crash.
>
IMHO, the fix would be not to increase the terminated parallel worker
count whenever ForgetBackgroundWorker is called due to a bgworker
crash. I've attached a patch for the same. PFA.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


0001-Fix-parallel-worker-counts-after-a-crash.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh
 wrote:
> 2. the server restarts automatically, initialize
> BackgroundWorkerData->parallel_register_count and
> BackgroundWorkerData->parallel_terminate_count in the shared memory.
> After that, it calls ForgetBackgroundWorker and it increments
> parallel_terminate_count.

Hmm.  So this seems like the root of the problem.  Presumably those
things need to be reset AFTER forgetting any background workers from
before the crash.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 5:43 AM, Neha Khatri  wrote:
>
> On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh 
> wrote:
>>
>> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
>>  wrote:
>> >
>> > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
>> > any parallel query.
>> >  In LaunchParallelWorkers, you can see
>> >nworkers = n nworkers_launched = n (n>0)
>> > But, all the workers will crash because of the assert statement.
>> > 2. the server restarts automatically, initialize
>> > BackgroundWorkerData->parallel_register_count and
>> > BackgroundWorkerData->parallel_terminate_count in the shared memory.
>> > After that, it calls ForgetBackgroundWorker and it increments
>> > parallel_terminate_count. In LaunchParallelWorkers, we have the
>> > following condition:
>> > if ((BackgroundWorkerData->parallel_register_count -
>> >  BackgroundWorkerData->parallel_terminate_count) >=
>> > max_parallel_workers)
>> > DO NOT launch any parallel worker.
>> > Hence, nworkers = n nworkers_launched = 0.
>> parallel_register_count and parallel_terminate_count, both are
>> unsigned integer. So, whenever the difference is negative, it'll be a
>> well-defined unsigned integer and certainly much larger than
>> max_parallel_workers. Hence, no workers will be launched. I've
>> attached a patch to fix this.
>
>
> The current explanation of active number of parallel workers is:
>
>  * The active
>  * number of parallel workers is the number of registered workers minus the
>  * terminated ones.
>
> In the situations like you mentioned above, this formula can give negative
> number for active parallel workers. However a negative number for active
> parallel workers does not make any sense.
Agreed.

> I feel it would be better to explain in code that in what situations, the
> formula
> can generate a negative result and what that means.
I think that we need to find a fix so that it never generates a
negative result. The last patch submitted by me generates a negative
value correctly. But, surely that's not enough.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Neha Khatri
On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh 
 wrote:

> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
>  wrote:
> >
> > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
> > any parallel query.
> >  In LaunchParallelWorkers, you can see
> >nworkers = n nworkers_launched = n (n>0)
> > But, all the workers will crash because of the assert statement.
> > 2. the server restarts automatically, initialize
> > BackgroundWorkerData->parallel_register_count and
> > BackgroundWorkerData->parallel_terminate_count in the shared memory.
> > After that, it calls ForgetBackgroundWorker and it increments
> > parallel_terminate_count. In LaunchParallelWorkers, we have the
> > following condition:
> > if ((BackgroundWorkerData->parallel_register_count -
> >  BackgroundWorkerData->parallel_terminate_count) >=
> > max_parallel_workers)
> > DO NOT launch any parallel worker.
> > Hence, nworkers = n nworkers_launched = 0.
> parallel_register_count and parallel_terminate_count, both are
> unsigned integer. So, whenever the difference is negative, it'll be a
> well-defined unsigned integer and certainly much larger than
> max_parallel_workers. Hence, no workers will be launched. I've
> attached a patch to fix this.


The current explanation of active number of parallel workers is:

 * The active
 * number of parallel workers is the number of registered workers minus the
 * terminated ones.

In the situations like you mentioned above, this formula can give negative
number for active parallel workers. However a negative number for active
parallel workers does not make any sense.

I feel it would be better to explain in code that in what situations, the
formula
can generate a negative result and what that means.

Regards,
Neha


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh
 wrote:
>
> 1. Put an Assert(0) in ParallelQueryMain(), start server and execute
> any parallel query.
>  In LaunchParallelWorkers, you can see
>nworkers = n nworkers_launched = n (n>0)
> But, all the workers will crash because of the assert statement.
> 2. the server restarts automatically, initialize
> BackgroundWorkerData->parallel_register_count and
> BackgroundWorkerData->parallel_terminate_count in the shared memory.
> After that, it calls ForgetBackgroundWorker and it increments
> parallel_terminate_count. In LaunchParallelWorkers, we have the
> following condition:
> if ((BackgroundWorkerData->parallel_register_count -
>  BackgroundWorkerData->parallel_terminate_count) >=
> max_parallel_workers)
> DO NOT launch any parallel worker.
> Hence, nworkers = n nworkers_launched = 0.
parallel_register_count and parallel_terminate_count, both are
unsigned integer. So, whenever the difference is negative, it'll be a
well-defined unsigned integer and certainly much larger than
max_parallel_workers. Hence, no workers will be launched. I've
attached a patch to fix this.



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


fix-workers-launched.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro
 wrote:
> On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
>  wrote:
>> Hi,
>>
>> While doing some benchmarking, I've ran into a fairly strange issue with OOM
>> breaking LaunchParallelWorkers() after the restart. What I see happening is
>> this:
>>
>> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> 2) the query does a Hash Aggregate, but ends up eating much more memory due
>> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
>> by OOM
>>
>> 3) the server restarts, the query is executed again, but this time we get in
>> LaunchParallelWorkers
>>
>> nworkers=8 nworkers_launched=0
>>
>> There's nothing else running on the server, and there definitely should be
>> free parallel workers.
>>
>> 4) The query gets killed again, and on the next execution we get
>>
>> nworkers=8 nworkers_launched=8
>>
>> again, although not always. I wonder whether the exact impact depends on OOM
>> killing the leader or worker, for example.
>
> I don't know what's going on but I think I have seen this once or
> twice myself while hacking on test code that crashed.  I wonder if the
> DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
> the DSM control is somehow confused?
>
I think I've run into the same problem while working on parallelizing
plans containing InitPlans. You can reproduce that scenario by
following steps:

1. Put an Assert(0) in ParallelQueryMain(), start server and execute
any parallel query.
 In LaunchParallelWorkers, you can see
   nworkers = n nworkers_launched = n (n>0)
But, all the workers will crash because of the assert statement.
2. the server restarts automatically, initialize
BackgroundWorkerData->parallel_register_count and
BackgroundWorkerData->parallel_terminate_count in the shared memory.
After that, it calls ForgetBackgroundWorker and it increments
parallel_terminate_count. In LaunchParallelWorkers, we have the
following condition:
if ((BackgroundWorkerData->parallel_register_count -
 BackgroundWorkerData->parallel_terminate_count) >=
max_parallel_workers)
DO NOT launch any parallel worker.
Hence, nworkers = n nworkers_launched = 0.

I thought because of my stupid mistake the parallel worker is
crashing, so, this is supposed to happen. Sorry for that.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Thomas Munro
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra
 wrote:
> Hi,
>
> While doing some benchmarking, I've ran into a fairly strange issue with OOM
> breaking LaunchParallelWorkers() after the restart. What I see happening is
> this:
>
> 1) a query is executed, and at the end of LaunchParallelWorkers we get
>
> nworkers=8 nworkers_launched=8
>
> 2) the query does a Hash Aggregate, but ends up eating much more memory due
> to n_distinct underestimate (see [1] from 2015 for details), and gets killed
> by OOM
>
> 3) the server restarts, the query is executed again, but this time we get in
> LaunchParallelWorkers
>
> nworkers=8 nworkers_launched=0
>
> There's nothing else running on the server, and there definitely should be
> free parallel workers.
>
> 4) The query gets killed again, and on the next execution we get
>
> nworkers=8 nworkers_launched=8
>
> again, although not always. I wonder whether the exact impact depends on OOM
> killing the leader or worker, for example.

I don't know what's going on but I think I have seen this once or
twice myself while hacking on test code that crashed.  I wonder if the
DSM_CREATE_NULL_IF_MAXSEGMENTS case could be being triggered because
the DSM control is somehow confused?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers