Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas wrote: > On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra > wrote: >> At first I was like 'WTF? Why do we need a new GUC just becase of an >> assert?' but you're actually not adding a new GUC

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra wrote: > At first I was like 'WTF? Why do we need a new GUC just becase of an > assert?' but you're actually not adding a new GUC parameter, you're adding a > constant which is then used as a max value for max for the two

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Kuntal Ghosh
On Tue, Apr 11, 2017 at 2:36 AM, Robert Haas wrote: > On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri wrote: >> On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: >>> 1. Forget BGW_NEVER_RESTART workers in >>>

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Tomas Vondra
On 04/10/2017 01:39 PM, Kuntal Ghosh wrote: On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas wrote: On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: The problem here seem to be the change in the max_parallel_workers value while the parallel workers

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri wrote: > On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: >> 1. Forget BGW_NEVER_RESTART workers in >> ResetBackgroundWorkerCrashTimes() rather than leaving them around to >> be cleaned up after the

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Neha Khatri
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: > > 1. Forget BGW_NEVER_RESTART workers in > ResetBackgroundWorkerCrashTimes() rather than leaving them around to > be cleaned up after the conclusion of the restart, so that they go > away before rather than after shared

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
[ Adding Julien, whose patch this was. ] On Thu, Apr 6, 2017 at 5:34 AM, Kuntal Ghosh wrote: > While performing StartupDatabase, both master and standby server > behave in similar way till postmaster spawns startup process. > In master, startup process completes its

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Kuntal Ghosh
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: >> The problem here seem to be the change in the max_parallel_workers value >> while the parallel workers are still under execution. So this poses

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-09 Thread Noah Misch
On Thu, Apr 06, 2017 at 03:04:13PM +0530, Kuntal Ghosh wrote: > On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila wrote: > > On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh > > wrote: > >> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra > >>> I'm

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-06 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila wrote: > On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh > wrote: >> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra >>> I'm probably missing something, but I don't quite understand how these >>> values

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: > The problem here seem to be the change in the max_parallel_workers value > while the parallel workers are still under execution. So this poses two > questions: > > 1. From usecase point of view, why could there be a need

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Neha Khatri
On Wed, Apr 5, 2017 at 5:34 PM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri > wrote: > > > I feel there should be an assert if > > (BackgroundWorkerData->parallel_register_count - > >

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:26 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: Did you intend to attach that patch to this email? Actually, I'm confused how we should

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh > wrote: >>> Did you intend to attach that patch to this email? >>> >> Actually, I'm confused how we should ensure (register_count > >>

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:15 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: Did you intend to attach that patch to this email? Actually, I'm confused how we should ensure (register_count > terminate_count) invariant. I think there can be a

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:09 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh wrote: Yes. But, as Robert suggested up in the thread, we should not use (parallel_register_count =

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: >> Did you intend to attach that patch to this email? >> > Actually, I'm confused how we should ensure (register_count > > terminate_count) invariant. I think there can be a system crash what > Tomas has suggested up

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh > wrote: >> Yes. But, as Robert suggested up in the thread, we should not use >> (parallel_register_count = 0) as an alternative to define a

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh wrote: > Yes. But, as Robert suggested up in the thread, we should not use > (parallel_register_count = 0) as an alternative to define a bgworker > crash. Hence, I've added an argument named 'wasCrashed' in >

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Tue, Apr 4, 2017 at 1:52 PM, Tomas Vondra wrote: > In any case, the comment right before BackgroundWorkerArray says this: > > * These counters can of course overflow, but it's not important here > * since the subtraction will still give the right number. > >

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Amit Kapila
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra > wrote: >> On 04/04/2017 06:52 PM, Robert Haas wrote: >>> >>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 01:09 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra wrote: The comment says that the counters are allowed to overflow, i.e. after a long uptime you might get these values parallel_register_count = UINT_MAX + 1 = 1

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra wrote: >>> >>> The comment says that the counters are allowed to overflow, i.e. after a >>> long uptime you might get these values >>> >>> parallel_register_count = UINT_MAX + 1 = 1 >>> parallel_terminate_count =

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 12:36 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra wrote: On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: AFAICU, during crash recovery, we wait for all non-syslogger children to exit, then reset shmem(call

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra wrote: > > > On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: >> >> AFAICU, during crash recovery, we wait for all non-syslogger children >> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform >>

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra wrote: On 04/04/2017 06:52 PM, Robert Haas wrote: On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: On Fri, Mar 31, 2017 at 6:50 PM,

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri wrote: > I feel there should be an assert if > (BackgroundWorkerData->parallel_register_count - > BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers) > Backend 1 > SET max_parallel_worker = 8; Backend 1 >

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra wrote: > On 04/04/2017 06:52 PM, Robert Haas wrote: >> >> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh >> wrote: >>> >>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Tomas Vondra
On 04/04/2017 06:52 PM, Robert Haas wrote: On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh wrote: 2. the

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Robert Haas
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: > On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: >> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh >> wrote: >>> 2. the server restarts automatically,

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Neha Khatri
Looking further in this context, number of active parallel workers is: parallel_register_count - parallel_terminate_count Can active workers ever be greater than max_parallel_workers, I think no. Then why should there be greater than check in the following condition: if (parallel &&

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-03 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: > On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh > wrote: >> 2. the server restarts automatically, initialize >> BackgroundWorkerData->parallel_register_count and >>

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh wrote: > 2. the server restarts automatically, initialize > BackgroundWorkerData->parallel_register_count and > BackgroundWorkerData->parallel_terminate_count in the shared memory. > After that, it calls

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 5:43 AM, Neha Khatri wrote: > > On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh > wrote: >> >> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh >> wrote: >> > >> > 1. Put an Assert(0) in

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Neha Khatri
On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh wrote: > On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh > wrote: > > > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > > any parallel query. > > In

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh wrote: > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > any parallel query. > In LaunchParallelWorkers, you can see >nworkers = n nworkers_launched = n (n>0) > But, all the workers will

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro wrote: > On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra > wrote: >> Hi, >> >> While doing some benchmarking, I've ran into a fairly strange issue with OOM >> breaking

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Thomas Munro
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra wrote: > Hi, > > While doing some benchmarking, I've ran into a fairly strange issue with OOM > breaking LaunchParallelWorkers() after the restart. What I see happening is > this: > > 1) a query is executed, and at the