Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Tue, Apr 11, 2017 at 12:15 PM, Robert Haas wrote: > On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra > wrote: >> At first I was like 'WTF? Why do we need a new GUC just becase of an >> assert?' but you're actually not adding a new GUC parameter, you're adding a >> constant which is then used as a

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Robert Haas
On Mon, Apr 10, 2017 at 7:17 PM, Tomas Vondra wrote: > At first I was like 'WTF? Why do we need a new GUC just becase of an > assert?' but you're actually not adding a new GUC parameter, you're adding a > constant which is then used as a max value for max for the two existing > parallel GUCs. > >

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-11 Thread Kuntal Ghosh
On Tue, Apr 11, 2017 at 2:36 AM, Robert Haas wrote: > On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri wrote: >> On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: >>> 1. Forget BGW_NEVER_RESTART workers in >>> ResetBackgroundWorkerCrashTimes() rather than leaving them around to >>> be cleaned up a

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Tomas Vondra
On 04/10/2017 01:39 PM, Kuntal Ghosh wrote: On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas wrote: On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: The problem here seem to be the change in the max_parallel_workers value while the parallel workers are still under execution. So this poses two qu

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
On Mon, Apr 10, 2017 at 2:32 PM, Neha Khatri wrote: > On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: >> 1. Forget BGW_NEVER_RESTART workers in >> ResetBackgroundWorkerCrashTimes() rather than leaving them around to >> be cleaned up after the conclusion of the restart, so that they go >> away

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Neha Khatri
On Tue, Apr 11, 2017 at 1:16 AM, Robert Haas wrote: > > 1. Forget BGW_NEVER_RESTART workers in > ResetBackgroundWorkerCrashTimes() rather than leaving them around to > be cleaned up after the conclusion of the restart, so that they go > away before rather than after shared memory is reset. Now

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Robert Haas
[ Adding Julien, whose patch this was. ] On Thu, Apr 6, 2017 at 5:34 AM, Kuntal Ghosh wrote: > While performing StartupDatabase, both master and standby server > behave in similar way till postmaster spawns startup process. > In master, startup process completes its job and dies. As a result, > r

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-10 Thread Kuntal Ghosh
On Thu, Apr 6, 2017 at 6:50 AM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: >> The problem here seem to be the change in the max_parallel_workers value >> while the parallel workers are still under execution. So this poses two >> questions: >> >> 1. From usecase point

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-09 Thread Noah Misch
On Thu, Apr 06, 2017 at 03:04:13PM +0530, Kuntal Ghosh wrote: > On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila wrote: > > On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh > > wrote: > >> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra > >>> I'm probably missing something, but I don't quite understand how

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-06 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 6:49 PM, Amit Kapila wrote: > On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh > wrote: >> On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra >>> I'm probably missing something, but I don't quite understand how these >>> values actually survive the crash. I mean, what I observed is

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 8:17 PM, Neha Khatri wrote: > The problem here seem to be the change in the max_parallel_workers value > while the parallel workers are still under execution. So this poses two > questions: > > 1. From usecase point of view, why could there be a need to tweak the > max_paral

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Neha Khatri
On Wed, Apr 5, 2017 at 5:34 PM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri > wrote: > > > I feel there should be an assert if > > (BackgroundWorkerData->parallel_register_count - > > BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers) > > > Backen

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:26 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: Did you intend to attach that patch to this email? Actually, I'm confused how we should ensure (register_count > terminate_count) invariant. I

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:45 PM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh > wrote: >>> Did you intend to attach that patch to this email? >>> >> Actually, I'm confused how we should ensure (register_count > >> terminate_count) invariant. I think there can be a system cras

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:15 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: Did you intend to attach that patch to this email? Actually, I'm confused how we should ensure (register_count > terminate_count) invariant. I think there can be a system crash what Tomas has sugge

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 04:09 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas wrote: On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh wrote: Yes. But, as Robert suggested up in the thread, we should not use (parallel_register_count = 0) as an alternative to define a bgworker crash. Henc

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 10:09 AM, Kuntal Ghosh wrote: >> Did you intend to attach that patch to this email? >> > Actually, I'm confused how we should ensure (register_count > > terminate_count) invariant. I think there can be a system crash what > Tomas has suggested up in the thread. > > Assert(pa

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 7:31 PM, Robert Haas wrote: > On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh > wrote: >> Yes. But, as Robert suggested up in the thread, we should not use >> (parallel_register_count = 0) as an alternative to define a bgworker >> crash. Hence, I've added an argument named 'w

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Wed, Apr 5, 2017 at 6:36 AM, Kuntal Ghosh wrote: > Yes. But, as Robert suggested up in the thread, we should not use > (parallel_register_count = 0) as an alternative to define a bgworker > crash. Hence, I've added an argument named 'wasCrashed' in > ForgetBackgroundWorker to indicate a bgworke

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Robert Haas
On Tue, Apr 4, 2017 at 1:52 PM, Tomas Vondra wrote: > In any case, the comment right before BackgroundWorkerArray says this: > > * These counters can of course overflow, but it's not important here > * since the subtraction will still give the right number. > > which means that this assert > > +

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Amit Kapila
On Wed, Apr 5, 2017 at 12:35 PM, Kuntal Ghosh wrote: > On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra > wrote: >> On 04/04/2017 06:52 PM, Robert Haas wrote: >>> >>> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh >>> wrote: On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: > >>

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 01:09 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra wrote: The comment says that the counters are allowed to overflow, i.e. after a long uptime you might get these values parallel_register_count = UINT_MAX + 1 = 1 parallel_terminate_count = U

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 4:13 PM, Tomas Vondra wrote: >>> >>> The comment says that the counters are allowed to overflow, i.e. after a >>> long uptime you might get these values >>> >>> parallel_register_count = UINT_MAX + 1 = 1 >>> parallel_terminate_count = UINT_MAX >>> >>> which is fine

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 12:36 PM, Kuntal Ghosh wrote: On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra wrote: On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: AFAICU, during crash recovery, we wait for all non-syslogger children to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform StartupDa

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Wed, Apr 5, 2017 at 3:07 PM, Tomas Vondra wrote: > > > On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: >> >> AFAICU, during crash recovery, we wait for all non-syslogger children >> to exit, then reset shmem(call BackgroundWorkerShmemInit) and perform >> StartupDataBase. While starting the startup

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Tomas Vondra
On 04/05/2017 09:05 AM, Kuntal Ghosh wrote: On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra wrote: On 04/04/2017 06:52 PM, Robert Haas wrote: On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: On Thu, Mar 30, 2017 at 4:35 PM, Kuntal G

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 12:16 PM, Neha Khatri wrote: > I feel there should be an assert if > (BackgroundWorkerData->parallel_register_count - > BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers) > Backend 1 > SET max_parallel_worker = 8; Backend 1 > Execute a long running

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-05 Thread Kuntal Ghosh
On Tue, Apr 4, 2017 at 11:22 PM, Tomas Vondra wrote: > On 04/04/2017 06:52 PM, Robert Haas wrote: >> >> On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh >> wrote: >>> >>> On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas >>> wrote: On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh wrote:

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Tomas Vondra
On 04/04/2017 06:52 PM, Robert Haas wrote: On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh wrote: 2. the server restarts automatically, initialize BackgroundWorkerData->parallel_register_co

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-04 Thread Robert Haas
On Mon, Apr 3, 2017 at 6:08 AM, Kuntal Ghosh wrote: > On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: >> On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh >> wrote: >>> 2. the server restarts automatically, initialize >>> BackgroundWorkerData->parallel_register_count and >>> BackgroundWorkerData

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-03 Thread Neha Khatri
Looking further in this context, number of active parallel workers is: parallel_register_count - parallel_terminate_count Can active workers ever be greater than max_parallel_workers, I think no. Then why should there be greater than check in the following condition: if (parallel && (Back

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-04-03 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 6:50 PM, Robert Haas wrote: > On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh > wrote: >> 2. the server restarts automatically, initialize >> BackgroundWorkerData->parallel_register_count and >> BackgroundWorkerData->parallel_terminate_count in the shared memory. >> After th

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-31 Thread Robert Haas
On Thu, Mar 30, 2017 at 4:35 PM, Kuntal Ghosh wrote: > 2. the server restarts automatically, initialize > BackgroundWorkerData->parallel_register_count and > BackgroundWorkerData->parallel_terminate_count in the shared memory. > After that, it calls ForgetBackgroundWorker and it increments > paral

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 5:43 AM, Neha Khatri wrote: > > On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh > wrote: >> >> On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh >> wrote: >> > >> > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute >> > any parallel query. >> > In LaunchPara

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Neha Khatri
On Fri, Mar 31, 2017 at 8:29 AM, Kuntal Ghosh wrote: > On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh > wrote: > > > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > > any parallel query. > > In LaunchParallelWorkers, you can see > >nworkers = n nworkers_launched

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 2:05 AM, Kuntal Ghosh wrote: > > 1. Put an Assert(0) in ParallelQueryMain(), start server and execute > any parallel query. > In LaunchParallelWorkers, you can see >nworkers = n nworkers_launched = n (n>0) > But, all the workers will crash because of the assert sta

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Kuntal Ghosh
On Fri, Mar 31, 2017 at 12:32 AM, Thomas Munro wrote: > On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra > wrote: >> Hi, >> >> While doing some benchmarking, I've ran into a fairly strange issue with OOM >> breaking LaunchParallelWorkers() after the restart. What I see happening is >> this: >> >> 1)

Re: [HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Thomas Munro
On Fri, Mar 31, 2017 at 7:38 AM, Tomas Vondra wrote: > Hi, > > While doing some benchmarking, I've ran into a fairly strange issue with OOM > breaking LaunchParallelWorkers() after the restart. What I see happening is > this: > > 1) a query is executed, and at the end of LaunchParallelWorkers we g

[HACKERS] strange parallel query behavior after OOM crashes

2017-03-30 Thread Tomas Vondra
Hi, While doing some benchmarking, I've ran into a fairly strange issue with OOM breaking LaunchParallelWorkers() after the restart. What I see happening is this: 1) a query is executed, and at the end of LaunchParallelWorkers we get nworkers=8 nworkers_launched=8 2) the query does a Ha