Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-31 Thread Robert Haas
On Tue, Mar 28, 2017 at 11:08 PM, Tomas Vondra
 wrote:
> Maybe. It depends on how valuable it's to keep Gather and GatherMerge
> similar - having nreaders in one and not the other seems a bit weird. But
> maybe the refactoring would remove it from both nodes?

Yeah, it appears to be the case that both Gather and GatherMerge
inevitably end up with nworkers_launched == nreaders, which seems
dumb.  If we're going to clean this up, I think we should  make them
both match.

> Also, it does not really solve the issue that we're using 'nreaders' or
> 'nworkers_launched' to access array with one extra element.

I'm not clear that there's any fundamental solution to that problem
other than trying to clarify it through comments.

Meantime, I think it's not good to leave this crashing, so I pushed
Rushabh's v2 patch for the actual crash for now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-30 Thread Rushabh Lathia
On Wed, Mar 29, 2017 at 8:38 AM, Tomas Vondra 
wrote:

> Hi,
>
> On 03/28/2017 11:07 AM, Rushabh Lathia wrote:
>
>> ...
>> I think we all agree that we should get rid of nreaders from the
>> GatherMergeState and need to do some code re-factor. But if I
>> understood correctly that Robert's concern was to do that re-factor
>> as separate commit.
>>
>>
> Maybe. It depends on how valuable it's to keep Gather and GatherMerge
> similar - having nreaders in one and not the other seems a bit weird. But
> maybe the refactoring would remove it from both nodes?
>
> Also, it does not really solve the issue that we're using 'nreaders' or
> 'nworkers_launched' to access array with one extra element.
>

With the latest patches, into gather_merge_init() there is one loop where we
initialize the tuple array (which is to hold the tuple came from the each
worker)
and slot (which is to hold the current tuple slot).

We hold the tuple array for the worker because when we read tuples from
workers,
it's a good idea to read several at once for efficiency when possible: this
minimizes
context-switching overhead.

For leader we don't initialize the slot because for leader we directly call
then ExecProcNode(),
which returns the slot. Also there is no point in holding the tuple array
for the leader.

With the latest patch's, gather_merge_clear_slots() will only clean the
tuple table
slot and the tuple array buffer for the workers (nworkers_launched).

Only time we loop through the "nworkers_launched + 1" is while reading the
tuple
from the each gather merge input. Personally I don't see any problem with
that,
but I am very much open for suggestions.



>
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Rushabh Lathia


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-28 Thread Tomas Vondra

Hi,

On 03/28/2017 11:07 AM, Rushabh Lathia wrote:
... 

I think we all agree that we should get rid of nreaders from the 
GatherMergeState and need to do some code re-factor. But if I

understood correctly that Robert's concern was to do that re-factor
as separate commit.



Maybe. It depends on how valuable it's to keep Gather and GatherMerge 
similar - having nreaders in one and not the other seems a bit weird. 
But maybe the refactoring would remove it from both nodes?


Also, it does not really solve the issue that we're using 'nreaders' or 
'nworkers_launched' to access array with one extra element.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-28 Thread Rushabh Lathia
On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra  wrote:

>
>
> On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
>
>>
>> ...
>> I was doing more testing with the patch and I found one more server
>> crash with the patch around same area, when we forced the gather
>> merge for the scan having zero rows.
>>
>> create table dept ( deptno numeric, dname varchar(20);
>> set parallel_tuple_cost =0;
>> set parallel_setup_cost =0;
>> set min_parallel_table_scan_size =0;
>> set min_parallel_index_scan_size =0;
>> set force_parallel_mode=regress;
>> explain analyze select * from dept order by deptno;
>>
>> This is because for leader we don't initialize the slot into gm_slots. So
>> in case where launched worker is zero and table having zero rows, we
>> end up having NULL slot into gm_slots array.
>>
>> Currently gather_merge_clear_slots() clear out the tuple table slots for
>> each
>> gather merge input and returns clear slot. In the patch I modified
>> function
>> gather_merge_clear_slots() to just clear out the tuple table slots and
>> always return NULL when All the queues and heap us exhausted.
>>
>>
> Isn't that just another sign the code might be a bit too confusing? I see
> two main issues in the code:
>
> 1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
> way to cause off-by-one errors
>
> 2) mixing objects with different life spans (slots for readers vs. slot
> for the leader) seems like a bad idea too
>
> I wonder how much we gain by reusing the slot from the leader (I'd be
> surprised if it was at all measurable). David posted a patch reworking
> this, and significantly simplifying the GatherMerge node. Why not to accept
> that?
>
>
>
I think we all agree that we should get rid of nreaders from the
GatherMergeState
and need to do some code re-factor. But if I understood correctly that
Robert's
concern was to do that re-factor as separate commit.

I picked David's patch and started reviewing the changes. I applied that
patch
on top of my v2 patch (which does the re-factor of
gather_merge_clear_slots).

In David's patch, into gather_merge_init(), a loop where tuple array is
getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.

Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).

Thanks,


-- 
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 62c399e..2d7eb71 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -195,9 +195,9 @@ ExecGatherMerge(GatherMergeState *node)
 			/* Set up tuple queue readers to read the results. */
 			if (pcxt->nworkers_launched > 0)
 			{
-node->nreaders = 0;
-node->reader = palloc(pcxt->nworkers_launched *
-	  sizeof(TupleQueueReader *));
+node->reader = (TupleQueueReader **)
+			palloc(pcxt->nworkers_launched *
+   sizeof(TupleQueueReader *));
 
 Assert(gm->numCols);
 
@@ -205,7 +205,7 @@ ExecGatherMerge(GatherMergeState *node)
 {
 	shm_mq_set_handle(node->pei->tqueue[i],
 	  pcxt->worker[i].bgwhandle);
-	node->reader[node->nreaders++] =
+	node->reader[i] =
 		CreateTupleQueueReader(node->pei->tqueue[i],
 			   node->tupDesc);
 }
@@ -298,7 +298,7 @@ ExecShutdownGatherMergeWorkers(GatherMergeState *node)
 	{
 		int			i;
 
-		for (i = 0; i < node->nreaders; ++i)
+		for (i = 0; i < node->nworkers_launched; ++i)
 			if (node->reader[i])
 DestroyTupleQueueReader(node->reader[i]);
 
@@ -344,28 +344,26 @@ ExecReScanGatherMerge(GatherMergeState *node)
 static void
 gather_merge_init(GatherMergeState *gm_state)
 {
-	int			nreaders = gm_state->nreaders;
+	int			nslots = gm_state->nworkers_launched + 1;
 	bool		initialize = true;
 	int			i;
 
 	/*
 	 * Allocate gm_slots for the number of worker + one more slot for leader.
-	 * Last slot is always for leader. Leader always calls ExecProcNode() to
-	 * read the tuple which will return the TupleTableSlot. Later it will
-	 * directly get assigned to gm_slot. So just initialize leader gm_slot
-	 * with NULL. For other slots below code will call
-	 * ExecInitExtraTupleSlot() which will do the initialization of worker
-	 * slots.
+	 * The final slot in the array is reserved for the leader process. This
+	 * slot is always populated via ExecProcNode(). This can be set to NULL
+	 * for now.  The remaining slots we'll initialize with a call to
+	 * ExecInitExtraTupleSlot().
 	 */
-	gm_state->gm_slots =
-		palloc((gm_state->nreaders + 1) * sizeof(TupleTableSlot *));
-	gm_state->gm_slots[gm_state->nreaders] = NULL;
+	gm_state->gm_slots = (TupleTableSlot **)
+palloc(nslots * sizeof(TupleTableSlot 

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tomas Vondra



On 03/27/2017 01:40 PM, Rushabh Lathia wrote:


...
I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for 
each

gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.



Isn't that just another sign the code might be a bit too confusing? I 
see two main issues in the code:


1) allocating 'slots' as 'nreaders+1' elements, which seems like a good 
way to cause off-by-one errors


2) mixing objects with different life spans (slots for readers vs. slot 
for the leader) seems like a bad idea too


I wonder how much we gain by reusing the slot from the leader (I'd be 
surprised if it was at all measurable). David posted a patch reworking 
this, and significantly simplifying the GatherMerge node. Why not to 
accept that?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tomas Vondra



On 03/27/2017 05:51 PM, Robert Haas wrote:

On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:

Robert Haas  writes:

On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
 wrote:

But it seems a bit futile to produce the parallel plan in the first place,
because with max_parallel_workers=0 we can't possibly get any parallel
workers ever. I wonder why compute_parallel_worker() only looks at
max_parallel_workers_per_gather, i.e. why shouldn't it do:
parallel_workers = Min(parallel_workers, max_parallel_workers);
Perhaps this was discussed and is actually intentional, though.



It was intentional.  See the last paragraph of
https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com


Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.


Hey, imagine if the comments explained the logic behind the code!

Good idea.  How about the attached?



Certainly an improvement. But perhaps we should also mention this at 
compute_parallel_worker, i.e. that not looking at max_parallel_workers 
is intentional.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:36 PM, Rushabh Lathia
 wrote:
> Hmm I agree that it's good idea, and I will work on that as separate patch.

Maybe you want to start with what David already posted?

>> Possibly we
>> should fix the crash bug first, though, and then do that afterwards.
>> What bugs me a little about Rushabh's fix is that it looks like magic.
>> You have to know that we're looping over two things and freeing them
>> up, but there's one more of one thing than the other thing.  I think
>> that at least needs some comments or something.
>>
> So in my second version of patch I change  gather_merge_clear_slots() to
> just clear the slot for the worker and some other clean up. Also throwing
> NULL from gather_merge_getnext() when all the queues and heap are
> exhausted - which earlier gather_merge_clear_slots() was returning clear
> slot. This way we make sure that we don't run over freeing the slot for
> the leader and gather_merge_getnext() don't need to depend on that
> clear slot.

Ah, I missed that.  That does seem cleaner.  Anybody see a problem
with that approach?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Rushabh Lathia
On Mon, Mar 27, 2017 at 9:52 PM, Robert Haas  wrote:

> On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
>  wrote:
> > On 28 March 2017 at 04:57, Robert Haas  wrote:
> >> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
> >>  wrote:
> >>> About the original issue reported by Tomas, I did more debugging and
> >>> found that - problem was gather_merge_clear_slots() was not returning
> >>> the clear slot when nreader is zero (means nworkers_launched = 0).
> >>> Due to the same scan was continue even all the tuple are exhausted,
> >>> and then end up with server crash at gather_merge_getnext(). In the
> patch
> >>> I also added the Assert into gather_merge_getnext(), about the index
> >>> should be less then the nreaders + 1 (leader).
> >>
> >> Well, you and David Rowley seem to disagree on what the fix is here.
> >> His patches posted upthread do A, and yours do B, and from a quick
> >> look those things are not just different ways of spelling the same
> >> underlying fix, but actually directly conflicting ideas about what the
> >> fix should be.  Any chance you can review his patches, and maybe he
> >> can review yours, and we could try to agree on a consensus position?
> >> :-)
> >
> > When comparing Rushabh's to my minimal patch, both change
> > gather_merge_clear_slots() to clear the leader's slot. My fix
> > mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> > leader's slot, but seems that's not required since
> > gather_merge_readnext() sets the leader's slot to the output of
> > ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> > mistake. Rushabh's patch sidesteps this by adding a conditional
> > pfree() to not free slot that we didn't allocate in the first place.
> >
> > I do think the code could be improved a bit. I don't like the way
> > GatherMergeState's nreaders and nworkers_launched are always the same.
> > I think this all threw me off a bit and may have been the reason for
> > the bug in the first place.
>
> Yeah, if those fields are always the same, then I think that they
> should be merged.  That seems undeniably a good idea.


Hmm I agree that it's good idea, and I will work on that as separate patch.


> Possibly we
> should fix the crash bug first, though, and then do that afterwards.
> What bugs me a little about Rushabh's fix is that it looks like magic.
> You have to know that we're looping over two things and freeing them
> up, but there's one more of one thing than the other thing.  I think
> that at least needs some comments or something.
>
>
So in my second version of patch I change  gather_merge_clear_slots() to
just clear the slot for the worker and some other clean up. Also throwing
NULL from gather_merge_getnext() when all the queues and heap are
exhausted - which earlier gather_merge_clear_slots() was returning clear
slot. This way we make sure that we don't run over freeing the slot for
the leader and gather_merge_getnext() don't need to depend on that
clear slot.


Thanks,
Rushabh Lathia
www.EnterpriseDB.com


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:26 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>>> Since this has now come up twice, I suggest adding a comment there
>>> that explains why we're intentionally ignoring max_parallel_workers.
>
>> Good idea.  How about the attached?
>
> WFM ... but seems like there should be some flavor of this statement
> in the user-facing docs too (ie, "max_parallel_workers_per_gather >
> max_parallel_workers is a bad idea unless you're trying to test what
> happens when a plan can't get all the workers it planned for").  The
> existing text makes some vague allusions suggesting that the two
> GUCs might be interrelated, but I think it could be improved.

Do you have a more specific idea?  I mean, this seems like a
degenerate case of what the documentation for
max_parallel_workers_per_gather says already. Even if
max_parallel_workers_per_gather <= Min(max_worker_processes,
max_parallel_workers), it's quite possible that you'll regularly be
generating plans that can't obtain the budgeted number of workers.
The only thing that is really special about the case where
max_parallel_workers_per_gather > Min(max_worker_processes,
max_parallel_workers) is that this can happen even on an
otherwise-idle system.  I'm not quite sure how to emphasize that
without seeming to state the obvious.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
>> Since this has now come up twice, I suggest adding a comment there
>> that explains why we're intentionally ignoring max_parallel_workers.

> Good idea.  How about the attached?

WFM ... but seems like there should be some flavor of this statement
in the user-facing docs too (ie, "max_parallel_workers_per_gather >
max_parallel_workers is a bad idea unless you're trying to test what
happens when a plan can't get all the workers it planned for").  The
existing text makes some vague allusions suggesting that the two
GUCs might be interrelated, but I think it could be improved.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
 wrote:
> On 28 March 2017 at 04:57, Robert Haas  wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>>  wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be.  Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged.  That seems undeniably a good idea.  Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing.  I think
that at least needs some comments or something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread David Rowley
On 28 March 2017 at 04:57, Robert Haas  wrote:
> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>  wrote:
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>
> Well, you and David Rowley seem to disagree on what the fix is here.
> His patches posted upthread do A, and yours do B, and from a quick
> look those things are not just different ways of spelling the same
> underlying fix, but actually directly conflicting ideas about what the
> fix should be.  Any chance you can review his patches, and maybe he
> can review yours, and we could try to agree on a consensus position?
> :-)

When comparing Rushabh's to my minimal patch, both change
gather_merge_clear_slots() to clear the leader's slot. My fix
mistakenly changes things so it does ExecInitExtraTupleSlot() on the
leader's slot, but seems that's not required since
gather_merge_readnext() sets the leader's slot to the output of
ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
mistake. Rushabh's patch sidesteps this by adding a conditional
pfree() to not free slot that we didn't allocate in the first place.

I do think the code could be improved a bit. I don't like the way
GatherMergeState's nreaders and nworkers_launched are always the same.
I think this all threw me off a bit and may have been the reason for
the bug in the first place.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
 wrote:
> About the original issue reported by Tomas, I did more debugging and
> found that - problem was gather_merge_clear_slots() was not returning
> the clear slot when nreader is zero (means nworkers_launched = 0).
> Due to the same scan was continue even all the tuple are exhausted,
> and then end up with server crash at gather_merge_getnext(). In the patch
> I also added the Assert into gather_merge_getnext(), about the index
> should be less then the nreaders + 1 (leader).

Well, you and David Rowley seem to disagree on what the fix is here.
His patches posted upthread do A, and yours do B, and from a quick
look those things are not just different ways of spelling the same
underlying fix, but actually directly conflicting ideas about what the
fix should be.  Any chance you can review his patches, and maybe he
can review yours, and we could try to agree on a consensus position?
:-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 9:54 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
>>  wrote:
>>> But it seems a bit futile to produce the parallel plan in the first place,
>>> because with max_parallel_workers=0 we can't possibly get any parallel
>>> workers ever. I wonder why compute_parallel_worker() only looks at
>>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>> parallel_workers = Min(parallel_workers, max_parallel_workers);
>>> Perhaps this was discussed and is actually intentional, though.
>
>> It was intentional.  See the last paragraph of
>> https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com
>
> Since this has now come up twice, I suggest adding a comment there
> that explains why we're intentionally ignoring max_parallel_workers.

Hey, imagine if the comments explained the logic behind the code!

Good idea.  How about the attached?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


add-worker-comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Tom Lane
Robert Haas  writes:
> On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
>  wrote:
>> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>> parallel_workers = Min(parallel_workers, max_parallel_workers);
>> Perhaps this was discussed and is actually intentional, though.

> It was intentional.  See the last paragraph of
> https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com

Since this has now come up twice, I suggest adding a comment there
that explains why we're intentionally ignoring max_parallel_workers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Robert Haas
On Mon, Mar 27, 2017 at 1:29 AM, Rushabh Lathia
 wrote:
>
>> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>
>>parallel_workers = Min(parallel_workers, max_parallel_workers);
>>
>> Perhaps this was discussed and is actually intentional, though.
>>
>
> Yes, I am not quite sure about this.

It was intentional.  See the last paragraph of

https://www.postgresql.org/message-id/ca%2btgmoamsn6a1780vutfsarcu0lcr%3dco2yi4vluo-jqbn4y...@mail.gmail.com

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-27 Thread Rushabh Lathia
On Mon, Mar 27, 2017 at 10:59 AM, Rushabh Lathia 
wrote:

>
>
> On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>>
>>>
>>>
>>> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
>>> >> > wrote:
>>>
>>> On 3/25/17 09:01, David Rowley wrote:
>>> > On 25 March 2017 at 23:09, Rushabh Lathia <
>>> rushabh.lat...@gmail.com > wrote:
>>> >> Also another point which I think we should fix is, when someone
>>> set
>>> >> max_parallel_workers = 0, we should also set the
>>> >> max_parallel_workers_per_gather
>>> >> to zero. So that way it we can avoid generating the gather path
>>> with
>>> >> max_parallel_worker = 0.
>>> > I see that it was actually quite useful that it works the way it
>>> does.
>>> > If it had worked the same as max_parallel_workers_per_gather, then
>>> > likely Tomas would never have found this bug.
>>>
>>> Another problem is that the GUC system doesn't really support cases
>>> where the validity of one setting depends on the current value of
>>> another setting.  So each individual setting needs to be robust
>>> against
>>> cases of related settings being nonsensical.
>>>
>>>
>>> Okay.
>>>
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>>
>>> PFA simple patch to fix the problem.
>>>
>>>
>> I think there are two issues at play, here - the first one is that we
>> still produce parallel plans even with max_parallel_workers=0, and the
>> second one is the crash in GatherMerge when nworkers=0.
>>
>> Your patch fixes the latter (thanks for looking into it), which is
>> obviously a good thing - getting 0 workers on a busy system is quite
>> possible, because all the parallel workers can be already chewing on some
>> other query.
>>
>>
> Thanks.
>
>

I was doing more testing with the patch and I found one more server
crash with the patch around same area, when we forced the gather
merge for the scan having zero rows.

create table dept ( deptno numeric, dname varchar(20);
set parallel_tuple_cost =0;
set parallel_setup_cost =0;
set min_parallel_table_scan_size =0;
set min_parallel_index_scan_size =0;
set force_parallel_mode=regress;
explain analyze select * from dept order by deptno;

This is because for leader we don't initialize the slot into gm_slots. So
in case where launched worker is zero and table having zero rows, we
end up having NULL slot into gm_slots array.

Currently gather_merge_clear_slots() clear out the tuple table slots for
each
gather merge input and returns clear slot. In the patch I modified function
gather_merge_clear_slots() to just clear out the tuple table slots and
always return NULL when All the queues and heap us exhausted.



> But it seems a bit futile to produce the parallel plan in the first place,
>> because with max_parallel_workers=0 we can't possibly get any parallel
>> workers ever. I wonder why compute_parallel_worker() only looks at
>> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>>
>>parallel_workers = Min(parallel_workers, max_parallel_workers);
>>
>>
> I agree with you here. Producing the parallel plan when
> max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
> think that we should do something like:
>
> /*
>  * In no case use more than max_parallel_workers_per_gather or
>  * max_parallel_workers.
>  */
> parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
> max_parallel_workers_per_gather));
>
>
>
>> Perhaps this was discussed and is actually intentional, though.
>>
>>
> Yes, I am not quite sure about this.
>
> Regarding handling this at the GUC level - I agree with Peter that that's
>> not a good idea. I suppose we could deal with checking the values in the
>> GUC check/assign hooks, but what we don't have is a way to undo the changes
>> in all the GUCs. That is, if I do
>>
>>SET max_parallel_workers = 0;
>>SET max_parallel_workers = 16;
>>
>> I expect to end up with just max_parallel_workers GUC changed and nothing
>> else.
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>
>
> --
> Rushabh Lathia
>



Regards,
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 

Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Rushabh Lathia
On Mon, Mar 27, 2017 at 3:43 AM, Tomas Vondra 
wrote:

> On 03/25/2017 05:18 PM, Rushabh Lathia wrote:
>
>>
>>
>> On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
>> > > wrote:
>>
>> On 3/25/17 09:01, David Rowley wrote:
>> > On 25 March 2017 at 23:09, Rushabh Lathia > > wrote:
>> >> Also another point which I think we should fix is, when someone set
>> >> max_parallel_workers = 0, we should also set the
>> >> max_parallel_workers_per_gather
>> >> to zero. So that way it we can avoid generating the gather path
>> with
>> >> max_parallel_worker = 0.
>> > I see that it was actually quite useful that it works the way it
>> does.
>> > If it had worked the same as max_parallel_workers_per_gather, then
>> > likely Tomas would never have found this bug.
>>
>> Another problem is that the GUC system doesn't really support cases
>> where the validity of one setting depends on the current value of
>> another setting.  So each individual setting needs to be robust
>> against
>> cases of related settings being nonsensical.
>>
>>
>> Okay.
>>
>> About the original issue reported by Tomas, I did more debugging and
>> found that - problem was gather_merge_clear_slots() was not returning
>> the clear slot when nreader is zero (means nworkers_launched = 0).
>> Due to the same scan was continue even all the tuple are exhausted,
>> and then end up with server crash at gather_merge_getnext(). In the patch
>> I also added the Assert into gather_merge_getnext(), about the index
>> should be less then the nreaders + 1 (leader).
>>
>> PFA simple patch to fix the problem.
>>
>>
> I think there are two issues at play, here - the first one is that we
> still produce parallel plans even with max_parallel_workers=0, and the
> second one is the crash in GatherMerge when nworkers=0.
>
> Your patch fixes the latter (thanks for looking into it), which is
> obviously a good thing - getting 0 workers on a busy system is quite
> possible, because all the parallel workers can be already chewing on some
> other query.
>
>
Thanks.


> But it seems a bit futile to produce the parallel plan in the first place,
> because with max_parallel_workers=0 we can't possibly get any parallel
> workers ever. I wonder why compute_parallel_worker() only looks at
> max_parallel_workers_per_gather, i.e. why shouldn't it do:
>
>parallel_workers = Min(parallel_workers, max_parallel_workers);
>
>
I agree with you here. Producing the parallel plan when
max_parallel_workers = 0 is wrong. But rather then your suggested fix, I
think that we should do something like:

/*
 * In no case use more than max_parallel_workers_per_gather or
 * max_parallel_workers.
 */
parallel_workers = Min(parallel_workers, Min(max_parallel_workers,
max_parallel_workers_per_gather));



> Perhaps this was discussed and is actually intentional, though.
>
>
Yes, I am not quite sure about this.

Regarding handling this at the GUC level - I agree with Peter that that's
> not a good idea. I suppose we could deal with checking the values in the
> GUC check/assign hooks, but what we don't have is a way to undo the changes
> in all the GUCs. That is, if I do
>
>SET max_parallel_workers = 0;
>SET max_parallel_workers = 16;
>
> I expect to end up with just max_parallel_workers GUC changed and nothing
> else.
>
> regards
>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



-- 
Rushabh Lathia


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread David Rowley
On 27 March 2017 at 10:23, Tomas Vondra  wrote:
> I'm not sure we need to invent a new magic value, though. Can we simply look
> at force_parallel_mode, and if it's 'regress' then tread 0 differently?

see standard_planner()

if (force_parallel_mode != FORCE_PARALLEL_OFF && best_path->parallel_safe)
{
Gather   *gather = makeNode(Gather);


Probably force_parallel_mode is good for testing the tuple queue code,
and some code in Gather, but I'm not quite sure what else its good
for. Certainly not GatherMerge.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Tomas Vondra

On 03/25/2017 02:01 PM, David Rowley wrote:
>

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;



It's demonstrably a valid way to disable parallel queries (i.e. there's 
nothing wrong with it), because the docs say this:


   Setting this value to 0 disables parallel query execution.

>

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...



I agree it'd be very useful to have a more where we generate parallel 
plans but then prohibit starting any workers. That would detect this and 
similar issues, I think.


I'm not sure we need to invent a new magic value, though. Can we simply 
look at force_parallel_mode, and if it's 'regress' then tread 0 differently?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-26 Thread Tomas Vondra

On 03/25/2017 05:18 PM, Rushabh Lathia wrote:



On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut
> wrote:

On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia > wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting.  So each individual setting needs to be robust against
cases of related settings being nonsensical.


Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.



I think there are two issues at play, here - the first one is that we 
still produce parallel plans even with max_parallel_workers=0, and the 
second one is the crash in GatherMerge when nworkers=0.


Your patch fixes the latter (thanks for looking into it), which is 
obviously a good thing - getting 0 workers on a busy system is quite 
possible, because all the parallel workers can be already chewing on 
some other query.


But it seems a bit futile to produce the parallel plan in the first 
place, because with max_parallel_workers=0 we can't possibly get any 
parallel workers ever. I wonder why compute_parallel_worker() only looks 
at max_parallel_workers_per_gather, i.e. why shouldn't it do:


   parallel_workers = Min(parallel_workers, max_parallel_workers);

Perhaps this was discussed and is actually intentional, though.

Regarding handling this at the GUC level - I agree with Peter that 
that's not a good idea. I suppose we could deal with checking the values 
in the GUC check/assign hooks, but what we don't have is a way to undo 
the changes in all the GUCs. That is, if I do


   SET max_parallel_workers = 0;
   SET max_parallel_workers = 16;

I expect to end up with just max_parallel_workers GUC changed and 
nothing else.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Rushabh Lathia
On Sat, Mar 25, 2017 at 7:01 PM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 3/25/17 09:01, David Rowley wrote:
> > On 25 March 2017 at 23:09, Rushabh Lathia 
> wrote:
> >> Also another point which I think we should fix is, when someone set
> >> max_parallel_workers = 0, we should also set the
> >> max_parallel_workers_per_gather
> >> to zero. So that way it we can avoid generating the gather path with
> >> max_parallel_worker = 0.
> > I see that it was actually quite useful that it works the way it does.
> > If it had worked the same as max_parallel_workers_per_gather, then
> > likely Tomas would never have found this bug.
>
> Another problem is that the GUC system doesn't really support cases
> where the validity of one setting depends on the current value of
> another setting.  So each individual setting needs to be robust against
> cases of related settings being nonsensical.
>

Okay.

About the original issue reported by Tomas, I did more debugging and
found that - problem was gather_merge_clear_slots() was not returning
the clear slot when nreader is zero (means nworkers_launched = 0).
Due to the same scan was continue even all the tuple are exhausted,
and then end up with server crash at gather_merge_getnext(). In the patch
I also added the Assert into gather_merge_getnext(), about the index
should be less then the nreaders + 1 (leader).

PFA simple patch to fix the problem.

Thanks,
Rushabh Lathia
www.Enterprisedb.com
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 72f30ab..e19bace 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -431,9 +431,14 @@ gather_merge_clear_slots(GatherMergeState *gm_state)
 {
 	int			i;
 
-	for (i = 0; i < gm_state->nreaders; i++)
+	for (i = 0; i < gm_state->nreaders + 1; i++)
 	{
-		pfree(gm_state->gm_tuple_buffers[i].tuple);
+		/*
+		 * Gather merge always allow leader to participate and we don't
+		 * allocate the tuple, so no need to free the tuple for leader.
+		 */
+		if (i != gm_state->nreaders)
+			pfree(gm_state->gm_tuple_buffers[i].tuple);
 		gm_state->gm_slots[i] = ExecClearTuple(gm_state->gm_slots[i]);
 	}
 
@@ -474,6 +479,8 @@ gather_merge_getnext(GatherMergeState *gm_state)
 		 */
 		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
 
+		Assert(i < gm_state->nreaders + 1);
+
 		if (gather_merge_readnext(gm_state, i, false))
 			binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i));
 		else

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Peter Eisentraut
On 3/25/17 09:01, David Rowley wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.

Another problem is that the GUC system doesn't really support cases
where the validity of one setting depends on the current value of
another setting.  So each individual setting needs to be robust against
cases of related settings being nonsensical.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 26 March 2017 at 00:17, Amit Kapila  wrote:
> On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
>  wrote:
>> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>>> Also another point which I think we should fix is, when someone set
>>> max_parallel_workers = 0, we should also set the
>>> max_parallel_workers_per_gather
>>> to zero. So that way it we can avoid generating the gather path with
>>> max_parallel_worker = 0.
>>
>> I see that it was actually quite useful that it works the way it does.
>> If it had worked the same as max_parallel_workers_per_gather, then
>> likely Tomas would never have found this bug.
>>
>> I wondered if there's anything we can do here to better test cases
>> when no workers are able to try to ensure the parallel nodes work
>> correctly, but the more I think about it, the more I don't see wrong
>> with just using SET max_parallel_workers = 0;
>>
>> My vote would be to leave the GUC behaviour as is and add some tests
>> to select_parallel.sql which exploit setting max_parallel_workers to 0
>> for running some tests.
>>
>
> I think force_parallel_mode=regress should test the same thing.

Not really. That flag's days are surely numbered. It creates a Gather
node, the problem was with GatherMerge.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Amit Kapila
On Sat, Mar 25, 2017 at 6:31 PM, David Rowley
 wrote:
> On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
>> Also another point which I think we should fix is, when someone set
>> max_parallel_workers = 0, we should also set the
>> max_parallel_workers_per_gather
>> to zero. So that way it we can avoid generating the gather path with
>> max_parallel_worker = 0.
>
> I see that it was actually quite useful that it works the way it does.
> If it had worked the same as max_parallel_workers_per_gather, then
> likely Tomas would never have found this bug.
>
> I wondered if there's anything we can do here to better test cases
> when no workers are able to try to ensure the parallel nodes work
> correctly, but the more I think about it, the more I don't see wrong
> with just using SET max_parallel_workers = 0;
>
> My vote would be to leave the GUC behaviour as is and add some tests
> to select_parallel.sql which exploit setting max_parallel_workers to 0
> for running some tests.
>

I think force_parallel_mode=regress should test the same thing.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 25 March 2017 at 23:09, Rushabh Lathia  wrote:
> Also another point which I think we should fix is, when someone set
> max_parallel_workers = 0, we should also set the
> max_parallel_workers_per_gather
> to zero. So that way it we can avoid generating the gather path with
> max_parallel_worker = 0.

I see that it was actually quite useful that it works the way it does.
If it had worked the same as max_parallel_workers_per_gather, then
likely Tomas would never have found this bug.

I wondered if there's anything we can do here to better test cases
when no workers are able to try to ensure the parallel nodes work
correctly, but the more I think about it, the more I don't see wrong
with just using SET max_parallel_workers = 0;

My vote would be to leave the GUC behaviour as is and add some tests
to select_parallel.sql which exploit setting max_parallel_workers to 0
for running some tests.

If that's not going to fly, then unless we add something else to allow
us to reliably not get any workers, then we're closing to close the
door on being able to write automatic tests to capture this sort of
bug.

... thinks for a bit

perhaps some magic value like -1 could be used for this... unsure of
how that would be documented though...

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread Rushabh Lathia
Thanks Tomas for reporting issue and Thanks David for working
on this.

I can see the problem in GatherMerge, specially when nworkers_launched
is zero. I will look into this issue and will post a fix for the same.

Also another point which I think we should fix is, when someone set
max_parallel_workers = 0, we should also set
the max_parallel_workers_per_gather
to zero. So that way it we can avoid generating the gather path with
max_parallel_worker = 0.

Thanks,


On Sat, Mar 25, 2017 at 2:21 PM, David Rowley 
wrote:

> On 25 March 2017 at 13:10, Tomas Vondra 
> wrote:
> > while working on a patch I ran into some crashes that seem to be caused
> by
> > inconsistent handling of max_parallel_workers - queries still seem to be
> > planned with parallel plans enabled, but then crash at execution.
> >
> > The attached script reproduces the issue on a simple query, causing
> crashed
> > in GatherMerge, but I assume the issue is more general.
>
> I had a look at this and found a bunch of off by 1 errors in the code.
>
> I've attached a couple of patches, one is the minimal fix, and one
> more complete one.
>
> In the more complete one I ended up ditching the
> GatherMergeState->nreaders altogether. It was always the same as
> nworkers_launched anyway, so I saw no point in it.
> Here I've attempted to make the code a bit more understandable, to
> prevent further confusion about how many elements are in each array.
> Probably there's more that can be done here. I see GatherState has
> nreaders too, but I've not looked into the detail of if it suffers
> from the same weirdness of nreaders always matching nworkers_launched.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


-- 
Rushabh Lathia


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-25 Thread David Rowley
On 25 March 2017 at 13:10, Tomas Vondra  wrote:
> while working on a patch I ran into some crashes that seem to be caused by
> inconsistent handling of max_parallel_workers - queries still seem to be
> planned with parallel plans enabled, but then crash at execution.
>
> The attached script reproduces the issue on a simple query, causing crashed
> in GatherMerge, but I assume the issue is more general.

I had a look at this and found a bunch of off by 1 errors in the code.

I've attached a couple of patches, one is the minimal fix, and one
more complete one.

In the more complete one I ended up ditching the
GatherMergeState->nreaders altogether. It was always the same as
nworkers_launched anyway, so I saw no point in it.
Here I've attempted to make the code a bit more understandable, to
prevent further confusion about how many elements are in each array.
Probably there's more that can be done here. I see GatherState has
nreaders too, but I've not looked into the detail of if it suffers
from the same weirdness of nreaders always matching nworkers_launched.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


gather_merge_minimal_fix.patch
Description: Binary data


gather_merge_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crashes due to setting max_parallel_workers=0

2017-03-24 Thread Amit Kapila
On Sat, Mar 25, 2017 at 7:40 AM, Tomas Vondra
 wrote:
> Hi,
>
> while working on a patch I ran into some crashes that seem to be caused by
> inconsistent handling of max_parallel_workers - queries still seem to be
> planned with parallel plans enabled, but then crash at execution.
>
> The attached script reproduces the issue on a simple query, causing crashed
> in GatherMerge, but I assume the issue is more general.
>

I could see other parallel plans like Gather also picked at
max_parallel_workers=0.  I suspect multiple issues here and this needs
some investigation.  I have added this to open items list.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers