Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
>> If no objections, I'll do the additional legwork and push.

> No objections.

Done.  Out of curiosity, I pushed just the rescan-param patch to the
buildfarm to start with, to see if anything would fall over, and indeed
some things did:

* prairiedog has shown several instances of a parallel bitmap heap scan
test failing with too many rows being retrieved.  I think what's
happening there is that the leader's ExecReScanBitmapHeapScan call is
slow enough to happen that the worker(s) have already retrieved some rows
using the old shared state.  We'd determined that the equivalent case
for a plain seqscan would result in no failure because the workers would
think they had nothing to do, but this evidently isn't true for a parallel
bitmap scan.

* prairiedog and loach have both shown failures with the test case from
a2b70c89c, in which the *first* scan produces too many rows and then the
later ones are fine.  This befuddled me initially, but then I remembered
that nodeNestloop.c will unconditionally do an ExecReScan call on its
inner plan before the first ExecProcNode call.  With the modified code
from 7df2c1f8d, this results in the leader's Gather node's top child
having a pending rescan on it due to a chgParam bit.  That's serviced
when we do the first ExecProcNode call on the child, after having started
the workers.  So that's another way in which a ReScan call can happen
in the leader when workers are already running, and if the workers have
already scanned some pages then those pages will get scanned again.

So I think this is all fixed up by 41b0dd987, but evidently those patches
are not nearly as independent as I first thought.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 7:39 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
>> ! /* Make sure any existing workers are gracefully shut down */
>>   ExecShutdownGatherWorkers(node);
>
>> The above call doesn't ensure the shutdown. It just ensures that we
>> receive all messages from parallel workers.  Basically, it doesn't
>> call WaitForParallelWorkersToExit.
>
> Perhaps you should submit a patch to rename ExecShutdownGatherWorkers
> to something less misleading, then.  But the previous comment there
> was even more wrong :-(

Your (Tom's) proposed comment doesn't seem wrong to me.  Shutting down
workers consists of several stages.  We destroy the tuple queues --
which will cause them to cease generating tuples once they notice --
then we wait for them to send us an 'X' message to indicate that
they've shut down cleanly -- then they actually exit -- then the
postmaster notices and releases their slots for reuse.  After
ExecShutdownGatherWorkers has completed, the first two of those things
have finished but the third and fourth may not be quite done yet.  I'd
say it's fair to say, at that point, that the workers are gracefully
shut down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
> ! /* Make sure any existing workers are gracefully shut down */
>   ExecShutdownGatherWorkers(node);

> The above call doesn't ensure the shutdown. It just ensures that we
> receive all messages from parallel workers.  Basically, it doesn't
> call WaitForParallelWorkersToExit.

Perhaps you should submit a patch to rename ExecShutdownGatherWorkers
to something less misleading, then.  But the previous comment there
was even more wrong :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Amit Kapila
On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Aug 29, 2017 at 8:32 AM, Robert Haas  wrote:
>>> There's already ExecParallelReinitialize, which could be made to walk
>>> the nodes in addition to what it does already, but I don't understand
>>> exactly what else needs fixing.
>
>> Sure, but it is not advisable to reset state of all the nodes below
>> gather at that place, otherwise, it will be more or less like we are
>> forcing rescan of each node.  I think there we can reset the shared
>> parallel state of parallel-aware nodes (or anything related) and then
>> allow rescan to reset the master backend specific state for all nodes
>> beneath gather.
>
> Right, the idea is to make this happen separately from the "rescan"
> logic.  In general, it's a good idea for ExecReScanFoo to do as little
> as possible, so that you don't pay if a node is rescanned more than
> once before it's asked to do anything, or indeed if no rows are ever
> demanded from it at all.
>
> Attached is a WIP patch along this line.
>

The idea looks sane to me.

>  It's unfinished because
> I've not done the tedium of extending the FDW and CustomScan APIs
> to support this new type of per-node operation; but that part seems
> straightforward enough.  The core code is complete and survived
> light testing.
>

I have also played a bit with both of the patches together and didn't
found any problem.  In your second patch, I have a minor comment.

void
  ExecReScanGather(GatherState *node)
  {
! /* Make sure any existing workers are gracefully shut down */
  ExecShutdownGatherWorkers(node);

The above call doesn't ensure the shutdown. It just ensures that we
receive all messages from parallel workers.  Basically, it doesn't
call WaitForParallelWorkersToExit.


> I'm pretty happy with the results --- note in
> particular how we get rid of some very dubious coding in
> ExecReScanIndexScan and ExecReScanIndexOnlyScan.
>
> If you try the test case from a2b70c89c on this patch alone, you'll notice
> that instead of sometimes reporting too-small counts during the rescans,
> it pretty consistently reports too-large counts.  This is because we are
> now successfully resetting the shared state for the parallel seqscan, but
> we haven't done anything about the leader's HashAgg node deciding that it
> can re-use its old hashtable.  So on the first scan, the leader typically
> scans all or most of the table because of its startup time advantage, and
> saves those counts in its hashtable.  On later scans, the workers read all
> of the table while the leader decides it need do no scanning.  So we get
> counts that reflect all of the table (from the workers) plus whatever part
> of the table the leader read the first time.  So this by no means removes
> the need for my other patch.
>
> If no objections, I'll do the additional legwork and push.
>

No objections.

>  As before,
> I think we can probably get away without fixing 9.6, even though it's
> nominally got the same bug.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-29 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 29, 2017 at 8:32 AM, Robert Haas  wrote:
>> There's already ExecParallelReinitialize, which could be made to walk
>> the nodes in addition to what it does already, but I don't understand
>> exactly what else needs fixing.

> Sure, but it is not advisable to reset state of all the nodes below
> gather at that place, otherwise, it will be more or less like we are
> forcing rescan of each node.  I think there we can reset the shared
> parallel state of parallel-aware nodes (or anything related) and then
> allow rescan to reset the master backend specific state for all nodes
> beneath gather.

Right, the idea is to make this happen separately from the "rescan"
logic.  In general, it's a good idea for ExecReScanFoo to do as little
as possible, so that you don't pay if a node is rescanned more than
once before it's asked to do anything, or indeed if no rows are ever
demanded from it at all.

Attached is a WIP patch along this line.  It's unfinished because
I've not done the tedium of extending the FDW and CustomScan APIs
to support this new type of per-node operation; but that part seems
straightforward enough.  The core code is complete and survived
light testing.  I'm pretty happy with the results --- note in
particular how we get rid of some very dubious coding in
ExecReScanIndexScan and ExecReScanIndexOnlyScan.

If you try the test case from a2b70c89c on this patch alone, you'll notice
that instead of sometimes reporting too-small counts during the rescans,
it pretty consistently reports too-large counts.  This is because we are
now successfully resetting the shared state for the parallel seqscan, but
we haven't done anything about the leader's HashAgg node deciding that it
can re-use its old hashtable.  So on the first scan, the leader typically
scans all or most of the table because of its startup time advantage, and
saves those counts in its hashtable.  On later scans, the workers read all
of the table while the leader decides it need do no scanning.  So we get
counts that reflect all of the table (from the workers) plus whatever part
of the table the leader read the first time.  So this by no means removes
the need for my other patch.

If no objections, I'll do the additional legwork and push.  As before,
I think we can probably get away without fixing 9.6, even though it's
nominally got the same bug.

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ff03c68..e29c5ad 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** heap_rescan(HeapScanDesc scan,
*** 1531,1551 
  	 * reinitialize scan descriptor
  	 */
  	initscan(scan, key, true);
- 
- 	/*
- 	 * reset parallel scan, if present
- 	 */
- 	if (scan->rs_parallel != NULL)
- 	{
- 		ParallelHeapScanDesc parallel_scan;
- 
- 		/*
- 		 * Caller is responsible for making sure that all workers have
- 		 * finished the scan before calling this.
- 		 */
- 		parallel_scan = scan->rs_parallel;
- 		pg_atomic_write_u64(_scan->phs_nallocated, 0);
- 	}
  }
  
  /* 
--- 1531,1536 
*** heap_parallelscan_initialize(ParallelHea
*** 1643,1648 
--- 1628,1646 
  }
  
  /* 
+  *		heap_parallelscan_reinitialize - reset a parallel scan
+  *
+  *		Call this in the leader process.  Caller is responsible for
+  *		making sure that all workers have finished the scan beforehand.
+  * 
+  */
+ void
+ heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan)
+ {
+ 	pg_atomic_write_u64(_scan->phs_nallocated, 0);
+ }
+ 
+ /* 
   *		heap_beginscan_parallel - join a parallel scan
   *
   *		Caller must hold a suitable lock on the correct relation.
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index ce47f1d..d8cdb0e 100644
*** a/src/backend/executor/execParallel.c
--- b/src/backend/executor/execParallel.c
*** static bool ExecParallelInitializeDSM(Pl
*** 109,114 
--- 109,116 
  		  ExecParallelInitializeDSMContext *d);
  static shm_mq_handle **ExecParallelSetupTupleQueues(ParallelContext *pcxt,
  			 bool reinitialize);
+ static bool ExecParallelReInitializeDSM(PlanState *planstate,
+ 			ParallelContext *pcxt);
  static bool ExecParallelRetrieveInstrumentation(PlanState *planstate,
  	SharedExecutorInstrumentation *instrumentation);
  
*** ExecParallelSetupTupleQueues(ParallelCon
*** 365,382 
  }
  
  /*
-  * Re-initialize the parallel executor info such that it can be reused by
-  * workers.
-  */
- void
- ExecParallelReinitialize(ParallelExecutorInfo *pei)
- {
- 	ReinitializeParallelDSM(pei->pcxt);
- 	pei->tqueue = ExecParallelSetupTupleQueues(pei->pcxt, true);
- 	pei->finished = false;
- }
- 
- /*
   * Sets up the required infrastructure for backend workers to perform
   * execution and 

Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-29 Thread Amit Kapila
On Tue, Aug 29, 2017 at 8:32 AM, Robert Haas  wrote:
> On Mon, Aug 28, 2017 at 10:17 PM, Tom Lane  wrote:
>> In the meantime, I think what we should do is commit the bug fix more or
>> less as I have it, and then work on Amit's concern about losing parallel
>> efficiency by separating the resetting of shared parallel-scan state
>> into a new plan tree traversal that's done before launching new worker
>> processes.
>>

Sounds reasonable plan to me.

>>  The only real alternative is to lobotomize the existing rescan
>> optimizations, and that seems like a really poor choice from here.
>
> There's already ExecParallelReinitialize, which could be made to walk
> the nodes in addition to what it does already, but I don't understand
> exactly what else needs fixing.
>

Sure, but it is not advisable to reset state of all the nodes below
gather at that place, otherwise, it will be more or less like we are
forcing rescan of each node.  I think there we can reset the shared
parallel state of parallel-aware nodes (or anything related) and then
allow rescan to reset the master backend specific state for all nodes
beneath gather.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Robert Haas
On Mon, Aug 28, 2017 at 10:17 PM, Tom Lane  wrote:
> In the meantime, I think what we should do is commit the bug fix more or
> less as I have it, and then work on Amit's concern about losing parallel
> efficiency by separating the resetting of shared parallel-scan state
> into a new plan tree traversal that's done before launching new worker
> processes.  The only real alternative is to lobotomize the existing rescan
> optimizations, and that seems like a really poor choice from here.

There's already ExecParallelReinitialize, which could be made to walk
the nodes in addition to what it does already, but I don't understand
exactly what else needs fixing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 28, 2017 at 6:35 PM, Tom Lane  wrote:
>> Maybe parallel_aware should have more than two values, depending
>> on whether the result of the node is context-dependent or not.

> It seems likely the whole concept of parallel_aware is only only a
> zero-order approximation to what we really want.

Yeah, I agree --- but it's also clear that we don't yet know what it
should be.  We'll have to work that out as we accrete more functionality.

In the meantime, I think what we should do is commit the bug fix more or
less as I have it, and then work on Amit's concern about losing parallel
efficiency by separating the resetting of shared parallel-scan state
into a new plan tree traversal that's done before launching new worker
processes.  The only real alternative is to lobotomize the existing rescan
optimizations, and that seems like a really poor choice from here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Robert Haas
On Mon, Aug 28, 2017 at 6:35 PM, Tom Lane  wrote:
> but probably we should think of a more arm's-length way to do it.
> Maybe parallel_aware should have more than two values, depending
> on whether the result of the node is context-dependent or not.

My original intent for the parallel_aware flag was for it to signal
whether the plan node was going to do something functionally different
when in parallel mode.  For scans, that's come to mean "partition the
input among the workers", and there doesn't seem to be any other
sensible meaning.  I don't have a good idea what it's going to mean
for non-scan nodes yet.  Parallel Hash will be the first non-parallel
aware scan node, and it uses it to mean that the hash table in dynamic
shared memory, so that the inner side can be partial (which is
otherwise not possible).  I'm guessing that is going to be a common
meaning for nodes that store stuff - it's easy to imagine Parallel
Materialize, Parallel Sort, Parallel HashAggregate with similar
semantics.  There's also a proposed patch for Parallel Append where it
signals that DSM is being used to coordinate task scheduling and load
balancing.

It seems likely the whole concept of parallel_aware is only only a
zero-order approximation to what we really want.  This bug is, IMHO,
the first really tangible evidence of the concept creaking around the
edges, but I've kind of had a feeling for a while that it wasn't
likely to be problem-free.  I'm still not sure exactly what the right
answer will turn out to be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 28, 2017 at 3:00 PM, Tom Lane  wrote:
>> Count the "that"s (you're failing to notice the next line).

> OK, true.  But "Academic literature" -> "The academic literature" is
> just second-guessing, I think.

No, it was more to avoid reflowing the paragraph (or leaving a weirdly
short line).

> There should never be a parallel_aware node that's not beneath a
> Gather or Gather Merge; I don't know what the meaning of such a plan
> would be, so I think we're safe against such a thing appearing in the
> future.  What I'm unclear about is what happens with nodes that aren't
> directly in the chain between the Gather and the parallel-aware node.

Nothing.  The parallel-aware node(s) get marked as dependent on the rescan
parameter, and then that marking bubbles up to their ancestor nodes (up
to the Gather).  Nodes that are not ancestral to any parallel-aware node
are unchanged.

> Now consider Parallel Hash
> (not yet committed), where we might get this:

> Something
> -> Gather
>   -> Merge Join
> -> Sort
>   -> Parallel Seq Scan on a
> -> Hash Join
>   -> Index Scan on b
>   -> Parallel Hash
> -> Parallel Seq Scan on c

> Now what?  We clearly still need to force a re-sort of a, but what
> about the shared hash table built on c?  If we've still got that hash
> table and it was a single-batch join, there's probably no need to
> rescan it even though both the Parallel Hash node and the Parallel Seq
> Scan beneath it are parallel-aware.  In this case all workers
> collaborated to build a shared hash table covering all rows from c; if
> we've still got that, it's still good.  In fact, even the workers can
> reuse that hash table even though, for them, it'll not really be a
> re-scan at all.

Well, I'd say that's something for the parallel hash patch to resolve.
Yes, if the contents of the hash table are expected to be the same
regardless of how many workers were involved, then we shouldn't need
to rebuild it after a rescan of the Gather.  That would mean not marking
either Parallel Hash or its descendant Parallel Seq Scan as dependent
on the Gather's rescan param.  That's not terribly hard to mechanize
within the structure of this patch: just ignore the param at/below
the ParallelHash.  Cowboy coding would be, perhaps,

if (plan->parallel_aware)
{
if (gather_param < 0)
elog(ERROR, "parallel-aware plan node is not below a Gather");
+   if (IsA(plan, Hash))
+   gather_param = -1;
+   else
context.paramids =
bms_add_member(context.paramids, gather_param);
}

but probably we should think of a more arm's-length way to do it.
Maybe parallel_aware should have more than two values, depending
on whether the result of the node is context-dependent or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Robert Haas
On Mon, Aug 28, 2017 at 3:00 PM, Tom Lane  wrote:
>> That sentence isn't wrong as written.
>
> Count the "that"s (you're failing to notice the next line).

OK, true.  But "Academic literature" -> "The academic literature" is
just second-guessing, I think.

>> I don't really understand the part about depending on a parallel-aware
>> node.  I mean, there should always be one, except in the
>> single-copy-Gather case, but why is it right to depend on that rather
>> than anything else?  What happens when the Parallel Hash patch goes in
>> and we have multiple parallel-aware scan nodes (plus a parallel-aware
>> Hash node) under the same Gather?
>
> Well, that's what I'm asking.  AFAICS we only really need the scan node(s)
> to be marked as depending on the Gather's rescan parameter.  It would not,
> however, hurt anything for nodes above them to be so marked as well ---
> and even if we didn't explicitly mark them, those nodes would end up
> depending on the parameter anyway because of the way that parameter
> dependency propagation works.  I think the question boils down to whether
> a "parallel_aware" node would ever not be underneath a related Gather.

There should never be a parallel_aware node that's not beneath a
Gather or Gather Merge; I don't know what the meaning of such a plan
would be, so I think we're safe against such a thing appearing in the
future.  What I'm unclear about is what happens with nodes that aren't
directly in the chain between the Gather and the parallel-aware node.
For instance:

Something
-> Gather
  -> Merge Join
-> Sort
  -> Parallel Seq Scan on a
-> Merge Join
  -> Sort
-> Seq Scan on b
  -> Sort
-> Seq Scan on c

If the Gather gets rescanned, is it OK to force a re-sort of a but not
of b or c?  Hmm, maybe so.  The workers are going to have to do the
sorts of b and c since any workers from a previous scan are GONE, but
if the leader's done that work, it's still good.  Similarly:

Something
-> Gather
  -> Merge Join
-> Sort
  -> Parallel Seq Scan on a
-> Hash Join
  -> Index Scan on b
  -> Hash
-> Seq Scan on c

If the leader's got an existing hash table built on c, it can reuse
it.  The workers will need to build one.  Now consider Parallel Hash
(not yet committed), where we might get this:

Something
-> Gather
  -> Merge Join
-> Sort
  -> Parallel Seq Scan on a
-> Hash Join
  -> Index Scan on b
  -> Parallel Hash
-> Parallel Seq Scan on c

Now what?  We clearly still need to force a re-sort of a, but what
about the shared hash table built on c?  If we've still got that hash
table and it was a single-batch join, there's probably no need to
rescan it even though both the Parallel Hash node and the Parallel Seq
Scan beneath it are parallel-aware.  In this case all workers
collaborated to build a shared hash table covering all rows from c; if
we've still got that, it's still good.  In fact, even the workers can
reuse that hash table even though, for them, it'll not really be a
re-scan at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Robert Haas  writes:
> On Mon, Aug 28, 2017 at 10:47 AM, Tom Lane  wrote:
> - fuller description.  Academic literature on parallel query suggests that
> + fuller description.  The academic literature on parallel query suggests

> That sentence isn't wrong as written.

Count the "that"s (you're failing to notice the next line).

> I don't really understand the part about depending on a parallel-aware
> node.  I mean, there should always be one, except in the
> single-copy-Gather case, but why is it right to depend on that rather
> than anything else?  What happens when the Parallel Hash patch goes in
> and we have multiple parallel-aware scan nodes (plus a parallel-aware
> Hash node) under the same Gather?

Well, that's what I'm asking.  AFAICS we only really need the scan node(s)
to be marked as depending on the Gather's rescan parameter.  It would not,
however, hurt anything for nodes above them to be so marked as well ---
and even if we didn't explicitly mark them, those nodes would end up
depending on the parameter anyway because of the way that parameter
dependency propagation works.  I think the question boils down to whether
a "parallel_aware" node would ever not be underneath a related Gather.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Robert Haas
On Mon, Aug 28, 2017 at 10:47 AM, Tom Lane  wrote:
> That seems like an unacceptably fragile assumption.  Even if it happens to
> be true today, we would need to fix it sooner or later.  (And I kinda
> suspect it's possible to break it today, anyway.  Treating PARAM_EXEC
> Params as parallel-restricted seems to lock out the easiest cases, but we
> have param slots that don't correspond to any Param node, eg for recursive
> union worktables.  replace_nestloop_params is also a source of PARAM_EXEC
> Params that won't be detected during is_parallel_safe() tests, because it
> happens later.)

Those particular cases are, I think, handled.  The CTE case is handled
by considering CTE scans as parallel-restricted, and the nestloop case
is handled by insisting that all partial paths must be
unparameterized.  You can join a partial path to a parameterized
non-partial path to make a new partial path, but neither the original
partial path nor the resulting one can itself be parameterized.

- fuller description.  Academic literature on parallel query suggests that
+ fuller description.  The academic literature on parallel query suggests

That sentence isn't wrong as written.

I don't really understand the part about depending on a parallel-aware
node.  I mean, there should always be one, except in the
single-copy-Gather case, but why is it right to depend on that rather
than anything else?  What happens when the Parallel Hash patch goes in
and we have multiple parallel-aware scan nodes (plus a parallel-aware
Hash node) under the same Gather?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Aug 28, 2017 at 6:34 PM, Tom Lane  wrote:
>> If what you're complaining about is that I put back the "if
>> (outerPlan->chgParam == NULL)" test to allow postponement of the
>> recursive ExecReScan call, I'm afraid that it's mere wishful
>> thinking that omitting that test in nodeGather did anything.

> Previously outerPlan->chgParam will be NULL, so I think rescan's won't
> be postponed.

That seems like an unacceptably fragile assumption.  Even if it happens to
be true today, we would need to fix it sooner or later.  (And I kinda
suspect it's possible to break it today, anyway.  Treating PARAM_EXEC
Params as parallel-restricted seems to lock out the easiest cases, but we
have param slots that don't correspond to any Param node, eg for recursive
union worktables.  replace_nestloop_params is also a source of PARAM_EXEC
Params that won't be detected during is_parallel_safe() tests, because it
happens later.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Amit Kapila
On Mon, Aug 28, 2017 at 6:34 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Mon, Aug 28, 2017 at 6:01 PM, Tom Lane  wrote:
>>> Um, what's different about that than before?
>
>> Earlier, we perform the rescan of all the nodes before ExecProcNode,
>> so workers will always start (restart) after the scan descriptor is
>> initialized.
>
> If what you're complaining about is that I put back the "if
> (outerPlan->chgParam == NULL)" test to allow postponement of the
> recursive ExecReScan call, I'm afraid that it's mere wishful
> thinking that omitting that test in nodeGather did anything.
> The nodes underneath the Gather are likely to do the same thing,
> so that the parallel table scan node itself is going to get a
> postponed rescan call anyway.  See e.g. ExecReScanNestLoop().
>

Previously outerPlan->chgParam will be NULL, so I think rescan's won't
be postponed.  IIRC, I have debugged it during the development of this
code that rescans were not postponed.  I don't deny that for some
cases it might get delayed but for simple cases, it was done
immediately.  I agree that in general, the proposed fix is better than
having nothing, but not sure if we need it for Gather as well
considering we are not able to demonstrate any case.

> I see your point that there's inadequate interlocking between resetting
> the parallel scan's shared state and starting a fresh set of workers,
> but that's a pre-existing bug.  In practice I doubt it makes any
> difference, because according to my testing the leader will generally
> reach the table scan long before any workers do.  It'd be nice to do
> better though.
>

Agreed.  BTW, I have mentioned above that we can avoid skipping
optimization in rescan path if we are in parallel mode.  I think that
will not be as elegant a solution as your patch, but it won't have
this problem.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Aug 28, 2017 at 6:01 PM, Tom Lane  wrote:
>> Um, what's different about that than before?

> Earlier, we perform the rescan of all the nodes before ExecProcNode,
> so workers will always start (restart) after the scan descriptor is
> initialized.

If what you're complaining about is that I put back the "if
(outerPlan->chgParam == NULL)" test to allow postponement of the
recursive ExecReScan call, I'm afraid that it's mere wishful
thinking that omitting that test in nodeGather did anything.
The nodes underneath the Gather are likely to do the same thing,
so that the parallel table scan node itself is going to get a
postponed rescan call anyway.  See e.g. ExecReScanNestLoop().

I see your point that there's inadequate interlocking between resetting
the parallel scan's shared state and starting a fresh set of workers,
but that's a pre-existing bug.  In practice I doubt it makes any
difference, because according to my testing the leader will generally
reach the table scan long before any workers do.  It'd be nice to do
better though.

I'm inclined to think that what's needed is to move the reset of the
shared state into a new "ExecParallelReInitializeDSM" plan tree walk,
which would have to occur before we start the new set of workers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Amit Kapila
On Mon, Aug 28, 2017 at 6:01 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> With this change, it is quite possible that during rescans workers
>> will not do any work.
>
> Um, what's different about that than before?
>

Earlier, we perform the rescan of all the nodes before ExecProcNode,
so workers will always start (restart) after the scan descriptor is
initialized.  Now, as evident in the discussion in this thread that
was not the right thing for gather merge as some of the nodes like
Sort does some optimization due to which rescan for the lower nodes
will never be called.  So, we need to ensure in some way that we don't
skip rescanning in such nodes and one way to achieve that is what you
have done in the patch, but it seems to have some side effect.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Tom Lane
Amit Kapila  writes:
> With this change, it is quite possible that during rescans workers
> will not do any work.

Um, what's different about that than before?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-28 Thread Amit Kapila
On Mon, Aug 28, 2017 at 1:59 AM, Tom Lane  wrote:
> I wrote:
>> I think that the correct fix probably involves marking each parallel scan
>> plan node as dependent on a pseudo executor parameter, which the parent
>> Gather or GatherMerge node would flag as being changed on each rescan.
>> This would cue the plan layers in between that they cannot optimize on the
>> assumption that the leader's instance of the parallel scan will produce
>> exactly the same rows as it did last time, even when "nothing else
>> changed".  The "wtParam" pseudo parameter that's used for communication
>> between RecursiveUnion and its descendant WorkTableScan node is a good
>> model for what needs to happen.
>
> Here is a draft patch for this.

! /*
! * Set child node's chgParam to tell it that the next scan might deliver a
! * different set of rows within the leader process.  (The overall rowset
! * shouldn't change, but the leader process's subset might; hence nodes
! * between here and the parallel table scan node mustn't optimize on the
! * assumption of an unchanging rowset.)
! */
! if (gm->rescan_param >= 0)
! outerPlan->chgParam = bms_add_member(outerPlan->chgParam,
! gm->rescan_param);
!
!
! /*
! * if chgParam of subnode is not null then plan will be re-scanned by
! * first ExecProcNode.
! */
! if (outerPlan->chgParam == NULL)
! ExecReScan(outerPlan);


With this change, it is quite possible that during rescans workers
will not do any work.  I think this will allow workers to launch
before rescan (for sequence scan) can reset the scan descriptor in the
leader which means that workers will still see the old value and
assume that the scan is finished and come out without doing any work.
Now, this won't produce wrong results because the leader will scan the
whole relation by itself in such a case, but it might be inefficient.

  It's a bit different from wtParam in
> that the special parameter isn't allocated until createplan.c time,
> so that we don't eat a parameter slot if we end up choosing a non-parallel
> plan; but otherwise things are comparable.
>
> I could use some feedback on whether this is marking dependent child nodes
> sanely.  As written, any plan node that's marked parallel_aware is assumed
> to need a dependency on the parent Gather or GatherMerge's rescan param
> --- and the planner will now bitch if a parallel_aware plan node is not
> under any such Gather.  Is this reasonable?

I think so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-27 Thread Tom Lane
I wrote:
> I think that the correct fix probably involves marking each parallel scan
> plan node as dependent on a pseudo executor parameter, which the parent
> Gather or GatherMerge node would flag as being changed on each rescan.
> This would cue the plan layers in between that they cannot optimize on the
> assumption that the leader's instance of the parallel scan will produce
> exactly the same rows as it did last time, even when "nothing else
> changed".  The "wtParam" pseudo parameter that's used for communication
> between RecursiveUnion and its descendant WorkTableScan node is a good
> model for what needs to happen.

Here is a draft patch for this.  It's a bit different from wtParam in
that the special parameter isn't allocated until createplan.c time,
so that we don't eat a parameter slot if we end up choosing a non-parallel
plan; but otherwise things are comparable.

I could use some feedback on whether this is marking dependent child nodes
sanely.  As written, any plan node that's marked parallel_aware is assumed
to need a dependency on the parent Gather or GatherMerge's rescan param
--- and the planner will now bitch if a parallel_aware plan node is not
under any such Gather.  Is this reasonable?  I do not see any
documentation that defines the parallel_aware field with enough clarity
to be very sure about it.

I included the heap_parallelscan_nextpage hack I'm using to make the
original failure reproducible, but that hunk is not meant for commit.
Also, the regression test case is the same as in a2b70c89c.

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ff03c68..c478897 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** heap_parallelscan_nextpage(HeapScanDesc 
*** 1763,1768 
--- 1763,1770 
  			ss_report_location(scan->rs_rd, parallel_scan->phs_startblock);
  	}
  
+ 	pg_usleep(random()/100);
+ 
  	return page;
  }
  
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index e8d94ee..3aa819f 100644
*** a/src/backend/executor/nodeGather.c
--- b/src/backend/executor/nodeGather.c
*** ExecShutdownGather(GatherState *node)
*** 430,435 
--- 430,438 
  void
  ExecReScanGather(GatherState *node)
  {
+ 	Gather	   *gather = (Gather *) node->ps.plan;
+ 	PlanState  *outerPlan = outerPlanState(node);
+ 
  	/*
  	 * Re-initialize the parallel workers to perform rescan of relation. We
  	 * want to gracefully shutdown all the workers so that they should be able
*** ExecReScanGather(GatherState *node)
*** 443,447 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	ExecReScan(node->ps.lefttree);
  }
--- 446,467 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	/*
! 	 * Set child node's chgParam to tell it that the next scan might deliver a
! 	 * different set of rows within the leader process.  (The overall rowset
! 	 * shouldn't change, but the leader process's subset might; hence nodes
! 	 * between here and the parallel table scan node mustn't optimize on the
! 	 * assumption of an unchanging rowset.)
! 	 */
! 	if (gather->rescan_param >= 0)
! 		outerPlan->chgParam = bms_add_member(outerPlan->chgParam,
! 			 gather->rescan_param);
! 
! 
! 	/*
! 	 * if chgParam of subnode is not null then plan will be re-scanned by
! 	 * first ExecProcNode.
! 	 */
! 	if (outerPlan->chgParam == NULL)
! 		ExecReScan(outerPlan);
  }
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 64c6239..e8c70df 100644
*** a/src/backend/executor/nodeGatherMerge.c
--- b/src/backend/executor/nodeGatherMerge.c
*** ExecShutdownGatherMergeWorkers(GatherMer
*** 325,330 
--- 325,333 
  void
  ExecReScanGatherMerge(GatherMergeState *node)
  {
+ 	GatherMerge *gm = (GatherMerge *) node->ps.plan;
+ 	PlanState  *outerPlan = outerPlanState(node);
+ 
  	/*
  	 * Re-initialize the parallel workers to perform rescan of relation. We
  	 * want to gracefully shutdown all the workers so that they should be able
*** ExecReScanGatherMerge(GatherMergeState *
*** 339,345 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	ExecReScan(node->ps.lefttree);
  }
  
  /*
--- 342,365 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	/*
! 	 * Set child node's chgParam to tell it that the next scan might deliver a
! 	 * different set of rows within the leader process.  (The overall rowset
! 	 * shouldn't change, but the leader process's subset might; hence nodes
! 	 * between here and the parallel table scan node mustn't optimize on the
! 	 * assumption of an unchanging rowset.)
! 	 */
! 	if (gm->rescan_param >= 0)
! 		outerPlan->chgParam = bms_add_member(outerPlan->chgParam,
! 			 gm->rescan_param);
! 
! 
! 	/*
! 	 * if chgParam of subnode is not null then plan will be re-scanned by
! 	 * first 

Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-27 Thread Tom Lane
I wrote:
> 4. The fact that the test succeeds on many machines implies that the
> leader process is usually doing *all* of the work.  This is in itself not
> very good.  Even on the machines where it fails, the fact that the tuple
> counts are usually a pretty large fraction of the expected values
> indicates that the leader usually did most of the work.  We need to take
> a closer look at why that is.

I've spent some time poking into this, and it seems the bottom line is
that the time needed to launch a parallel worker and get it started on
running the query plan is comparable to the time needed to scan all 1
rows of tenk1.  Maybe this isn't surprising, I dunno; but it doesn't give
me a warm feeling about how much exercise the parallel scan machinery is
really getting in select_parallel.sql.  Not sure what to do about that ---
I don't really want to make the regression test case large enough (and
slow enough) to provide a more realistic scenario.

In the meantime, I've been able to make the failure reproducible by the
expedient of sticking "pg_usleep(1000)" into heap_parallelscan_nextpage(),
thus slowing down the leader's scan enough so that the workers can get
started.

> However, the bottom line here is that parallel scan is completely broken
> for rescans, and it's not (solely) the fault of nodeGatherMerge; rather,
> the issue is that nobody bothered to wire up parallelism to the rescan
> parameterization mechanism.  I imagine that related bugs can be
> demonstrated in 9.6 with little effort.

I've so far been unable to break it for cases involving only Gather.
The issue is triggered, basically, by having a Sort or HashAgg node
below Gather[Merge], since either of those can decide that they don't
need to rescan their child.  While it's not terribly hard to get the
planner to make such plans, you always end up with another Sort or
HashAgg above the Gather, and that masks the problem because the upper
node makes the same decision that it needn't rescan its child, protecting
the Gather from being run more than once.  The particular plan shape that
a2b70c89c used,

   ->  Finalize GroupAggregate
 ->  Gather Merge
   ->  Partial GroupAggregate
 ->  Sort
   ->  Parallel Seq Scan on tenk1

does exhibit the problem, but it requires GatherMerge so that no extra
sort is needed above the parallel subplan.

This may mean that we don't need to risk back-patching the fix into 9.6.
I'm not totally convinced of that yet, but I can't show that it's needed
given 9.6's limited support for parallelism.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-18 Thread Amit Kapila
On Fri, Aug 18, 2017 at 3:50 AM, Tom Lane  wrote:
> I wrote:
>> Ah-hah, I see my dromedary box is one of the ones failing, so I'll
>> have a look there.  I can't reproduce it on my other machines.
>
> OK, so this is a whole lot more broken than I thought :-(.
>
> Bear in mind that the plan for this (omitting uninteresting detail) is
>
>  Nested Loop Left Join
>->  Values Scan on "*VALUES*"
>->  Finalize GroupAggregate
>  ->  Gather Merge
>->  Partial GroupAggregate
>  ->  Sort
>->  Parallel Seq Scan on tenk1
>
> What seems to be happening is that:
>
> 1. On the first pass, the parallel seqscan work gets doled out to several
> workers, plus the leader, as expected.
>
> 2. When the nestloop rescans its right input, ExecReScanGatherMerge
> supposes that this is good enough to handle rescanning its subnodes:
>
> ExecReScan(node->ps.lefttree);
>
> Leaving aside the question of why that doesn't look like nearly every
> other child rescan call, what happens is that that invokes ExecReScanAgg,
> which does the more usual thing:
>
> if (outerPlan->chgParam == NULL)
> ExecReScan(outerPlan);
>
> and so that invokes ExecReScanSort, and then behold what ExecReScanSort
> thinks it can optimize away:
>
>  * If subnode is to be rescanned then we forget previous sort results; we
>  * have to re-read the subplan and re-sort.  Also must re-sort if the
>  * bounded-sort parameters changed or we didn't select randomAccess.
>  *
>  * Otherwise we can just rewind and rescan the sorted output.
>
> So we never get to ExecReScanSeqScan, and thus not to heap_rescan,
> with the effect that parallel_scan->phs_nallocated never gets reset.
>
> 3. On the next pass, we fire up all the workers as expected, but they all
> perceive phs_nallocated >= rs_nblocks and conclude they have nothing to
> do.  Meanwhile, in the leader, nodeSort just re-emits the sorted data it
> had last time.  Net effect is that the GatherMerge node returns only the
> fraction of the data that was scanned by the leader in the first pass.
>
> 4. The fact that the test succeeds on many machines implies that the
> leader process is usually doing *all* of the work.  This is in itself not
> very good.  Even on the machines where it fails, the fact that the tuple
> counts are usually a pretty large fraction of the expected values
> indicates that the leader usually did most of the work.  We need to take
> a closer look at why that is.
>
> However, the bottom line here is that parallel scan is completely broken
> for rescans, and it's not (solely) the fault of nodeGatherMerge; rather,
> the issue is that nobody bothered to wire up parallelism to the rescan
> parameterization mechanism.
>

I think we don't generate parallel plans for parameterized paths, so I
am not sure whether any work is required in that area.

>  I imagine that related bugs can be
> demonstrated in 9.6 with little effort.
>
> I think that the correct fix probably involves marking each parallel scan
> plan node as dependent on a pseudo executor parameter, which the parent
> Gather or GatherMerge node would flag as being changed on each rescan.
> This would cue the plan layers in between that they cannot optimize on the
> assumption that the leader's instance of the parallel scan will produce
> exactly the same rows as it did last time, even when "nothing else
> changed".  The "wtParam" pseudo parameter that's used for communication
> between RecursiveUnion and its descendant WorkTableScan node is a good
> model for what needs to happen.
>

Yeah, that seems like a good idea.  I think another way could be to
*not* optimize rescanning when we are in parallel mode
(IsInParallelMode()), that might be restrictive as compared to what
you are suggesting, but will be somewhat simpler.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
I wrote:
> Ah-hah, I see my dromedary box is one of the ones failing, so I'll
> have a look there.  I can't reproduce it on my other machines.

OK, so this is a whole lot more broken than I thought :-(.

Bear in mind that the plan for this (omitting uninteresting detail) is

 Nested Loop Left Join
   ->  Values Scan on "*VALUES*"
   ->  Finalize GroupAggregate
 ->  Gather Merge
   ->  Partial GroupAggregate
 ->  Sort
   ->  Parallel Seq Scan on tenk1

What seems to be happening is that:

1. On the first pass, the parallel seqscan work gets doled out to several
workers, plus the leader, as expected.

2. When the nestloop rescans its right input, ExecReScanGatherMerge
supposes that this is good enough to handle rescanning its subnodes:

ExecReScan(node->ps.lefttree);

Leaving aside the question of why that doesn't look like nearly every
other child rescan call, what happens is that that invokes ExecReScanAgg,
which does the more usual thing:

if (outerPlan->chgParam == NULL)
ExecReScan(outerPlan);

and so that invokes ExecReScanSort, and then behold what ExecReScanSort
thinks it can optimize away:

 * If subnode is to be rescanned then we forget previous sort results; we
 * have to re-read the subplan and re-sort.  Also must re-sort if the
 * bounded-sort parameters changed or we didn't select randomAccess.
 *
 * Otherwise we can just rewind and rescan the sorted output.

So we never get to ExecReScanSeqScan, and thus not to heap_rescan,
with the effect that parallel_scan->phs_nallocated never gets reset.

3. On the next pass, we fire up all the workers as expected, but they all
perceive phs_nallocated >= rs_nblocks and conclude they have nothing to
do.  Meanwhile, in the leader, nodeSort just re-emits the sorted data it
had last time.  Net effect is that the GatherMerge node returns only the
fraction of the data that was scanned by the leader in the first pass.

4. The fact that the test succeeds on many machines implies that the
leader process is usually doing *all* of the work.  This is in itself not
very good.  Even on the machines where it fails, the fact that the tuple
counts are usually a pretty large fraction of the expected values
indicates that the leader usually did most of the work.  We need to take
a closer look at why that is.

However, the bottom line here is that parallel scan is completely broken
for rescans, and it's not (solely) the fault of nodeGatherMerge; rather,
the issue is that nobody bothered to wire up parallelism to the rescan
parameterization mechanism.  I imagine that related bugs can be
demonstrated in 9.6 with little effort.

I think that the correct fix probably involves marking each parallel scan
plan node as dependent on a pseudo executor parameter, which the parent
Gather or GatherMerge node would flag as being changed on each rescan.
This would cue the plan layers in between that they cannot optimize on the
assumption that the leader's instance of the parallel scan will produce
exactly the same rows as it did last time, even when "nothing else
changed".  The "wtParam" pseudo parameter that's used for communication
between RecursiveUnion and its descendant WorkTableScan node is a good
model for what needs to happen.

A possibly-simpler fix would be to abandon the idea that the leader
should do any of the work, but I imagine that will be unpopular.

As I mentioned, I'm outta here for the next week.  I'd be willing to
work on, or review, a patch for this when I get back.

regards, tom lane

PS: while I was trying to decipher this, I found three or four other
minor bugs or near-bugs in nodeGatherMerge.c.  But none of them seem to be
triggering in this example.  I plan to do a round of code-review-ish fixes
there when I get back.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
Robert Haas  writes:
> On Thu, Aug 17, 2017 at 2:06 PM, Tom Lane  wrote:
>> Nope, spoke too soon.  See buildfarm.

> Whoa, that's not good.

Ah-hah, I see my dromedary box is one of the ones failing, so I'll
have a look there.  I can't reproduce it on my other machines.

I'm a bit suspicious that it's got something to do with getting
a different number of workers during restart.  Whether that's
the issue or not, though, it sure seems like a rescan leaks an
unpleasantly large amount of memory.  I wonder if we shouldn't
refactor this so that the per-reader structs can be reused.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Robert Haas
On Thu, Aug 17, 2017 at 2:06 PM, Tom Lane  wrote:
> I wrote:
>> Pushed, with minor tidying of the test case.  I think we can now
>> close this open item.
>
> Nope, spoke too soon.  See buildfarm.
>
> (Man, am I glad I insisted on a test case.)

Whoa, that's not good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
I wrote:
> Pushed, with minor tidying of the test case.  I think we can now
> close this open item.

Nope, spoke too soon.  See buildfarm.

(Man, am I glad I insisted on a test case.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
Amit Kapila  writes:
>> This will be fixed by the patch [1] (execrescan_gathermerge_v2.patch)
>> I posted sometime back.  The test case is slightly different, but may
>> I can re post the patch with your test case.

> I have kept the fix as it is but changed the test to match your test.

Pushed, with minor tidying of the test case.  I think we can now
close this open item.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
Amit Kapila  writes:
> I think the another patch posted above to add a new guc for
> enable_gather is still relevant and important.

FWIW, I'm pretty much -1 on that.  I don't see that it has any
real-world use; somebody who wants to turn that off would likely
really want to turn off parallelism altogether.  We have
mucho knobs in that area already.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Amit Kapila
On Thu, Aug 17, 2017 at 8:08 PM, Amit Kapila  wrote:
> On Thu, Aug 17, 2017 at 7:49 PM, Tom Lane  wrote:
>> Amit Kapila  writes:
>>> On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane  wrote:
 I should think it wouldn't be that expensive to create a test
 case, if you already have test cases that invoke GatherMerge.
 Adding a right join against a VALUES clause with a small number of
 entries, and a non-mergeable/hashable join clause, ought to do it.
>>
>>> I have done some experiments based on this idea to generate a test,
>>> but I think it is not as straightforward as it appears.
>>
>> I did this (the first 4 SETs duplicate what's already used in
>> select_parallel.sql):
>>
>> regression=# set parallel_setup_cost=0;
>> SET
>> regression=# set parallel_tuple_cost=0;
>> SET
>> regression=# set min_parallel_table_scan_size=0;
>> SET
>> regression=# set max_parallel_workers_per_gather=4;
>> SET
>> regression=# set enable_hashagg TO 0;
>> SET
>> regression=# set enable_material TO 0;
>> SET
>> regression=# explain select * from (select string4, count((unique2))
>> from tenk1 group by string4 order by string4) ss right join
>> (values(1),(2)) v(x) on true;
>> QUERY PLAN
>> --
>>  Nested Loop Left Join  (cost=524.15..1086.77 rows=8 width=76)
>>->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
>>->  Finalize GroupAggregate  (cost=524.15..543.29 rows=4 width=72)
>>  Group Key: tenk1.string4
>>  ->  Gather Merge  (cost=524.15..543.17 rows=16 width=72)
>>Workers Planned: 4
>>->  Partial GroupAggregate  (cost=524.10..542.89 rows=4 
>> width=72)
>>  Group Key: tenk1.string4
>>  ->  Sort  (cost=524.10..530.35 rows=2500 width=68)
>>Sort Key: tenk1.string4
>>->  Parallel Seq Scan on tenk1  
>> (cost=0.00..383.00 rows=2500 width=68)
>> (11 rows)
>>
>> regression=# select * from (select string4, count((unique2))
>> from tenk1 group by string4 order by string4) ss right join
>> (values(1),(2)) v(x) on true;
>> server closed the connection unexpectedly
>>
>>
>> So, not only is it not that hard to reach ExecReScanGatherMerge,
>> but there is indeed a bug to fix there somewhere.  The stack
>> trace indicates that the failure occurs in a later execution
>> of ExecGatherMerge:
>>
>
> This will be fixed by the patch [1] (execrescan_gathermerge_v2.patch)
> I posted sometime back.  The test case is slightly different, but may
> I can re post the patch with your test case.
>

I have kept the fix as it is but changed the test to match your test.
I think the another patch posted above to add a new guc for
enable_gather is still relevant and important.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


execrescan_gathermerge_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Amit Kapila
On Thu, Aug 17, 2017 at 7:49 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane  wrote:
>>> I should think it wouldn't be that expensive to create a test
>>> case, if you already have test cases that invoke GatherMerge.
>>> Adding a right join against a VALUES clause with a small number of
>>> entries, and a non-mergeable/hashable join clause, ought to do it.
>
>> I have done some experiments based on this idea to generate a test,
>> but I think it is not as straightforward as it appears.
>
> I did this (the first 4 SETs duplicate what's already used in
> select_parallel.sql):
>
> regression=# set parallel_setup_cost=0;
> SET
> regression=# set parallel_tuple_cost=0;
> SET
> regression=# set min_parallel_table_scan_size=0;
> SET
> regression=# set max_parallel_workers_per_gather=4;
> SET
> regression=# set enable_hashagg TO 0;
> SET
> regression=# set enable_material TO 0;
> SET
> regression=# explain select * from (select string4, count((unique2))
> from tenk1 group by string4 order by string4) ss right join
> (values(1),(2)) v(x) on true;
> QUERY PLAN
> --
>  Nested Loop Left Join  (cost=524.15..1086.77 rows=8 width=76)
>->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
>->  Finalize GroupAggregate  (cost=524.15..543.29 rows=4 width=72)
>  Group Key: tenk1.string4
>  ->  Gather Merge  (cost=524.15..543.17 rows=16 width=72)
>Workers Planned: 4
>->  Partial GroupAggregate  (cost=524.10..542.89 rows=4 
> width=72)
>  Group Key: tenk1.string4
>  ->  Sort  (cost=524.10..530.35 rows=2500 width=68)
>Sort Key: tenk1.string4
>->  Parallel Seq Scan on tenk1  (cost=0.00..383.00 
> rows=2500 width=68)
> (11 rows)
>
> regression=# select * from (select string4, count((unique2))
> from tenk1 group by string4 order by string4) ss right join
> (values(1),(2)) v(x) on true;
> server closed the connection unexpectedly
>
>
> So, not only is it not that hard to reach ExecReScanGatherMerge,
> but there is indeed a bug to fix there somewhere.  The stack
> trace indicates that the failure occurs in a later execution
> of ExecGatherMerge:
>

This will be fixed by the patch [1] (execrescan_gathermerge_v2.patch)
I posted sometime back.  The test case is slightly different, but may
I can re post the patch with your test case.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JkByysFJNh9M349u_nNjqETuEnY_y1VUc_kJiU0bxtaQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane  wrote:
>> I should think it wouldn't be that expensive to create a test
>> case, if you already have test cases that invoke GatherMerge.
>> Adding a right join against a VALUES clause with a small number of
>> entries, and a non-mergeable/hashable join clause, ought to do it.

> I have done some experiments based on this idea to generate a test,
> but I think it is not as straightforward as it appears.

I did this (the first 4 SETs duplicate what's already used in
select_parallel.sql):

regression=# set parallel_setup_cost=0;
SET
regression=# set parallel_tuple_cost=0;
SET
regression=# set min_parallel_table_scan_size=0;
SET
regression=# set max_parallel_workers_per_gather=4;
SET
regression=# set enable_hashagg TO 0;
SET
regression=# set enable_material TO 0;
SET
regression=# explain select * from (select string4, count((unique2))
from tenk1 group by string4 order by string4) ss right join
(values(1),(2)) v(x) on true;
QUERY PLAN  
  
--
 Nested Loop Left Join  (cost=524.15..1086.77 rows=8 width=76)
   ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)
   ->  Finalize GroupAggregate  (cost=524.15..543.29 rows=4 width=72)
 Group Key: tenk1.string4
 ->  Gather Merge  (cost=524.15..543.17 rows=16 width=72)
   Workers Planned: 4
   ->  Partial GroupAggregate  (cost=524.10..542.89 rows=4 width=72)
 Group Key: tenk1.string4
 ->  Sort  (cost=524.10..530.35 rows=2500 width=68)
   Sort Key: tenk1.string4
   ->  Parallel Seq Scan on tenk1  (cost=0.00..383.00 
rows=2500 width=68)
(11 rows)

regression=# select * from (select string4, count((unique2))
from tenk1 group by string4 order by string4) ss right join
(values(1),(2)) v(x) on true;
server closed the connection unexpectedly


So, not only is it not that hard to reach ExecReScanGatherMerge,
but there is indeed a bug to fix there somewhere.  The stack
trace indicates that the failure occurs in a later execution
of ExecGatherMerge:

Program terminated with signal 11, Segmentation fault.
#0  0x0064b4e4 in swap_nodes (heap=0x15a9440) at binaryheap.c:223
223 heap->bh_nodes[a] = heap->bh_nodes[b];
(gdb) bt
#0  0x0064b4e4 in swap_nodes (heap=0x15a9440) at binaryheap.c:223
#1  binaryheap_remove_first (heap=0x15a9440) at binaryheap.c:189
#2  0x00634196 in gather_merge_getnext (pstate=)
at nodeGatherMerge.c:479
#3  ExecGatherMerge (pstate=) at nodeGatherMerge.c:241
#4  0x006251fe in ExecProcNode (aggstate=0x157a6d0)
at ../../../src/include/executor/executor.h:249
#5  fetch_input_tuple (aggstate=0x157a6d0) at nodeAgg.c:688
#6  0x00629264 in agg_retrieve_direct (pstate=)
at nodeAgg.c:2313
#7  ExecAgg (pstate=) at nodeAgg.c:2124
#8  0x006396ef in ExecProcNode (pstate=0x1579d98)
at ../../../src/include/executor/executor.h:249
#9  ExecNestLoop (pstate=0x1579d98) at nodeNestloop.c:160
#10 0x0061bc3f in ExecProcNode (queryDesc=0x14d5570, 
direction=, count=0, execute_once=-104 '\230')
at ../../../src/include/executor/executor.h:249
#11 ExecutePlan (queryDesc=0x14d5570, direction=, 
count=0, execute_once=-104 '\230') at execMain.c:1693
#12 standard_ExecutorRun (queryDesc=0x14d5570, 
direction=, count=0, execute_once=-104 '\230')
at execMain.c:362


regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-17 Thread Amit Kapila
On Thu, Aug 17, 2017 at 10:07 AM, Amit Kapila  wrote:
> On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane  wrote:
>>
>>> I believe that between this commit and the test-coverage commit from
>>> Andres, this open item is reasonably well addressed.  If someone
>>> thinks more needs to be done, please specify.  Thanks.
>>
>> How big a deal do we think test coverage is?  It looks like
>> ExecReScanGatherMerge is identical logic to ExecReScanGather,
>> which *is* covered according to coverage.postgresql.org, but
>> it wouldn't be too surprising if they diverge in future.
>>
>> I should think it wouldn't be that expensive to create a test
>> case, if you already have test cases that invoke GatherMerge.
>> Adding a right join against a VALUES clause with a small number of
>> entries, and a non-mergeable/hashable join clause, ought to do it.
>>
>
>
> Another way to make it parallel is, add a new guc enable_gather
> similar to enable_gathermerge and then set that to off, it will prefer
> GatherMerge in that case.  I think it is anyway good to have such a
> guc.  I will go and do it this way unless you have a better idea.
>

Going by above, I have created two separate patches.  First to
introduce a new guc enable_gather and second patch to test the rescan
behavior of gather merge.  I have found a problem in the rescan path
of gather merge which is that it is not initializing the gather merge
state which is required to initialize the heap for processing of
tuples.  I think this should have been caught earlier, but probably I
didn't notice it because in the previous tests left side would not
have passed enough rows to hit this case.  I have fixed it in the
attached patch (execrescan_gathermerge_v2).

> Note - enable_gathermerge is not present in postgresql.conf. I think
> we should add it in the postgresql.conf.sample file.
>

Thomas has already posted a patch to handle this problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


guc_enable_gather_v1.patch
Description: Binary data


execrescan_gathermerge_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-16 Thread Thomas Munro
On Thu, Aug 17, 2017 at 4:37 PM, Amit Kapila  wrote:
> Note - enable_gathermerge is not present in postgresql.conf. I think
> we should add it in the postgresql.conf.sample file.

+1

See also 
https://www.postgresql.org/message-id/CAEepm=0b7ym9mzsviq1d-hnt4koarvejvsfayvfyknv-pvd...@mail.gmail.com
.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-16 Thread Amit Kapila
On Tue, Aug 15, 2017 at 7:16 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila  wrote:
>>> Attached patch fixes the issue for me.  I have locally verified that
>>> the gather merge gets executed in rescan path.  I haven't added a test
>>> case for the same as having gather or gather merge on the inner side
>>> of join can be time-consuming.  However, if you or others feel that it
>>> is important to have a test to cover this code path, then I can try to
>>> produce one.
>
>> Committed.
>
>> I believe that between this commit and the test-coverage commit from
>> Andres, this open item is reasonably well addressed.  If someone
>> thinks more needs to be done, please specify.  Thanks.
>
> How big a deal do we think test coverage is?  It looks like
> ExecReScanGatherMerge is identical logic to ExecReScanGather,
> which *is* covered according to coverage.postgresql.org, but
> it wouldn't be too surprising if they diverge in future.
>
> I should think it wouldn't be that expensive to create a test
> case, if you already have test cases that invoke GatherMerge.
> Adding a right join against a VALUES clause with a small number of
> entries, and a non-mergeable/hashable join clause, ought to do it.
>

I have done some experiments based on this idea to generate a test,
but I think it is not as straightforward as it appears.  Below are
some of my experiments:

Query that uses GatherMerge in regression tests
-

regression=# explain (costs off)   select  string4, count((unique2))
from tenk1 group by string4 order by string4;
 QUERY PLAN

 Finalize GroupAggregate
   Group Key: string4
   ->  Gather Merge
 Workers Planned: 2
 ->  Partial GroupAggregate
   Group Key: string4
   ->  Sort
 Sort Key: string4
 ->  Parallel Seq Scan on tenk1
(9 rows)

Modified Query
--
regression=# explain (costs off) select  tenk1.hundred,
count((unique2)) from tenk1 right join (values (1,100), (2,200)) as t
(two, hundred) on t.two
> 1 and tenk1.hundred > 1 group by tenk1.hundred order by tenk1.hundred;
QUERY PLAN
--
 Sort
   Sort Key: tenk1.hundred
   ->  HashAggregate
 Group Key: tenk1.hundred
 ->  Nested Loop Left Join
   Join Filter: ("*VALUES*".column1 > 1)
   ->  Values Scan on "*VALUES*"
   ->  Gather
 Workers Planned: 4
 ->  Parallel Index Scan using tenk1_hundred on tenk1
   Index Cond: (hundred > 1)
(11 rows)

The cost of GatherMerge is always higher than Gather in this case
which is quite obvious as GatherMerge has to perform some additional
task. I am not aware of a way such that Grouping and Sorting can be
pushed below parallel node (Gather/GatherMerge) in this case, if there
is any such possibility, then it might prefer GatherMerge.

Another way to make it parallel is, add a new guc enable_gather
similar to enable_gathermerge and then set that to off, it will prefer
GatherMerge in that case.  I think it is anyway good to have such a
guc.  I will go and do it this way unless you have a better idea.

Note - enable_gathermerge is not present in postgresql.conf. I think
we should add it in the postgresql.conf.sample file.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-16 Thread Robert Haas
On Tue, Aug 15, 2017 at 9:46 AM, Tom Lane  wrote:
> How big a deal do we think test coverage is?  It looks like
> ExecReScanGatherMerge is identical logic to ExecReScanGather,
> which *is* covered according to coverage.postgresql.org, but
> it wouldn't be too surprising if they diverge in future.
>
> I should think it wouldn't be that expensive to create a test
> case, if you already have test cases that invoke GatherMerge.
> Adding a right join against a VALUES clause with a small number of
> entries, and a non-mergeable/hashable join clause, ought to do it.

I chatted with Amit about this -- he's planning to look into it.  I
assume we'll hear from him tomorrow about this, but for official
status update purposes I'll set a next-update date of one week from
today (August 23rd).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila  wrote:
>> Attached patch fixes the issue for me.  I have locally verified that
>> the gather merge gets executed in rescan path.  I haven't added a test
>> case for the same as having gather or gather merge on the inner side
>> of join can be time-consuming.  However, if you or others feel that it
>> is important to have a test to cover this code path, then I can try to
>> produce one.

> Committed.

> I believe that between this commit and the test-coverage commit from
> Andres, this open item is reasonably well addressed.  If someone
> thinks more needs to be done, please specify.  Thanks.

How big a deal do we think test coverage is?  It looks like
ExecReScanGatherMerge is identical logic to ExecReScanGather,
which *is* covered according to coverage.postgresql.org, but
it wouldn't be too surprising if they diverge in future.

I should think it wouldn't be that expensive to create a test
case, if you already have test cases that invoke GatherMerge.
Adding a right join against a VALUES clause with a small number of
entries, and a non-mergeable/hashable join clause, ought to do it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 7:31 AM, Amit Kapila  wrote:
> On Tue, Aug 15, 2017 at 8:22 AM, Noah Misch  wrote:
>> On Fri, Aug 11, 2017 at 11:59:14AM -0400, Tom Lane wrote:
>>> "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)"  
>>> writes:
>>> > ERROR:  XX000: unrecognized node type: 90
>>> > LOCATION:  ExecReScan, execAmi.c:284
>>>
>>> (gdb) p (NodeTag) 90
>>> $1 = T_GatherMergeState
>>>
>>> So, apparently somebody wrote ExecReScanGatherMerge, but never bothered
>>> to plug it into ExecReScan.
>
> Attached patch fixes the issue for me.  I have locally verified that
> the gather merge gets executed in rescan path.  I haven't added a test
> case for the same as having gather or gather merge on the inner side
> of join can be time-consuming.  However, if you or others feel that it
> is important to have a test to cover this code path, then I can try to
> produce one.

Committed.

I believe that between this commit and the test-coverage commit from
Andres, this open item is reasonably well addressed.  If someone
thinks more needs to be done, please specify.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-15 Thread Amit Kapila
On Tue, Aug 15, 2017 at 8:22 AM, Noah Misch  wrote:
> On Fri, Aug 11, 2017 at 11:59:14AM -0400, Tom Lane wrote:
>> "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)"  
>> writes:
>> > ERROR:  XX000: unrecognized node type: 90
>> > LOCATION:  ExecReScan, execAmi.c:284
>>
>> (gdb) p (NodeTag) 90
>> $1 = T_GatherMergeState
>>
>> So, apparently somebody wrote ExecReScanGatherMerge, but never bothered
>> to plug it into ExecReScan.

Attached patch fixes the issue for me.  I have locally verified that
the gather merge gets executed in rescan path.  I haven't added a test
case for the same as having gather or gather merge on the inner side
of join can be time-consuming.  However, if you or others feel that it
is important to have a test to cover this code path, then I can try to
produce one.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


execrescan_gathermerge_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers