Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-12 Thread Etsuro Fujita

(2019/03/11 23:46), Tom Lane wrote:

So this is
just a plan-quality problem not a wrong-answer problem.

However, I'd still argue for back-patching into v11, on the grounds
that this is a regression from v10.  The example you just gave does
produce the desired plan in v10, and I think it's more likely that
people would complain about a regression from v10 than that they'd
be unhappy because we changed it between 11.2 and 11.3.


Agreed.  I committed the patch to v11 and HEAD.  Thanks for reviewing!

Best regards,
Etsuro Fujita




Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-11 Thread Tom Lane
Etsuro Fujita  writes:
> (2019/03/11 14:14), Tom Lane wrote:
>> Seems to me it's the other way around: the final target would include
>> all functions invoked in the grouping target plus maybe some more.
>> So a non-parallel-safe grouping target implies a non-parallel-safe
>> final target, but not vice versa.

> I mean the final *scan/join* target, not the final target.

Oh, of course.  Yup, I was too tired last night :-(.  So this is
just a plan-quality problem not a wrong-answer problem.

However, I'd still argue for back-patching into v11, on the grounds
that this is a regression from v10.  The example you just gave does
produce the desired plan in v10, and I think it's more likely that
people would complain about a regression from v10 than that they'd
be unhappy because we changed it between 11.2 and 11.3.

regards, tom lane



Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-10 Thread Etsuro Fujita

(2019/03/11 14:14), Tom Lane wrote:

Etsuro Fujita  writes:

(2019/03/11 13:06), Tom Lane wrote:

Is that the only possible outcome?  Per Robert's summary quoted above,
it seems like it might be possible for the code to decide that the final
scan/join target to be parallel safe when it is not, leading to outright
wrong answers or query failures.



Maybe I'm missing something, but I think that if the final scan/join
target is not parallel-safe, then the grouping target would not be
parallel-safe either, by the construction of the two targets, so I don't
think that we have such a correctness issue.


Seems to me it's the other way around: the final target would include
all functions invoked in the grouping target plus maybe some more.
So a non-parallel-safe grouping target implies a non-parallel-safe
final target, but not vice versa.


I mean the final *scan/join* target, not the final target.

Best regards,
Etsuro Fujita




Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-10 Thread Tom Lane
Etsuro Fujita  writes:
> (2019/03/11 13:06), Tom Lane wrote:
>> Is that the only possible outcome?  Per Robert's summary quoted above,
>> it seems like it might be possible for the code to decide that the final
>> scan/join target to be parallel safe when it is not, leading to outright
>> wrong answers or query failures.

> Maybe I'm missing something, but I think that if the final scan/join 
> target is not parallel-safe, then the grouping target would not be 
> parallel-safe either, by the construction of the two targets, so I don't 
> think that we have such a correctness issue.

Seems to me it's the other way around: the final target would include
all functions invoked in the grouping target plus maybe some more.
So a non-parallel-safe grouping target implies a non-parallel-safe
final target, but not vice versa.

Possibly the summary had it backwards and the actual code bug is
inferring things in the safe direction, but I'm too tired to double
check that right now ...

regards, tom lane



Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-10 Thread Etsuro Fujita

(2019/03/11 13:06), Tom Lane wrote:

Etsuro Fujita  writes:

The parallel safety of the final scan/join target is determined from the
grouping target, not that target, which [ is wrong ]



This would only affect plan quality a little bit, so I don't think we
need a regression test for this, either, but the fix might destabilize
existing plan choices, so we should apply it to HEAD only?


Is that the only possible outcome?  Per Robert's summary quoted above,
it seems like it might be possible for the code to decide that the final
scan/join target to be parallel safe when it is not, leading to outright
wrong answers or query failures.


Maybe I'm missing something, but I think that if the final scan/join 
target is not parallel-safe, then the grouping target would not be 
parallel-safe either, by the construction of the two targets, so I don't 
think that we have such a correctness issue.


Best regards,
Etsuro Fujita




Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-10 Thread Tom Lane
Etsuro Fujita  writes:
>>> The parallel safety of the final scan/join target is determined from the
>>> grouping target, not that target, which [ is wrong ]

> This would only affect plan quality a little bit, so I don't think we 
> need a regression test for this, either, but the fix might destabilize 
> existing plan choices, so we should apply it to HEAD only?

Is that the only possible outcome?  Per Robert's summary quoted above,
it seems like it might be possible for the code to decide that the final
scan/join target to be parallel safe when it is not, leading to outright
wrong answers or query failures.  If the wrong answer can only be wrong
in the safe direction, it's not very clear why.

regards, tom lane



Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-10 Thread Etsuro Fujita

(2019/03/09 5:36), Tom Lane wrote:

Etsuro Fujita  writes:

(2019/02/28 0:52), Robert Haas wrote:

On Tue, Feb 26, 2019 at 7:26 AM Etsuro Fujita
   wrote:

The parallel safety of the final scan/join target is determined from the
grouping target, not that target, which [ is wrong ]



Your patch looks right to me.



I think it would be better for you to take this one.  Could you?


I concur with Robert that this is a brown-paper-bag-grade bug in
960df2a97.  Please feel free to push (and don't forget to back-patch).


OK, will do.


The only really interesting question is whether it's worth adding
a regression test for.  I doubt it; the issue seems much too narrow.
Usually the point of a regression test is to prevent re-introduction
of the same/similar bug, but what class of bugs would you argue
we'd be protecting against?


Plan degradation; without the fix, we would have this on data created by 
"pgbench -i -s 10 postgres", as shown in [1]:


postgres=# set parallel_setup_cost = 10;
postgres=# set parallel_tuple_cost = 0.001;

postgres=# explain verbose select aid+bid, sum(abalance), random() from 
pgbench_accounts group by aid+bid;

 QUERY PLAN


 GroupAggregate  (cost=137403.01..159903.01 rows=100 width=20)
   Output: ((aid + bid)), sum(abalance), random()
   Group Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
   ->  Sort  (cost=137403.01..139903.01 rows=100 width=8)
 Output: ((aid + bid)), abalance
 Sort Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
 ->  Gather  (cost=10.00..24070.67 rows=100 width=8)
-->Output: (aid + bid), abalance
   Workers Planned: 2
   ->  Parallel Seq Scan on public.pgbench_accounts 
(cost=0.00..20560.67 rows=416667 width=12)

 Output: aid, bid, abalance
(11 rows)

The final scan/join target {(aid + bid), abalance} would be parallel 
safe, but in the plan the target is not parallelized across workers. 
The reason for that is because the parallel-safety of the target is 
assessed incorrectly using the grouping target {((aid + bid)), 
sum(abalance), random()}, which would not be parallel safe.  By the fix 
we would have this:


postgres=# explain verbose select aid+bid, sum(abalance), random() from 
pgbench_accounts group by aid+bid;

QUERY PLAN

---
 GroupAggregate  (cost=135944.68..158444.68 rows=100 width=20)
   Output: ((aid + bid)), sum(abalance), random()
   Group Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
   ->  Sort  (cost=135944.68..138444.68 rows=100 width=8)
 Output: ((aid + bid)), abalance
 Sort Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
 ->  Gather  (cost=10.00..22612.33 rows=100 width=8)
   Output: ((aid + bid)), abalance
   Workers Planned: 2
   ->  Parallel Seq Scan on public.pgbench_accounts 
(cost=0.00..21602.33 rows=416667 width=8)

-->  Output: (aid + bid), abalance
(11 rows)

Note that the final scan/join target is parallelized in the plan.

This would only affect plan quality a little bit, so I don't think we 
need a regression test for this, either, but the fix might destabilize 
existing plan choices, so we should apply it to HEAD only?


Best regards,
Etsuro Fujita




Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-08 Thread Tom Lane
Etsuro Fujita  writes:
> (2019/02/28 0:52), Robert Haas wrote:
>> On Tue, Feb 26, 2019 at 7:26 AM Etsuro Fujita
>>   wrote:
>>> The parallel safety of the final scan/join target is determined from the
>>> grouping target, not that target, which [ is wrong ]

>> Your patch looks right to me.

> I think it would be better for you to take this one.  Could you?

I concur with Robert that this is a brown-paper-bag-grade bug in
960df2a97.  Please feel free to push (and don't forget to back-patch).

The only really interesting question is whether it's worth adding
a regression test for.  I doubt it; the issue seems much too narrow.
Usually the point of a regression test is to prevent re-introduction
of the same/similar bug, but what class of bugs would you argue
we'd be protecting against?

regards, tom lane



Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-03-01 Thread Etsuro Fujita

(2019/02/28 0:52), Robert Haas wrote:

On Tue, Feb 26, 2019 at 7:26 AM Etsuro Fujita
  wrote:

The parallel safety of the final scan/join target is determined from the
grouping target, not that target, which [ is wrong ]



Your patch looks right to me.


I think it would be better for you to take this one.  Could you?

Best regards,
Etsuro Fujita




Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-02-27 Thread Robert Haas
On Tue, Feb 26, 2019 at 7:26 AM Etsuro Fujita
 wrote:
> The parallel safety of the final scan/join target is determined from the
> grouping target, not that target, which [ is wrong ]

OOPS.  That's pretty embarrassing.

Your patch looks right to me.  I will now go look for a bag to put over my head.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Oddity with parallel safety test for scan/join target in grouping_planner

2019-02-27 Thread Etsuro Fujita
(2019/02/26 21:25), Etsuro Fujita wrote:
> While working on [1], I noticed $Subject, that is:
> 
>  /*
>   * If we have grouping or aggregation to do, the topmost scan/join
>   * plan node must emit what the grouping step wants; otherwise, it
>   * should emit grouping_target.
>   */
>  have_grouping = (parse->groupClause || parse->groupingSets ||
>   parse->hasAggs || root->hasHavingQual);
>  if (have_grouping)
>  {
>  scanjoin_target = make_group_input_target(root, final_target);
> -->  scanjoin_target_parallel_safe =
>  is_parallel_safe(root, (Node *) grouping_target->exprs);
>  }
>  else
>  {
>  scanjoin_target = grouping_target;
>  scanjoin_target_parallel_safe = grouping_target_parallel_safe;
>  }
> 
> The parallel safety of the final scan/join target is determined from the
> grouping target, not that target, which would cause to generate inferior
> parallel plans as shown below:
> 
> pgbench=# explain verbose select aid+bid, sum(abalance), random() from
> pgbench_accounts group by aid+bid;
>   QUERY PLAN
> 
>   GroupAggregate  (cost=137403.01..159903.01 rows=100 width=20)
> Output: ((aid + bid)), sum(abalance), random()
> Group Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
> ->   Sort  (cost=137403.01..139903.01 rows=100 width=8)
>   Output: ((aid + bid)), abalance
>   Sort Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
>   ->   Gather  (cost=10.00..24070.67 rows=100 width=8)
> Output: (aid + bid), abalance
> Workers Planned: 2
> ->   Parallel Seq Scan on public.pgbench_accounts
> (cost=0.00..20560.67 rows=416667 width=12)
>   Output: aid, bid, abalance
> (11 rows)
> 
> The final scan/join target {(aid + bid), abalance} is definitely
> parallel safe, but the target evaluation isn't parallelized across
> workers, which is not good.  Attached is a patch for fixing this.

I added this to the upcoming commitfest.

Best regards,
Etsuro Fujita




Oddity with parallel safety test for scan/join target in grouping_planner

2019-02-26 Thread Etsuro Fujita
Hi hackers,

While working on [1], I noticed $Subject, that is:

/*
 * If we have grouping or aggregation to do, the topmost scan/join
 * plan node must emit what the grouping step wants; otherwise, it
 * should emit grouping_target.
 */
have_grouping = (parse->groupClause || parse->groupingSets ||
 parse->hasAggs || root->hasHavingQual);
if (have_grouping)
{
scanjoin_target = make_group_input_target(root, final_target);
--> scanjoin_target_parallel_safe =
is_parallel_safe(root, (Node *) grouping_target->exprs);
}
else
{
scanjoin_target = grouping_target;
scanjoin_target_parallel_safe = grouping_target_parallel_safe;
}

The parallel safety of the final scan/join target is determined from the
grouping target, not that target, which would cause to generate inferior
parallel plans as shown below:

pgbench=# explain verbose select aid+bid, sum(abalance), random() from
pgbench_accounts group by aid+bid;
 QUERY PLAN

 GroupAggregate  (cost=137403.01..159903.01 rows=100 width=20)
   Output: ((aid + bid)), sum(abalance), random()
   Group Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
   ->  Sort  (cost=137403.01..139903.01 rows=100 width=8)
 Output: ((aid + bid)), abalance
 Sort Key: ((pgbench_accounts.aid + pgbench_accounts.bid))
 ->  Gather  (cost=10.00..24070.67 rows=100 width=8)
   Output: (aid + bid), abalance
   Workers Planned: 2
   ->  Parallel Seq Scan on public.pgbench_accounts
(cost=0.00..20560.67 rows=416667 width=12)
 Output: aid, bid, abalance
(11 rows)

The final scan/join target {(aid + bid), abalance} is definitely
parallel safe, but the target evaluation isn't parallelized across
workers, which is not good.  Attached is a patch for fixing this.

Best regards,
Etsuro Fujita

[1] https://commitfest.postgresql.org/22/1950/
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
***
*** 1989,1995  grouping_planner(PlannerInfo *root, bool inheritance_update,
  		{
  			scanjoin_target = make_group_input_target(root, final_target);
  			scanjoin_target_parallel_safe =
! is_parallel_safe(root, (Node *) grouping_target->exprs);
  		}
  		else
  		{
--- 1989,1995 
  		{
  			scanjoin_target = make_group_input_target(root, final_target);
  			scanjoin_target_parallel_safe =
! is_parallel_safe(root, (Node *) scanjoin_target->exprs);
  		}
  		else
  		{