Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Tom Lane
Robert Haas  writes:
> You want to push something, or should I do it?

Go for it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 12:49 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Aug 16, 2017 at 11:56 AM, Tom Lane  wrote:
>>> I can get on board with that statement.  Can you draft a better wording?
>
>> Here is an attempt.  Feel free to edit.
>
> I think s/plan/query/ in the last bit would be better.  Perhaps
>
> +* However, if force_parallel_mode = on or force_parallel_mode = 
> regress,
> +* then we impose parallel mode whenever it's safe to do so, even if 
> the
> +* final plan doesn't use parallelism.  It's not safe to do so if the 
> query
> +* contains anything parallel-unsafe; parallelModeOK will be false in 
> that
> +* case.  Otherwise, everything in the query is either parallel-safe 
> or
> +* parallel-restricted, and in either case it should be OK to impose
> +* parallel-mode restrictions.  If that ends up breaking something, 
> then
> +* either some function the user included in the query is incorrectly
> +* labelled as parallel-safe or parallel-restricted when in reality 
> it's
> +* parallel-unsafe, or else the query planner itself has a bug.
>  */

Works for me.  I'm happy to phrase this in any way that makes it clear
to you, 'cuz it's already clear to me.  :-)

You want to push something, or should I do it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 16, 2017 at 11:56 AM, Tom Lane  wrote:
>> I can get on board with that statement.  Can you draft a better wording?

> Here is an attempt.  Feel free to edit.

I think s/plan/query/ in the last bit would be better.  Perhaps

+* However, if force_parallel_mode = on or force_parallel_mode = 
regress,
+* then we impose parallel mode whenever it's safe to do so, even if the
+* final plan doesn't use parallelism.  It's not safe to do so if the 
query
+* contains anything parallel-unsafe; parallelModeOK will be false in 
that
+* case.  Otherwise, everything in the query is either parallel-safe or
+* parallel-restricted, and in either case it should be OK to impose
+* parallel-mode restrictions.  If that ends up breaking something, then
+* either some function the user included in the query is incorrectly
+* labelled as parallel-safe or parallel-restricted when in reality it's
+* parallel-unsafe, or else the query planner itself has a bug.
 */

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 11:56 AM, Tom Lane  wrote:
>> So I still don't see what's wrong here, other than that the comment is
>> evidently not half clear enough.
>
> I can get on board with that statement.  Can you draft a better wording?

Here is an attempt.  Feel free to edit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


clarify-force-parallel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 16, 2017 at 11:03 AM, Tom Lane  wrote:
>> Well, it has subplans, so formally I think it's restricted not unsafe
>> --- but the parallel_safe marking on constructed paths/plans is only
>> safe/not-safe, not a three-way.

> True, but when parallel_safe it not set, that means it's not
> parallel-safe, so either parallel-restricted or parallel-unsafe.  But
> if parallelModeOK is true, then it had better be parallel-restricted,
> not parallel-unsafe.

Ah, I see.

> So I still don't see what's wrong here, other than that the comment is
> evidently not half clear enough.

I can get on board with that statement.  Can you draft a better wording?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 11:03 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Aug 15, 2017 at 6:40 PM, Tom Lane  wrote:
>>> (In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
>>> path, that's parallel unsafe, but the original query might've been
>>> completely safe.)
>
>> I'm quite confused here.  What's parallel-unsafe about a MinMaxAgg?
>> There might be some reason why it's parallel-restricted, but it
>> shouldn't be parallel-unsafe.
>
> Well, it has subplans, so formally I think it's restricted not unsafe
> --- but the parallel_safe marking on constructed paths/plans is only
> safe/not-safe, not a three-way.

True, but when parallel_safe it not set, that means it's not
parallel-safe, so either parallel-restricted or parallel-unsafe.  But
if parallelModeOK is true, then it had better be parallel-restricted,
not parallel-unsafe.  Which in turn means that it had better be
perfectly safe to run the plan under
EnterParallelMode()/ExitParallelMode().  If it's not, then  UNION ALL  would blow up.

So I still don't see what's wrong here, other than that the comment is
evidently not half clear enough.  I don't think this is the first time
we've been over all this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 6:40 PM, Tom Lane  wrote:
>> (In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
>> path, that's parallel unsafe, but the original query might've been
>> completely safe.)

> I'm quite confused here.  What's parallel-unsafe about a MinMaxAgg?
> There might be some reason why it's parallel-restricted, but it
> shouldn't be parallel-unsafe.

Well, it has subplans, so formally I think it's restricted not unsafe
--- but the parallel_safe marking on constructed paths/plans is only
safe/not-safe, not a three-way.  By the time we get back up to
standard_planner() and are considering whether to plaster a Gather on
top, it doesn't really matter whether that subtree is unsafe or merely
restricted; either way, you can't send it to a worker.

> More generally, there is no way for parallelModeOK to go from true to
> false after it's initially set.

Sure, and it's not necessary, because that's just a plan-time flag
indicating whether it's worth the trouble to look for parallel plans.
It is not an indicator that we will or must end up choosing a parallel
plan.  parallelModeNeeded is a different animal: it's a planner output
(which AFAICS is never consulted within the planner, so there's no need
to set it early) telling the executor whether to do
EnterParallelMode/ExitParallelMode.

> If there were, it would be a bug,
> because we might plan some subquery thinking that parallelModeOK is
> true, use a Gather node, and then later plan some other subquery that
> changes to parallelModeOK from true to false, making the plan that's
> already written in stone no longer valid.  This is exactly why we have
> to have max_parallel_hazard() walk the ENTIRE query tree, including
> all subqueries, before we get started.

> Planning can obviously introduce elements into the query that prevent
> parallelism from being used for that part of the query, and the only
> thing there is to make sure that such things never make it into a
> partial path.  But it can't just decide that parallelism is no longer
> allowed *anywhere* in the query.

These statements are true, but none of them seem at all relevant to
my point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 6:40 PM, Tom Lane  wrote:
> No.  The case that I'm concerned about is where the initial estimate
> of "parallelModeOK" is true, but the planner nevertheless selects
> a parallel-unsafe plan --- unsafe for some other reason than that it
> already has a Gather in it.  That would prevent the code further down
> in standard_planner from plastering a Gather on top, but we still end up
> labeling the output plan with parallelModeNeeded = true.
>
> Now, you might argue that there should be no case where that initial
> estimate has parallelModeOK = true and yet we end up with a
> parallel-unsafe plan.  I do not think I believe that; that estimate
> is supposed to be a conservative estimate, not ironclad exactness.
> (In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
> path, that's parallel unsafe, but the original query might've been
> completely safe.)

I'm quite confused here.  What's parallel-unsafe about a MinMaxAgg?
There might be some reason why it's parallel-restricted, but it
shouldn't be parallel-unsafe.

More generally, there is no way for parallelModeOK to go from true to
false after it's initially set.  If there were, it would be a bug,
because we might plan some subquery thinking that parallelModeOK is
true, use a Gather node, and then later plan some other subquery that
changes to parallelModeOK from true to false, making the plan that's
already written in stone no longer valid.  This is exactly why we have
to have max_parallel_hazard() walk the ENTIRE query tree, including
all subqueries, before we get started.

Planning can obviously introduce elements into the query that prevent
parallelism from being used for that part of the query, and the only
thing there is to make sure that such things never make it into a
partial path.  But it can't just decide that parallelism is no longer
allowed *anywhere* in the query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
I wrote:
> As I said, I think this code is based on fuzzy thinking.

... or, more charitably, it may just date from before standard_planner's
plaster-a-Gather-on-top logic worked the way it does today.  But in
any case I think it's wrong now.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 5:15 PM, Tom Lane  wrote:
>> The idea of the existing
>> code seems to be "let's exercise what happens if the executor does
>> EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
>> parallel-unsafe one"; which seems to me to be bogus as heck.

> I think that is not what the current code is doing.  If the plan is
> diagnosed as parallel-unsafe, then parallelModeOK will be false and
> nothing will happen.

No.  The case that I'm concerned about is where the initial estimate
of "parallelModeOK" is true, but the planner nevertheless selects
a parallel-unsafe plan --- unsafe for some other reason than that it
already has a Gather in it.  That would prevent the code further down
in standard_planner from plastering a Gather on top, but we still end up
labeling the output plan with parallelModeNeeded = true.

Now, you might argue that there should be no case where that initial
estimate has parallelModeOK = true and yet we end up with a
parallel-unsafe plan.  I do not think I believe that; that estimate
is supposed to be a conservative estimate, not ironclad exactness.
(In fact, a quick look shows a counterexample: if we pick a MinMaxAgg
path, that's parallel unsafe, but the original query might've been
completely safe.)

> If the plan is actually parallel-unsafe but the
> planner doesn't *think* it's parallel-unsafe, then what you are
> talking about will happen, but that seems to me to be a good thing.

No, you have that backwards.  If the planner incorrectly thinks the plan
is parallel-safe then it will forcibly put a Gather on top, and we'll mark
parallelModeNeeded = true due to the existing assignment where that is
done, and then we will detect the unsafety at runtime.  The case I'm
worried about is where the planner knows (correctly) that the selected
plan is parallel-unsafe, due to some choice made after the initial
parallelModeOK = true estimate.

> It lets you find planner bugs (or functions that a user has labelled
> incorrectly).

Don't believe this argument either.  Certainly we want to be able to
detect incorrectly-labeled-safe functions by turning on
force_parallel_mode; but that will happen anyway, since both the initial
parallelModeOK estimate and the final top_plan->parallel_safe flag will
reflect the false safety labeling.  (If there's something else in the plan
that makes it parallel-unsafe overall, then the test cannot reveal the
false function labeling anyway.)

As I said, I think this code is based on fuzzy thinking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 5:15 PM, Tom Lane  wrote:
> The idea of the existing
> code seems to be "let's exercise what happens if the executor does
> EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
> parallel-unsafe one"; which seems to me to be bogus as heck.

I think that is not what the current code is doing.  If the plan is
diagnosed as parallel-unsafe, then parallelModeOK will be false and
nothing will happen.  If the plan is actually parallel-unsafe but the
planner doesn't *think* it's parallel-unsafe, then what you are
talking about will happen, but that seems to me to be a good thing.
It lets you find planner bugs (or functions that a user has labelled
incorrectly).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
I wrote:
> I like your plan (2).  It's not much code and it lends itself to having a
> run-time check, rather than just an Assert, that we found a Result node.
> That seems like a good idea now that we've found the assumption isn't
> bulletproof.  However, do we need to worry about the planner putting some
> nontrivial computation into the Gather's tlist instead of the Result?

I concluded that it'd probably be enough to have an assertion that the
Gather's tlist is trivial, so I made it work that way.

However, while investigating the behavior of force_parallel_mode along
the way to this, I found that standard_planner() contains some fuzzy
thinking about how to set parallelModeNeeded.  It's not necessary or
(IMO) approriate to force that on just because of force_parallel_mode,
so I propose the attached patch, which deletes that stanza in favor of
initializing glob->parallelModeNeeded to just plain false.  The effect of
this will be that parallelModeNeeded is only set true if there's actually
a Gather (or GatherMerge) node somewhere in the plan.  The only case where
that differs from the existing behavior is if the initial checks conclude
that parallelModeOK can be turned on, but then we end up with a
parallel-unsafe plan for some reason or other.  The idea of the existing
code seems to be "let's exercise what happens if the executor does
EnterParallelMode/ExitParallelMode around any plan whatsoever, even a
parallel-unsafe one"; which seems to me to be bogus as heck.  If it failed,
we would not conclude that that was an executor bug.

So I think we should apply and back-patch the below.

regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 407df9a..e407b34 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** standard_planner(Query *parse, int curso
*** 249,254 
--- 249,255 
  	glob->lastPlanNodeId = 0;
  	glob->transientPlan = false;
  	glob->dependsOnRole = false;
+ 	glob->parallelModeNeeded = false;
  
  	/*
  	 * Assess whether it's feasible to use parallel mode for this query. We
*** standard_planner(Query *parse, int curso
*** 290,307 
  		glob->parallelModeOK = false;
  	}
  
- 	/*
- 	 * glob->parallelModeNeeded should tell us whether it's necessary to
- 	 * impose the parallel mode restrictions, but we don't actually want to
- 	 * impose them unless we choose a parallel plan, so it is normally set
- 	 * only if a parallel plan is chosen (see create_gather_plan).  That way,
- 	 * people who mislabel their functions but don't use parallelism anyway
- 	 * aren't harmed.  But when force_parallel_mode is set, we enable the
- 	 * restrictions whenever possible for testing purposes.
- 	 */
- 	glob->parallelModeNeeded = glob->parallelModeOK &&
- 		(force_parallel_mode != FORCE_PARALLEL_OFF);
- 
  	/* Determine what fraction of the plan is likely to be scanned */
  	if (cursorOptions & CURSOR_OPT_FAST_PLAN)
  	{
--- 291,296 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Robert Haas  writes:
> On Tue, Aug 15, 2017 at 2:34 PM, Tom Lane  wrote:
>> I assume (haven't looked) that I could hack the plpgsql code to prevent
>> generating a parallel plan when it's decided the command is a simple
>> SELECT.  But I wonder whether that's the right place to fix it.  Does
>> it ever make sense to parallelize a plan that can't possibly benefit?
>> IOW I am not sure that we should carry force_parallel_mode this far.

> Well, I think the point of force_parallel_mode is to try running
> things in workers when it's legal but not necessarily beneficial, so
> I'd be disinclined to nerf this too much, but it might be OK to nerf
> it a little bit.

Fair enough.

> Off-hand, I'd say the obvious options are:
> (1) teach exec_simple_check_plan() to consider everything non-simple
> when force_parallel_mode is not off, or
> (2) teach exec_save_simple_expr() to see through a Gather node to the
> Result node underneath

I had been looking into (3) get plpgsql to drop the CURSOR_OPT_PARALLEL_OK
bit when it decides the query is simple.  But because (a) that bit is set
before we make the test, in current behavior, and (b) there are various
SPI interfaces in the way of changing it later, that approach is looking
pretty messy.

I like your plan (2).  It's not much code and it lends itself to having a
run-time check, rather than just an Assert, that we found a Result node.
That seems like a good idea now that we've found the assumption isn't
bulletproof.  However, do we need to worry about the planner putting some
nontrivial computation into the Gather's tlist instead of the Result?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Robert Haas
On Tue, Aug 15, 2017 at 2:34 PM, Tom Lane  wrote:
> Tom Lane  writes:
>> Simplify plpgsql's check for simple expressions.
>> ...
>> https://git.postgresql.org/pg/commitdiff/00418c61244138bd8ac2de58076a1d0dd4f539f3
>
> The buildfarm members that are running force_parallel_mode = regress
> are not happy with this.  Apparently, even a trivial SELECT 
> can be turned into a Gather plan if force_parallel_mode says so.
>
> I assume (haven't looked) that I could hack the plpgsql code to prevent
> generating a parallel plan when it's decided the command is a simple
> SELECT.  But I wonder whether that's the right place to fix it.  Does
> it ever make sense to parallelize a plan that can't possibly benefit?
> IOW I am not sure that we should carry force_parallel_mode this far.

Well, I think the point of force_parallel_mode is to try running
things in workers when it's legal but not necessarily beneficial, so
I'd be disinclined to nerf this too much, but it might be OK to nerf
it a little bit.  Off-hand, I'd say the obvious options are:

(1) teach exec_simple_check_plan() to consider everything non-simple
when force_parallel_mode is not off, or
(2) teach exec_save_simple_expr() to see through a Gather node to the
Result node underneath

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Simplify plpgsql's check for simple expressions.

2017-08-15 Thread Tom Lane
Tom Lane  writes:
> Simplify plpgsql's check for simple expressions.
> ...
> https://git.postgresql.org/pg/commitdiff/00418c61244138bd8ac2de58076a1d0dd4f539f3

The buildfarm members that are running force_parallel_mode = regress
are not happy with this.  Apparently, even a trivial SELECT 
can be turned into a Gather plan if force_parallel_mode says so.

I assume (haven't looked) that I could hack the plpgsql code to prevent
generating a parallel plan when it's decided the command is a simple
SELECT.  But I wonder whether that's the right place to fix it.  Does
it ever make sense to parallelize a plan that can't possibly benefit?
IOW I am not sure that we should carry force_parallel_mode this far.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers