Re: [HACKERS] parallelize queries containing subplans

2017-02-14 Thread Amit Kapila
On Wed, Feb 15, 2017 at 4:38 AM, Robert Haas  wrote:
> On Tue, Feb 14, 2017 at 4:24 AM, Amit Kapila  wrote:
>> On further evaluation, it seems this patch has one big problem which
>> is that it will allow forming parallel plans which can't be supported
>> with current infrastructure.  For ex. marking immediate level params
>> as parallel safe can generate below type of plan:
>>
>> Seq Scan on t1
>>Filter: (SubPlan 1)
>>SubPlan 1
>>  ->  Gather
>>Workers Planned: 1
>>->  Result
>>  One-Time Filter: (t1.k = 0)
>>  ->  Parallel Seq Scan on t2
>>
>>
>> In this plan, we can't evaluate one-time filter (that contains
>> correlated param) unless we have the capability to pass all kind of
>> PARAM_EXEC param to workers.   I don't want to invest too much time in
>> this patch unless somebody can see some way using current parallel
>> infrastructure to implement correlated subplans.
>
> I don't think this approach has much chance of working; it just seems
> too simplistic.  I'm not entirely sure what the right approach is.
> Unfortunately, the current query planner code seems to compute the
> sets of parameters that are set and used quite late, and really only
> on a per-subquery level.
>

 Now just for the sake of discussion consider we have list of
allParams at the path level, then also I think it might not be easy to
make it work.

>  Here we need to know whether there is
> anything that's set below the Gather node and used above it, or the
> other way around, and we need to know it much earlier, while we're
> still doing path generation.
>

Yes, that is exactly the challenge.  I am not sure if currently there
is a way by which we can identify if a Param on a particular node
refers to node below it or above it.


> (There's also possible a couple of other cases, like an initPlan that
> needs to get executed only once, and also maybe a case where a
> parameter is set below the Gather and later used above the Gather.
> Not sure if that latter one happen, or how to deal with it.)
>

I think the case for initPlan is slightly different because we can
always evaluate it at Gather (considering it is an uncorrelated
initplan) and then pass it to Workers.  We generally have a list of
all the params at each plan node, so we can identify which of these
are initPlan params and then evaluate them. Now, it can be used
irrespective of whether it is used above or below the Gather node.
For the cases, where it can be used above Gather node, it will work as
we always store the computed value of such params in estate/econtext
and for the cases when it has to be used below Gather, we need to pass
the computed value to workers.  Now, there is some exceptions like for
few cases not all the params are available at a particular node, but I
feel those can be handled easily by either traversing the planstate
tree or by actually storing them at Gather node.  Actually, in short,
this is what is done in the patch proposed for parallizing initplans
[1].


[1] - https://commitfest.postgresql.org/13/997/

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-14 Thread Amit Kapila
On Wed, Feb 15, 2017 at 4:46 AM, Robert Haas  wrote:
> On Mon, Feb 13, 2017 at 11:07 PM, Amit Kapila  wrote:
>> I have removed the check of AlternativeSubPlan so that it can be
>> handled by expression_tree_walker.
> ...
>> Attached patch fixes all the comments raised till now.
>
> Committed after removing the reference to AlternativeSubPlan from the
> comment.  I didn't see any need to mention that.
>

Okay, I was also of two minds while adding that line in the comment.
I had kept it because there are few places in the code where both
SubPlan and AlternativeSubPlan are handled together, so I thought the
person looking at this code should not wonder why we have not handled
AlternativeSubPlan. However, I think it is okay to remove it from
comment as there exist other places where only Subplan is handled and
we rely on expression tree walker for AlternativeSubPlans.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-14 Thread Robert Haas
On Mon, Feb 13, 2017 at 11:07 PM, Amit Kapila  wrote:
> I have removed the check of AlternativeSubPlan so that it can be
> handled by expression_tree_walker.
...
> Attached patch fixes all the comments raised till now.

Committed after removing the reference to AlternativeSubPlan from the
comment.  I didn't see any need to mention that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-14 Thread Robert Haas
On Tue, Feb 14, 2017 at 4:24 AM, Amit Kapila  wrote:
> On further evaluation, it seems this patch has one big problem which
> is that it will allow forming parallel plans which can't be supported
> with current infrastructure.  For ex. marking immediate level params
> as parallel safe can generate below type of plan:
>
> Seq Scan on t1
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Gather
>Workers Planned: 1
>->  Result
>  One-Time Filter: (t1.k = 0)
>  ->  Parallel Seq Scan on t2
>
>
> In this plan, we can't evaluate one-time filter (that contains
> correlated param) unless we have the capability to pass all kind of
> PARAM_EXEC param to workers.   I don't want to invest too much time in
> this patch unless somebody can see some way using current parallel
> infrastructure to implement correlated subplans.

I don't think this approach has much chance of working; it just seems
too simplistic.  I'm not entirely sure what the right approach is.
Unfortunately, the current query planner code seems to compute the
sets of parameters that are set and used quite late, and really only
on a per-subquery level.  Here we need to know whether there is
anything that's set below the Gather node and used above it, or the
other way around, and we need to know it much earlier, while we're
still doing path generation.  There doesn't seem to be any simple way
of getting that information, but I think you need it.

What's more, I think you would still need it even if you had the
ability to pass parameter values between processes.  For example,
consider:

Gather
-> Parallel Seq Scan
  Filter: (Correlated Subplan Reference Goes Here)

Of course, the Param in the filter condition *can't* be a shared Param
across all processes.  It needs to be private to each process
participating in the parallel sequential scan -- and the params
passing data down from the Parallel Seq Scan to the correlated subplan
also need to be private.  On the other hand, in your example quoted
above, you do need to share across processes: the value for t1.k needs
to get passed down.  So it seems to me that we somehow need to
identify, for each parameter that gets used, whether it's provided by
something beneath the Gather node (in which case it should be private
to the worker) or whether it's provided from higher up (in which case
it should be passed down to the worker, or if we can't do that, then
don't use parallelism there).

(There's also possible a couple of other cases, like an initPlan that
needs to get executed only once, and also maybe a case where a
parameter is set below the Gather and later used above the Gather.
Not sure if that latter one happen, or how to deal with it.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-14 Thread Amit Kapila
On Tue, Jan 3, 2017 at 4:19 PM, Amit Kapila  wrote:
> On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila  wrote:
>>
>> Now, we can further extend this to parallelize queries containing
>> correlated subplans like below:
>>
>> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
>>  QUERY PLAN
>> -
>>  Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
>>Filter: (SubPlan 1)
>>SubPlan 1
>>  ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>>Filter: (i = t1.i)
>> (5 rows)
>>
>> In the above query, Filter on t2 (i = t1.i) generates Param node which
>> is a parallel-restricted node, so such queries won't be able to use
>> parallelism even with the patch.  I think we can mark Params which
>> refer to same level as parallel-safe and I think we have this
>> information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
>> when we replace correlation vars (SS_replace_correlation_vars).
>>
>
> I have implemented the above idea which will allow same or immediate
> outer level PARAMS as parallel_safe.  The results of above query after
> patch:
>
> postgres=# explain select * from t1 where t1.i in (select t2.i from t2
> where t2.i=t1.i);
> QUERY PLAN
> --
>  Gather  (cost=0.00..49.88 rows=8395 width=12)
>Workers Planned: 1
>->  Parallel Seq Scan on t1  (cost=0.00..49.88 rows=4938 width=12)
>  Filter: (SubPlan 1)
>  SubPlan 1
>->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>  Filter: (i = t1.i)
> (7 rows)
>

On further evaluation, it seems this patch has one big problem which
is that it will allow forming parallel plans which can't be supported
with current infrastructure.  For ex. marking immediate level params
as parallel safe can generate below type of plan:

Seq Scan on t1
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Gather
   Workers Planned: 1
   ->  Result
 One-Time Filter: (t1.k = 0)
 ->  Parallel Seq Scan on t2


In this plan, we can't evaluate one-time filter (that contains
correlated param) unless we have the capability to pass all kind of
PARAM_EXEC param to workers.   I don't want to invest too much time in
this patch unless somebody can see some way using current parallel
infrastructure to implement correlated subplans.

Note that still, the other patch [1] in this thread which implements
parallelism for uncorrelated subplan holds good.


[1] - 
https://www.postgresql.org/message-id/CAA4eK1J9mDZLcp-OskkdzAf_yT8W4dBSGL9E%3DkoEiJkdpVZsEA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-13 Thread Amit Kapila
On Thu, Feb 2, 2017 at 9:23 AM, Amit Kapila  wrote:
> On Wed, Feb 1, 2017 at 10:04 PM, Robert Haas  wrote:
>> On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila  wrote:
>>> Moved this patch to next CF.
>>
>> So here is what seems to be the key hunk from this patch:
>>
>>  /*
>> - * Since we don't have the ability to push subplans down to workers at
>> - * present, we treat subplan references as parallel-restricted.  We need
>> - * not worry about examining their contents; if they are unsafe, we 
>> would
>> - * have found that out while examining the whole tree before reduction 
>> of
>> - * sublinks to subplans.  (Really we should not see SubLink during a
>> - * max_interesting == restricted scan, but if we do, return true.)
>> + * We can push the subplans only if they don't contain any 
>> parallel-aware
>> + * node as we don't support multi-level parallelism (parallel workers
>> + * invoking another set of parallel workers).
>>   */
>> -else if (IsA(node, SubLink) ||
>> - IsA(node, SubPlan) ||
>> - IsA(node, AlternativeSubPlan))
>> +else if (IsA(node, SubPlan))
>> +return !((SubPlan *) node)->parallel_safe;
>> +else if (IsA(node, AlternativeSubPlan))
>>  {
>> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
>> -return true;
>> +AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
>> +ListCell   *lc;
>> +
>> +foreach(lc, asplan->subplans)
>> +{
>> +SubPlan*splan = (SubPlan *) lfirst(lc);
>> +
>> +Assert(IsA(splan, SubPlan));
>> +
>> +if (max_parallel_hazard_walker((Node *) splan, context))
>> +return true;
>> +}
>> +
>> +return false;
>>  }
>>
>> I don't see the reason for having this new code that makes
>> AlternativeSubPlan walk over the subplans; expression_tree_walker
>> already does that.
>>

I have removed the check of AlternativeSubPlan so that it can be
handled by expression_tree_walker.

>
> It is done this way mainly to avoid recursion/nested calls, but I
> think you prefer to handle it via expression_tree_walker so that code
> looks clean.  Another probable reason could be that there is always a
> chance that we miss handling of some expression of a particular node
> (in this case AlternativeSubPlan), but if that is the case then there
> are other places in the code which do operate on individual subplan/'s
> in AlternativeSubPlan list.
>
>>  On the flip side I don't see the reason for
>> removing the max_parallel_hazard_test() call for AlternativeSubPlan or
>> for SubLink.
>
> If we don't remove the current test of max_parallel_hazard_test() for
> AlternativeSubPlan, then AlternativeSubPlan node will be considered
> parallel restricted, so why do we want that after this patch.  For
> Sublinks, I think they would have converted to Subplans by the time we
> reach this function for the parallel restricted check.  Can you
> elaborate what you have in mind for the handling of AlternativeSubPlan
> and SubLink?
>

I have removed the changes for SubLink node.

>> +* CTE scans are not consider for parallelism (cf
>>
>>
>> considered
>>

Fixed.

>> +   select count(*)from tenk1 where (two, four) not in
>>
>> whitespace

Fixed.


Attached patch fixes all the comments raised till now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-01 Thread Amit Kapila
On Wed, Feb 1, 2017 at 10:04 PM, Robert Haas  wrote:
> On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila  wrote:
>> Moved this patch to next CF.
>
> So here is what seems to be the key hunk from this patch:
>
>  /*
> - * Since we don't have the ability to push subplans down to workers at
> - * present, we treat subplan references as parallel-restricted.  We need
> - * not worry about examining their contents; if they are unsafe, we would
> - * have found that out while examining the whole tree before reduction of
> - * sublinks to subplans.  (Really we should not see SubLink during a
> - * max_interesting == restricted scan, but if we do, return true.)
> + * We can push the subplans only if they don't contain any parallel-aware
> + * node as we don't support multi-level parallelism (parallel workers
> + * invoking another set of parallel workers).
>   */
> -else if (IsA(node, SubLink) ||
> - IsA(node, SubPlan) ||
> - IsA(node, AlternativeSubPlan))
> +else if (IsA(node, SubPlan))
> +return !((SubPlan *) node)->parallel_safe;
> +else if (IsA(node, AlternativeSubPlan))
>  {
> -if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
> -return true;
> +AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
> +ListCell   *lc;
> +
> +foreach(lc, asplan->subplans)
> +{
> +SubPlan*splan = (SubPlan *) lfirst(lc);
> +
> +Assert(IsA(splan, SubPlan));
> +
> +if (max_parallel_hazard_walker((Node *) splan, context))
> +return true;
> +}
> +
> +return false;
>  }
>
> I don't see the reason for having this new code that makes
> AlternativeSubPlan walk over the subplans; expression_tree_walker
> already does that.
>

It is done this way mainly to avoid recursion/nested calls, but I
think you prefer to handle it via expression_tree_walker so that code
looks clean.  Another probable reason could be that there is always a
chance that we miss handling of some expression of a particular node
(in this case AlternativeSubPlan), but if that is the case then there
are other places in the code which do operate on individual subplan/'s
in AlternativeSubPlan list.

>  On the flip side I don't see the reason for
> removing the max_parallel_hazard_test() call for AlternativeSubPlan or
> for SubLink.

If we don't remove the current test of max_parallel_hazard_test() for
AlternativeSubPlan, then AlternativeSubPlan node will be considered
parallel restricted, so why do we want that after this patch.  For
Sublinks, I think they would have converted to Subplans by the time we
reach this function for the parallel restricted check.  Can you
elaborate what you have in mind for the handling of AlternativeSubPlan
and SubLink?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-02-01 Thread Robert Haas
On Tue, Jan 31, 2017 at 6:01 AM, Amit Kapila  wrote:
> Moved this patch to next CF.

So here is what seems to be the key hunk from this patch:

 /*
- * Since we don't have the ability to push subplans down to workers at
- * present, we treat subplan references as parallel-restricted.  We need
- * not worry about examining their contents; if they are unsafe, we would
- * have found that out while examining the whole tree before reduction of
- * sublinks to subplans.  (Really we should not see SubLink during a
- * max_interesting == restricted scan, but if we do, return true.)
+ * We can push the subplans only if they don't contain any parallel-aware
+ * node as we don't support multi-level parallelism (parallel workers
+ * invoking another set of parallel workers).
  */
-else if (IsA(node, SubLink) ||
- IsA(node, SubPlan) ||
- IsA(node, AlternativeSubPlan))
+else if (IsA(node, SubPlan))
+return !((SubPlan *) node)->parallel_safe;
+else if (IsA(node, AlternativeSubPlan))
 {
-if (max_parallel_hazard_test(PROPARALLEL_RESTRICTED, context))
-return true;
+AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
+ListCell   *lc;
+
+foreach(lc, asplan->subplans)
+{
+SubPlan*splan = (SubPlan *) lfirst(lc);
+
+Assert(IsA(splan, SubPlan));
+
+if (max_parallel_hazard_walker((Node *) splan, context))
+return true;
+}
+
+return false;
 }

I don't see the reason for having this new code that makes
AlternativeSubPlan walk over the subplans; expression_tree_walker
already does that.  On the flip side I don't see the reason for
removing the max_parallel_hazard_test() call for AlternativeSubPlan or
for SubLink.  I think that the core of this change is to change the
handling for SubPlan nodes to just return the flag from the subplan,
and that seems fine, but the rest of this I'm skeptical about.

+* CTE scans are not consider for parallelism (cf


considered

+   select count(*)from tenk1 where (two, four) not in

whitespace

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-31 Thread Amit Kapila
On Tue, Jan 24, 2017 at 10:59 AM, Amit Kapila  wrote:
> On Thu, Jan 19, 2017 at 4:51 PM, Dilip Kumar  wrote:
>> On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
>>> During debugging I found that subplan created for below part of the
>>> query is parallel_unsafe, Is it a problem or there is some explanation
>>> of why it's not parallel_safe,
>>
>> Okay, so basically we don't have any mechanism to perform parallel
>> scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
>> must come along with CTE scan. So I think we can avoid setting below
>> code because we will never be able to test its side effect, another
>> argument can be that if we don't consider the final effect, and just
>> see this subplan then by logic it should be marked parallel-safe or
>> unsafe as per it's path and it will not have any side effect, as it
>> will finally become parallel-unsafe. So it's your call to keep it
>> either way.
>>
>
> Yeah, actually setting parallel_safety information for subplan from
> corresponding is okay.  However, in this particular case as we know
> that it might not be of any use till we enable parallelism for CTE
> Scan (and doing that is certainly not essential for this project).
> So, I have set parallel_safe as false for CTE subplans.
>

Moved this patch to next CF.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-24 Thread Amit Kapila
On Tue, Jan 24, 2017 at 10:59 AM, Amit Kapila  wrote:
> On Thu, Jan 19, 2017 at 4:51 PM, Dilip Kumar  wrote:
>> On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
>>> During debugging I found that subplan created for below part of the
>>> query is parallel_unsafe, Is it a problem or there is some explanation
>>> of why it's not parallel_safe,
>>
>> Okay, so basically we don't have any mechanism to perform parallel
>> scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
>> must come along with CTE scan. So I think we can avoid setting below
>> code because we will never be able to test its side effect, another
>> argument can be that if we don't consider the final effect, and just
>> see this subplan then by logic it should be marked parallel-safe or
>> unsafe as per it's path and it will not have any side effect, as it
>> will finally become parallel-unsafe. So it's your call to keep it
>> either way.
>>
>
> Yeah, actually setting parallel_safety information for subplan from
> corresponding is okay.
>

missed the word *path* in above sentence.

/corresponding/corresponding path

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-23 Thread Amit Kapila
On Thu, Jan 19, 2017 at 4:51 PM, Dilip Kumar  wrote:
> On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
>> During debugging I found that subplan created for below part of the
>> query is parallel_unsafe, Is it a problem or there is some explanation
>> of why it's not parallel_safe,
>
> Okay, so basically we don't have any mechanism to perform parallel
> scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
> must come along with CTE scan. So I think we can avoid setting below
> code because we will never be able to test its side effect, another
> argument can be that if we don't consider the final effect, and just
> see this subplan then by logic it should be marked parallel-safe or
> unsafe as per it's path and it will not have any side effect, as it
> will finally become parallel-unsafe. So it's your call to keep it
> either way.
>

Yeah, actually setting parallel_safety information for subplan from
corresponding is okay.  However, in this particular case as we know
that it might not be of any use till we enable parallelism for CTE
Scan (and doing that is certainly not essential for this project).
So, I have set parallel_safe as false for CTE subplans.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-19 Thread Kuntal Ghosh
On Thu, Jan 19, 2017 at 4:51 PM, Dilip Kumar  wrote:
> On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
>> During debugging I found that subplan created for below part of the
>> query is parallel_unsafe, Is it a problem or there is some explanation
>> of why it's not parallel_safe,
>
> Okay, so basically we don't have any mechanism to perform parallel
> scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
> must come along with CTE scan. So I think we can avoid setting below
> code because we will never be able to test its side effect, another
> argument can be that if we don't consider the final effect, and just
> see this subplan then by logic it should be marked parallel-safe or
> unsafe as per it's path and it will not have any side effect, as it
> will finally become parallel-unsafe. So it's your call to keep it
> either way.
Oops, you're right. We don't consider parallelism for RTE_CTE type.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-19 Thread Dilip Kumar
On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
> During debugging I found that subplan created for below part of the
> query is parallel_unsafe, Is it a problem or there is some explanation
> of why it's not parallel_safe,

Okay, so basically we don't have any mechanism to perform parallel
scan on CTE. And, IMHO subplan built for CTE (using SS_process_ctes)
must come along with CTE scan. So I think we can avoid setting below
code because we will never be able to test its side effect, another
argument can be that if we don't consider the final effect, and just
see this subplan then by logic it should be marked parallel-safe or
unsafe as per it's path and it will not have any side effect, as it
will finally become parallel-unsafe. So it's your call to keep it
either way.


@@ -1213,6 +1216,7 @@ SS_process_ctes(PlannerInfo *root)
>firstColCollation);
  splan->useHashTable = false;
  splan->unknownEqFalse = false;
+ splan->parallel_safe = best_path->parallel_safe;

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-19 Thread Kuntal Ghosh
On Thu, Jan 19, 2017 at 3:19 PM, Kuntal Ghosh
 wrote:
> Since Param is not parallel_safe till now, the SubPlan is also not
> parallel_safe. This is why CTE subplans will not be pushed under
> Gather.
Specifically, Params which are generated using generate_new_param()
are not parallel_safe. In the patch, it is marked as parallel-unsafe.


-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-19 Thread Kuntal Ghosh
On Thu, Jan 19, 2017 at 3:05 PM, Dilip Kumar  wrote:
> @@ -1213,6 +1216,7 @@ SS_process_ctes(PlannerInfo *root)
> >firstColCollation);
>   splan->useHashTable = false;
>   splan->unknownEqFalse = false;
> + splan->parallel_safe = best_path->parallel_safe;
>
> I noticed that if path for CTE is parallel safe then we are marking
> CTE subplan as parallel safe, In particular, I don't have any problem
> with that, but can you add some test case which can cover this path, I
> mean to say where CTE subplan are pushed.
>
> 
> I have tried to test the subplan with CTE below is my test.
> create table t1(a int , b varchar);
> create table t (n int, b varchar);
>
> Query:
> explain verbose select * from t where t.n not in (WITH RECURSIVE t(n) AS (
> VALUES (1)
>   UNION ALL
> SELECT a+1 FROM t1 WHERE a < 100
> )
> SELECT sum(n) FROM t);
>
> During debugging I found that subplan created for below part of the
> query is parallel_unsafe, Is it a problem or there is some explanation
> of why it's not parallel_safe,
>
> (WITH RECURSIVE t(n) AS (
> VALUES (1)
>   UNION ALL
> SELECT a+1 FROM t1 WHERE a < 100
> )
> SELECT sum(n) FROM t);
> --
The corresponding plan for the query you have specified is:

QUERY PLAN
--
 Seq Scan on public.t  (cost=40.73..20894.74 rows=500480 width=35)
   Output: t.n, t.b
   Filter: (NOT (hashed SubPlan 2))
   SubPlan 2
 ->  Aggregate  (cost=40.72..40.73 rows=1 width=8)
   Output: sum(t_1.n)
   CTE t
 ->  Append  (cost=0.00..31.18 rows=424 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
 Output: 1
   ->  Seq Scan on public.t1  (cost=0.00..26.93
rows=423 width=4)
 Output: (t1.a + 1)
 Filter: (t1.a < 100)
   ->  CTE Scan on t t_1  (cost=0.00..8.48 rows=424 width=4)
 Output: t_1.n
(15 rows)

Now, the plan for CTE is parallel_safe. But, a CTE plan is converted
to an InitPlan and returns a Param which is used in the CTE Scan.
Since Param is not parallel_safe till now, the SubPlan is also not
parallel_safe. This is why CTE subplans will not be pushed under
Gather.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-19 Thread Dilip Kumar
On Mon, Jan 16, 2017 at 9:13 PM, Amit Kapila  wrote:
>
> After commit-ab1f0c8, this patch needs a rebase.  Attached find
> rebased version of the patch.
>
> Thanks to Kuntal for informing me offlist that this patch needs rebase.

In this patch, I have observed some changes while creating subplan for
CTE. So I have reviewed this from that perspective and also tried to
perform some test.

Here is my finding/ comments.

@@ -1213,6 +1216,7 @@ SS_process_ctes(PlannerInfo *root)
>firstColCollation);
  splan->useHashTable = false;
  splan->unknownEqFalse = false;
+ splan->parallel_safe = best_path->parallel_safe;

I noticed that if path for CTE is parallel safe then we are marking
CTE subplan as parallel safe, In particular, I don't have any problem
with that, but can you add some test case which can cover this path, I
mean to say where CTE subplan are pushed.


I have tried to test the subplan with CTE below is my test.
create table t1(a int , b varchar);
create table t (n int, b varchar);

Query:
explain verbose select * from t where t.n not in (WITH RECURSIVE t(n) AS (
VALUES (1)
  UNION ALL
SELECT a+1 FROM t1 WHERE a < 100
)
SELECT sum(n) FROM t);

During debugging I found that subplan created for below part of the
query is parallel_unsafe, Is it a problem or there is some explanation
of why it's not parallel_safe,

(WITH RECURSIVE t(n) AS (
VALUES (1)
  UNION ALL
SELECT a+1 FROM t1 WHERE a < 100
)
SELECT sum(n) FROM t);
--


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-16 Thread Amit Kapila
On Thu, Jan 12, 2017 at 7:56 PM, Amit Kapila  wrote:
> On Thu, Jan 12, 2017 at 8:51 AM, Robert Haas  wrote:
>> On Wed, Jan 11, 2017 at 9:58 PM, Amit Kapila  wrote:
>>> The other alternative is to remember this information in SubPlan.  We
>>> can retrieve parallel_safe information from best_path and can use it
>>> while generating SubPlan.  The main reason for storing it in the plan
>>> was to avoid explicitly passing parallel_safe information while
>>> generating SubPlan as plan was already available at that time.
>>> However, it seems there are only two places in code (refer
>>> build_subplan) where this information needs to be propagated.  Let me
>>> know if you prefer to remember the parallel_safe information in
>>> SubPlan instead of in Plan or if you have something else in mind?
>>
>> I think we should try doing it in the SubPlan, at least, and see if
>> that comes out more elegant than what you have at the moment.
>>
>
> Okay, done that way.  I have fixed the review comments raised by Dilip
> as well and added the test case in attached patch.
>

After commit-ab1f0c8, this patch needs a rebase.  Attached find
rebased version of the patch.

Thanks to Kuntal for informing me offlist that this patch needs rebase.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-14 Thread Amit Kapila
On Fri, Jan 13, 2017 at 7:13 PM, Tom Lane  wrote:
> Dilip Kumar  writes:
>> ERROR:  did not find '}' at end of input node at character 762
>

I could reproduce this error with simple query like:
SELECT * FROM information_schema.role_usage_grants WHERE object_type
LIKE 'FOREIGN%' AND object_name IN ('s6', 'foo') ORDER BY 1, 2, 3, 4,
5;

The reason is that the stored rules have the old structure of Param.  See below:
{RELABELTYPE :arg {PARAM :paramkind 2 :paramid 1 :paramtype 11975
:paramtypmod -1 :paramcollid 100 :location -1}

The new variable parallel_safe is not present in above node.  If you
recreate the fresh database you won't see the above problem.  I have
observed a problem in equal function which I have fixed in the
attached patch.  I have also added a test, so that we can catch any
problem similar to what you have reported.

> I've not looked at the patches, but just seeing this error message,
> this looks like somebody's fat-fingered the correspondence between
> outfuncs.c and readfuncs.c processing.
>

I think what we need is catversion bump as Param is part of stored rules.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_correl_subplan_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-13 Thread Tom Lane
Dilip Kumar  writes:
> ERROR:  did not find '}' at end of input node at character 762

I've not looked at the patches, but just seeing this error message,
this looks like somebody's fat-fingered the correspondence between
outfuncs.c and readfuncs.c processing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-13 Thread Dilip Kumar
On Thu, Jan 12, 2017 at 7:56 PM, Amit Kapila  wrote:
> Okay, done that way.  I have fixed the review comments raised by Dilip
> as well and added the test case in attached patch.

I have tested with pq_pushdown_subplan_v2.patch +
pq_pushdown_correl_subplan_v1.patch

sqlsmith. is reporting below error, I have attached the query to
reproduce the issue. This issue is only coming when both the patch is
applied, only with pq_pushdown_subplan_v2.patch there is no problem.
So I think the problem is because of
pq_pushdown_correl_subplan_v1.patch.

ERROR:  did not find '}' at end of input node at character 762

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
select  
  subq_2.c1 as c0, 
  25 as c1
from 
  (select  
pg_catalog.random() as c0, 
31 as c1, 
ref_3.prsend as c2, 
subq_1.c2 as c3
  from 
(select  
(select page from pg_catalog.pg_locks limit 1 offset 10)
 as c0, 
9 as c1, 
sample_1.opfname as c2, 
sample_1.opfnamespace as c3
  from 
pg_catalog.pg_opfamily as sample_1 tablesample 
bernoulli (5.4) ,
lateral (select  
  57 as c0, 
  7 as c1
from 
  pg_catalog.pg_amop as sample_2 tablesample system 
(8.7) 
where 54 > (select character_maximum_length from 
information_schema.element_types limit 1 offset 28)

limit 8) as subq_0
  where (((pg_catalog.pg_postmaster_start_time() < 
pg_catalog.pg_stat_get_snapshot_timestamp()) 
and (subq_0.c0 is not NULL)) 
  and (pg_catalog.pg_backend_pid() <> 30)) 
or (subq_0.c0 >= (select active_pid from 
pg_catalog.pg_replication_slots limit 1 offset 9)
)
  limit 167) as subq_1
  inner join pg_catalog.pg_ts_parser as ref_3
  on (subq_1.c2 = ref_3.prsname )
  where subq_1.c1 <= subq_1.c0
  limit 56) as subq_2,
  lateral (select  
subq_2.c0 as c0
  from 
pg_catalog.pg_statistic as sample_3 tablesample bernoulli (2.3) 
  where (EXISTS (
  select  
  pg_catalog.pg_trigger_depth() as c0, 
  sample_4.comments as c1
from 
  information_schema.sql_sizing_profiles as sample_4 
tablesample bernoulli (9.9) 
where (pg_catalog.statement_timestamp() is not NULL) 
  and (((select character_maximum_length from 
information_schema.domains limit 1 offset 40)
 < 7) 
or ((select ordinal_position from 
information_schema.parameters limit 1 offset 21)
 = sample_4.sizing_id) 
or (sample_4.sizing_id <> 
sample_4.required_value)) 
  or (sample_4.sizing_id <= 
pg_catalog.pg_stat_get_buf_alloc())) 
and 
((pg_catalog.pg_stat_get_bgwriter_buf_written_checkpoints() <> 
pg_catalog.gin_cmp_tslexeme(
  cast(pg_catalog.timeofday() as text),
  cast(null as text))) 
  and (sample_4.profile_id is NULL))) 
  or ((select ordinal_position from 
information_schema.attributes limit 1 offset 31)
 < sample_4.sizing_id))) 
  and (sample_4.sizing_id <= sample_4.sizing_id)) 
or (sample_4.sizing_id <> 40)) 
  and ((sample_4.required_value <> sample_4.sizing_id) 
and (false))) 
or (((pg_catalog.int8range_canonical(
cast(null as int8range)) is not NULL) 
and (sample_4.required_value >= 58)) 
  and (((select objsubid from pg_catalog.pg_seclabel 
limit 1 offset 1)
 is NULL) 
and (((select typtypmod from pg_catalog.pg_type 
limit 1 offset 32)
 <= (select character_maximum_length from 
information_schema.element_types limit 1 offset 33)
) 
  or (13 <= 9)
limit 67)) 
and ((27 <= (select sourceline from pg_catalog.pg_settings 
limit 1 offset 6)
) 
  or (sample_3.staattnum <= subq_2.c1))
  limit 52) as subq_3
where true;

-- 
Sent via pgsql-hackers mailing list 

Re: [HACKERS] parallelize queries containing subplans

2017-01-12 Thread Amit Kapila
On Thu, Jan 12, 2017 at 8:51 AM, Robert Haas  wrote:
> On Wed, Jan 11, 2017 at 9:58 PM, Amit Kapila  wrote:
>> The other alternative is to remember this information in SubPlan.  We
>> can retrieve parallel_safe information from best_path and can use it
>> while generating SubPlan.  The main reason for storing it in the plan
>> was to avoid explicitly passing parallel_safe information while
>> generating SubPlan as plan was already available at that time.
>> However, it seems there are only two places in code (refer
>> build_subplan) where this information needs to be propagated.  Let me
>> know if you prefer to remember the parallel_safe information in
>> SubPlan instead of in Plan or if you have something else in mind?
>
> I think we should try doing it in the SubPlan, at least, and see if
> that comes out more elegant than what you have at the moment.
>

Okay, done that way.  I have fixed the review comments raised by Dilip
as well and added the test case in attached patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_subplan_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-11 Thread Dilip Kumar
On Thu, Jan 12, 2017 at 9:22 AM, Amit Kapila  wrote:
> Valid point, but I think we can avoid that by returning false after
> foreach(..) loop.  I think one improvement could be that instead of
> manually checking the parallel safety of each subplan, we can
> recursively call max_parallel_hazard_walker for each subplan.

I agree that this way we can avoid the problem what I mentioned.
>
>
>> But, more than that it will be cleaner to not
>> handle AlternativeSubPlan here unless there is some strong reason?
>>
>
> Yeah, the reason is to avoid unnecessary recursion.  Also, in all
> other places in the code, we do handle both of them separately, refer
> cost_qual_eval_walker for somewhat similar usage.

ok.



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-11 Thread Amit Kapila
On Tue, Jan 10, 2017 at 11:55 AM, Dilip Kumar  wrote:
> On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila  wrote:
>> Attached patch implements the above idea.  This will enable
>> parallelism for queries containing un-correlated subplans, an example
>> of which is as follows:
>
> I have reviewed the patch (pq_pushdown_subplan_v1.patch), Mostly patch
> looks clean to me.
> I have also done some basic functional testing and it's working fine.
>
> I have one comment.
>
> + else if (IsA(node, AlternativeSubPlan))
> + {
> + AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
> + ListCell   *lc;
> +
> + Assert(context->root);
> +
> + foreach(lc, asplan->subplans)
> + {
> + SubPlan*splan = (SubPlan *) lfirst(lc);
> + Plan   *plan;
> +
> + Assert(IsA(splan, SubPlan));
> +
> + plan = planner_subplan_get_plan(context->root, splan);
> + if (!plan->parallel_safe)
> + return true;
> + }
>   }
>
> I see no reason why we need to process the subplan list of
> AlternativeSubPlan here, Anyway expression_tree_walker will take care
> of processing each subplan of AlternativeSubPlan. Now in the case
> where all the subplans in AlternativeSubPlan are parallel_safe we will
> process this list twice.
>

Valid point, but I think we can avoid that by returning false after
foreach(..) loop.  I think one improvement could be that instead of
manually checking the parallel safety of each subplan, we can
recursively call max_parallel_hazard_walker for each subplan.


> But, more than that it will be cleaner to not
> handle AlternativeSubPlan here unless there is some strong reason?
>

Yeah, the reason is to avoid unnecessary recursion.  Also, in all
other places in the code, we do handle both of them separately, refer
cost_qual_eval_walker for somewhat similar usage.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-11 Thread Robert Haas
On Wed, Jan 11, 2017 at 9:58 PM, Amit Kapila  wrote:
> The other alternative is to remember this information in SubPlan.  We
> can retrieve parallel_safe information from best_path and can use it
> while generating SubPlan.  The main reason for storing it in the plan
> was to avoid explicitly passing parallel_safe information while
> generating SubPlan as plan was already available at that time.
> However, it seems there are only two places in code (refer
> build_subplan) where this information needs to be propagated.  Let me
> know if you prefer to remember the parallel_safe information in
> SubPlan instead of in Plan or if you have something else in mind?

I think we should try doing it in the SubPlan, at least, and see if
that comes out more elegant than what you have at the moment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-11 Thread Amit Kapila
On Tue, Jan 10, 2017 at 10:55 PM, Robert Haas  wrote:
> On Wed, Dec 28, 2016 at 1:17 AM, Amit Kapila  wrote:
>> Currently, queries that have references to SubPlans or
>> AlternativeSubPlans are considered parallel-restricted.  I think we
>> can lift this restriction in many cases especially when SubPlans are
>> parallel-safe.  To make this work, we need to propagate the
>> parallel-safety information from path node to plan node and the same
>> could be easily done while creating a plan.  Another option could be
>> that instead of propagating parallel-safety information from path to
>> plan, we can find out from the plan if it is parallel-safe (doesn't
>> contain any parallel-aware node) by traversing whole plan tree, but I
>> think it is a waste of cycles.  Once we have parallel-safety
>> information in the plan, we can use that for detection of
>> parallel-safe expressions in max_parallel_hazard_walker().  Finally,
>> we can pass all the subplans to workers during plan serialization in
>> ExecSerializePlan().  This will enable workers to execute subplans
>> that are referred in parallel part of the plan.  Now, we might be able
>> to optimize it such that we pass only subplans that are referred in
>> parallel portion of plan, but I am not sure if it is worth the trouble
>> because it is one-time cost and much lesser than other things we do
>> (like creating
>> dsm, launching workers).
>
> It seems unfortunate to have to add a parallel_safe flag to the
> finished plan; the whole reason we have the Path-Plan distinction is
> so that we can throw away information that won't be needed at
> execution time.  The parallel_safe flag is, in fact, not needed at
> execution time, but just for further planning.  Isn't there some way
> that we can remember, at the time when a sublink is converted to a
> subplan, whether or not the subplan was created from a parallel-safe
> path?
>

The other alternative is to remember this information in SubPlan.  We
can retrieve parallel_safe information from best_path and can use it
while generating SubPlan.  The main reason for storing it in the plan
was to avoid explicitly passing parallel_safe information while
generating SubPlan as plan was already available at that time.
However, it seems there are only two places in code (refer
build_subplan) where this information needs to be propagated.  Let me
know if you prefer to remember the parallel_safe information in
SubPlan instead of in Plan or if you have something else in mind?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-10 Thread Robert Haas
On Wed, Dec 28, 2016 at 1:17 AM, Amit Kapila  wrote:
> Currently, queries that have references to SubPlans or
> AlternativeSubPlans are considered parallel-restricted.  I think we
> can lift this restriction in many cases especially when SubPlans are
> parallel-safe.  To make this work, we need to propagate the
> parallel-safety information from path node to plan node and the same
> could be easily done while creating a plan.  Another option could be
> that instead of propagating parallel-safety information from path to
> plan, we can find out from the plan if it is parallel-safe (doesn't
> contain any parallel-aware node) by traversing whole plan tree, but I
> think it is a waste of cycles.  Once we have parallel-safety
> information in the plan, we can use that for detection of
> parallel-safe expressions in max_parallel_hazard_walker().  Finally,
> we can pass all the subplans to workers during plan serialization in
> ExecSerializePlan().  This will enable workers to execute subplans
> that are referred in parallel part of the plan.  Now, we might be able
> to optimize it such that we pass only subplans that are referred in
> parallel portion of plan, but I am not sure if it is worth the trouble
> because it is one-time cost and much lesser than other things we do
> (like creating
> dsm, launching workers).

It seems unfortunate to have to add a parallel_safe flag to the
finished plan; the whole reason we have the Path-Plan distinction is
so that we can throw away information that won't be needed at
execution time.  The parallel_safe flag is, in fact, not needed at
execution time, but just for further planning.  Isn't there some way
that we can remember, at the time when a sublink is converted to a
subplan, whether or not the subplan was created from a parallel-safe
path?  That seems like it would be cleaner than maintaining this flag
for all plans.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-09 Thread Dilip Kumar
On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila  wrote:
> Attached patch implements the above idea.  This will enable
> parallelism for queries containing un-correlated subplans, an example
> of which is as follows:

I have reviewed the patch (pq_pushdown_subplan_v1.patch), Mostly patch
looks clean to me.
I have also done some basic functional testing and it's working fine.

I have one comment.

+ else if (IsA(node, AlternativeSubPlan))
+ {
+ AlternativeSubPlan *asplan = (AlternativeSubPlan *) node;
+ ListCell   *lc;
+
+ Assert(context->root);
+
+ foreach(lc, asplan->subplans)
+ {
+ SubPlan*splan = (SubPlan *) lfirst(lc);
+ Plan   *plan;
+
+ Assert(IsA(splan, SubPlan));
+
+ plan = planner_subplan_get_plan(context->root, splan);
+ if (!plan->parallel_safe)
+ return true;
+ }
  }

I see no reason why we need to process the subplan list of
AlternativeSubPlan here, Anyway expression_tree_walker will take care
of processing each subplan of AlternativeSubPlan. Now in the case
where all the subplans in AlternativeSubPlan are parallel_safe we will
process this list twice. But, more than that it will be cleaner to not
handle AlternativeSubPlan here unless there is some strong reason?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2017-01-03 Thread Amit Kapila
On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila  wrote:
>
> Now, we can further extend this to parallelize queries containing
> correlated subplans like below:
>
> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
>  QUERY PLAN
> -
>  Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
>Filter: (SubPlan 1)
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>Filter: (i = t1.i)
> (5 rows)
>
> In the above query, Filter on t2 (i = t1.i) generates Param node which
> is a parallel-restricted node, so such queries won't be able to use
> parallelism even with the patch.  I think we can mark Params which
> refer to same level as parallel-safe and I think we have this
> information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
> when we replace correlation vars (SS_replace_correlation_vars).
>

I have implemented the above idea which will allow same or immediate
outer level PARAMS as parallel_safe.  The results of above query after
patch:

postgres=# explain select * from t1 where t1.i in (select t2.i from t2
where t2.i=t1.i);
QUERY PLAN
--
 Gather  (cost=0.00..49.88 rows=8395 width=12)
   Workers Planned: 1
   ->  Parallel Seq Scan on t1  (cost=0.00..49.88 rows=4938 width=12)
 Filter: (SubPlan 1)
 SubPlan 1
   ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
 Filter: (i = t1.i)
(7 rows)

Note - This patch can be applied on top of
pq_pushdown_subplan_v1.patch posted upthread [1]

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2Be8Z45D2n%2BrnDMDYsVEb5iW7jqaCH_tvPMYau%3D1Rru9w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_correl_subplan_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallelize queries containing subplans

2016-12-28 Thread Rafia Sabih
On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila  wrote:
>
> Currently, queries that have references to SubPlans or
> AlternativeSubPlans are considered parallel-restricted.  I think we
> can lift this restriction in many cases especially when SubPlans are
> parallel-safe.  To make this work, we need to propagate the
> parallel-safety information from path node to plan node and the same
> could be easily done while creating a plan.  Another option could be
> that instead of propagating parallel-safety information from path to
> plan, we can find out from the plan if it is parallel-safe (doesn't
> contain any parallel-aware node) by traversing whole plan tree, but I
> think it is a waste of cycles.  Once we have parallel-safety
> information in the plan, we can use that for detection of
> parallel-safe expressions in max_parallel_hazard_walker().  Finally,
> we can pass all the subplans to workers during plan serialization in
> ExecSerializePlan().  This will enable workers to execute subplans
> that are referred in parallel part of the plan.  Now, we might be able
> to optimize it such that we pass only subplans that are referred in
> parallel portion of plan, but I am not sure if it is worth the trouble
> because it is one-time cost and much lesser than other things we do
> (like creating
> dsm, launching workers).
>
> Attached patch implements the above idea.  This will enable
> parallelism for queries containing un-correlated subplans, an example
> of which is as follows:
>
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
> set min_parallel_relation_size=50;
>
> create table t1 (i int, j int, k int);
> create table t2 (i int, j int, k int);
>
> insert into t1 values (generate_series(1,10)*random(),
> generate_series(5,50)*random(),
> generate_series(8,80)*random());
> insert into t2 values (generate_series(4,10)*random(),
> generate_series(5,90)*random(),
> generate_series(2,10)*random());
>
>
> Plan without Patch
> -
> postgres=# explain select * from t1 where t1.i not in (select t2.i
> from t2 where t2.i in (1,2,3));
>   QUERY PLAN
> ---
>  Seq Scan on t1  (cost=110.84..411.72 rows=8395 width=12)
>Filter: (NOT (hashed SubPlan 1))
>SubPlan 1
>  ->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
>Filter: (i = ANY ('{1,2,3}'::integer[]))
> (5 rows)
>
> Plan with Patch
> 
> postgres=# explain select * from t1 where t1.i not in (select t2.i
> from t2 where t2.i in (1,2,3));
>QUERY PLAN
> -
>  Gather  (cost=110.84..325.30 rows=8395 width=12)
>Workers Planned: 1
>->  Parallel Seq Scan on t1  (cost=110.84..325.30 rows=4938 width=12)
>  Filter: (NOT (hashed SubPlan 1))
>  SubPlan 1
>->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
>  Filter: (i = ANY ('{1,2,3}'::integer[]))
> (7 rows)
>
> We have observed that Q-16 in TPC-H have been improved with the patch
> and the analysis of same will be shared by my colleague Rafia.
>
To study the effect of uncorrelated sub-plan pushdown on TPC-H and
TPC-DS benchmark queries we performed some experiments and the
execution time results for same are summarised as follows,

Query| HEAD | Patches | scale-factor
---+-+---+-
DS-Q45 | 35   | 19 | scale-factor: 100
H-Q16   | 812 | 645   | scale-factor: 300
H-Q16   | 49   | 37 | scale-factor: 20

Execution time given in above table is in seconds. Detailed analysis
of this experimentation is as follows,
Additional patches applied in this analysis are,
Parallel index scan [1]
Parallel index-only scan [2]
Parallel merge-join [3]
The system setup used for this experiment is,

Server parameter settings:
work_mem = 500 MB,
max_parallel_workers_per_gather = 4,
random_page_cost = seq_page_cost = 0.1 = parallel_tuple_cost,
shared_buffers = 1 GB
Machine used: IBM Power, 4 socket machine, 512 GB RAM

TPC-DS scale factor = 100 (approx size of database is 150 GB)

Query 45 which takes around 35 seconds on head, completes in 19
seconds with these patches. The point to note here is that without
this patch of pushing uncorrelated sublans, hash join which is using
subplan in join filter could not be pushed to workers and hence query
was unable to use the parallelism enough, with this patch parallelism
is available till the topmost join node. The output of explain analyse
statement of this query on both head and with patches is given in
attached file ds_q45.txt.

On further evaluating these patches on TPC-H queries on different
scale factors we came across query 16, for TPC-H scale factor 20 and
aforementioned parameter settings with the change of
max_parallel_workers_per gather = 2, it took 37 seconds with the

[HACKERS] parallelize queries containing subplans

2016-12-27 Thread Amit Kapila
Currently, queries that have references to SubPlans or
AlternativeSubPlans are considered parallel-restricted.  I think we
can lift this restriction in many cases especially when SubPlans are
parallel-safe.  To make this work, we need to propagate the
parallel-safety information from path node to plan node and the same
could be easily done while creating a plan.  Another option could be
that instead of propagating parallel-safety information from path to
plan, we can find out from the plan if it is parallel-safe (doesn't
contain any parallel-aware node) by traversing whole plan tree, but I
think it is a waste of cycles.  Once we have parallel-safety
information in the plan, we can use that for detection of
parallel-safe expressions in max_parallel_hazard_walker().  Finally,
we can pass all the subplans to workers during plan serialization in
ExecSerializePlan().  This will enable workers to execute subplans
that are referred in parallel part of the plan.  Now, we might be able
to optimize it such that we pass only subplans that are referred in
parallel portion of plan, but I am not sure if it is worth the trouble
because it is one-time cost and much lesser than other things we do
(like creating
dsm, launching workers).

Attached patch implements the above idea.  This will enable
parallelism for queries containing un-correlated subplans, an example
of which is as follows:

set parallel_tuple_cost=0;
set parallel_setup_cost=0;
set min_parallel_relation_size=50;

create table t1 (i int, j int, k int);
create table t2 (i int, j int, k int);

insert into t1 values (generate_series(1,10)*random(),
generate_series(5,50)*random(),
generate_series(8,80)*random());
insert into t2 values (generate_series(4,10)*random(),
generate_series(5,90)*random(),
generate_series(2,10)*random());


Plan without Patch
-
postgres=# explain select * from t1 where t1.i not in (select t2.i
from t2 where t2.i in (1,2,3));
  QUERY PLAN
---
 Seq Scan on t1  (cost=110.84..411.72 rows=8395 width=12)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
   Filter: (i = ANY ('{1,2,3}'::integer[]))
(5 rows)

Plan with Patch

postgres=# explain select * from t1 where t1.i not in (select t2.i
from t2 where t2.i in (1,2,3));
   QUERY PLAN
-
 Gather  (cost=110.84..325.30 rows=8395 width=12)
   Workers Planned: 1
   ->  Parallel Seq Scan on t1  (cost=110.84..325.30 rows=4938 width=12)
 Filter: (NOT (hashed SubPlan 1))
 SubPlan 1
   ->  Seq Scan on t2  (cost=0.00..104.50 rows=2537 width=4)
 Filter: (i = ANY ('{1,2,3}'::integer[]))
(7 rows)

We have observed that Q-16 in TPC-H have been improved with the patch
and the analysis of same will be shared by my colleague Rafia.

Now, we can further extend this to parallelize queries containing
correlated subplans like below:

explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
 QUERY PLAN
-
 Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
   Filter: (SubPlan 1)
   SubPlan 1
 ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
   Filter: (i = t1.i)
(5 rows)

In the above query, Filter on t2 (i = t1.i) generates Param node which
is a parallel-restricted node, so such queries won't be able to use
parallelism even with the patch.  I think we can mark Params which
refer to same level as parallel-safe and I think we have this
information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
when we replace correlation vars (SS_replace_correlation_vars).  The
reason why it is not advisable to mark Params that don't refer to same
query level as parallel-safe is that can lead to plans like below:

Foo
-> Gather
  -> Bar
   SubPlan 1 (SubPlan refers to Foo)

Now, this problem is tricky because we need to pass all such Params
each time we invoke Gather.  That also is doable with much more
effort, but I am not sure if it is worth because all of the use cases
I have seen in TPC-H (Q-2) or TPC-DS (Q-6) always uses SubPlans that
refer to same query level.

Yet, another useful enhancement in this area could be to consider both
parallel and non-parallel paths for subplans.  As of now, we consider
the cheapest/best path and form subplan from it, but it is quite
possible that instead of choosing parallel path (in case it is
cheapest) at subplan level, the non-parallel path at subplan level
could be beneficial when upper plan can use parallelism.  I think this
will be a separate project in itself if we want to do this and based
on my study of TPC-H and TPC-DS queries, I am confident that this will
be helpful in certain queries at higher scale