Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

Amit Kapila Sun, 26 Feb 2017 19:04:14 -0800

On Sun, Feb 26, 2017 at 4:14 PM, Robert Haas <robertmh...@gmail.com> wrote:
> On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar <dilipbal...@gmail.com> wrote:
>>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila <amit.kapil...@gmail.com> 
>>> wrote:
>>>> Sure, but that should only happen if the function is *not* declared as
>>>> parallel safe (aka in parallel safe functions, we should not generate
>>>> parallel plans).
>>>
>>> So basically we want to put a restriction that parallel-safe function
>>> can not use the parallel query? This will work but it seems too
>>> restrictive to me. Because by marking function parallel safe we enable
>>> it to be used with the outer parallel query that is fine. But, that
>>> should not restrict the function from using the parallel query if it's
>>> used with the other outer query which is not having the parallel
>>> plan(or function is executed directly).
>>
>> I think if the user is explicitly marking a function as parallel-safe,
>> then it doesn't make much sense to allow parallel query in such
>> functions as it won't be feasible for the planner (or at least it will
>> be quite expensive) to detect the same.  By the way, if the user has
>> any such expectation from a function, then he can mark the function as
>> parallel-restricted or parallel-unsafe.
>
> However, if a query is parallel-safe, it might not end up getting run
> in parallel.  In that case, it could still benefit from parallelism
> internally.  I think we want to allow that.  For example, suppose you
> run a query like:
>
> SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;
>
> If sometab isn't very big, it's probably better to use a non-parallel
> plan for this query, because then somewhat_expensive_function() can
> still use parallelism internally, which might be better. However, if
> sometab is large enough, then it might be better to parallelize the
> whole query using a Partial/FinalizeAggregate and force each call to
> somewhat_expensive_function() to run serially.
>


Is there any easy way to find out which way is less expensive?  Even
if we find some way or just make a rule that when an outer query uses
parallelism, then force function call to run serially, how do we
achieve that?  I mean in each worker we can ensure that each
individual statements from a function can run serially (by having a
check of isparallelworker() in gather node), but having a similar
check in the master backend is tricky or maybe we don't want to care
for the same in master backend.  Do you have any suggestions on how to
make it work?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Enabling parallelism for queries coming from SQL or other PL functions

Reply via email to