On Tue, Jul 11, 2017 at 9:02 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:
> If I have a slow function which is evaluated in a simple seq scan, I do not
> get parallel execution, even though it would be massively useful.  Unless
> force_parallel_mode=ON, then I get a dummy parallel plan with one worker.
> explain select aid,slow(abalance) from pgbench_accounts;

After analysing this, I see multiple reasons of this getting not selected

1. The query is selecting all the tuple and the benefit what we are
getting by parallelism is by dividing cpu_tuple_cost which is 0.01 but
for each tuple sent from worker to gather there is parallel_tuple_cost
which is 0.1 for each tuple.  (which will be very less in case of
aggregate).   Maybe you can try some selecting with some condition.

like below:
postgres=# explain select slow(abalance) from pgbench_accounts where
abalance > 1;
                                    QUERY PLAN
 Gather  (cost=0.00..46602.33 rows=1 width=4)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..46602.33
rows=1 width=4)
         Filter: (abalance > 1)

2. The second problem I am seeing is that (maybe the code problem),
the "slow" function is very costly (10000000) and in
apply_projection_to_path we account for this cost.  But, I have
noticed that for gather node also we are adding this cost to all the
rows but actually, if the lower node is already doing the projection
then gather node just need to send out the tuple instead of actually
applying the projection.

In below function, we always multiply the target->cost.per_tuple with
path->rows, but in case of gather it should multiply this with


path->startup_cost += target->cost.startup - oldcost.startup;
path->total_cost += target->cost.startup - oldcost.startup +
(target->cost.per_tuple - oldcost.per_tuple) * path->rows;

So because of this high projection cost the seqpath and parallel path
both have fuzzily same cost but seqpath is winning because it's
parallel safe.

>  RETURNS integer
>  LANGUAGE plperl
> AS $function$
>   my $thing=$_[0];
>   foreach (1..1_000_000) {
>     $thing = sqrt($thing);
>     $thing *= $thing;
>   };
>   return ($thing+0);
> $function$;
> The partial path is getting added to the list of paths, it is just not
> getting chosen, even if parallel_*_cost are set to zero.  Why not?
> If I do an aggregate, then it does use parallel workers:
> explain select sum(slow(abalance)) from pgbench_accounts;
> It doesn't use as many as I would like, because there is a limit based on
> the logarithm of the table size (I'm using -s 10 and get 3 parallel
> processes) , but at least I know how to start looking into that.
> Also, how do you debug stuff like this?  Are there some gdb tricks to make
> this easier to introspect into the plans?
> Cheers,
> Jeff

Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to