Re: [HACKERS] [DESIGN] ParallelAppend

Amit Kapila Wed, 18 Nov 2015 04:26:57 -0800

On Sat, Nov 14, 2015 at 3:39 AM, Robert Haas <[email protected]> wrote:
>
> On Thu, Nov 12, 2015 at 12:09 AM, Kouhei Kaigai <[email protected]>
wrote:
> > I'm now designing the parallel feature of Append...
> >
> > Here is one challenge. How do we determine whether each sub-plan
> > allows execution in the background worker context?
>
> I've been thinking about these questions for a bit now, and I think we
> can work on improving Append in multiple phases.  The attached patch
> shows what I have in mind for phase 1.
>


Couple of comments and questions regarding this patch:

1.
+/*
+ * add_partial_path
..
+ *  produce the same number of rows.  Neither do we need to consider
startup
+ *  costs: parallelism
is only used for plans that will be run to completion.

A.
Don't we need the startup cost incase we need to build partial paths for
joinpaths like mergepath?
Also, I think there are other cases for single relation scan where startup
cost can matter like when there are psuedoconstants in qualification
(refer cost_qual_eval_walker()) or let us say if someone has disabled
seq scan (disable_cost is considered as startup cost.)

B. I think partial path is an important concept and desrves some
explanation in src/backend/optimizer/README.
There is already a good explanation about Paths, so I think it
seems that it is better to add explanation about partial paths.

2.
+ *  costs: parallelism is only used for plans that will be run to
completion.
+ *    Therefore, this
routine is much simpler than add_path: it needs to
+ *    consider only pathkeys and total cost.

There seems to be some spacing issue in last two lines.

3.
+static void
+create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)
+{
+ int parallel_threshold = 1000;
+ int parallel_degree = 1;
+
+ /*
+ * If this relation is too small to be worth a parallel scan, just return
+ * without doing anything ... unless it's an inheritance child.  In that
case,
+ * we want to generate a parallel path here anyway.  It might not be
worthwhile
+ * just for this relation, but when combined with all of its inheritance
siblings
+ * it may well pay off.
+ */
+ if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)
+ return;

A.
This means that for inheritance child relations for which rel pages are
less than parallel_threshold, it will always consider the cost shared
between 1 worker and leader as per below calc in cost_seqscan:
if (path->parallel_degree > 0)
run_cost = run_cost / (path->parallel_degree + 0.5);

I think this might not be the appropriate cost model for even for
non-inheritence relations which has pages more than parallel_threshold,
but it seems to be even worst for inheritance children which have
pages less than parallel_threshold

B.
Will it be possible that if none of the inheritence child rels (or very few
of them) are big enough for parallel scan, then considering Append
node for parallelism of any use or in other words, won't it be better
to generate plan as it is done now without this patch for such cases
considering current execution model of Gather node?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [DESIGN] ParallelAppend

Reply via email to