I've only really skimmed over this thread and only opened the code
enough to extract the following:

+ /* Multiply the costs by partition_wise_agg_cost_factor. */
+ apath->startup_cost *= partition_wise_agg_cost_factor;
+ apath->total_cost *= partition_wise_agg_cost_factor;

I've not studied how all the path plumbing is done, but I think
instead of doing this costing magic we should really stop pretending
that Append/MergeAppend nodes are cost-free. I think something like
charging cpu_tuple_cost per row expected through Append/MergeAppend
would be a better approach to this.

If you perform grouping or partial grouping before the Append, then in
most cases the Append will receive less rows, so come out cheaper than
if you perform the grouping after it. I've not learned the
partition-wise join code enough to know if this is going to affect
that too, but for everything else, there should be no plan change,
since there's normally no alternative paths. I see there's even a
comment in create_append_path() which claims the zero cost is a bit

