Re: apply_scanjoin_target_to_paths and partitionwise join

Robert Haas Thu, 30 Oct 2025 08:53:12 -0700

On Wed, Oct 29, 2025 at 9:23 PM Arne Roland <[email protected]> wrote:
> The main factor of your example is, that the amount of rows handled by the 
> (Merge) Append is different.


Right. Although that's the main thing here, I am inclined to suspect
there are other ways to hit this problem, maybe ways that are more
likely to happen in the real world, because...

> My second sentence just captured the mundane observation, if the join has 
> significantly more tuples, than any base relation, the place of the (Merge) 
> Append might be more relevant. If I join everything with a generate_series(1, 
> 30000) I get more tuples to process.

...as you imply, joins that inflate the row count are somewhat
uncommon. They definitely do happen, but they're not the most typical
pattern, and there might well be other reasons why a partitionwise
join fails to win that we haven't figured out yet. These could even be
cases where, for example, a certain optimization that works in the
non-partitionwise case is not preserved in the partitionwise case. I
feel like I now understand *one* case where Ashutosh's patch can make
a demonstrable positive difference, but whether that's the only case
that exists seems quite uncertain.

> I'd like to make one more side note about this example: The planner punishes 
> the partitionwise join for having an extra node, that emits N rows (three 
> Hash joins + Append vs two Appends + Hash Join). This plan is chosen because 
> of the cpu_tuple_cost. I'm happy it picks the plan with the smaller memory 
> footprint, but in my real world experience for a timing based approach the 
> default cpu_tuple_cost tends to be too high to get a fair comparison between 
> partitionwise and non partitionwise joins.

Have you localized the problem to cpu_tuple_cost specifically, vs.
cpu_index_tuple_cost or cpu_operator_cost? I've generally found that I
need to reduce random_page_cost and seq_page_cost significantly to
avoid getting sequential scans when index scans would be more
reasonable, but that goes in the opposite direction as what you
suggest here, in that it brings the I/O and CPU costs closer together,
whereas your suggestion would push them further apart. I remember that
Kevin Grittner used to say that the default value of this parameter
was bad, too, but he recommended *raising* it:

https://www.postgresql.org/message-id/1385148245.49487.YahooMailNeo%40web162904.mail.bf1.yahoo.com
https://www.postgresql.org/message-id/[email protected]
https://www.postgresql.org/message-id/CACjxUsNp4uEx3xsunw4wVpBDVomas7o6hnv_49bSbaz-HAVdyA%40mail.gmail.com

I don't actually know what's best in terms of settings in this area. I
don't have experience tuning for partitionwise join specifically.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: apply_scanjoin_target_to_paths and partitionwise join

Reply via email to