Hi, > On Jun 22, 2026, at 10:59, Richard Guo <[email protected]> wrote: > > On Mon, Jun 22, 2026 at 9:39 AM David Rowley <[email protected]> wrote: >> I believe this should make the following code redundant, so shouldn't >> the patch remove it too? >> >> /* >> * Estimate the number of UNION output rows. In the case when only a >> * single UNION child remains, we can use estimate_num_groups() on >> * that child. We must be careful not to do this when that child is >> * the result of some other set operation as the targetlist will >> * contain Vars with varno==0, which estimate_num_groups() wouldn't >> * like. >> */ >> if (list_length(cheapest.subpaths) == 1 && >> first_path->parent->reloptkind != RELOPT_UPPER_REL) >> { >> dNumGroups = estimate_num_groups(root, >> first_path->pathtarget->exprs, >> first_path->rows, >> NULL, >> NULL); >> } >> >> Then you may as well pass dNumChildGroups directly to the path >> creation functions and get rid of your new "With multiple children," >> comment. >> >> Aside from that, I don't see any issues. > > Thanks for looking. You're right. build_setop_child_paths() already > computes each child's distinct estimate, so for a single surviving > child dNumChildGroups is exactly what that branch recomputed. > > (And removing it can be a slight improvement, as the old branch ran > estimate_num_groups on the subquery-scan Vars, while > build_setop_child_paths uses the child's own rowcount when it has > GROUP BY/DISTINCT/aggs.) > > Patch updated. > > - Richard > <v2-0001-Improve-UNION-s-output-row-count-estimate.patch>
Thanks for working on this. I agree this is a real problem, and estimating UNION’s row count from the per-child distinct estimates looks like the right direction to me. I reviewed the v2 patch and it looks good. I also ran the regression tests locally on my Apple Silicon machine, and they all passed. The added regression test looks reasonable to me, since it checks the plan change that motivated the patch. One small question: would it be worth adding a direct row-estimate check, perhaps with a helper like planner_est.sql’s explain_mask_costs() so that rows= stays visible while cost/width are masked? The existing plan-shape test may already be sufficient, though. -- Best regards, Chengpeng Yan
