Hi Richard, On Thu, Apr 30, 2026 at 7:47 PM Richard Guo <[email protected]> wrote:
> On Thu, Apr 30, 2026 at 12:08 PM Richard Guo <[email protected]> > wrote: > > I was about to push the v2 patch, but I just can't shake off the > > concern Wenhui Qiu raised about the repeated subtree scan. I still > > don't have a concrete real-world case where a query has a large enough > > HAVING clause for it to matter, but let's just be paranoid. > > > > I think we can fix it easily. The current walker calls > > pull_var_clause() at every collation-aware node, which re-walks the > > subtree. The fix is to flip it inside out: walk top-down, push > > inputcollids onto a LIFO stack, and at each GROUP Var check against > > the stack. This way, we only need to walk the expression tree once. > > Attached v3 does this. > > > > v3 also fixes the RowCompareExpr case. Unlike the node types covered > > by exprInputCollation(), RowCompareExpr carries per-column > > inputcollids[] rather than a single inputcollid, so we need to descend > > into each (largs[i], rargs[i]) pair with the matching collation pushed > > onto the stack. Without this, a HAVING clause like: > > > > HAVING ROW(x, 1) < ROW('ABC' COLLATE case_sensitive, 1) > > > > over a case_insensitive group would give wrong results. > > I've committed this and back-patched it to v18. I was not > back-patching further because pre-v18 branches would need a very > different and more complex fix due to the lack of the RTE_GROUP > mechanism. I think it's too risky, and doesn't seem justified given > the absence of field reports. > It appears HAVING-to-WHERE pushdown is still wrong with CASE and nondeterministic collations. The shorthand CASE expression bypasses the new collation-conflict detector, so the HAVING clause gets pushed to WHERE, filtering rows before grouping and silently changing aggregate results. Repro: CREATE COLLATION ci (provider=icu, locale='und-u-ks-level2', deterministic=false); CREATE COLLATION cs (provider=icu, locale='und', deterministic=true); CREATE TABLE t (x text COLLATE ci); INSERT INTO t VALUES ('abc'),('ABC'),('def'),('DEF'),('xyz'); CREATE COLLATION CREATE COLLATION CREATE TABLE INSERT 0 5 -- This works correctly as fixed in the patch srcdb=# EXPLAIN (COSTS OFF) SELECT x, count(*) FROM t GROUP BY x HAVING x = 'abc' COLLATE cs; QUERY PLAN ---------------------------------------- HashAggregate Group Key: x Filter: (x = 'abc'::text COLLATE cs) -> Seq Scan on t (4 rows) srcdb=# SELECT x, count(*) FROM t GROUP BY x HAVING x = 'abc' COLLATE cs; x | count -----+------- abc | 2 (1 row) -- CASE from incorrectly pushed to WHERE EXPLAIN (COSTS OFF) SELECT x, count(*) FROM t GROUP BY x HAVING CASE x WHEN 'abc' COLLATE cs THEN true ELSE false END; QUERY PLAN ----------------------------------------------------------------------------- HashAggregate Group Key: x -> Seq Scan on t Filter: CASE x WHEN 'abc'::text COLLATE cs THEN true ELSE false END (4 rows) SELECT x, count(*) FROM t GROUP BY x HAVING CASE x WHEN 'abc' COLLATE cs THEN true ELSE false END; x | count -----+------- abc | 1 (1 row) Under the case-insensitive GROUP BY collation 'ci', 'abc' and 'ABC' belong to the same group with count=2. The case-sensitive filter must run after grouping, not before. But when hidden inside CASE, it runs as a Seq Scan filter and eliminates 'ABC' before it can be counted. having_collation_conflict_walker() walks the HAVING clause top-down, maintaining a stack of ancestor inputcollids. When it reaches a GROUP Var with a nondeterministic varcollid, it reports a conflict if any ancestor pushed a different collation. The ancestor collations are gathered via exprInputCollation(), which doesn't cover CaseExpr. My understanding is shallow here, attached a draft patch that adds a CaseExpr branch to having_collation_conflict_walker() mirroring the existing RowCompareExpr special case. Patch includes the tests. Please take a look. Thanks, Satya
0001-fix-having-case-collation.patch
Description: Binary data
