Support grouping-expression references and GROUPING() in subqueries.

Until now, substitute_grouped_columns and its predecessor
check_ungrouped_columns intentionally did not cope with references
to GROUP BY expressions (anything more complex than a Var) within
subqueries of the query having GROUP BY.  Because they didn't try to
match subexpressions of subqueries to the GROUP BY list, they'd drill
down to raw Vars of the grouping level and then fail with "subquery
uses ungrouped column from outer query".  There have been remarkably
few complaints about this deficiency, so nobody ever did anything
about it.

The reason for not wanting to deal with it is that within a subquery,
Vars will have varlevelsup different from zero and will thus not be
equal() to the expressions seen in the outer query.  We recognized
this at least as far back as 96ca8ffeb, although I think the comment
I added about it then was just documenting a pre-existing deficiency.
It looks like at the time, the solutions I considered were
(1) write a version of equal() that permits an offset in varlevelsup,
or (2) dynamically apply IncrementVarSublevelsUp at each
subexpression.  (1) would require an amount of new code that seems
rather out of proportion to the benefit, while (2) would add an
exponential amount of cost to the matching process.  But rethinking
it now, what seems attractive is (3) apply IncrementVarSublevelsUp to
the groupingClause list not the subexpressions, and do so only once
per subquery depth level.  Then we can still use plain equal() to
check for matches, and we're not incurring cost proportional to some
power of the subquery's complexity.

This patch continues to use the old logic when the GROUP BY list is
all Vars.  We could discard the special comparison logic for that and
always do it the more general way, but that would be a good deal
slower.  (Micro-benchmarking just parse analysis suggests it's about
50% slower than the Vars-only path.  But we've not heard complaints
about the speed of matching within the main query, so I doubt that
applying the same matching logic within subqueries will be a problem.)
The lack of complaints suggests strongly that this is a very minority
use-case, so I don't want to make the typical case slower to fix it.

While testing that, I was surprised to discover a nearby bug:
GROUPING() within a subquery fails to match GROUP BY Vars that are
join alias Vars.  It tries to apply flatten_join_alias_vars to make
such cases work, but that fails to work inside a subquery because
varlevelsup is wrong.  Therefore, this patch invents a new entry point
flatten_join_alias_for_parser() that allows specification of a
sublevels_up offset.  (It seems cleaner to give the parser its own
entry point rather than abuse the planner's conventions even further.)

While this is pretty clearly a bug fix, I'm hesitant to take the risk
of back-patching, seeing that the existing behavior has stood for so
long with so few complaints.  Maybe we can reconsider once this patch
has baked awhile in master.

Reported-by: PALAYRET Jacques <[email protected]>
Author: Tom Lane <[email protected]>
Discussion: https://postgr.es/m/[email protected]

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/415100aa62bff1402672eec4b68deef2cd436126

Modified Files
--------------
src/backend/optimizer/util/var.c           |  46 ++++++++++---
src/backend/parser/parse_agg.c             | 103 +++++++++++++++++++++--------
src/include/optimizer/optimizer.h          |   2 +
src/test/regress/expected/aggregates.out   |  20 ++++++
src/test/regress/expected/groupingsets.out |  37 +++++++++++
src/test/regress/sql/aggregates.sql        |  10 +++
src/test/regress/sql/groupingsets.sql      |  13 ++++
7 files changed, 196 insertions(+), 35 deletions(-)

Reply via email to