maropu commented on a change in pull request #31913:
URL: https://github.com/apache/spark/pull/31913#discussion_r598368983
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -870,8 +870,19 @@ object CollapseProject extends Rule[LogicalPlan] with
AliasHelper {
if (haveCommonNonDeterministicOutput(p.projectList,
agg.aggregateExpressions)) {
p
} else {
- agg.copy(aggregateExpressions = buildCleanedProjectList(
- p.projectList, agg.aggregateExpressions))
+ val complexGroupingExpressions =
+ ExpressionSet(agg.groupingExpressions.filter(_.children.nonEmpty))
+
+ def wrapGroupingExpression(e: Expression): Expression = e match {
+ case _: AggregateExpression => e
+ case _ if complexGroupingExpressions.contains(e) =>
GroupingExpression(e)
+ case _ => e.mapChildren(wrapGroupingExpression)
+ }
+
+ val wrappedAggregateExpressions =
+
agg.aggregateExpressions.map(wrapGroupingExpression(_).asInstanceOf[NamedExpression])
+ agg.copy(aggregateExpressions =
+ buildCleanedProjectList(p.projectList, wrappedAggregateExpressions))
Review comment:
I think this issue is not related to `CollapseProject`. For example, we
can reproduce it like this;
```
$ ./bin/spark-shell --conf
spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.CollapseProject
Seq(Some(1), None).toDF("id").createOrReplaceTempView("t")
val df = sql("""
SELECT NOT(t.id IS NULL) AS X, count(*) AS c
FROM t
GROUP BY t.id IS NULL
""")
df.show()
```
The query fails because `BooleanSimplification` applys illegal expr
transformation to break the group-by constraint (that is, `group-by columns
must appear in aggregate exprs`). In the ohter DBMS-like systems (e.g.,
PostgreSQL), the transformed query fails like this;
```
postgres=# SELECT t.id IS NOT NULL AS X, count(*) AS c FROM t GROUP BY t.id
IS NULL;
ERROR: column "t.id" must appear in the GROUP BY clause or be used in an
aggregate function
LINE 1: SELECT t.id IS NOT NULL AS X, count(*) AS c FROM t GROUP BY ...
```
I'm currently not sure that this issue can happen only in
`BooleanSimplification` though, I think we need a general solution to fix this
kind of the illegal transformation instead of band-aid fixes.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]