Github user henryr commented on a diff in the pull request:
https://github.com/apache/spark/pull/20687#discussion_r174637789
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
---
@@ -22,54 +22,34 @@ import
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
import org.apache.spark.sql.catalyst.rules.Rule
/**
-* push down operations into [[CreateNamedStructLike]].
-*/
-object SimplifyCreateStructOps extends Rule[LogicalPlan] {
- override def apply(plan: LogicalPlan): LogicalPlan = {
- plan.transformExpressionsUp {
- // push down field extraction
+ * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and
[[CreateMap]] expressions.
+ */
+object SimplifyExtractValueOps extends Rule[LogicalPlan] {
+ override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
case p =>
+ p.transformExpressionsUp {
--- End diff --
FWIW I think this is a particular example of a more general problem where
expression simplification can break the correspondence between a select
expression and its grouping equivalent.
Here's a simpler example:
`SELECT (a + b) - a FROM t GROUP BY a + b`
gets me the following:
`org.apache.spark.sql.AnalysisException: expression 't.`b`' is neither
present in the group by, nor is it an aggregate function. Add to group by or
wrap in first() (or first_value) if you don't care which value you get.`
Postgres also has this problem, at least in 9.3: `ERROR: column "t.a" must
appear in the GROUP BY clause or be used in an aggregate function Position: 23`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]