Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174637789
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala
 ---
    @@ -22,54 +22,34 @@ import 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and 
[[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { 
case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    FWIW I think this is a particular example of a more general problem where 
expression simplification can break the correspondence between a select 
expression and its grouping equivalent.
    
    Here's a simpler example:
    
    `SELECT (a + b) - a FROM t GROUP BY a + b`
    
    gets me the following:
    
    `org.apache.spark.sql.AnalysisException: expression 't.`b`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.`
    
    Postgres also has this problem, at least in 9.3: `ERROR: column "t.a" must 
appear in the GROUP BY clause or be used in an aggregate function Position: 23`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to