rednaxelafx commented on issue #23701: [SPARK-26741][SQL] Allow using aggregate 
expressions in ORDER BY clause
URL: https://github.com/apache/spark/pull/23701#issuecomment-459264774
 
 
   @mgaido91 Thanks!
   
   That's true, both of the examples I gave are pathological cases that are 
supposed to be rejected by the analyzer. I'm fine with addressing them in a 
separate PR.
   
   Here's the example of the other case in PG10:
   ```
   select * from ((select * from t1) union all (select * from t2)) tt order by 
max(tt.v)
   ```
   this fails in PG10 with:
   ```
   Query Error: error: column "tt.v" must appear in the GROUP BY clause or be 
used in an aggregate function
   ```
   In Spark SQL this passes analysis but fails later in codegen:
   ```
   java.lang.UnsupportedOperationException: Cannot generate code for 
expression: max(input[0, bigint, false])
     at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291)
   ```
   
   whereas this:
   ```
   select max(tt.v) from ((select * from t1) union all (select * from t2)) tt 
order by 1
   ```
   passes in both PG10 and Spark SQL. In Spark SQL this would result in 
`Aggregate` being a direct child of `Sort` so it's fine.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to