rednaxelafx commented on issue #23701: [SPARK-26741][SQL] Allow using aggregate expressions in ORDER BY clause URL: https://github.com/apache/spark/pull/23701#issuecomment-459264774 @mgaido91 Thanks! That's true, both of the examples I gave are pathological cases that are supposed to be rejected by the analyzer. I'm fine with addressing them in a separate PR. Here's the example of the other case in PG10: ``` select * from ((select * from t1) union all (select * from t2)) tt order by max(tt.v) ``` this fails in PG10 with: ``` Query Error: error: column "tt.v" must appear in the GROUP BY clause or be used in an aggregate function ``` In Spark SQL this passes analysis but fails later in codegen: ``` java.lang.UnsupportedOperationException: Cannot generate code for expression: max(input[0, bigint, false]) at org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291) ``` whereas this: ``` select max(tt.v) from ((select * from t1) union all (select * from t2)) tt order by 1 ``` passes in both PG10 and Spark SQL. In Spark SQL this would result in `Aggregate` being a direct child of `Sort` so it's fine.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
