Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21049 @henryr Since SubqueryAlias is used as a correlation name and used mostly for resolving attributes, in my understanding its not safe to apply this optimization. I will borrow @gatorsmile 's example here. Please note that the alias is specified after the sort. Below is plan after this optimization that removes sorts under SubqueryAlias->child. ```SQL scala> Seq((1, 2, "1"), (3, 4, "3")).toDF("int", "int2", "str_sort").orderBy('int.asc).as('df1) res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [int: int, int2: int ... 1 more field] scala> res0.explain(true) == Parsed Logical Plan == SubqueryAlias df1 +- AnalysisBarrier +- Sort [int#7 ASC NULLS FIRST], true +- Project [_1#3 AS int#7, _2#4 AS int2#8, _3#5 AS str_sort#9] +- LocalRelation [_1#3, _2#4, _3#5] == Analyzed Logical Plan == int: int, int2: int, str_sort: string SubqueryAlias df1 +- Sort [int#7 ASC NULLS FIRST], true +- Project [_1#3 AS int#7, _2#4 AS int2#8, _3#5 AS str_sort#9] +- LocalRelation [_1#3, _2#4, _3#5] == Optimized Logical Plan == LocalRelation [int#7, int2#8, str_sort#9] == Physical Plan == LocalTableScan [int#7, int2#8, str_sort#9] ``` In this case we should not be removing the top level sort from user's query right ? cc @gatorsmile for his opinion.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org