Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/21049 @henryr Is there any reason to actually use an alias at the root of a plan like this (outside of composing with other plans, where this optimization would apply)? I can't think of a reason :-). Just that the API allows users do that. How about this query ? ``` SQL scala> spark.sql("with abcd as (select * from t1 order by t1.c1) select * from abcd").explain(true) 18/04/29 23:28:45 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException == Parsed Logical Plan == CTE [abcd] : +- 'SubqueryAlias abcd : +- 'Sort ['t1.c1 ASC NULLS FIRST], true : +- 'Project [*] : +- 'UnresolvedRelation `t1` +- 'Project [*] +- 'UnresolvedRelation `abcd` == Analyzed Logical Plan == c1: int, c2: int, c3: int Project [c1#7, c2#8, c3#9] +- SubqueryAlias abcd +- Sort [c1#7 ASC NULLS FIRST], true +- Project [c1#7, c2#8, c3#9] +- SubqueryAlias t1 +- HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9] == Optimized Logical Plan == HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9] == Physical Plan == HiveTableScan [c1#7, c2#8, c3#9], HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9] ``` IMHO, its probably better to correctly detect the real subqueries and apply this optimization in order to be fully sure about it. cc @gatorsmile
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org