Github user dilipbiswal commented on the issue:
https://github.com/apache/spark/pull/21049
@henryr
Is there any reason to actually use an alias at the root of a plan like
this (outside of composing with other plans, where this optimization would
apply)?
I can't think of a reason :-). Just that the API allows users do that.
How about this query ?
``` SQL
scala> spark.sql("with abcd as (select * from t1 order by t1.c1) select *
from abcd").explain(true)
18/04/29 23:28:45 WARN ObjectStore: Failed to get database global_temp,
returning NoSuchObjectException
== Parsed Logical Plan ==
CTE [abcd]
: +- 'SubqueryAlias abcd
: +- 'Sort ['t1.c1 ASC NULLS FIRST], true
: +- 'Project [*]
: +- 'UnresolvedRelation `t1`
+- 'Project [*]
+- 'UnresolvedRelation `abcd`
== Analyzed Logical Plan ==
c1: int, c2: int, c3: int
Project [c1#7, c2#8, c3#9]
+- SubqueryAlias abcd
+- Sort [c1#7 ASC NULLS FIRST], true
+- Project [c1#7, c2#8, c3#9]
+- SubqueryAlias t1
+- HiveTableRelation `default`.`t1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
== Optimized Logical Plan ==
HiveTableRelation `default`.`t1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
== Physical Plan ==
HiveTableScan [c1#7, c2#8, c3#9], HiveTableRelation `default`.`t1`,
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#7, c2#8, c3#9]
```
IMHO, its probably better to correctly detect the real subqueries and apply
this optimization in order to be fully sure about it.
cc @gatorsmile
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]