Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/12719#discussion_r62420393
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -617,6 +618,77 @@ object NullPropagation extends Rule[LogicalPlan] {
}
/**
+ * Propagate foldable expressions:
+ * Replace all attributes with aliases of the original foldable
expressions except the followings.
+ * 1) Command and Set(UNION/INTERSECT/EXCEPT): Do not optimize.
--- End diff --
Thank you, @cloud-fan .
* For set queries, they uses **the same AttributeReference** in the global
query and one of subqueries. It causes theoretically incorrect result for
`FoldablePropagation`. We must prevent this.
```
scala> sql("select 1 a union select 2 a").explain
== Physical Plan ==
WholeStageCodegen
: +- TungstenAggregate(key=[a#0], functions=[], output=[a#0])
: +- INPUT
+- Exchange hashpartitioning(a#0, 200), None
+- WholeStageCodegen
: +- TungstenAggregate(key=[a#0], functions=[], output=[a#0])
: +- INPUT
+- Union
:- WholeStageCodegen
: : +- Project [1 AS a#0]
: : +- INPUT
: +- Scan OneRowRelation[]
+- WholeStageCodegen
: +- Project [2 AS a#1]
: +- INPUT
+- Scan OneRowRelation[]
```
* For command queries, it seems some command querys (CTAS) raises
exceptions when they received non-AttributeReference column outputs (here,
aliased literals). Actually, I hope to investigate that as an other issue. It
may need to touch other modules.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]