[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-27 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21822


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r205674265
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

we still need to transform resolved plan here to resolve self-join. Image
```
val df = ...
df.as("a").join(df.as("b"), ...)
```
We need to look into the resolved plan to replace the conflicted attributes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204957474
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -751,7 +751,8 @@ object TypeCoercion {
*/
   case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule {
 
-override protected def coerceTypes(plan: LogicalPlan): LogicalPlan = 
plan transform { case p =>
+override protected def coerceTypes(
+  plan: LogicalPlan): LogicalPlan = plan resolveOperatorsDown { case p 
=>
--- End diff --

im using a weird wrapping here to minimize the diff.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-24 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204955869
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

@cloud-fan  ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204190535
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -231,10 +231,11 @@ class Analyzer(
* Substitute child plan with WindowSpecDefinitions.
*/
   object WindowsSubstitution extends Rule[LogicalPlan] {
-def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators {
   // Lookup WindowSpecDefinitions. This rule works with unresolved 
children.
   case WithWindowDefinition(windowDefinitions, child) =>
-child.transform {
+// TODO(rxin): Check with Herman whether the next line is OK.
--- End diff --

It is good. The earlier `resolveOperators` makes sure we don't overwrite a 
window spec, with a similarly named one defined higher up the tree. BTW I don't 
think we have a test that covers this (it is pretty rare).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204169456
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

@hvanhovell Yeah. I agree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204167870
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

Well tests fail without it, so we don't really have a choice here. For a 
second I thought we could also create some utils class, but that would just 
mean moving the code in BooleanSimplification in there just for esthetics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204166360
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

@hvanhovell Hi Herman, as you said, we do the actual pulling up of the 
predicates in the optimizer in PullupCorrelatedPredicates in subquery.scala. We 
are also doing a BooleanSimplication first before traversing the plan there. In 
here, we are doing the error reporting and i thought it would be better to keep 
the traversal the same way. Basically previously we did the error reporting and 
rewriting in Analyzer and now, we do the error reporting in checkAnalysis and 
rewriting in Optimizer. Just to refresh your memory so you can help to take the 
right call here :-)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163484
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -33,6 +49,116 @@ abstract class LogicalPlan
   with QueryPlanConstraints
   with Logging {
 
+  private var _analyzed: Boolean = false
+
+  /**
+   * Marks this plan as already analyzed. This should only be called by 
[[CheckAnalysis]].
+   */
+  private[catalyst] def setAnalyzed(): Unit = { _analyzed = true }
+
+  /**
+   * Returns true if this node and its children have already been gone 
through analysis and
+   * verification.  Note that this is only an optimization used to avoid 
analyzing trees that
+   * have already been analyzed, and can be reset by transformations.
+   */
+  def analyzed: Boolean = _analyzed
+
+  /**
+   * Returns a copy of this node where `rule` has been recursively applied 
first to all of its
+   * children and then itself (post-order, bottom-up). When `rule` does 
not apply to a given node,
+   * it is left unchanged.  This function is similar to `transformUp`, but 
skips sub-trees that
+   * have already been marked as analyzed.
+   *
+   * @param rule the function use to transform this nodes children
+   */
+  def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): 
LogicalPlan = {
--- End diff --

todo: add unit tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163551
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

Yeah, I added boolean simplification here. I didn't quite like it back 
then, and I still don't like it. I was hoping this was happening in the 
`Optimizer` now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163424
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 ---
@@ -23,8 +23,24 @@ import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.QueryPlan
 import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats
-import org.apache.spark.sql.catalyst.trees.CurrentOrigin
+import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode}
 import org.apache.spark.sql.types.StructType
+import org.apache.spark.util.Utils
+
+
+object LogicalPlan {
+
+  private val resolveOperatorDepth = new ThreadLocal[Int] {
--- End diff --

todo: explain what this is


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204163328
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -2390,16 +2375,21 @@ class Analyzer(
  * scoping information for attributes and can be removed once analysis is 
complete.
  */
 object EliminateSubqueryAliases extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case SubqueryAlias(_, child) => child
+  // This is actually called in the beginning of the optimization phase, 
and as a result
+  // is using transformUp rather than resolveOperators. This is also often 
called in the
+  //
--- End diff --

note: finish comment


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204160853
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

Thanks. I'm going to add it back.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-20 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r204160150
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -787,6 +782,7 @@ class Analyzer(
   right
 case Some((oldRelation, newRelation)) =>
   val attributeRewrites = 
AttributeMap(oldRelation.output.zip(newRelation.output))
+  // TODO(rxin): Why do we need transformUp here?
--- End diff --

cc @cloud-fan why do we need transformUp here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r203924609
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

@rxin From what i remember Reynold, most of this logic was housed in 
Analyzer before and we moved it to optimizer. In the old code we used to walk 
the plan after simplifying the predicates. The comment used to read "Simplify 
the predicates before pulling them out.". I just retained that semantics.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP

2018-07-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21822#discussion_r203918981
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 ---
@@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper {
 
 // Simplify the predicates before validating any unsupported 
correlation patterns
 // in the plan.
-BooleanSimplification(sub).foreachUp {
+// TODO(rxin): Why did this need to call BooleanSimplification???
--- End diff --

cc @dilipbiswal @cloud-fan @gatorsmile 

Why did we need BooleanSimplification here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org