[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204955869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- @cloud-fan ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204190535 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -231,10 +231,11 @@ class Analyzer( * Substitute child plan with WindowSpecDefinitions. */ object WindowsSubstitution extends Rule[LogicalPlan] { -def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp { +def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperators { // Lookup WindowSpecDefinitions. This rule works with unresolved children. case WithWindowDefinition(windowDefinitions, child) => -child.transform { +// TODO(rxin): Check with Herman whether the next line is OK. --- End diff -- It is good. The earlier `resolveOperators` makes sure we don't overwrite a window spec, with a similarly named one defined higher up the tree. BTW I don't think we have a test that covers this (it is pretty rare). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204169456 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- @hvanhovell Yeah. I agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204167870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- Well tests fail without it, so we don't really have a choice here. For a second I thought we could also create some utils class, but that would just mean moving the code in BooleanSimplification in there just for esthetics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204166360 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- @hvanhovell Hi Herman, as you said, we do the actual pulling up of the predicates in the optimizer in PullupCorrelatedPredicates in subquery.scala. We are also doing a BooleanSimplication first before traversing the plan there. In here, we are doing the error reporting and i thought it would be better to keep the traversal the same way. Basically previously we did the error reporting and rewriting in Analyzer and now, we do the error reporting in checkAnalysis and rewriting in Optimizer. Just to refresh your memory so you can help to take the right call here :-) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163484 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -33,6 +49,116 @@ abstract class LogicalPlan with QueryPlanConstraints with Logging { + private var _analyzed: Boolean = false + + /** + * Marks this plan as already analyzed. This should only be called by [[CheckAnalysis]]. + */ + private[catalyst] def setAnalyzed(): Unit = { _analyzed = true } + + /** + * Returns true if this node and its children have already been gone through analysis and + * verification. Note that this is only an optimization used to avoid analyzing trees that + * have already been analyzed, and can be reset by transformations. + */ + def analyzed: Boolean = _analyzed + + /** + * Returns a copy of this node where `rule` has been recursively applied first to all of its + * children and then itself (post-order, bottom-up). When `rule` does not apply to a given node, + * it is left unchanged. This function is similar to `transformUp`, but skips sub-trees that + * have already been marked as analyzed. + * + * @param rule the function use to transform this nodes children + */ + def resolveOperators(rule: PartialFunction[LogicalPlan, LogicalPlan]): LogicalPlan = { --- End diff -- todo: add unit tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163551 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- Yeah, I added boolean simplification here. I didn't quite like it back then, and I still don't like it. I was hoping this was happening in the `Optimizer` now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163424 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -23,8 +23,24 @@ import org.apache.spark.sql.catalyst.analysis._ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.QueryPlan import org.apache.spark.sql.catalyst.plans.logical.statsEstimation.LogicalPlanStats -import org.apache.spark.sql.catalyst.trees.CurrentOrigin +import org.apache.spark.sql.catalyst.trees.{CurrentOrigin, TreeNode} import org.apache.spark.sql.types.StructType +import org.apache.spark.util.Utils + + +object LogicalPlan { + + private val resolveOperatorDepth = new ThreadLocal[Int] { --- End diff -- todo: explain what this is --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204163328 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -2390,16 +2375,21 @@ class Analyzer( * scoping information for attributes and can be removed once analysis is complete. */ object EliminateSubqueryAliases extends Rule[LogicalPlan] { - def apply(plan: LogicalPlan): LogicalPlan = plan transformUp { -case SubqueryAlias(_, child) => child + // This is actually called in the beginning of the optimization phase, and as a result + // is using transformUp rather than resolveOperators. This is also often called in the + // --- End diff -- note: finish comment --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160853 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- Thanks. I'm going to add it back. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r204160150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -787,6 +782,7 @@ class Analyzer( right case Some((oldRelation, newRelation)) => val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) + // TODO(rxin): Why do we need transformUp here? --- End diff -- cc @cloud-fan why do we need transformUp here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r203924609 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- @rxin From what i remember Reynold, most of this logic was housed in Analyzer before and we moved it to optimizer. In the old code we used to walk the plan after simplifying the predicates. The comment used to read "Simplify the predicates before pulling them out.". I just retained that semantics. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21822: [SPARK-24865] Remove AnalysisBarrier - WIP
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/21822#discussion_r203918981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -533,7 +537,8 @@ trait CheckAnalysis extends PredicateHelper { // Simplify the predicates before validating any unsupported correlation patterns // in the plan. -BooleanSimplification(sub).foreachUp { +// TODO(rxin): Why did this need to call BooleanSimplification??? --- End diff -- cc @dilipbiswal @cloud-fan @gatorsmile Why did we need BooleanSimplification here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org