[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18956 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137681045 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { protected def batches: Seq[Batch] /** + * Defines a check function which checks for structural integrity of the plan after the execution --- End diff -- `which` -> `that` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137680999 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { protected def batches: Seq[Batch] /** + * Defines a check function which checks for structural integrity of the plan after the execution + * of each rule. For example, we can check whether a plan is still resolved after each rule in + * `Optimizer`, so we can catch rules that return invalid plans. The check function will returns --- End diff -- `will returns` -> `returns` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137678987 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSICheckerSuite.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.analysis.{EmptyFunctionRegistry, UnresolvedAttribute} +import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.catalyst.expressions.{Alias, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation, Project} +import org.apache.spark.sql.catalyst.rules._ +import org.apache.spark.sql.internal.SQLConf + + +class OptimizerSICheckerkSuite extends PlanTest { --- End diff -- Ok. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137679007 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { protected def batches: Seq[Batch] /** + * Defines a check function which checks for structural integrity of the plan after the execution + * of each rule. For example, we can check whether a plan is still resolved after each rule in + * `Optimizer`, so we can catch rules that return invalid plans. The check function will returns + * `false` if the given plan doesn't pass the structural integrity check. + */ + protected def planChecker(plan: TreeType): Boolean = true --- End diff -- Looks good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137613115 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { protected def batches: Seq[Batch] /** + * Defines a check function which checks for structural integrity of the plan after the execution + * of each rule. For example, we can check whether a plan is still resolved after each rule in + * `Optimizer`, so we can catch rules that return invalid plans. The check function will returns + * `false` if the given plan doesn't pass the structural integrity check. + */ + protected def planChecker(plan: TreeType): Boolean = true --- End diff -- `planChecker ` -> `isPlanIntegral`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137611075 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -93,6 +101,13 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { """.stripMargin) } +// Run the structural integrity checker against the plan after each rule. +if (!planChecker(result)) { + val message = s"After applying rule ${rule.ruleName} in batch ${batch.name}, " + +"the structural integrity of the plan is broken." + throw new TreeNodeException(result, message, null) --- End diff -- nvm. The message also has rule and batch names. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137610235 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSICheckerSuite.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.optimizer + +import org.apache.spark.sql.catalyst.analysis.{EmptyFunctionRegistry, UnresolvedAttribute} +import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, SessionCatalog} +import org.apache.spark.sql.catalyst.dsl.plans._ +import org.apache.spark.sql.catalyst.errors.TreeNodeException +import org.apache.spark.sql.catalyst.expressions.{Alias, Literal} +import org.apache.spark.sql.catalyst.plans.PlanTest +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, OneRowRelation, Project} +import org.apache.spark.sql.catalyst.rules._ +import org.apache.spark.sql.internal.SQLConf + + +class OptimizerSICheckerkSuite extends PlanTest { --- End diff -- -> OptimizerStructuralIntegrityCheckerkSuite --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r137609405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala --- @@ -93,6 +101,13 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { """.stripMargin) } +// Run the structural integrity checker against the plan after each rule. +if (!planChecker(result)) { + val message = s"After applying rule ${rule.ruleName} in batch ${batch.name}, " + +"the structural integrity of the plan is broken." + throw new TreeNodeException(result, message, null) --- End diff -- move the exception throwing logics into the `planChecker `? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r133360995 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -37,6 +37,12 @@ import org.apache.spark.sql.types._ abstract class Optimizer(sessionCatalog: SessionCatalog) extends RuleExecutor[LogicalPlan] { + // Check for structural integrity of the plan in test mode. Currently we only check if a plan is + // still resolved after the execution of each rule. + override protected def planChecker: Option[LogicalPlan => Boolean] = Some( --- End diff -- Thanks. I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18956#discussion_r133360047 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -37,6 +37,12 @@ import org.apache.spark.sql.types._ abstract class Optimizer(sessionCatalog: SessionCatalog) extends RuleExecutor[LogicalPlan] { + // Check for structural integrity of the plan in test mode. Currently we only check if a plan is + // still resolved after the execution of each rule. + override protected def planChecker: Option[LogicalPlan => Boolean] = Some( --- End diff -- can we move the checking of whether this is a test in here, then this method simply returns boolean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/18956 [SPARK-21726][SQL] Check for structural integrity of the plan in Optimzer in test mode. ## What changes were proposed in this pull request? We have many optimization rules now in `Optimzer`. Right now we don't have any checks in the optimizer to check for the structural integrity of the plan (e.g. resolved). When debugging, it is difficult to identify which rules return invalid plans. It would be great if in test mode, we can check whether a plan is still resolved after the execution of each rule, so we can catch rules that return invalid plans. ## How was this patch tested? Added tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-21726 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18956.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18956 commit 21d86bac80790d0b994df79b5e27a7d2d354e90f Author: Liang-Chi Hsieh Date: 2017-08-16T04:53:49Z Check for structural integrity of the plan in Optimzer in test mode. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org