[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18956


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137681045
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
--- End diff --

`which` -> `that`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137680999
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
+   * of each rule. For example, we can check whether a plan is still 
resolved after each rule in
+   * `Optimizer`, so we can catch rules that return invalid plans. The 
check function will returns
--- End diff --

`will returns` -> `returns`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137678987
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSICheckerSuite.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.analysis.{EmptyFunctionRegistry, 
UnresolvedAttribute}
+import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.errors.TreeNodeException
+import org.apache.spark.sql.catalyst.expressions.{Alias, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
OneRowRelation, Project}
+import org.apache.spark.sql.catalyst.rules._
+import org.apache.spark.sql.internal.SQLConf
+
+
+class OptimizerSICheckerkSuite extends PlanTest {
--- End diff --

Ok.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137679007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
+   * of each rule. For example, we can check whether a plan is still 
resolved after each rule in
+   * `Optimizer`, so we can catch rules that return invalid plans. The 
check function will returns
+   * `false` if the given plan doesn't pass the structural integrity check.
+   */
+  protected def planChecker(plan: TreeType): Boolean = true
--- End diff --

Looks good.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137613115
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -64,6 +64,14 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
   protected def batches: Seq[Batch]
 
   /**
+   * Defines a check function which checks for structural integrity of the 
plan after the execution
+   * of each rule. For example, we can check whether a plan is still 
resolved after each rule in
+   * `Optimizer`, so we can catch rules that return invalid plans. The 
check function will returns
+   * `false` if the given plan doesn't pass the structural integrity check.
+   */
+  protected def planChecker(plan: TreeType): Boolean = true
--- End diff --

`planChecker ` -> `isPlanIntegral`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137611075
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -93,6 +101,13 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 """.stripMargin)
 }
 
+// Run the structural integrity checker against the plan after 
each rule.
+if (!planChecker(result)) {
+  val message = s"After applying rule ${rule.ruleName} in 
batch ${batch.name}, " +
+"the structural integrity of the plan is broken."
+  throw new TreeNodeException(result, message, null)
--- End diff --

nvm. The message also has rule and batch names.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137610235
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/OptimizerSICheckerSuite.scala
 ---
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.optimizer
+
+import org.apache.spark.sql.catalyst.analysis.{EmptyFunctionRegistry, 
UnresolvedAttribute}
+import org.apache.spark.sql.catalyst.catalog.{InMemoryCatalog, 
SessionCatalog}
+import org.apache.spark.sql.catalyst.dsl.plans._
+import org.apache.spark.sql.catalyst.errors.TreeNodeException
+import org.apache.spark.sql.catalyst.expressions.{Alias, Literal}
+import org.apache.spark.sql.catalyst.plans.PlanTest
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, 
OneRowRelation, Project}
+import org.apache.spark.sql.catalyst.rules._
+import org.apache.spark.sql.internal.SQLConf
+
+
+class OptimizerSICheckerkSuite extends PlanTest {
--- End diff --

-> OptimizerStructuralIntegrityCheckerkSuite


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-09-07 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r137609405
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala
 ---
@@ -93,6 +101,13 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] 
extends Logging {
 """.stripMargin)
 }
 
+// Run the structural integrity checker against the plan after 
each rule.
+if (!planChecker(result)) {
+  val message = s"After applying rule ${rule.ruleName} in 
batch ${batch.name}, " +
+"the structural integrity of the plan is broken."
+  throw new TreeNodeException(result, message, null)
--- End diff --

move the exception throwing logics into the `planChecker `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-08-15 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r133360995
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -37,6 +37,12 @@ import org.apache.spark.sql.types._
 abstract class Optimizer(sessionCatalog: SessionCatalog)
   extends RuleExecutor[LogicalPlan] {
 
+  // Check for structural integrity of the plan in test mode. Currently we 
only check if a plan is
+  // still resolved after the execution of each rule.
+  override protected def planChecker: Option[LogicalPlan => Boolean] = 
Some(
--- End diff --

Thanks. I will update it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-08-15 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18956#discussion_r133360047
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -37,6 +37,12 @@ import org.apache.spark.sql.types._
 abstract class Optimizer(sessionCatalog: SessionCatalog)
   extends RuleExecutor[LogicalPlan] {
 
+  // Check for structural integrity of the plan in test mode. Currently we 
only check if a plan is
+  // still resolved after the execution of each rule.
+  override protected def planChecker: Option[LogicalPlan => Boolean] = 
Some(
--- End diff --

can we move the checking of whether this is a test in here, then this 
method simply returns boolean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18956: [SPARK-21726][SQL] Check for structural integrity...

2017-08-15 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/18956

[SPARK-21726][SQL] Check for structural integrity of the plan in Optimzer 
in test mode.

## What changes were proposed in this pull request?

We have many optimization rules now in `Optimzer`. Right now we don't have 
any checks in the optimizer to check for the structural integrity of the plan 
(e.g. resolved). When debugging, it is difficult to identify which rules return 
invalid plans.

It would be great if in test mode, we can check whether a plan is still 
resolved after the execution of each rule, so we can catch rules that return 
invalid plans.

## How was this patch tested?

Added tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-21726

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18956


commit 21d86bac80790d0b994df79b5e27a7d2d354e90f
Author: Liang-Chi Hsieh 
Date:   2017-08-16T04:53:49Z

Check for structural integrity of the plan in Optimzer in test mode.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org