[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-20 Thread sameeragarwal
Github user sameeragarwal commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r64112716
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -348,6 +348,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val CARTESIAN_PRODUCT_ENABLED = 
SQLConfigBuilder("spark.sql.join.cartesian.enabled")
+.doc("When false, we will throw an error if a query contains a 
cartesian product")
+.booleanConf
+.createWithDefault(false)
+
   val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal")
 .doc("When true, the ordinal numbers are treated as the position in 
the select list. " +
  "When false, the ordinal numbers in order/sort By clause are 
ignored.")
--- End diff --

yes, for sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63990349
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -348,6 +348,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val CARTESIAN_PRODUCT_ENABLED = 
SQLConfigBuilder("spark.sql.join.cartesian.enabled")
+.doc("When false, we will throw an error if a query contains a 
cartesian product")
+.booleanConf
+.createWithDefault(false)
+
   val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal")
 .doc("When true, the ordinal numbers are treated as the position in 
the select list. " +
  "When false, the ordinal numbers in order/sort By clause are 
ignored.")
--- End diff --

no it's not this pr but @sameeragarwal can you fix it while you are at it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread jaceklaskowski
Github user jaceklaskowski commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63987980
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -348,6 +348,11 @@ object SQLConf {
 .booleanConf
 .createWithDefault(true)
 
+  val CARTESIAN_PRODUCT_ENABLED = 
SQLConfigBuilder("spark.sql.join.cartesian.enabled")
+.doc("When false, we will throw an error if a query contains a 
cartesian product")
+.booleanConf
+.createWithDefault(false)
+
   val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal")
 .doc("When true, the ordinal numbers are treated as the position in 
the select list. " +
  "When false, the ordinal numbers in order/sort By clause are 
ignored.")
--- End diff --

Is there any reason for `By` uppercase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220505869
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58933/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220505867
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220505801
  
**[Test build #58933 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58933/consoleFull)**
 for PR 13209 at commit 
[`2c4d6e2`](https://github.com/apache/spark/commit/2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220504465
  
Yeah, thats a good idea.  They might not even know which join is getting 
planned that way otherwise.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63984891
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
   // Pick CartesianProduct for InnerJoin
   case logical.Join(left, right, Inner, condition) =>
-joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+if (conf.cartesianProductEnabled) {
+  joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+} else {
+  throw new AnalysisException("Cartesian products are disabled by 
default. To explicitly " +
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220503450
  
cc @marmbrus 
I'm wondering if it'd be better to do this check before execution, rather 
than in planning. The reason is that we would then be able to run explain on 
the plan...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63984279
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
   // Pick CartesianProduct for InnerJoin
   case logical.Join(left, right, Inner, condition) =>
-joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+if (conf.cartesianProductEnabled) {
+  joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+} else {
+  throw new AnalysisException("Cartesian products are disabled by 
default. To explicitly " +
--- End diff --

Cartesian joins? Might be confusing for some users to hear cartesian 
product. If you say join they will know it's the join.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/13209#discussion_r63984257
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
 
   // Pick CartesianProduct for InnerJoin
   case logical.Join(left, right, Inner, condition) =>
-joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+if (conf.cartesianProductEnabled) {
+  joins.CartesianProductExec(planLater(left), planLater(right), 
condition) :: Nil
+} else {
+  throw new AnalysisException("Cartesian products are disabled by 
default. To explicitly " +
+"enable them, please set spark.sql.join.cartesian.enabled = 
true")
--- End diff --

better use the key itself rather than hard coding it here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13209#issuecomment-220500702
  
**[Test build #58933 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58933/consoleFull)**
 for PR 13209 at commit 
[`2c4d6e2`](https://github.com/apache/spark/commit/2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...

2016-05-19 Thread sameeragarwal
GitHub user sameeragarwal opened a pull request:

https://github.com/apache/spark/pull/13209

[SPARK-15425][SQL] Disallow cartesian joins by default

## What changes were proposed in this pull request?

In order to prevent users from inadvertently writing queries with cartesian 
joins, this patch introduces a new conf `spark.sql.join.cartesian.enabled` (set 
to `false` by default) that if not set, results in an `AnalysisException` if 
the query contains one or more cartesian products.

## How was this patch tested?

Added a test to verify the new behavior in `JoinSuite`. Additionally, 
`SQLQuerySuite` and `SQLMetricsSuite` were modified to explicitly enable 
cartesian products.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sameeragarwal/spark disallow-cartesian

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13209.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13209


commit 2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3
Author: Sameer Agarwal 
Date:   2016-05-20T02:04:04Z

Disallow cartesian joins by default




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org