[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user sameeragarwal commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r64112716 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -348,6 +348,11 @@ object SQLConf { .booleanConf .createWithDefault(true) + val CARTESIAN_PRODUCT_ENABLED = SQLConfigBuilder("spark.sql.join.cartesian.enabled") +.doc("When false, we will throw an error if a query contains a cartesian product") +.booleanConf +.createWithDefault(false) + val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal") .doc("When true, the ordinal numbers are treated as the position in the select list. " + "When false, the ordinal numbers in order/sort By clause are ignored.") --- End diff -- yes, for sure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r63990349 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -348,6 +348,11 @@ object SQLConf { .booleanConf .createWithDefault(true) + val CARTESIAN_PRODUCT_ENABLED = SQLConfigBuilder("spark.sql.join.cartesian.enabled") +.doc("When false, we will throw an error if a query contains a cartesian product") +.booleanConf +.createWithDefault(false) + val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal") .doc("When true, the ordinal numbers are treated as the position in the select list. " + "When false, the ordinal numbers in order/sort By clause are ignored.") --- End diff -- no it's not this pr but @sameeragarwal can you fix it while you are at it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r63987980 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -348,6 +348,11 @@ object SQLConf { .booleanConf .createWithDefault(true) + val CARTESIAN_PRODUCT_ENABLED = SQLConfigBuilder("spark.sql.join.cartesian.enabled") +.doc("When false, we will throw an error if a query contains a cartesian product") +.booleanConf +.createWithDefault(false) + val ORDER_BY_ORDINAL = SQLConfigBuilder("spark.sql.orderByOrdinal") .doc("When true, the ordinal numbers are treated as the position in the select list. " + "When false, the ordinal numbers in order/sort By clause are ignored.") --- End diff -- Is there any reason for `By` uppercase? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220505869 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58933/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220505867 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220505801 **[Test build #58933 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58933/consoleFull)** for PR 13209 at commit [`2c4d6e2`](https://github.com/apache/spark/commit/2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220504465 Yeah, thats a good idea. They might not even know which join is getting planned that way otherwise. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r63984891 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // Pick CartesianProduct for InnerJoin case logical.Join(left, right, Inner, condition) => -joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +if (conf.cartesianProductEnabled) { + joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +} else { + throw new AnalysisException("Cartesian products are disabled by default. To explicitly " + --- End diff -- +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220503450 cc @marmbrus I'm wondering if it'd be better to do this check before execution, rather than in planning. The reason is that we would then be able to run explain on the plan... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r63984279 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // Pick CartesianProduct for InnerJoin case logical.Join(left, right, Inner, condition) => -joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +if (conf.cartesianProductEnabled) { + joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +} else { + throw new AnalysisException("Cartesian products are disabled by default. To explicitly " + --- End diff -- Cartesian joins? Might be confusing for some users to hear cartesian product. If you say join they will know it's the join. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/13209#discussion_r63984257 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -178,7 +178,12 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] { // Pick CartesianProduct for InnerJoin case logical.Join(left, right, Inner, condition) => -joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +if (conf.cartesianProductEnabled) { + joins.CartesianProductExec(planLater(left), planLater(right), condition) :: Nil +} else { + throw new AnalysisException("Cartesian products are disabled by default. To explicitly " + +"enable them, please set spark.sql.join.cartesian.enabled = true") --- End diff -- better use the key itself rather than hard coding it here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13209#issuecomment-220500702 **[Test build #58933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58933/consoleFull)** for PR 13209 at commit [`2c4d6e2`](https://github.com/apache/spark/commit/2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15425][SQL] Disallow cartesian joins by...
GitHub user sameeragarwal opened a pull request: https://github.com/apache/spark/pull/13209 [SPARK-15425][SQL] Disallow cartesian joins by default ## What changes were proposed in this pull request? In order to prevent users from inadvertently writing queries with cartesian joins, this patch introduces a new conf `spark.sql.join.cartesian.enabled` (set to `false` by default) that if not set, results in an `AnalysisException` if the query contains one or more cartesian products. ## How was this patch tested? Added a test to verify the new behavior in `JoinSuite`. Additionally, `SQLQuerySuite` and `SQLMetricsSuite` were modified to explicitly enable cartesian products. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sameeragarwal/spark disallow-cartesian Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13209.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13209 commit 2c4d6e29e1e7921b23a0ca45b8a9882a6a4c53a3 Author: Sameer Agarwal Date: 2016-05-20T02:04:04Z Disallow cartesian joins by default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org