[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553330#comment-16553330 ] Michael Armbrust commented on SPARK-6459: - [~tenstriker] this will never happen from a SQL query. This only happens when you take already resolved attributes from different parts of a DataFrame and manually construct an equality that can't be differentiated. > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550024#comment-16550024 ] nirav patel commented on SPARK-6459: [~zero323] why the example you gave should generate cartesian product? I don't see why it should if it were to run against any ansi sql engine (mysql, oracle) . Why is this issue with spark? Is because nature of lazy evaluation and query planner optimization done at the end ? [~marmbrus] I think this "WARNING" is masking some bigger issue here. Can't spark sql engine create aliases itself so it itself doesn't get confused instead of burdening user with it. As far as user is concerned he is writing individual sql statements which are correct syntactically and semantically. It's a spark query planner which misinterprets the semantic. > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074268#comment-15074268 ] Maciej Szymkiewicz commented on SPARK-6459: --- [~marmbrus] Isn't this warning obsolete in 1.5+? > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074322#comment-15074322 ] Maciej Szymkiewicz commented on SPARK-6459: --- I've been trying to reproduce the problem on 1.5.2 to illustrate why there is a need for aliases but surprisingly it worked just fine. {code} val df = sc.parallelize(Seq(("a", 1, 0.2), ("a", 2, 0.3), ("b", 2, 0.4), ("b", 3, 0.5))).toDF("x", "y", "z") val as = df.where($"x" === "a") val bs = df.where($"x" === "b") as.join(bs, as("y") === bs("y")).collect {code} I get a warning but no Cartesian product. {code} scala> as.join(bs, as("y") === bs("y")).explain(true) 15/12/29 21:29:16 WARN Column: Constructing trivially true equals predicate, 'y#4 = y#4'. Perhaps you need to use aliases. == Parsed Logical Plan == Join Inner, Some((y#4 = y#17)) Filter (x#3 = a) Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5] LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 Filter (x#16 = b) Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18] LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 == Analyzed Logical Plan == x: string, y: int, z: double, x: string, y: int, z: double Join Inner, Some((y#4 = y#17)) Filter (x#3 = a) Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5] LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 Filter (x#16 = b) Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18] LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 == Optimized Logical Plan == Join Inner, Some((y#4 = y#17)) Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5] Filter (_1#0 = a) LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18] Filter (_1#0 = b) LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at :21 == Physical Plan == SortMergeJoin [y#4], [y#17] TungstenSort [y#4 ASC], false, 0 TungstenExchange hashpartitioning(y#4) TungstenProject [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5] Filter (_1#0 = a) Scan PhysicalRDD[_1#0,_2#1,_3#2] TungstenSort [y#17 ASC], false, 0 TungstenExchange hashpartitioning(y#17) TungstenProject [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18] Filter (_1#0 = b) Scan PhysicalRDD[_1#0,_2#1,_3#2] Code Generation: true {code} > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074342#comment-15074342 ] Maciej Szymkiewicz commented on SPARK-6459: --- Thanks for clarification. > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074327#comment-15074327 ] Michael Armbrust commented on SPARK-6459: - We do special case that very common problem, but I bet if you construct something more complicated (include an OR or a UDF) it will not get rewritten. > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074279#comment-15074279 ] Michael Armbrust commented on SPARK-6459: - I don't think so, why do you say that? > Warn when Column API is constructing trivially true equality > > > Key: SPARK-6459 > URL: https://issues.apache.org/jira/browse/SPARK-6459 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 >Reporter: Michael Armbrust >Assignee: Michael Armbrust >Priority: Critical > Fix For: 1.3.1, 1.4.0 > > > Right now its pretty confusing when a user constructs and equality predicate > that is going to be use in a self join, where the optimizer cannot > distinguish between the attributes in question (e.g., [SPARK-6231]). Since > there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality
[ https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377502#comment-14377502 ] Apache Spark commented on SPARK-6459: - User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/5163 Warn when Column API is constructing trivially true equality Key: SPARK-6459 URL: https://issues.apache.org/jira/browse/SPARK-6459 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.3.0 Reporter: Michael Armbrust Priority: Critical Right now its pretty confusing when a user constructs and equality predicate that is going to be use in a self join, where the optimizer cannot distinguish between the attributes in question (e.g., [SPARK-6231]). Since there is really no good reason to do this, lets print a warning. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org