[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2018-07-23 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553330#comment-16553330
 ] 

Michael Armbrust commented on SPARK-6459:
-

[~tenstriker] this will never happen from a SQL query.  This only happens when 
you take already resolved attributes from different parts of a DataFrame and 
manually construct an equality that can't be differentiated.

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2018-07-19 Thread nirav patel (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16550024#comment-16550024
 ] 

nirav patel commented on SPARK-6459:


[~zero323] why the example you gave should generate cartesian product? I don't 
see why it should if it were to run against any ansi sql engine (mysql, oracle) 
. Why is this issue with spark? Is because nature of lazy evaluation and query 
planner optimization done at the end ?

[~marmbrus] I think this "WARNING" is masking some bigger issue here. Can't 
spark sql engine create aliases itself so it itself doesn't get confused 
instead of burdening user with it. As far as user is concerned he is writing 
individual sql statements which are correct syntactically and semantically. 
It's a spark query planner which misinterprets the semantic. 

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074268#comment-15074268
 ] 

Maciej Szymkiewicz commented on SPARK-6459:
---

[~marmbrus] Isn't this warning obsolete in 1.5+?

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074322#comment-15074322
 ] 

Maciej Szymkiewicz commented on SPARK-6459:
---

I've been trying to reproduce the problem on 1.5.2 to illustrate why there is a 
need for aliases but surprisingly it worked just fine. 

{code}
val df = sc.parallelize(Seq(("a", 1, 0.2), ("a", 2, 0.3), ("b", 2, 0.4), ("b", 
3, 0.5))).toDF("x", "y", "z")
val as = df.where($"x" === "a")
val bs = df.where($"x" === "b")
as.join(bs, as("y") === bs("y")).collect
{code}

I get a warning but no Cartesian product. 

{code}
scala> as.join(bs, as("y") === bs("y")).explain(true)
15/12/29 21:29:16 WARN Column: Constructing trivially true equals predicate, 
'y#4 = y#4'. Perhaps you need to use aliases.
== Parsed Logical Plan ==
Join Inner, Some((y#4 = y#17))
 Filter (x#3 = a)
  Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5]
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21
 Filter (x#16 = b)
  Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18]
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21

== Analyzed Logical Plan ==
x: string, y: int, z: double, x: string, y: int, z: double
Join Inner, Some((y#4 = y#17))
 Filter (x#3 = a)
  Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5]
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21
 Filter (x#16 = b)
  Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18]
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21

== Optimized Logical Plan ==
Join Inner, Some((y#4 = y#17))
 Project [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5]
  Filter (_1#0 = a)
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21
 Project [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18]
  Filter (_1#0 = b)
   LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at rddToDataFrameHolder at 
:21

== Physical Plan ==
SortMergeJoin [y#4], [y#17]
 TungstenSort [y#4 ASC], false, 0
  TungstenExchange hashpartitioning(y#4)
   TungstenProject [_1#0 AS x#3,_2#1 AS y#4,_3#2 AS z#5]
Filter (_1#0 = a)
 Scan PhysicalRDD[_1#0,_2#1,_3#2]
 TungstenSort [y#17 ASC], false, 0
  TungstenExchange hashpartitioning(y#17)
   TungstenProject [_1#0 AS x#16,_2#1 AS y#17,_3#2 AS z#18]
Filter (_1#0 = b)
 Scan PhysicalRDD[_1#0,_2#1,_3#2]

Code Generation: true
{code}

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Maciej Szymkiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074342#comment-15074342
 ] 

Maciej Szymkiewicz commented on SPARK-6459:
---

Thanks for clarification.

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074327#comment-15074327
 ] 

Michael Armbrust commented on SPARK-6459:
-

We do special case that very common problem, but I bet if you construct 
something more complicated (include an OR or a UDF) it will not get rewritten.

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-12-29 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15074279#comment-15074279
 ] 

Michael Armbrust commented on SPARK-6459:
-

I don't think so, why do you say that?

> Warn when Column API is constructing trivially true equality
> 
>
> Key: SPARK-6459
> URL: https://issues.apache.org/jira/browse/SPARK-6459
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
>Reporter: Michael Armbrust
>Assignee: Michael Armbrust
>Priority: Critical
> Fix For: 1.3.1, 1.4.0
>
>
> Right now its pretty confusing when a user constructs and equality predicate 
> that is going to be use in a self join, where the optimizer cannot 
> distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
> there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6459) Warn when Column API is constructing trivially true equality

2015-03-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377502#comment-14377502
 ] 

Apache Spark commented on SPARK-6459:
-

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/5163

 Warn when Column API is constructing trivially true equality
 

 Key: SPARK-6459
 URL: https://issues.apache.org/jira/browse/SPARK-6459
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.3.0
Reporter: Michael Armbrust
Priority: Critical

 Right now its pretty confusing when a user constructs and equality predicate 
 that is going to be use in a self join, where the optimizer cannot 
 distinguish between the attributes in question (e.g.,  [SPARK-6231]).  Since 
 there is really no good reason to do this, lets print a warning.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org