[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-18 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011193#comment-15011193
 ] 

Xiao Li commented on SPARK-11803:
-

No problem. Actually, I have a couple of test cases. You can try it. 

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-18 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010837#comment-15010837
 ] 

Wenchen Fan commented on SPARK-11803:
-

[~smilegator] sorry I haven't noticed that you are already working on it... 
could you help to review my PR and leave your thoughts there? Thanks!

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-18 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010828#comment-15010828
 ] 

Apache Spark commented on SPARK-11803:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/9806

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010267#comment-15010267
 ] 

Xiao Li commented on SPARK-11803:
-

We need to detect if this is a self join in the function joinWith. 

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010177#comment-15010177
 ] 

Xiao Li commented on SPARK-11803:
-

Not sure if this has been assigned. I can try it tonight and tomorrow. Will 
give a reply before the end of tomorrow. 

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010230#comment-15010230
 ] 

Xiao Li commented on SPARK-11803:
-

We need to assign a new expression ID the conflicting attribute in Project. 

== Analyzed Logical Plan ==
_1: string, _2: string
Project [value#1 AS _1#6,value#1 AS _2#7]
 Join Inner, Some(true)
  LocalRelation [value#1], [[0,11,31],[0,11,32]]
  LocalRelation [value#8], [[0,11,31],[0,11,32]]




> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010181#comment-15010181
 ] 

Xiao Li commented on SPARK-11803:
-

The optimized plan is wrong. 

Project [value#1 AS _1#4,value#1 AS _2#5]
 Join Inner, None
  LocalRelation [value#1], [[0,11,31],[0,11,32]]
  LocalRelation [[empty row],[empty row]]

The correct one should be like 

Project [value#1 AS _1#4,value#5 AS _2#5]
 Join Inner, None
  LocalRelation [value#1], [[0,11,31],[0,11,32]]
  LocalRelation [value#5], [[0,11,31],[0,11,32]]

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010418#comment-15010418
 ] 

Xiao Li commented on SPARK-11803:
-

After implementing the prototype, it can generate the expected results. I need 
to clean my code tomorrow. 

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010315#comment-15010315
 ] 

Xiao Li commented on SPARK-11803:
-

I believe this must be an urgent issue. This is my first time to read this 
Dataset implementation. This might be a quick fix for you. I should not block 
your current progress. 

Just want to share my idea. Detecting the conflicting attributes in the 
joinWith function and then reassign the new expression ids to the 
other.logicalPlan. This should be similar as how the Analyzer does for self 
joins for handling the duplicate expression ids. Then, we can use the new 
`other` for joining with `this`.




> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010329#comment-15010329
 ] 

Reynold Xin commented on SPARK-11803:
-

Please submit a pull request if you can figure it out. Thanks.


> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result

2015-11-17 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010345#comment-15010345
 ] 

Xiao Li commented on SPARK-11803:
-

Sure, I can try it, but I have to say I am unable to complete it tonight. : )

> Dataset self join returns incorrect result
> --
>
> Key: SPARK-11803
> URL: https://issues.apache.org/jira/browse/SPARK-11803
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>
> See the test case in https://github.com/apache/spark/pull/9789
> {code}
>   ignore("self join") {
> val ds = Seq("1", "2").toDS().as("a")
> val joined = ds.joinWith(ds, lit(true))
> checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2"))
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org