[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15011193#comment-15011193 ] Xiao Li commented on SPARK-11803: - No problem. Actually, I have a couple of test cases. You can try it. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010837#comment-15010837 ] Wenchen Fan commented on SPARK-11803: - [~smilegator] sorry I haven't noticed that you are already working on it... could you help to review my PR and leave your thoughts there? Thanks! > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010828#comment-15010828 ] Apache Spark commented on SPARK-11803: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/9806 > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010267#comment-15010267 ] Xiao Li commented on SPARK-11803: - We need to detect if this is a self join in the function joinWith. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010177#comment-15010177 ] Xiao Li commented on SPARK-11803: - Not sure if this has been assigned. I can try it tonight and tomorrow. Will give a reply before the end of tomorrow. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010230#comment-15010230 ] Xiao Li commented on SPARK-11803: - We need to assign a new expression ID the conflicting attribute in Project. == Analyzed Logical Plan == _1: string, _2: string Project [value#1 AS _1#6,value#1 AS _2#7] Join Inner, Some(true) LocalRelation [value#1], [[0,11,31],[0,11,32]] LocalRelation [value#8], [[0,11,31],[0,11,32]] > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010181#comment-15010181 ] Xiao Li commented on SPARK-11803: - The optimized plan is wrong. Project [value#1 AS _1#4,value#1 AS _2#5] Join Inner, None LocalRelation [value#1], [[0,11,31],[0,11,32]] LocalRelation [[empty row],[empty row]] The correct one should be like Project [value#1 AS _1#4,value#5 AS _2#5] Join Inner, None LocalRelation [value#1], [[0,11,31],[0,11,32]] LocalRelation [value#5], [[0,11,31],[0,11,32]] > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010418#comment-15010418 ] Xiao Li commented on SPARK-11803: - After implementing the prototype, it can generate the expected results. I need to clean my code tomorrow. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010315#comment-15010315 ] Xiao Li commented on SPARK-11803: - I believe this must be an urgent issue. This is my first time to read this Dataset implementation. This might be a quick fix for you. I should not block your current progress. Just want to share my idea. Detecting the conflicting attributes in the joinWith function and then reassign the new expression ids to the other.logicalPlan. This should be similar as how the Analyzer does for self joins for handling the duplicate expression ids. Then, we can use the new `other` for joining with `this`. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010329#comment-15010329 ] Reynold Xin commented on SPARK-11803: - Please submit a pull request if you can figure it out. Thanks. > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11803) Dataset self join returns incorrect result
[ https://issues.apache.org/jira/browse/SPARK-11803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010345#comment-15010345 ] Xiao Li commented on SPARK-11803: - Sure, I can try it, but I have to say I am unable to complete it tonight. : ) > Dataset self join returns incorrect result > -- > > Key: SPARK-11803 > URL: https://issues.apache.org/jira/browse/SPARK-11803 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > > See the test case in https://github.com/apache/spark/pull/9789 > {code} > ignore("self join") { > val ds = Seq("1", "2").toDS().as("a") > val joined = ds.joinWith(ds, lit(true)) > checkAnswer(joined, ("1", "1"), ("1", "2"), ("2", "1"), ("2", "2")) > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org