[GitHub] spark pull request #14411: [SPARK-16804][SQL] Correlated subqueries containi...

2016-07-29 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/14411 [SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect results ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-07-29 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 I include two examples of "good" case in the JIRA to show that this fix only blocks cases where Spark will produce incorrect results. I need to find a place to host those "good"

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-07-29 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 Two good cases, which return the same result set, with and without this proposed fix: sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2) and exists (select 1 fr

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-07-29 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @gatorsmile: thanks. I will add them in SubquerySuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-08-01 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, Thank you for your comment. There are quite a few patterns being blacklisted already, such as correlation under set operators (UNION, EXCEPT, INTERSECT), correlation outside

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-08-01 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, First, my apologies for delaying the replies. I am travelling this week, only getting spontaneous connections. Thank you for your explanation of the implementation

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-08-05 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, Have you had a chance to review my last update? Are there anything I should add/change in this PR? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing non-...

2016-08-05 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, Code and test case for blocking `TABLESAMPLE` is in. Could you please review? Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-08-05 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell , Yes. I agree that we need to block this case. I was under the impression that the tablesample clause is supported only when referenced to a base table, not a derived table

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing non-...

2016-08-08 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell, Thanks for getting the PR merged and sorry for causing a few hiccups before I got it right. It's my first PR. I have opened a new JIRA, SPARK-16951, to track

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing non-...

2016-08-05 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing non-...

2016-08-07 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 Thanks, @gatorsmile. This time I ran `dev/lint-scala`. Hope it's my last attempt to get this work thru. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14411: [SPARK-16804][SQL] Correlated subqueries containing LIMI...

2016-08-02 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14411 @hvanhovell , thanks for sharing the blog. I will read thru. It's nice to see the implementation of NOT IN this way. I have an idea to do it differently but let's move this to another place

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 This problem can be viewed in SQL language like this: ``` val a = Seq((1),(2)).toDF("a").createOrReplaceTempView("A") val b = Seq((2),(3)).toDF("a&quo

[GitHub] spark issue #14661: [SPARK-16991][SQL] Fix Outer Join Elimination when Filte...

2016-08-17 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14661 This code change looks good to me. The logic in `` filter.constraints.filter(_.isInstanceOf[IsNotNull]) .exists(expr => join.left.outputSet.intersect(expr.references).nonEm

[GitHub] spark pull request #16798: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-02-03 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/16798 [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 2 of 2) - scalar subquery in predicate context ## What changes were proposed in this pull request? This PR adds new test cases

[GitHub] spark issue #16798: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-02-03 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16798 Below are a modified version of the test cases to run on DB2 and the result from DB2, as a second source to compare to the result from Spark. [Modified test file to run on DB2](https://github.com

[GitHub] spark pull request #16798: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-02-03 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16798#discussion_r99453763 --- Diff: sql/core/src/test/resources/sql-tests/inputs/scalar-subquery.sql --- @@ -1,20 +0,0 @@ -CREATE OR REPLACE TEMPORARY VIEW p AS VALUES (1, 1) AS T

[GitHub] spark issue #16798: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-02-03 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16798 @dilipbiswal Could you please cross-check the results from both sources? @gatorsmile, @hvanhovell Could you please review? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #16798: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-02-05 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16798#discussion_r99499561 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-predicate.sql --- @@ -0,0 +1,255 @@ +-- A test suite

[GitHub] spark issue #16760: [SPARK-18872][SQL][TESTS] New test cases for EXISTS subq...

2017-02-08 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16760 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100077423 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out --- @@ -0,0 +1,178 @@ +-- Automatically

[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100077204 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out --- @@ -0,0 +1,353 @@ +-- Automatically generated

[GitHub] spark pull request #16841: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-08 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16841#discussion_r100076749 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-having.sql.out --- @@ -0,0 +1,217 @@ +-- Automatically generated

[GitHub] spark pull request #16802: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-02-04 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16802#discussion_r99466405 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-cte.sql.out --- @@ -0,0 +1,200 @@ +-- Automatically generated

[GitHub] spark pull request #16802: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-02-04 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16802#discussion_r99466460 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-joins-and-set-ops.sql.out --- @@ -0,0 +1,330 @@ +-- Automatically

[GitHub] spark pull request #16760: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-01-31 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16760#discussion_r98794624 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-having.sql.out --- @@ -0,0 +1,153 @@ +-- Automatically generated

[GitHub] spark pull request #16760: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-01-31 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16760#discussion_r98794661 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-orderby-limit.sql.out --- @@ -0,0 +1,222 @@ +-- Automatically

[GitHub] spark pull request #16760: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-01-31 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16760#discussion_r98794587 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-aggregate.sql.out --- @@ -0,0 +1,183 @@ +-- Automatically

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97694229 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala --- @@ -223,7 +228,10 @@ class SQLQueryTestSuite extends QueryTest

[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-01-26 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16710#discussion_r98048517 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-basic.sql.out --- @@ -0,0 +1,201 @@ +-- Automatically generated

[GitHub] spark pull request #16710: [SPARK-18872][SQL][TESTS] New test cases for EXIS...

2017-01-26 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16710#discussion_r98048551 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-within-and-or.sql.out --- @@ -0,0 +1,156 @@ +-- Automatically

[GitHub] spark issue #16712: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-01-26 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16712 Attached are a slightly modified version of the submitted test file to adapt to IBM DB2 syntax, and the result of the run. [Modified version of the test file](https://github.com/apache/spark

[GitHub] spark issue #16712: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-01-26 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16712 @kevinyu, @gatorsmile. Also FYI to @hvanhovell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #16712: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-01-26 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/16712 [SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2) ## What changes were proposed in this pull request? This PR adds new test cases for scalar subquery in SELECT clause

[GitHub] spark pull request #16712: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-01-30 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16712#discussion_r98453251 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql --- @@ -0,0 +1,139 @@ +-- A test suite

[GitHub] spark pull request #16712: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-01-30 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16712#discussion_r98452724 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql --- @@ -0,0 +1,139 @@ +-- A test suite

[GitHub] spark pull request #16712: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-01-30 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16712#discussion_r98452451 --- Diff: sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql --- @@ -0,0 +1,139 @@ +-- A test suite

[GitHub] spark issue #16712: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-01-30 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16712 Thank you, @gatorsmile, for your time reviewing this test PR. I will wait for your suggestion on the pattern of the literals in the first columns of the tables if you do need to have them changed

[GitHub] spark pull request #16712: [SPARK-18873][SQL][TEST] New test cases for scala...

2017-01-30 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16712#discussion_r98453216 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out --- @@ -0,0 +1,198 @@ +-- Automatically

[GitHub] spark issue #16712: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-01-30 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16712 The part-2 is for scalar subquery in predicates. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16712: [SPARK-18873][SQL][TEST] New test cases for scalar subqu...

2017-01-30 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16712 These two commands should give you the delta of the changes I made to address your comments. https://github.com/apache/spark/pull/16712/commits/0db0bc3a1896c6187b42e04ac2fd11a67769007c

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-25 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 @hvanhovell, I agree it does look risky with this approach. There are a lot of dependencies here. I am pitching in the idea to get your initial thought. Let me do some background and I will share

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 Note the way the plans inside subqueries are not treated as part of the tree traversal is a common problem. Besides this problem, another was reported in SPARK-19093. Also the way Spark needs

[GitHub] spark pull request #16572: [SPARK-18863][SQL] Output non-aggregate expressio...

2017-01-24 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16572#discussion_r97694210 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,66 +117,72 @@ trait CheckAnalysis extends

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-24 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 Thank you for your time reviewing this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-16 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r101596895 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -83,29 +95,150 @@ object RewritePredicateSubquery

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-16 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r101594561 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -40,19 +42,179 @@ abstract class PlanExpression[T

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-16 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r101590658 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -40,19 +42,179 @@ abstract class PlanExpression[T

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-16 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r101590145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -40,19 +42,179 @@ abstract class PlanExpression[T

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-16 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r101593305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala --- @@ -40,19 +42,179 @@ abstract class PlanExpression[T

[GitHub] spark pull request #16915: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-15 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16915#discussion_r101375044 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-with-cte.sql.out --- @@ -0,0 +1,364 @@ +-- Automatically generated

[GitHub] spark issue #16915: [SPARK-18871][SQL][TESTS] New test cases for IN/NOT IN s...

2017-02-15 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16915 It's larger than typical test PRs we submitted for the subquery JIRA but since it's the last test PR, we think we wanted to avoid an additional round of administrative work. --- If your project

[GitHub] spark pull request #16915: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-15 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16915#discussion_r101374593 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out --- @@ -0,0 +1,595 @@ +-- Automatically generated

[GitHub] spark pull request #16915: [SPARK-18871][SQL][TESTS] New test cases for IN/N...

2017-02-15 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16915#discussion_r101375513 --- Diff: sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-joins.sql.out --- @@ -0,0 +1,229 @@ +-- Automatically generated

[GitHub] spark issue #16572: [SPARK-18863][SQL] Output non-aggregate expressions with...

2017-01-16 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16572 cc @hvanhovell. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #16467: [SPARK-19017][SQL] NOT IN subquery with more than one co...

2017-01-16 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16467 @hvanhovell Would there be anything left that I have not addressed in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun, My apologies on the terse comment I put in previously. There is nothing wrong with the ```full outer join``` with ```using``` What I tried to explain is the ```using

[GitHub] spark issue #14580: [SPARK-16991][SQL] Fix `EliminateOuterJoin` optimizer to...

2016-08-15 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14580 @dongjoon-hyun, could you please try this on your PR? val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A") val b

[GitHub] spark pull request #16954: [SPARK-18874][SQL] First phase: Deferring the cor...

2017-02-28 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16954#discussion_r103461455 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -707,13 +709,85 @@ class Analyzer

[GitHub] spark issue #14899: [SPARK-17337][SQL] Incomplete algorithm for name resolut...

2016-08-31 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14899 @hvanhovell, thanks for your feedback. I thought about narrowing the scope of my PR to just the subquery alias context. That would solve the problem I used in this PR. At first, I hesitated

[GitHub] spark issue #14899: [SPARK-17337][SQL] Incomplete algorithm for name resolut...

2016-08-31 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14899 Do you think I should keep the title of the PR as generic as it is? Feel free to suggest a specific title if you still think it will more appropriately describe the problem we are trying to fix here

[GitHub] spark pull request #14899: [SPARK-17337][SQL] Incomplete algorithm for name ...

2016-08-31 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/14899 [SPARK-17337][SQL] Incomplete algorithm for name resolution in Catalyst paser may lead to incorrect result ## What changes were proposed in this pull request? Create a new alias for each

[GitHub] spark issue #14899: [SPARK-17337][SQL] Incomplete algorithm for name resolut...

2016-08-31 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14899 Some background on my investigation of the problem: How can we do name resolution in an SQL statement? ``` scala> sql("select * from t1, t1 t2").explain(true)

[GitHub] spark issue #14899: [SPARK-17337][SQL] Incomplete algorithm for name resolut...

2016-08-31 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14899 With the fix in this PR, the Analyzed Logical Plan and Optimized Logical Plan become: ``` == Analyzed Logical Plan == c3: int Project [c3#123] +- Filter NOT predicate-subquery

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-13 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14912 @viirya, I agree that we need a separate set of PRs to address the general problem. On your comment: "I think the goal to simplify a predicate such as (a > 10 || b > 2) &&

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-12 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14912 Thanks, @gatorsmile, for mentioning me. I will try my best to comment on this thread. Disclaimer: I have not looked at the existing code manipulating predicates/expressions in Spark. Nor have I

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-09-27 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14719 @sarutak -san, can your solution be extended to solve the problem I reported in [SPARK-17337](https://issues.apache.org/jira/browse/SPARK-17337). For your convenience, here is the reproduction script

[GitHub] spark issue #14899: [SPARK-17337][SQL] Incomplete algorithm for name resolut...

2016-09-27 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14899 I am closing this PR. I will propose a general solution in SPARK-17154. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request #14899: [SPARK-17337][SQL] Incomplete algorithm for name ...

2016-09-27 Thread nsyca
Github user nsyca closed the pull request at: https://github.com/apache/spark/pull/14899 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-09-27 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14719 @sarutak, my apologies if this will upset you. I'd like to propose an alternative solution to the problems documented in SPARK-13801, 14040, 17337 and, here, 17154. The solution borrows the idea from

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-09-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14719 @sarutak, on the surface, the problem looks like in the Optimization code but in fact, the root cause is the column/ExprId C2#77 from T2 are indistinguishable between the two streams referencing

[GitHub] spark pull request #14719: [SPARK-17154][SQL] Wrong result can be returned o...

2016-09-27 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/14719#discussion_r80714835 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1580,6 +1583,28 @@ class DataFrameSuite extends QueryTest

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-09-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14719 The test case @sarutak raised here is what I consider the problem of the current code. ```df.join(df, df("key") === df("key"))``` How do we make a conclusion that

[GitHub] spark pull request #15763: [SPARK-17348][SQL] Incorrect results from subquer...

2016-11-07 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15763#discussion_r86790230 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1044,6 +1044,34 @@ class Analyzer

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-07 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 I think I have addressed all the pending comments on this PR. Would be there anything left to do? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15763: [SPARK-17348][SQL] Incorrect results from subquer...

2016-11-07 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15763#discussion_r86911547 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1044,6 +1044,34 @@ class Analyzer

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-07 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 @srinathshankar It is intentional. It is impossible to do an analysis on what in-between operations we can allow and what we cannot. Correlated predicates can be placed in any arbitrary level

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-11-10 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/14719 @cloud-fan, I was studying the ResolveSubquery code for my work on SPARK-17348. I was first puzzle about the code in `def rewriteSubQuery` // Make sure the inner and the outer query

[GitHub] spark pull request #15763: [SPARK-17348][SQL] Incorrect results from subquer...

2016-11-04 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15763#discussion_r86653844 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1044,6 +1044,34 @@ class Analyzer

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-04 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-14 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 @hvanhovell could you please review the latest PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-11 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 Thanks for the tip on the code. I will work on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-11 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 @hvanhovell Then we will need to walk from the top of the operator hosting the outer reference to the operator hosting the correlation to ensure there is no Aggregate or Window operator

[GitHub] spark pull request #15763: [SPARK-17348][SQL] Incorrect results from subquer...

2016-11-14 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15763#discussion_r87823864 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1069,11 +1110,19 @@ class Analyzer( case

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-14 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 @hvanhovell I removed the redundant checking in `ScalarSubquery` as pointed out in your earlier comment (copied below). "I was also wondering if we still need the foll

[GitHub] spark issue #15763: [SPARK-17348][SQL] Incorrect results from subquery trans...

2016-11-14 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15763 @hvanhovell could you please review the PR again? Hopefully this is the last one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16012#discussion_r89657974 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -989,7 +989,7 @@ class Analyzer

[GitHub] spark pull request #16012: [SPARK-17251][SQL] Support `OuterReference` in pr...

2016-11-25 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16012#discussion_r89664100 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -989,7 +989,7 @@ class Analyzer

[GitHub] spark pull request #15936: [SPARK-18504][SQL] Scalar subquery with extra gro...

2016-11-22 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15936#discussion_r89158982 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,19 +117,36 @@ trait CheckAnalysis extends

[GitHub] spark pull request #15936: [SPARK-18504][SQL] Scalar subquery with extra gro...

2016-11-22 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15936#discussion_r89155790 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,19 +117,36 @@ trait CheckAnalysis extends

[GitHub] spark issue #15936: [SPARK-18504][SQL] Scalar subquery with extra group by c...

2016-11-22 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/15936 Should I say `retest this please` to kick off the regression test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15936: [SPARK-18504][SQL] Scalar subquery with extra gro...

2016-11-22 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/15936#discussion_r89159852 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala --- @@ -117,19 +117,36 @@ trait CheckAnalysis extends

[GitHub] spark pull request #16026: [SPARK-18597][SQL] Do not push-down join conditio...

2016-11-28 Thread nsyca
Github user nsyca commented on a diff in the pull request: https://github.com/apache/spark/pull/16026#discussion_r89830125 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -932,7 +932,7 @@ object PushPredicateThroughJoin extends

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16044 Hmm. `ExistenceJoin` is not exposed via `DataFrame` APIs. Let me study the way `org.apache.spark.sql.execution.joins.ExistenceJoinSuite` was written then I will add a new test case. --- If your

[GitHub] spark pull request #16044: Spark 18614

2016-11-28 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/16044 Spark 18614 ## What changes were proposed in this pull request? ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not InnerLike and LeftSemi. This is not currently exposed

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16044 @hvanhovell I can add a new test case but I want to reiterate that this is not an exposed case because `ExistenceJoin` does not exist in the course of the `PushPredicateThroughJoin` rule

[GitHub] spark issue #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operators allow...

2016-11-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16046 We are reviewing existing test cases on subqueries, especially correlated subqueries, and will open a new JIRA, as part of the umbrella JIRA SPARK-18455, to add new test cases to extend the coverage

[GitHub] spark pull request #16046: [SPARK-18582][SQL] Whitelist LogicalPlan operator...

2016-11-28 Thread nsyca
GitHub user nsyca opened a pull request: https://github.com/apache/spark/pull/16046 [SPARK-18582][SQL] Whitelist LogicalPlan operators allowed in correlated subqueries ## What changes were proposed in this pull request? This fix puts an explicit list of operators

[GitHub] spark issue #16044: [Spark-18614][SQL] Incorrect predicate pushdown from Exi...

2016-11-28 Thread nsyca
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16044 @hvanhovell Thank you for the detailed instruction on the test case. @gatorsmile Also thanks for the directive. In addition to the white box testing on the Optimized Logical Plan, I also

  1   2   3   >