GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/14411
[SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect
results
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
I include two examples of "good" case in the JIRA to show that this fix
only blocks cases where Spark will produce incorrect results. I need to find a
place to host those "good"
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
Two good cases, which return the same result set, with and without this
proposed fix:
sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2)
and exists (select 1 fr
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@gatorsmile: thanks. I will add them in SubquerySuite.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
Thank you for your comment. There are quite a few patterns being
blacklisted already, such as correlation under set operators (UNION, EXCEPT,
INTERSECT), correlation outside
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
First, my apologies for delaying the replies. I am travelling this week,
only getting spontaneous connections. Thank you for your explanation of the
implementation
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
Have you had a chance to review my last update? Are there anything I should
add/change in this PR?
---
If your project is set up for it, you can reply to this email and have
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
Code and test case for blocking `TABLESAMPLE` is in. Could you please
review? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell ,
Yes. I agree that we need to block this case. I was under the impression
that the tablesample clause is supported only when referenced to a base table,
not a derived table
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
Thanks for getting the PR merged and sorry for causing a few hiccups before
I got it right. It's my first PR.
I have opened a new JIRA, SPARK-16951, to track
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
Done.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
Thanks, @gatorsmile. This time I ran `dev/lint-scala`. Hope it's my last
attempt to get this work thru.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell , thanks for sharing the blog. I will read thru. It's nice to
see the implementation of NOT IN this way. I have an idea to do it differently
but let's move this to another place
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14580
This problem can be viewed in SQL language like this:
```
val a = Seq((1),(2)).toDF("a").createOrReplaceTempView("A")
val b = Seq((2),(3)).toDF("a&quo
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14661
This code change looks good to me. The logic in
``
filter.constraints.filter(_.isInstanceOf[IsNotNull])
.exists(expr =>
join.left.outputSet.intersect(expr.references).nonEm
GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/16798
[SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 2 of 2) -
scalar subquery in predicate context
## What changes were proposed in this pull request?
This PR adds new test cases
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16798
Below are a modified version of the test cases to run on DB2 and the result
from DB2, as a second source to compare to the result from Spark.
[Modified test file to run on
DB2](https://github.com
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16798#discussion_r99453763
--- Diff: sql/core/src/test/resources/sql-tests/inputs/scalar-subquery.sql
---
@@ -1,20 +0,0 @@
-CREATE OR REPLACE TEMPORARY VIEW p AS VALUES (1, 1) AS T
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16798
@dilipbiswal Could you please cross-check the results from both sources?
@gatorsmile, @hvanhovell Could you please review?
---
If your project is set up for it, you can reply to this email
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16798#discussion_r99499561
--- Diff:
sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-predicate.sql
---
@@ -0,0 +1,255 @@
+-- A test suite
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16760
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16841#discussion_r100077423
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-multiple-columns.sql.out
---
@@ -0,0 +1,178 @@
+-- Automatically
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16841#discussion_r100077204
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-joins.sql.out
---
@@ -0,0 +1,353 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16841#discussion_r100076749
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-having.sql.out
---
@@ -0,0 +1,217 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16802#discussion_r99466405
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-cte.sql.out
---
@@ -0,0 +1,200 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16802#discussion_r99466460
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-joins-and-set-ops.sql.out
---
@@ -0,0 +1,330 @@
+-- Automatically
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16760#discussion_r98794624
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-having.sql.out
---
@@ -0,0 +1,153 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16760#discussion_r98794661
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-orderby-limit.sql.out
---
@@ -0,0 +1,222 @@
+-- Automatically
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16760#discussion_r98794587
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-aggregate.sql.out
---
@@ -0,0 +1,183 @@
+-- Automatically
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16572#discussion_r97694229
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -223,7 +228,10 @@ class SQLQueryTestSuite extends QueryTest
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16710#discussion_r98048517
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-basic.sql.out
---
@@ -0,0 +1,201 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16710#discussion_r98048551
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/exists-subquery/exists-within-and-or.sql.out
---
@@ -0,0 +1,156 @@
+-- Automatically
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16712
Attached are a slightly modified version of the submitted test file to
adapt to IBM DB2 syntax, and the result of the run.
[Modified version of the test
file](https://github.com/apache/spark
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16712
@kevinyu, @gatorsmile. Also FYI to @hvanhovell.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/16712
[SPARK-18873][SQL][TEST] New test cases for scalar subquery (part 1 of 2)
## What changes were proposed in this pull request?
This PR adds new test cases for scalar subquery in SELECT clause
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16712#discussion_r98453251
--- Diff:
sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
---
@@ -0,0 +1,139 @@
+-- A test suite
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16712#discussion_r98452724
--- Diff:
sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
---
@@ -0,0 +1,139 @@
+-- A test suite
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16712#discussion_r98452451
--- Diff:
sql/core/src/test/resources/sql-tests/inputs/subquery/scalar-subquery/scalar-subquery-select.sql
---
@@ -0,0 +1,139 @@
+-- A test suite
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16712
Thank you, @gatorsmile, for your time reviewing this test PR. I will wait
for your suggestion on the pattern of the literals in the first columns of the
tables if you do need to have them changed
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16712#discussion_r98453216
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/scalar-subquery/scalar-subquery-select.sql.out
---
@@ -0,0 +1,198 @@
+-- Automatically
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16712
The part-2 is for scalar subquery in predicates.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16712
These two commands should give you the delta of the changes I made to
address your comments.
https://github.com/apache/spark/pull/16712/commits/0db0bc3a1896c6187b42e04ac2fd11a67769007c
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16572
@hvanhovell, I agree it does look risky with this approach. There are a lot
of dependencies here. I am pitching in the idea to get your initial thought.
Let me do some background and I will share
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16572
Note the way the plans inside subqueries are not treated as part of the
tree traversal is a common problem. Besides this problem, another was reported
in SPARK-19093. Also the way Spark needs
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16572#discussion_r97694210
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -117,66 +117,72 @@ trait CheckAnalysis extends
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16572
Thank you for your time reviewing this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r101596895
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
---
@@ -83,29 +95,150 @@ object RewritePredicateSubquery
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r101594561
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -40,19 +42,179 @@ abstract class PlanExpression[T
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r101590658
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -40,19 +42,179 @@ abstract class PlanExpression[T
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r101590145
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -40,19 +42,179 @@ abstract class PlanExpression[T
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r101593305
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -40,19 +42,179 @@ abstract class PlanExpression[T
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16915#discussion_r101375044
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-with-cte.sql.out
---
@@ -0,0 +1,364 @@
+-- Automatically generated
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16915
It's larger than typical test PRs we submitted for the subquery JIRA but
since it's the last test PR, we think we wanted to avoid an additional round of
administrative work.
---
If your project
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16915#discussion_r101374593
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/in-set-operations.sql.out
---
@@ -0,0 +1,595 @@
+-- Automatically generated
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16915#discussion_r101375513
--- Diff:
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-joins.sql.out
---
@@ -0,0 +1,229 @@
+-- Automatically generated
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16572
cc @hvanhovell.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16467
@hvanhovell Would there be anything left that I have not addressed in this
PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14580
@dongjoon-hyun, My apologies on the terse comment I put in previously.
There is nothing wrong with the ```full outer join``` with ```using``` What I
tried to explain is the ```using
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14580
@dongjoon-hyun, could you please try this on your PR?
val a = Seq((1,2),(2,3)).toDF("a","b").createOrReplaceTempView("A")
val b
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16954#discussion_r103461455
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -707,13 +709,85 @@ class Analyzer
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14899
@hvanhovell, thanks for your feedback.
I thought about narrowing the scope of my PR to just the subquery alias
context. That would solve the problem I used in this PR. At first, I hesitated
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14899
Do you think I should keep the title of the PR as generic as it is? Feel
free to suggest a specific title if you still think it will more appropriately
describe the problem we are trying to fix here
GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/14899
[SPARK-17337][SQL] Incomplete algorithm for name resolution in Catalyst
paser may lead to incorrect result
## What changes were proposed in this pull request?
Create a new alias for each
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14899
Some background on my investigation of the problem:
How can we do name resolution in an SQL statement?
```
scala> sql("select * from t1, t1 t2").explain(true)
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14899
With the fix in this PR, the Analyzed Logical Plan and Optimized Logical
Plan become:
```
== Analyzed Logical Plan ==
c3: int
Project [c3#123]
+- Filter NOT predicate-subquery
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14912
@viirya, I agree that we need a separate set of PRs to address the general
problem.
On your comment: "I think the goal to simplify a predicate such as (a > 10
|| b > 2) &&
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14912
Thanks, @gatorsmile, for mentioning me. I will try my best to comment on
this thread. Disclaimer: I have not looked at the existing code manipulating
predicates/expressions in Spark. Nor have I
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
@sarutak -san, can your solution be extended to solve the problem I
reported in [SPARK-17337](https://issues.apache.org/jira/browse/SPARK-17337).
For your convenience, here is the reproduction script
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14899
I am closing this PR. I will propose a general solution in SPARK-17154.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user nsyca closed the pull request at:
https://github.com/apache/spark/pull/14899
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
@sarutak, my apologies if this will upset you. I'd like to propose an
alternative solution to the problems documented in SPARK-13801, 14040, 17337
and, here, 17154. The solution borrows the idea from
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
@sarutak, on the surface, the problem looks like in the Optimization code
but in fact, the root cause is the column/ExprId C2#77 from T2 are
indistinguishable between the two streams referencing
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/14719#discussion_r80714835
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
---
@@ -1580,6 +1583,28 @@ class DataFrameSuite extends QueryTest
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
The test case @sarutak raised here is what I consider the problem of the
current code.
```df.join(df, df("key") === df("key"))```
How do we make a conclusion that
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15763#discussion_r86790230
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1044,6 +1044,34 @@ class Analyzer
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
I think I have addressed all the pending comments on this PR. Would be
there anything left to do?
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15763#discussion_r86911547
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1044,6 +1044,34 @@ class Analyzer
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
@srinathshankar It is intentional. It is impossible to do an analysis on
what in-between operations we can allow and what we cannot. Correlated
predicates can be placed in any arbitrary level
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14719
@cloud-fan, I was studying the ResolveSubquery code for my work on
SPARK-17348. I was first puzzle about the code in `def rewriteSubQuery`
// Make sure the inner and the outer query
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15763#discussion_r86653844
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1044,6 +1044,34 @@ class Analyzer
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
@hvanhovell could you please review the latest PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
Thanks for the tip on the code. I will work on it.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
@hvanhovell Then we will need to walk from the top of the operator hosting
the outer reference to the operator hosting the correlation to ensure there is
no Aggregate or Window operator
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15763#discussion_r87823864
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1069,11 +1110,19 @@ class Analyzer(
case
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
@hvanhovell
I removed the redundant checking in `ScalarSubquery` as pointed out in your
earlier comment (copied below).
"I was also wondering if we still need the foll
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15763
@hvanhovell could you please review the PR again? Hopefully this is the
last one.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16012#discussion_r89657974
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -989,7 +989,7 @@ class Analyzer
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16012#discussion_r89664100
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -989,7 +989,7 @@ class Analyzer
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15936#discussion_r89158982
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -117,19 +117,36 @@ trait CheckAnalysis extends
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15936#discussion_r89155790
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -117,19 +117,36 @@ trait CheckAnalysis extends
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/15936
Should I say `retest this please` to kick off the regression test?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15936#discussion_r89159852
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
---
@@ -117,19 +117,36 @@ trait CheckAnalysis extends
Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/16026#discussion_r89830125
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
---
@@ -932,7 +932,7 @@ object PushPredicateThroughJoin extends
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16044
Hmm. `ExistenceJoin` is not exposed via `DataFrame` APIs. Let me study the
way `org.apache.spark.sql.execution.joins.ExistenceJoinSuite` was written then
I will add a new test case.
---
If your
GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/16044
Spark 18614
## What changes were proposed in this pull request?
ExistenceJoin should be treated the same as LeftOuter and LeftAnti, not
InnerLike and LeftSemi. This is not currently exposed
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16044
@hvanhovell I can add a new test case but I want to reiterate that this is
not an exposed case because `ExistenceJoin` does not exist in the course of the
`PushPredicateThroughJoin` rule
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16046
We are reviewing existing test cases on subqueries, especially correlated
subqueries, and will open a new JIRA, as part of the umbrella JIRA SPARK-18455,
to add new test cases to extend the coverage
GitHub user nsyca opened a pull request:
https://github.com/apache/spark/pull/16046
[SPARK-18582][SQL] Whitelist LogicalPlan operators allowed in correlated
subqueries
## What changes were proposed in this pull request?
This fix puts an explicit list of operators
Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/16044
@hvanhovell Thank you for the detailed instruction on the test case.
@gatorsmile Also thanks for the directive.
In addition to the white box testing on the Optimized Logical Plan, I also
1 - 100 of 250 matches
Mail list logo