Github user nsyca commented on the issue:
https://github.com/apache/spark/pull/14411
@hvanhovell,
First, my apologies for delaying the replies. I am travelling this week,
only getting spontaneous connections. Thank you for your explanation of the
implementation and the reason behind the choice of the implementation. It is
very helpful for a beginner like me.
My bad! What I meant in my previous comment on rewriting of subqueries to
join is actually the moving of the positions of the correlated predicates from
their original positions to outside of the scopes of subqueries, specifically,
the call to the function pullOutCorrelatedPredicates() -- I hope I got it right
this time. I see this as one of the root causes of many problems. Bear with me,
I don't have a good solution as I am still getting myself familiar with the
code. Here is an example of the problems, in my opinion. With the rewrite, we
cannot distinct between the EXISTS form and IN form of the original SQL.
select * from t1 where exists (select 1 from t2 where t1.c1=t2.c2)
-and-
select * from t1 where t1.c1 in (select t2.c2 from t2)
are represented after Analysis phase. This does not have issue because they
are semantically equivalent. However, when we add the NOT in
select * from t1 where not exists (select 1 from t2 where t1.c1=t2.c2)
-and-
select * from t1 where t1.c1 not in (select t2.c2 from t2)
are NOT semantically equivalent when T2.C2 can produce NULL values.
Lastly, your comment on the operator SAMPLE seems right. I will give it
shot on adding it to this PR.
Thanks again for your patience.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]