Github user nsyca commented on the issue:

    https://github.com/apache/spark/pull/15763
  
    @srinathshankar It is intentional. It is impossible to do an analysis on 
what in-between operations we can allow and what we cannot. Correlated 
predicates can be placed in any arbitrary level of depth in a subquery. Spark 
may not support more than one level of correlations today but it may in the 
next version. So my argument is if we cannot proof that pulling up a correlated 
predicate thru any operations will still preserve its original semantics, then 
we should not do it. The paper @rxin mentioned in the previous note 
(http://www.btw-2015.de/res/proceedings/Hauptband/Wiss/Neumann-Unnesting_Arbitrary_Querie.pdf)
 makes this claim in the paragraph after Q2 in page 2.
    
    This PR is not a full solution. It is intended to be a temporary stop-gap 
solution to close off the incorrect result cases Spark exposes today. It could 
be argued that your example above is a regression but the statement can be 
rewritten to make it work by collapsing the 2 levels of subselects. It is 
harder to implement a solution that will walk the whole plan tree of a subquery 
to determine which operations are fine to let the correlated predicate pulled 
through.
    
    A permanent solution, as I proposed in one of my comments above, is to 
separate the transformation of any correlated predicates to Optimizer phase and 
leave Analyzer phase to just resolve the references and validate the input SQL 
is valid. This way, the two subselects in your example will probably be merged 
and then the correlated predicate pull up follows. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to