Github user nsyca commented on a diff in the pull request:
https://github.com/apache/spark/pull/15763#discussion_r86653844
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
---
@@ -1044,6 +1044,34 @@ class Analyzer(
failOnOuterReference(p)
p
}
+
+ // SPARK-17348
+ // Looking for a potential incorrect result case.
+ // When a correlated predicate is a non-equality predicate
+ // it must be placed at the immediate child operator.
+ // Otherwise, the pull up of the correlated predicate
+ // will generate a plan with a different semantics
+ // which could return incorrect result.
+ var continue : Boolean = true
--- End diff --
One technique that I know of being used to transform correlation queries to
queries with no correlation is outlined in this 1996 IEEE Data Engineering
paper.
Complex query decorrelation
P. Seshadri; H. Pirahesh; T. Y. C. Leung
Data Engineering, 1996. Proceedings of the Twelfth International
Conference on
Pages: 450 - 458
Distributed systems aggravate the performance impact of correlated queries
from the movement of the entire data set of the subqueries to where the data of
the outer tables reside. This processing is similar to the
`BroadcastNestedLoopJoinExec` in Spark.
The idea behind the paper is to build a duplicate portion of the outer
tables and de-correlate the original subquery by joining the duplicate portion
within the subquery. The algorithm is claimed to be generic and can be applied
to all forms of correlations, both shallow correlation where the correlated
point is immediately below the operation over the outer table(s), and deep
correlation, where the correlated point is at arbitrary level below the
operation over the outer tables.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]