GitHub user hvanhovell opened a pull request:
https://github.com/apache/spark/pull/12954
[SPARK-15122][SQL] Fix TPC-DS 41 - Normalize predicates before pulling them
out
## What changes were proposed in this pull request?
The official TPC-DS 41 query currently fails because it contains a scalar
subquery with a disjunctive correlated predicate (the correlated predicates
were nested in ORs). This makes the `Analyzer` pull out the entire predicate
which is wrong and causes the following (correct) analysis exception: `The
correlated scalar subquery can only contain equality predicates`
This PR fixes this by first simplifing (or normalizing) the correlated
predicates before pulling them out of the subquery. I have also added a small
optimizer rule that rewrites correlated scalar subqueries into predicate
subqueries if they are used in a `Filter` and are wrapped by a predicate. This
is allows us to use semi joins instead of left outer joins.
## How was this patch tested?
Manual testing on TPC-DS 41, and added a test to SubquerySuite.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hvanhovell/spark SPARK-15122
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/12954.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #12954
----
commit f0871c921285a05602cf566c9f2c23901224d73e
Author: Herman van Hovell <[email protected]>
Date: 2016-05-06T13:39:43Z
Fix TPC-DS 41 - normalize predicates before pulling them out.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]