GitHub user wzhfy opened a pull request:
https://github.com/apache/spark/pull/17668
[SPARK-20366] [SQL] Fix recursive join reordering: inside joins are not
reordered
## What changes were proposed in this pull request?
If a plan has multi-level successive joins, e.g.:
```
Join
/ \
Union t5
/ \
Join t4
/ \
Join t3
/ \
t1 t2
```
Currently we fail to reorder the inside joins, i.e. t1, t2, t3.
In join reorder, we use `OrderedJoin` to indicate a join has been ordered,
such that when transforming down the plan, these joins don't need to be
rerodered again.
But there's a problem in the definition of `OrderedJoin`:
The real join node is a parameter, but not a child. This breaks the
transform procedure because `mapChildren` applies transform on parameters which
should be children.
In this patch, we change `OrderedJoin` to a class having the same structure
as a join node.
## How was this patch tested?
Add a corresponding test case.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wzhfy/spark recursiveReorder
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17668.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17668
----
commit 522d2fae3d5e35cad96997da73ab4980fb816735
Author: wangzhenhua <[email protected]>
Date: 2017-04-17T13:20:01Z
support recursive reorder
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]