GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/10577
[SPARK-12616] [SQL] Adding a New Logical Operator Unions
`Union` logical operator only supports two children. Thus, adding a new
logical operator `Unions` which can have arbitrary number of children.
`Union` logical plan is a binary node. However, a typical use case for
union is to union a very large number of input sources (DataFrames, RDDs, or
files). It is not uncommon to union hundreds of thousands of files. In this
case, our optimizer can become very slow due to the large number of logical
unions. We should change the Union logical plan to support an arbitrary number
of children, and add a single rule in the optimizer to collapse all adjacent
`Union`s into a single `Unions`. Note that this problem doesn't exist in
physical plan, because the physical Union already supports arbitrary number of
children.
After this is merged, will submit a separate PR for adding a new optimizer
rule: Push `Unions` through `Filter` and `Project`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark unionAllMultiChildren
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10577.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10577
----
commit 73270c8aa7b7e387e7b0e75369dfcbf8c554aa5e
Author: gatorsmile <[email protected]>
Date: 2016-01-04T20:09:50Z
added a new logical operator UNIONS
commit d9811c7bb3f2c15ef9ba6fe95ec0b09f8f66b973
Author: gatorsmile <[email protected]>
Date: 2016-01-04T20:21:36Z
Merge remote-tracking branch 'upstream/master' into unionAllMultiChildren
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]