GitHub user pwoody opened a pull request:
https://github.com/apache/spark/pull/14733
[SPARK-17170] [SQL] InMemoryTableScanExec driver-side partition pruning
## What changes were proposed in this pull request?
After caching data, we have statistics that enable us to eagerly prune
entire partitions before launching a query. This modifies the
InMemoryTableScanExec to prune partitions before launching the tasks.
## How was this patch tested?
Existing test suite with slight modification to scan over the data once as
setup.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/pwoody/spark feature/inMemoryPartitionPruning
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14733.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14733
----
commit 3ef63bd5095e9a42b0813e17b7d15fcb5603d8cc
Author: Patrick Woody <[email protected]>
Date: 2016-08-19T10:28:13Z
[SPARK-17170] [SQL] InMemoryTableScanExec driver-side partition pruning
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]