GitHub user ericl opened a pull request:
https://github.com/apache/spark/pull/14241
[SPARK-16596] [SQL] Refactor DataSourceScanExec to do partition discovery
at execution instead of planning time
## What changes were proposed in this pull request?
Partition discovery is rather expensive, so we should do it at execution
time instead of during physical planning. Right now there is not much benefit
since ListingFileCatalog will read scan for all partitions at planning time
anyways, but this can be optimized in the future. Also, there might be more
information for partition pruning not available at planning time.
TODO: In another pr, move DataSourceScanExec to it's own file.
## How was this patch tested?
Existing tests (it might be worth adding a test that catalog.listFiles() is
delayed until execution, but this can be delayed until there is an actual
benefit to doing so).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/ericl/spark refactor
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/14241.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #14241
----
commit d04636474cce217106c1b3bfb60b5da54a53f7e5
Author: Eric Liang <[email protected]>
Date: 2016-07-16T02:18:52Z
Fri Jul 15 19:18:52 PDT 2016
commit 36d6ef44051a9ac57b9d6d1681aa9b11fa16d259
Author: Eric Liang <[email protected]>
Date: 2016-07-17T21:28:47Z
Sun Jul 17 14:28:47 PDT 2016
commit 6c0eb0e05238e21c68b3e26c1efad01c2af3e5e8
Author: Eric Liang <[email protected]>
Date: 2016-07-17T21:29:46Z
Sun Jul 17 14:29:46 PDT 2016
commit 1a4660286496663f6cb3414a22460e4fb24610b1
Author: Eric Liang <[email protected]>
Date: 2016-07-17T21:36:58Z
Sun Jul 17 14:36:58 PDT 2016
commit 538233499efce05379110d7210a0cdc7e25b699e
Author: Eric Liang <[email protected]>
Date: 2016-07-17T21:42:32Z
Sun Jul 17 14:42:32 PDT 2016
commit 98d6d74dde496b2256081e5840d56e91031e4db3
Author: Eric Liang <[email protected]>
Date: 2016-07-17T21:55:13Z
Sun Jul 17 14:55:13 PDT 2016
commit 0d4642a3cef757666fbc72932d3eb78bbaeec530
Author: Eric Liang <[email protected]>
Date: 2016-07-17T22:12:24Z
Sun Jul 17 15:12:24 PDT 2016
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]