GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/15657
[DO NOT MERGE] Test partition
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark test-partition
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15657.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15657
----
commit c2eacb7da1d2d4129b19be89a2c07e91dbff3964
Author: Michael Allman <[email protected]>
Date: 2016-08-10T19:07:34Z
[SPARK-16980][SQL] Load only catalog table partition metadata required
to answer a query
commit 1f611c4089102744242b73346d9724d248635cac
Author: Michael Allman <[email protected]>
Date: 2016-09-13T01:21:38Z
Add a new catalyst optimizer rule to SQL core for pruning unneeded
partitions' files from a table file catalog
commit 8b24eada4a0b49f39d16570ee86f52ddc1682251
Author: Michael Allman <[email protected]>
Date: 2016-10-08T00:15:11Z
Include the type of file catalog in the FileSourceScanExec metadata
commit f82f0d228141dd026b0b631e8d984961ee8b827b
Author: Michael Allman <[email protected]>
Date: 2016-10-08T00:15:54Z
TODO: Consider renaming FileCatalog to better differentiate it from
BasicFileCatalog (or vice-versa)
commit 1f0d5d88538da058e474098eabba53d387f70f53
Author: Eric Liang <[email protected]>
Date: 2016-10-11T02:54:53Z
try out parquet case insensitive fallback
commit 198dd9457fad08516f65ea1bcfa6edf4af17d948
Author: Michael Allman <[email protected]>
Date: 2016-10-11T17:53:13Z
Refactor the FileSourceScanExec.metadata val to make it prettier
commit acc84f07f53d3c87c5637636e69b1c564421484a
Author: Michael Allman <[email protected]>
Date: 2016-10-11T19:00:43Z
Refactor `TableFileCatalog.listFiles` to call `listDataLeafFiles` once
instead of once per partition
commit 59de5ca2c8b209a190dc0c6082fc6e2d2de0096b
Author: Eric Liang <[email protected]>
Date: 2016-10-11T23:03:18Z
fix and add test for input files
commit 3b51624263cfcedd3e51b71342b940592a5f6118
Author: Eric Liang <[email protected]>
Date: 2016-10-11T23:09:06Z
rename test
commit f94863dd386a8654986a1fde09e5d87ded97a6e3
Author: Eric Liang <[email protected]>
Date: 2016-10-13T01:09:02Z
fix it
commit 0958bcd8f088d5641fc78952b8265ce05232c3f9
Author: Eric Liang <[email protected]>
Date: 2016-10-12T20:20:11Z
feature flag
commit 291cee788e1bcc3ecbd7b1a4187f8eba58e134fb
Author: Eric Liang <[email protected]>
Date: 2016-10-12T22:48:03Z
add comments
commit 022d5b9873018dad8ac08646704f567176977877
Author: Eric Liang <[email protected]>
Date: 2016-10-13T01:26:23Z
more test cases
commit 8bd27be814f7721f3764364c72b33c7f67e0e9ff
Author: Eric Liang <[email protected]>
Date: 2016-10-13T01:46:41Z
also fix a bug with zero partitions selected
commit 627572e0020d313a9c1378349e2ee4ab0d0e97f1
Author: Eric Liang <[email protected]>
Date: 2016-10-13T17:30:48Z
extend and fix flakiness in test
commit 6d8e7ea9f904e33af4ca7372f5b31379aede9308
Author: Michael Allman <[email protected]>
Date: 2016-10-13T17:55:26Z
Enhance `ParquetMetastoreSuite` with mixed-case partition columns
commit 21caa932a157ec3dd394829061b06bd3d857de0f
Author: Michael Allman <[email protected]>
Date: 2016-10-13T18:29:25Z
Tidy up a little by removing some unused imports, an unused method and
moving a protected method down and making it private
commit d7795cd0f3bc517bdf278e626ca25ce08ea23bcb
Author: Michael Allman <[email protected]>
Date: 2016-10-13T18:44:15Z
Put partition count in `FileSourceScanExec.metadata` for partitioned
tables
commit 765f93ce664ef33c1c62bf80b678ff5ba2992b85
Author: Michael Allman <[email protected]>
Date: 2016-10-13T20:48:33Z
Fix some errors in my revision of `ParquetSourceSuite`
commit e1635e4570c0e4b892b93d1ac1e71d52d5a4f66b
Author: Eric Liang <[email protected]>
Date: 2016-10-14T01:24:31Z
Add metrics and cost tests for partition pruning effectiveness (#5)
* [SPARK-16980][SQL] Load only catalog table partition metadata required
to answer a query
* Add a new catalyst optimizer rule to SQL core for pruning unneeded
partitions' files from a table file catalog
* Include the type of file catalog in the FileSourceScanExec metadata
* TODO: Consider renaming FileCatalog to better differentiate it from
BasicFileCatalog (or vice-versa)
* try out parquet case insensitive fallback
* Refactor the FileSourceScanExec.metadata val to make it prettier
* fix and add test for input files
* rename test
* Refactor `TableFileCatalog.listFiles` to call `listDataLeafFiles` once
instead of once per partition
* fix it
* more test cases
* also fix a bug with zero partitions selected
* feature flag
* add comments
* extend and fix flakiness in test
* Enhance `ParquetMetastoreSuite` with mixed-case partition columns
* Tidy up a little by removing some unused imports, an unused method and
moving a protected method down and making it private
* Put partition count in `FileSourceScanExec.metadata` for partitioned
tables
* Fix some errors in my revision of `ParquetSourceSuite`
* Thu Oct 13 17:18:14 PDT 2016
* more generic
* Thu Oct 13 18:09:42 PDT 2016
* Thu Oct 13 18:09:55 PDT 2016
* Thu Oct 13 18:22:31 PDT 2016
commit 71049d130e89aedba75e8875d8fde7620d6a55e2
Author: Eric Liang <[email protected]>
Date: 2016-10-14T02:27:01Z
Actually register the hive catalog metrics, also revert broken tests (#6)
* Thu Oct 13 19:02:36 PDT 2016
* Thu Oct 13 19:03:06 PDT 2016
commit 6a63afd156d4806122b9ad0c2593de69a0ae790c
Author: Eric Liang <[email protected]>
Date: 2016-10-14T21:04:01Z
Fri Oct 14 14:04:01 PDT 2016
commit 6b02b3c36b3c1f99695262f9d60fe2aaaf25c5bc
Author: Michael Allman <[email protected]>
Date: 2016-10-14T21:35:10Z
[SPARK-16980][SQL] Load only catalog table partition metadata required
to answer a query
commit e816919fe8b4cd06cc91fb373e8e55f7c18e99b6
Author: Michael Allman <[email protected]>
Date: 2016-09-13T01:21:38Z
Add a new catalyst optimizer rule to SQL core for pruning unnecessary
partition data from a HadoopFsRelation's file catalog
commit 8cca6dc02847eb04740ec1ed5d29920b4f2f0030
Author: Michael Allman <[email protected]>
Date: 2016-10-08T00:15:11Z
Include the type of file catalog in the FileSourceScanExec metadata
commit 7acc3f1072ece6b2e5f5324ff84bbcbeae487ef2
Author: Eric Liang <[email protected]>
Date: 2016-10-11T02:54:53Z
try out parquet case insensitive fallback
commit cf7d1f15e0045cbd12c81a39138e7c3439c611d7
Author: Michael Allman <[email protected]>
Date: 2016-10-11T17:53:13Z
Refactor the FileSourceScanExec.metadata val to make it prettier
commit c75855c0615d88001a83c03a9515a9b1fff0b241
Author: Eric Liang <[email protected]>
Date: 2016-10-11T23:03:18Z
fix and add test for input files
commit 821372f2fdc09ebd882bb6958bed24a42738235c
Author: Eric Liang <[email protected]>
Date: 2016-10-11T23:09:06Z
rename test
commit d0b893ba5c45db32aad640ea6732a8803c054f07
Author: Michael Allman <[email protected]>
Date: 2016-10-11T19:00:43Z
Refactor `TableFileCatalog.listFiles` to call `listDataLeafFiles` once
instead of once per partition
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]