GitHub user mbasmanova opened a pull request:
https://github.com/apache/spark/pull/18309
[SPARK-21079] [SQL] Calculate total size of a partition table as a sum of
individual partitions
## What changes were proposed in this pull request?
When calculating total size of a partitioned table, use storage URIs
associated with individual partitions to identify the files which make up the
table.
CC: @wzhfy
## How was this patch tested?
Ran ANALYZE TABLE xxx COMPUTE STATISTICS on a partitioned Hive table and
verified that sizeInBytes is calculated correctly. Before this change, the size
would be zero.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mbasmanova/spark mbasmanova-analyze-part-table
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18309.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18309
----
commit a1dbdd6f56e500586b399565a7f837800039bfb3
Author: Masha Basmanova <[email protected]>
Date: 2017-06-15T00:24:47Z
[SPARK-21079] [SQL] Calculate total size of a partition table as a sum of
individual partitions
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]