GitHub user fjh100456 opened a pull request:
https://github.com/apache/spark/pull/22693
[SPARK-25701][SQL] Supports calculation of table statistics from
partition's catalog statistics.
## What changes were proposed in this pull request?
When determine table statistics, if the `totalSize` of the table is not
defined, we fallback to HDFS to get the table statistics when
`spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default
value(`spark.sql.defaultSizeInBytes`) will be taken, which will lead to tables
without `totalSize` property may not be broadcast(Except parquet).
Fortunately, in most case the data is written into the table by a insertion
command which will save the data-size in metastore, so it's possible to use
metastore to calculate the table statistics.
## How was this patch tested?
Add test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fjh100456/spark StatisticCommit
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22693.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22693
----
commit e610477063b4f326b8261d59b55abce83cbb82e7
Author: fjh100456 <fu.jinhua6@...>
Date: 2018-10-11T06:43:52Z
[SPARK-25701][SQL] Supports calculation of table statistics from
partition's catalog statistics.
## What changes were proposed in this pull request?
When obtaining table statistics, if the `totalSize` of the table is not
defined, we fallback to HDFS to get the table statistics when
`spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default
value(`spark.sql.defaultSizeInBytes`) will be taken.
Fortunately, in most case the data is written into the table by a insertion
command which will save the data-size in metastore, so it's possible to use
metastore to calculate the table statistics.
## How was this patch tested?
Add test.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]