GitHub user fjh100456 opened a pull request:

    https://github.com/apache/spark/pull/22693

    [SPARK-25701][SQL] Supports calculation of table statistics from 
partition's catalog statistics.

    ## What changes were proposed in this pull request?
    
    When determine table statistics, if the `totalSize` of the table is not 
defined, we fallback to HDFS to get the table statistics when 
`spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default 
value(`spark.sql.defaultSizeInBytes`) will be taken, which will lead to tables 
without `totalSize` property may not be broadcast(Except parquet). 
    
    Fortunately, in most case the data is written into the table by a insertion 
command which will save the data-size in metastore, so it's possible to use 
metastore to calculate the table statistics.
    
    ## How was this patch tested?
    Add test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fjh100456/spark StatisticCommit

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22693
    
----
commit e610477063b4f326b8261d59b55abce83cbb82e7
Author: fjh100456 <fu.jinhua6@...>
Date:   2018-10-11T06:43:52Z

    [SPARK-25701][SQL] Supports calculation of table statistics from 
partition's catalog statistics.
    
    ## What changes were proposed in this pull request?
    
    When obtaining table statistics, if the `totalSize` of the table is not 
defined, we fallback to HDFS to get the table statistics when 
`spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default 
value(`spark.sql.defaultSizeInBytes`) will be taken.
    
    Fortunately, in most case the data is written into the table by a insertion 
command which will save the data-size in metastore, so it's possible to use 
metastore to calculate the table statistics.
    
    ## How was this patch tested?
    Add test.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to