[GitHub] spark pull request #19880: [SPARK-22626][SQL][FOLLOWUP] improve documentatio...

wzhfy Mon, 04 Dec 2017 08:02:53 -0800

GitHub user wzhfy opened a pull request:

    https://github.com/apache/spark/pull/19880


    [SPARK-22626][SQL][FOLLOWUP] improve documentation and simplify test case

    ## What changes were proposed in this pull request?
    
    The reason why some Hive tables have `numRows` statistics is that, in Hive, 
when stats gathering is disabled, `numRows` is always zero after INSERT command:
    ```
    hive> create table src (key int, value string) stored as orc;
    hive> desc formatted src;
    Table Parameters:            
        COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\"}
        numFiles                0                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               0                   
        transient_lastDdlTime   1512399590 
    
    hive> set hive.stats.autogather=false;
    hive> insert into src select 1, 'a';
    hive> desc formatted src;
    Table Parameters:            
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               275                 
        transient_lastDdlTime   1512399647 
    
    hive> insert into src select 1, 'b';
    hive> desc formatted src;
    Table Parameters:            
        numFiles                2                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               550                 
        transient_lastDdlTime   1512399687 
    ```
    
    ## How was this patch tested?
    
    Modified existing test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wzhfy/spark doc_zero_rowCount

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19880.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19880
    
----
commit 9be829d208f7e2d6a88b9d2008fc04eec4a4ad8e
Author: Zhenhua Wang <[email protected]>
Date:   2017-12-04T15:53:49Z

    improve doc

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19880: [SPARK-22626][SQL][FOLLOWUP] improve documentatio...

Reply via email to