GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/15136

    [SPARK-17581] [SQL] Invalidate Statistics After Some ALTER TABLE Commands

    ### What changes were proposed in this pull request?
    In the recent statistics-related work, our focus is on how to generate and 
store the statistics. After `Analyze Table` commands, the statistics will not 
be changed unless users run the command again. However, Hive behaves 
differently. For example, `ALTER TABLE SET LOCATION` will invalidate the 
statistics, including `numRows` and `rawDataSize`.
    ```
    hive> describe formatted t2;
    ...
    Location:                   hdfs://6b68a24121f4:9000/user/hive/warehouse/t2 
 
    Table Type:                 MANAGED_TABLE            
    Table Parameters:            
        COLUMN_STATS_ACCURATE   true                
        numFiles                4                   
        numRows                 2                   
        rawDataSize             2                   
        totalSize               4                   
        transient_lastDdlTime   1464590855          
    ...
    ```
    ```
    hive> alter table t2 set location 
'hdfs://6b68a24121f4:9000/user/hive/warehouse/t1';
    OK
    Time taken: 0.113 seconds
    ```
    ```
    hive> describe formatted t2;
    ...                  
    Location:                   hdfs://6b68a24121f4:9000/user/hive/warehouse/t1 
 
    Table Type:                 MANAGED_TABLE            
    Table Parameters:            
        COLUMN_STATS_ACCURATE   false               
        last_modified_by        root                
        last_modified_time      1474178025          
        numFiles                4                   
        numRows                 -1                  
        rawDataSize             -1                  
        totalSize               4                   
    ...
    ```
    
    This PR tries to fix the related issues.
    
    ### How was this patch tested?
    Added test cases.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark 
invalidateStatsAfterAlterTable

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15136
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to