[GitHub] spark pull request #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats ...

aokolnychyi Sat, 16 Sep 2017 05:18:22 -0700

GitHub user aokolnychyi opened a pull request:

    https://github.com/apache/spark/pull/19252


    [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable

    ## What changes were proposed in this pull request?
    
    Tables in the catalog cache are not invalidated once their statistics are 
updated. As a consequence, existing sessions will use the cached information 
even though it is not valid anymore. Consider and an example below. 
    
    ```
    // step 1
    spark.range(100).write.saveAsTable("tab1")
    // step 2
    spark.sql("analyze table tab1 compute statistics")
    // step 3
    spark.sql("explain cost select distinct * from tab1").show(false)
    // step 4
    spark.range(100).write.mode("append").saveAsTable("tab1")
    // step 5
    spark.sql("explain cost select distinct * from tab1").show(false)
    ```
    
    After step 3, the table will be present in the catalog relation cache. Step 
4 will correctly update the metadata inside the catalog but will NOT invalidate 
the cache.
    
    By the way, ``spark.sql("analyze table tab1 compute statistics")`` between 
step 3 and step 4 would also solve the problem.
    
    ## How was this patch tested?
    
    Current and additional unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aokolnychyi/spark spark-21969

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19252.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19252
    
----
commit ba963b46cd2917315bc2bd0cf237c7d9f79e9d65
Author: aokolnychyi <[email protected]>
Date:   2017-09-16T11:57:52Z

    [SPARK-21969][SQL] CommandUtils.updateTableStats should call refreshTable

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19252: [SPARK-21969][SQL] CommandUtils.updateTableStats ...

Reply via email to