GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22743

    [WIP][SPARK-25740][SQL] Set some configuration need invalidateStatsCache

    ## What changes were proposed in this pull request?
    How to reproduce:
    ```sql
    # spark-sql
    create table t1 (a int) stored as parquet;
    create table t2 (a int) stored as parquet;
    insert into table t1 values (1);
    insert into table t2 values (1);
    explain select * from t1, t2 where t1.a = t2.a;
    exit;
    ```
    ```sql
    # spark-sql
    set spark.sql.statistics.fallBackToHdfs=true;
    explain select * from t1, t2 where t1.a = t2.a;
    -- It is BroadcastHashJoin
    ```
    ```sql
    # spark-sql
    explain select * from t1, t2 where t1.a = t2.a;
    -- SortMergeJoin
    set spark.sql.statistics.fallBackToHdfs=true;
    explain select * from t1, t2 where t1.a = t2.a;
    -- SortMergeJoin, it should be BroadcastHashJoin
    ```
    We need `LogicalPlanStats.invalidateStatsCache` to clean cached stats when 
execute set `spark.sql.statistics.fallBackToHdfs` Command, but seems only we 
can do is `invalidateAllCachedTables`.
    ## How was this patch tested?
    
    manual tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25740

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22743
    
----
commit cf43e225c9da4f1274c7c82b568a89b3369e3515
Author: Yuming Wang <yumwang@...>
Date:   2018-10-16T07:27:03Z

    Set some configuration need invalidateStatsCache

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to