[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

CodingCat Fri, 01 Dec 2017 15:00:06 -0800

GitHub user CodingCat opened a pull request:

    https://github.com/apache/spark/pull/19864


    [SPARK-22673][SQL] InMemoryRelation should utilize on-disk table stats 
whenever possible

    ## What changes were proposed in this pull request?
    
    The current implementation of InMemoryRelation always uses the most 
expensive execution plan when writing cache
    With CBO enabled, we can actually have a more exact estimation of the 
underlying table size...
    
    ## How was this patch tested?
    
    existing test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/CodingCat/spark SPARK-22673

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19864.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19864
    
----
commit b2fb1d25804b7bdbe1a767306a319dc748965bce
Author: CodingCat <[email protected]>
Date:   2016-03-07T14:37:37Z

    improve the doc for "spark.memory.offHeap.size"

commit 0971900d562cb1a18af6f7de02bb8eb95637a640
Author: CodingCat <[email protected]>
Date:   2016-03-07T19:00:16Z

    fix

commit 32f7c74a9b5cf4f19e7d14357bb87064383e11b3
Author: CodingCat <[email protected]>
Date:   2017-12-01T23:05:35Z

    use cbo stats in inmemoryrelation

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

Reply via email to