GitHub user CodingCat opened a pull request:
https://github.com/apache/spark/pull/19864
[SPARK-22673][SQL] InMemoryRelation should utilize on-disk table stats
whenever possible
## What changes were proposed in this pull request?
The current implementation of InMemoryRelation always uses the most
expensive execution plan when writing cache
With CBO enabled, we can actually have a more exact estimation of the
underlying table size...
## How was this patch tested?
existing test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/CodingCat/spark SPARK-22673
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19864.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19864
----
commit b2fb1d25804b7bdbe1a767306a319dc748965bce
Author: CodingCat <[email protected]>
Date: 2016-03-07T14:37:37Z
improve the doc for "spark.memory.offHeap.size"
commit 0971900d562cb1a18af6f7de02bb8eb95637a640
Author: CodingCat <[email protected]>
Date: 2016-03-07T19:00:16Z
fix
commit 32f7c74a9b5cf4f19e7d14357bb87064383e11b3
Author: CodingCat <[email protected]>
Date: 2017-12-01T23:05:35Z
use cbo stats in inmemoryrelation
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]