Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19864
Is this initial statistics important? After the columnar RDD is
materialized, we will get accurate statistics then. Don't we?
On Dec 3, 2017 1:43 AM, "Nan Zhu" <[email protected]> wrote:
> *@CodingCat* commented on this pull request.
> ------------------------------
>
> In sql/core/src/main/scala/org/apache/spark/sql/execution/
> CacheManager.scala
> <https://github.com/apache/spark/pull/19864#discussion_r154501939>:
>
> > - planToCache,
> - InMemoryRelation(
> - sparkSession.sessionState.conf.useCompression,
> - sparkSession.sessionState.conf.columnBatchSize,
> - storageLevel,
> -
sparkSession.sessionState.executePlan(planToCache).executedPlan,
> - tableName)))
> + val inMemoryRelation = InMemoryRelation(
> + sparkSession.sessionState.conf.useCompression,
> + sparkSession.sessionState.conf.columnBatchSize,
> + storageLevel,
> + sparkSession.sessionState.executePlan(planToCache).executedPlan,
> + tableName)
> + if (planToCache.conf.cboEnabled &&
planToCache.stats.rowCount.isDefined) {
> + inMemoryRelation.setStatsFromCachedPlan(planToCache)
> + }
>
> I have to make InMemoryRelation stateful to avoid breaking APIs.....
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/spark/pull/19864#pullrequestreview-80680362>,
> or mute the thread
>
<https://github.com/notifications/unsubscribe-auth/AAEM96llEjZsyqac_xi9Nkks_2idfmgEks5s8YxWgaJpZM4QzBjk>
> .
>
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]