Github user wangyum commented on the issue:
https://github.com/apache/spark/pull/22743
Datasource table will not cache in
[tableRelationCache](https://github.com/apache/spark/blob/01c3dfab158d40653f8ce5d96f57220297545d5b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala#L134).
Hive table only occured when Hive table stats is empty and enable
`spark.sql.hive.convertMetastoreParquet` (default value). then when we read
this table, Spark will
[convertToLogicalRelation](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L116)
and [cache
it](https://github.com/apache/spark/blob/a2f502cf53b6b00af7cb80b6f38e64cf46367595/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L207).
Empty stats occured at least in 2 situations:
1. Create as Hive table and enable `spark.sql.hive.convertMetastoreParquet`
(default value) and disable `spark.sql.statistics.size.autoUpdate.enabled`
(default value) then do inserting.
2. Table managed by Hive and didn't gather stats.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]