Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/14971
... Very surprised about Hive... Any `ALTER TABLE SET/UNSET TBLPROPERTIES`
statements can invalidate the Hive-generated statistics...
```Scala
hiveClient.runSqlHive(s"ANALYZE TABLE $oldName COMPUTE STATISTICS")
hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println)
```
```
Table Parameters:
COLUMN_STATS_ACCURATE true
numFiles 1
numRows 500
rawDataSize 5312
spark.sql.statistics.numRows 500
spark.sql.statistics.totalSize 5812
totalSize 5812
transient_lastDdlTime 1473610039
```
```Scala
hiveClient.runSqlHive(s"ALTER TABLE $oldName SET TBLPROPERTIES ('foofoo' =
'a')")
hiveClient.runSqlHive(s"DESCRIBE FORMATTED $oldName").foreach(println)
```
```
Table Parameters:
COLUMN_STATS_ACCURATE false
foofoo a
last_modified_by xiaoli
last_modified_time 1473610039
numFiles 1
numRows -1
rawDataSize -1
spark.sql.statistics.numRows 500
spark.sql.statistics.totalSize 5812
totalSize 5812
transient_lastDdlTime 1473610039
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]