Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/14712
I think the problem lies in the dependence on hive metastore. As long as we
use hive metastore to persist/retrieve statistics, we need to deal with these
flags.
- If we analyze the table in spark and want to persist stats in Hive
metastore (and retrieve them when we do queries), we must set
STATS_GENERATED_VIA_STATS_TASK, otherwise the stats won't be stored. - this is
in top priority
- If users alter table's properties (without setting
STATS_GENERATED_VIA_STATS_TASK) in Spark or Hive, then COLUMN_STATS_ACCURATE
will be false and stats be set to invalid, so it's unnecessary to read the
stats in Spark. - this is in second priority
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]