[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

wzhfy Sat, 27 Aug 2016 00:48:07 -0700

Github user wzhfy commented on the issue:

    https://github.com/apache/spark/pull/14712
  
    I think the problem lies in the dependence on hive metastore. As long as we 
use hive metastore to persist/retrieve statistics, we need to deal with these 
flags.
    
    - If we analyze the table in spark and want to persist stats in Hive 
metastore (and retrieve them when we do queries), we must set 
STATS_GENERATED_VIA_STATS_TASK, otherwise the stats won't be stored. - this is 
in top priority
    - If users alter table's properties (without setting 
STATS_GENERATED_VIA_STATS_TASK) in Spark or Hive, then COLUMN_STATS_ACCURATE 
will be false and stats be set to invalid, so it's unnecessary to read the 
stats in Spark. - this is in second priority



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14712: [SPARK-17072] [SQL] support table-level statistics gener...

Reply via email to