Xuefu Zhang created HIVE-7060: --------------------------------- Summary: Column stats give incorrect min and distinct_count Key: HIVE-7060 URL: https://issues.apache.org/jira/browse/HIVE-7060 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Xuefu Zhang
It seems that the result from column statistics isn't correct on two measures for numeric columns: min (which is always 0) and distinct count. Here is an example: {code} select count(distinct avgTimeOnSite), min(avgTimeO from UserVisits_web_text_nonenSite) from UserVisits_web_text_none; ... OK 9 1 Time taken: 9.747 seconds, Fetched: 1 row(s) (code} The statisitics for the column: {code} PREHOOK: query: desc formatted UserVisits_web_text_none avgTimeOnSite PREHOOK: type: DESCTABLE PREHOOK: Input: default@uservisits_web_text_none POSTHOOK: query: desc formatted UserVisits_web_text_none avgTimeOnSite POSTHOOK: type: DESCTABLE POSTHOOK: Input: default@uservisits_web_text_none # col_name data_type min max num_nulls distinct_count avg_col_len max_col_len num_trues num_falses comment avgTimeOnSite int 0 9 0 11 null null null {code} -- This message was sent by Atlassian JIRA (v6.2#6252)