Rajesh Balamohan created HIVE-23788: ---------------------------------------
Summary: FilterStatsRule misestimate causes hashtable computation to rehash often Key: HIVE-23788 URL: https://issues.apache.org/jira/browse/HIVE-23788 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan Depending on available statistics, FilterStatsRule estimates the rows as numRows/3 at times. This causes, lower keyCount to be projected for hashtable computation causing rehashing often. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L952] [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/stats/annotation/StatsRulesProcFactory.java#L1192] E.g TPCDS Q74 @ 10TB. But as part of evaluating "t_s_firstyear.year_total > 0, t_w_secyear.year_total / t_w_firstyear.year_total , t_s_secyear.year_total / t_s_firstyear.year_total " conditions, it projects 1/3rd of the rows causing rehashing of hashtable in downstream vertex. May have to check whether stats can be projected for these columns correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005)