Stamatis Zampetakis created HIVE-23781: ------------------------------------------
Summary: Incomplete partition column stats in CachedStore may lead to wrong aggregate stats Key: HIVE-23781 URL: https://issues.apache.org/jira/browse/HIVE-23781 Project: Hive Issue Type: Bug Reporter: Stamatis Zampetakis Assignee: Stamatis Zampetakis Requesting aggregate stats from the Metastore ({{RawStore#get_aggr_stats_for}}) may return wrong results when the backing implementation is CachedStore and column statistics are missing from the cache. The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} that returns an [empty object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267] when no stats are found in the cache. This is considered a valid value by the consumer so no additional lookup is performed in the rawstore to fetch the actual values. Moreover, in the case where the cache holds values for some partitions but not for all those requested the result will be wrong assuming that the underlying rawstore has information about the requested partitions. -- This message was sent by Atlassian Jira (v8.3.4#803005)