Stamatis Zampetakis created HIVE-23781:
------------------------------------------

             Summary: Incomplete partition column stats in CachedStore may lead 
to wrong aggregate stats
                 Key: HIVE-23781
                 URL: https://issues.apache.org/jira/browse/HIVE-23781
             Project: Hive
          Issue Type: Bug
            Reporter: Stamatis Zampetakis
            Assignee: Stamatis Zampetakis


Requesting aggregate stats from the Metastore ({{RawStore#get_aggr_stats_for}}) 
may return wrong results when the backing implementation is CachedStore and 
column statistics are missing from the cache.
 
The suspicious code lies inside {{CachedStore#mergeColStatsForPartitions}} that 
returns an [empty 
object|https://github.com/apache/hive/blob/31ee14644bf6105360d6266baa8c6c8060d38ea3/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java#L2267]
 when no stats are found in the cache. This is considered a valid value by the 
consumer so no additional lookup is performed in the rawstore to fetch the 
actual values.

Moreover, in the case where the cache holds values for some partitions but not 
for all those requested the result will be wrong assuming that the underlying 
rawstore has information about the requested partitions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to