[ https://issues.apache.org/jira/browse/IMPALA-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jim Apple resolved IMPALA-6620. ------------------------------- Resolution: Duplicate Duplicates IMPALA-5615 > Compute incremental stats for groups of partitions does not update stats > correctly > ---------------------------------------------------------------------------------- > > Key: IMPALA-6620 > URL: https://issues.apache.org/jira/browse/IMPALA-6620 > Project: IMPALA > Issue Type: Bug > Components: Catalog > Affects Versions: Impala 2.8.0 > Environment: Impala - v2.8.0-cdh5.11.1 > We are using Hive Metastore Database embedded (by cloudera) > It's postgres 8.4.20 > OS: Centos > Reporter: H Milyakov > Priority: Major > > Executing COMPUTE INCREMENTAL STATS `table` PARTITION (`partition clause`) > does not compute statistics correctly (computes 0) when `partition clause` > matches more than one partition. > Executing the same command when `partition clause` matches just a single > partition > results in statistics being computed correctly (non 0 and non -1). > The issue was observed on our production cluster for a table with 40 000 > partitions and 20 columns. > I have copied the table to separate isolated cluster and observed the same > behaviour. > We use Impala 2.8.0 in Cloudera CDH 5.11 > The issue could be simulated with the following: > 1. CREATE TABLE my_test_table ( some_ints BIGINT ) > PARTITIONED BY ( part_1 BIGINT, part_2 STRING ) > STORED AS PARQUET; > > 2. The only column 'some_ints' is populated so that there are 10 000 > different partitions (part_1, part_2). > Total number of records in the table does not matter and could be same as > the number of different partitions. > > 3. Then running the compute incremental as described above simulates the > issue. > Did anybody faced similar issue or does have more info on the case? -- This message was sent by Atlassian JIRA (v7.6.3#76005)