Hi! I'm using impala-kudu currently.
impala's version is v2.7.0 kudu's version is 1.3 I found out table statistics hint few days ago. So i tried compute statistics using command `compute stats`. After short time no errors shown my screen, but all the rows was -1. So i searched about this, then i could find this one. https://issues.apache.org/jira/browse/IMPALA-2830 Question 1. Can i manually set rows? And i found column statistics computed with wrong value. For example, some column's actual distinct value was 5092153, but command `show column stats ${table}` shows 5405440. (Similar other columns too) Question 2. Why this difference happens? Also, can i set value manually? And after all i'm not clear impala use this information during query processing. For example, Issued `SELECT COUNT(DISTINCT ${column}) FROM ${table}`, and i found impala scan from kudu using `summary` command. +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+ | Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail | +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+ | 06:AGGREGATE | 1 | 121.09us | 121.09us | 1 | 1 | 64.00 KB | -1 B | FINALIZE | | 05:EXCHANGE | 1 | 61.82us | 61.82us | 12 | 1 | 0 B | -1 B | UNPARTITIONED | | 02:AGGREGATE | 12 | 3.71ms | 5.53ms | 12 | 1 | 16.00 KB | 10.00 MB | | | 04:AGGREGATE | 12 | 171.00ms | 181.15ms | 5.09M | 5.41M | 154.58 MB | 11.57 MB | | | 03:EXCHANGE | 12 | 12.85ms | 14.27ms | 7.93M | 5.41M | 0 B | 0 B | HASH(c) | | 01:AGGREGATE | 12 | 2.72s | 4.80s | 7.93M | 5.41M | 170.08 MB | 138.88 MB | STREAMING | | 00:SCAN KUDU | 12 | 991.95ms | 5.38s | 963.00M | 963.00M | 2.30 MB | 0 B | | +--------------+--------+----------+----------+---------+------------+-----------+---------------+---------------+ Why impala did not used column statistics information? Question 3. If i can set statistics value manually, can impala understands that? it seems impala do not use computed statistics information. I working on this, but it's hard to know more. Thanks! Have a nice day.
