Paul Rogers created IMPALA-8024: ----------------------------------- Summary: HBase table cardinality estimates are wrong Key: IMPALA-8024 URL: https://issues.apache.org/jira/browse/IMPALA-8024 Project: IMPALA Issue Type: Bug Components: Frontend Affects Versions: Impala 3.1.0 Reporter: Paul Rogers
IMPALA-8021 added cardinality estimates to EXPLAIN plan output. Running some of our {{PlannerTest}} files revealed that our HBase cardinality estimates are very poor, even for our simple test tables. For example, for {{functional_hbase.alltypessmall}}: {{count\(*)}} tells us that there are 100 rows: {noformat} select count(*) from functional_hbase.alltypessmall +----------+ | count(*) | +----------+ | 100 | +----------+ {noformat} Table stats claim that there are only 60 rows: {noformat} show table stats functional_hbase.alltypessmall; +-----------------+--------------+------------+------+ | Region Location | Start RowKey | Est. #Rows | Size | +-----------------+--------------+------------+------+ | localhost | | 10 | 0B | | localhost | 1 | 10 | 0B | | localhost | 3 | 10 | 0B | | localhost | 5 | 10 | 0B | | localhost | 7 | 10 | 0B | | localhost | 9 | 10 | 0B | | Total | | 60 | 0B | +-----------------+--------------+------------+------+ {noformat} The NDV stats show that there must be at least 100 rows: {noformat} show column stats functional_hbase.alltypessmall +-----------------+-----------+------------------+--------+----------+----------+ | Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size | +-----------------+-----------+------------------+--------+----------+----------+ | id | INT | 99 | 0 | 4 | 4 | ... | timestamp_col | TIMESTAMP | 100 | 0 | 16 | 16 | ... +-----------------+-----------+------------------+--------+----------+----------+ {noformat} Planning a query, the most critical part, thinks there are only 50 rows: {noformat} select * from functional.alltypesagg join functional_hbase.alltypessmall using (id, int_col) |--01:SCAN HBASE [functional_hbase.alltypessmall] | row-size=89B cardinality=50 {noformat} We need a more reliable estimate. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org