Paul Rogers created IMPALA-8024:
-----------------------------------

             Summary: HBase table cardinality estimates are wrong
                 Key: IMPALA-8024
                 URL: https://issues.apache.org/jira/browse/IMPALA-8024
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.1.0
            Reporter: Paul Rogers


IMPALA-8021 added cardinality estimates to EXPLAIN plan output. Running some of 
our {{PlannerTest}} files revealed that our HBase cardinality estimates are 
very poor, even for our simple test tables. For example, for 
{{functional_hbase.alltypessmall}}:

{{count\(*)}} tells us that there are 100 rows:

{noformat}
select count(*) from functional_hbase.alltypessmall
+----------+
| count(*) |
+----------+
| 100      |
+----------+
{noformat}

Table stats claim that there are only 60 rows:

{noformat}
show table stats functional_hbase.alltypessmall;
+-----------------+--------------+------------+------+
| Region Location | Start RowKey | Est. #Rows | Size |
+-----------------+--------------+------------+------+
| localhost       |              | 10         | 0B   |
| localhost       | 1            | 10         | 0B   |
| localhost       | 3            | 10         | 0B   |
| localhost       | 5            | 10         | 0B   |
| localhost       | 7            | 10         | 0B   |
| localhost       | 9            | 10         | 0B   |
| Total           |              | 60         | 0B   |
+-----------------+--------------+------------+------+
{noformat}

The NDV stats show that there must be at least 100 rows:

{noformat}
show column stats functional_hbase.alltypessmall
+-----------------+-----------+------------------+--------+----------+----------+
| Column          | Type      | #Distinct Values | #Nulls | Max Size | Avg Size 
|
+-----------------+-----------+------------------+--------+----------+----------+
| id              | INT       | 99               | 0      | 4        | 4        
|
...
| timestamp_col   | TIMESTAMP | 100              | 0      | 16       | 16       
|
...
+-----------------+-----------+------------------+--------+----------+----------+
{noformat}

Planning a query, the most critical part, thinks there are only 50 rows:

{noformat}
select *
from functional.alltypesagg join functional_hbase.alltypessmall using (id, 
int_col)

|--01:SCAN HBASE [functional_hbase.alltypessmall]
|     row-size=89B cardinality=50
{noformat}

We need a more reliable estimate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to