[ 
https://issues.apache.org/jira/browse/IMPALA-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-10084:
-----------------------------------
    Component/s: Catalog

> Display the number of estimated rows for a table
> ------------------------------------------------
>
>                 Key: IMPALA-10084
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10084
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Catalog
>            Reporter: Sahil Takiar
>            Priority: Major
>
> AFAICT, there is no way to determine the number of rows estimated for a table 
> when row counts have been estimated via file size:
> {code:java}
> [localhost:21000] default> create table test (col int);
> [localhost:21000] default> insert into table test values (1), (2), (3), (4), 
> (5);
> [localhost:21000] default> show table stats test;
> +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
> | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | 
> Incremental stats | Location                                   |
> +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
> | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   | false   
>           | hdfs://localhost:20500/test-warehouse/test |
> +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+
> [localhost:21000] default> explain select * from test order by col limit 10;
> +------------------------------------------------------------------------------------+
> | Explain String                                                              
>        |
> +------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                  
>        |
> | Per-Host Resource Estimates: Memory=32MB                                    
>        |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | default.test                                                                
>        |
> |                                                                             
>        |
> | PLAN-ROOT SINK                                                              
>        |
> | |                                                                           
>        |
> | 02:MERGING-EXCHANGE [UNPARTITIONED]                                         
>        |
> | |  order by: col ASC                                                        
>        |
> | |  limit: 10                                                                
>        |
> | |                                                                           
>        |
> | 01:TOP-N [LIMIT=10]                                                         
>        |
> | |  order by: col ASC                                                        
>        |
> | |  row-size=4B cardinality=3                                                
>        |
> | |                                                                           
>        |
> | 00:SCAN HDFS [default.test]                                                 
>        |
> |    HDFS partitions=1/1 files=1 size=10B                                     
>        |
> |    row-size=4B cardinality=3                                                
>        |
> +------------------------------------------------------------------------------------+
> [localhost:21000] default> set explain_level=3;
> localhost:21000] default> explain select * from test order by col limit 10;
> +--------------------------------------------------------------------------------------+
> | Explain String                                                              
>          |
> +--------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                  
>          |
> | Per-Host Resource Estimates: Memory=32MB                                    
>          |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics.   |
> | default.test                                                                
>          |
> | Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 
> AS       |
> | TINYINT)                                                                    
>          |
> |                                                                             
>          |
> | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>          |
> | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B 
> thread-reservation=1     |
> |   PLAN-ROOT SINK                                                            
>          |
> |   |  output exprs: col                                                      
>          |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                
>          |
> |   |                                                                         
>          |
> |   02:MERGING-EXCHANGE [UNPARTITIONED]                                       
>          |
> |      order by: col ASC                                                      
>          |
> |      limit: 10                                                              
>          |
> |      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0           
>          |
> |      tuple-ids=1 row-size=4B cardinality=3                                  
>          |
> |      in pipelines: 01(GETNEXT)                                              
>          |
> |                                                                             
>          |
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                              
>          |
> | Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
> thread-reservation=2 |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                
>          |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                
>          |
> |   01:TOP-N [LIMIT=10]                                                       
>          |
> |   |  order by: col ASC                                                      
>          |
> |   |  mem-estimate=12B mem-reservation=0B thread-reservation=0               
>          |
> |   |  tuple-ids=1 row-size=4B cardinality=3                                  
>          |
> |   |  in pipelines: 01(GETNEXT), 00(OPEN)                                    
>          |
> |   |                                                                         
>          |
> |   00:SCAN HDFS [default.test, RANDOM]                                       
>          |
> |      HDFS partitions=1/1 files=1 size=10B                                   
>          |
> |      stored statistics:                                                     
>          |
> |        table: rows=unavailable size=unavailable                             
>          |
> |        columns: unavailable                                                 
>          |
> |      extrapolated-rows=disabled max-scan-range-rows=unavailable             
>          |
> |      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1       
>          |
> |      tuple-ids=0 row-size=4B cardinality=3                                  
>          |
> |      in pipelines: 00(GETNEXT)                                              
>          |
> +--------------------------------------------------------------------------------------+
> [localhost:21000] default> create table part_test (col int) partitioned by 
> (partcol int);
> [localhost:21000] default> alter table part_test add partition (partcol=1);
> [localhost:21000] default> insert into table part_test partition (partcol=1) 
> values (1), (2), (3), (4), (5);
> [localhost:21000] default> show table stats part_test;
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format 
> | Incremental stats | Location                                                
>   |
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   
> | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | Total   | -1    | 1      | 10B  | 0B           |                   |        
> |                   |                                                         
>   |
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> [localhost:21000] default> show partitions part_test;
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format 
> | Incremental stats | Location                                                
>   |
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> | 1       | -1    | 1      | 10B  | NOT CACHED   | NOT CACHED        | TEXT   
> | false             | 
> hdfs://localhost:20500/test-warehouse/part_test/partcol=1 |
> | Total   | -1    | 1      | 10B  | 0B           |                   |        
> |                   |                                                         
>   |
> +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+
> [localhost:21000] default> explain select * from part_test order by col limit 
> 10;
> +--------------------------------------------------------------------------------------+
> | Explain String                                                              
>          |
> +--------------------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3                  
>          |
> | Per-Host Resource Estimates: Memory=32MB                                    
>          |
> | WARNING: The following tables are missing relevant table and/or column 
> statistics.   |
> | default.part_test                                                           
>          |
> | Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT 
> CAST(10     |
> | AS TINYINT)                                                                 
>          |
> |                                                                             
>          |
> | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                       
>          |
> | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B 
> thread-reservation=1     |
> |   PLAN-ROOT SINK                                                            
>          |
> |   |  output exprs: col, partcol                                             
>          |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                
>          |
> |   |                                                                         
>          |
> |   02:MERGING-EXCHANGE [UNPARTITIONED]                                       
>          |
> |      order by: col ASC                                                      
>          |
> |      limit: 10                                                              
>          |
> |      mem-estimate=16.00KB mem-reservation=0B thread-reservation=0           
>          |
> |      tuple-ids=1 row-size=8B cardinality=1                                  
>          |
> |      in pipelines: 01(GETNEXT)                                              
>          |
> |                                                                             
>          |
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1                              
>          |
> | Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB 
> thread-reservation=2 |
> |   DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED]                
>          |
> |   |  mem-estimate=0B mem-reservation=0B thread-reservation=0                
>          |
> |   01:TOP-N [LIMIT=10]                                                       
>          |
> |   |  order by: col ASC                                                      
>          |
> |   |  mem-estimate=8B mem-reservation=0B thread-reservation=0                
>          |
> |   |  tuple-ids=1 row-size=8B cardinality=1                                  
>          |
> |   |  in pipelines: 01(GETNEXT), 00(OPEN)                                    
>          |
> |   |                                                                         
>          |
> |   00:SCAN HDFS [default.part_test, RANDOM]                                  
>          |
> |      HDFS partitions=1/1 files=1 size=10B                                   
>          |
> |      stored statistics:                                                     
>          |
> |        table: rows=unavailable size=unavailable                             
>          |
> |        partitions: 0/1 rows=1                                               
>          |
> |        columns missing stats: col                                           
>          |
> |      extrapolated-rows=disabled max-scan-range-rows=unavailable             
>          |
> |      mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1       
>          |
> |      tuple-ids=0 row-size=8B cardinality=1                                  
>          |
> |      in pipelines: 00(GETNEXT)                                              
>          |
> +--------------------------------------------------------------------------------------+
> {code}
> It would be nice if there was a field in the explain plan similar to 
> "extrapolated-rows" that lists the estimated row count for the table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to