[ https://issues.apache.org/jira/browse/IMPALA-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Armstrong updated IMPALA-10084: ----------------------------------- Component/s: Catalog > Display the number of estimated rows for a table > ------------------------------------------------ > > Key: IMPALA-10084 > URL: https://issues.apache.org/jira/browse/IMPALA-10084 > Project: IMPALA > Issue Type: Sub-task > Components: Catalog > Reporter: Sahil Takiar > Priority: Major > > AFAICT, there is no way to determine the number of rows estimated for a table > when row counts have been estimated via file size: > {code:java} > [localhost:21000] default> create table test (col int); > [localhost:21000] default> insert into table test values (1), (2), (3), (4), > (5); > [localhost:21000] default> show table stats test; > +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+ > | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | > Incremental stats | Location | > +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+ > | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT | false > | hdfs://localhost:20500/test-warehouse/test | > +-------+--------+------+--------------+-------------------+--------+-------------------+--------------------------------------------+ > [localhost:21000] default> explain select * from test order by col limit 10; > +------------------------------------------------------------------------------------+ > | Explain String > | > +------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 > | > | Per-Host Resource Estimates: Memory=32MB > | > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | default.test > | > | > | > | PLAN-ROOT SINK > | > | | > | > | 02:MERGING-EXCHANGE [UNPARTITIONED] > | > | | order by: col ASC > | > | | limit: 10 > | > | | > | > | 01:TOP-N [LIMIT=10] > | > | | order by: col ASC > | > | | row-size=4B cardinality=3 > | > | | > | > | 00:SCAN HDFS [default.test] > | > | HDFS partitions=1/1 files=1 size=10B > | > | row-size=4B cardinality=3 > | > +------------------------------------------------------------------------------------+ > [localhost:21000] default> set explain_level=3; > localhost:21000] default> explain select * from test order by col limit 10; > +--------------------------------------------------------------------------------------+ > | Explain String > | > +--------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 > | > | Per-Host Resource Estimates: Memory=32MB > | > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | default.test > | > | Analyzed query: SELECT * FROM `default`.test ORDER BY col ASC LIMIT CAST(10 > AS | > | TINYINT) > | > | > | > | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B > thread-reservation=1 | > | PLAN-ROOT SINK > | > | | output exprs: col > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | > | > | 02:MERGING-EXCHANGE [UNPARTITIONED] > | > | order by: col ASC > | > | limit: 10 > | > | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 > | > | tuple-ids=1 row-size=4B cardinality=3 > | > | in pipelines: 01(GETNEXT) > | > | > | > | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 > | > | Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB > thread-reservation=2 | > | DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | 01:TOP-N [LIMIT=10] > | > | | order by: col ASC > | > | | mem-estimate=12B mem-reservation=0B thread-reservation=0 > | > | | tuple-ids=1 row-size=4B cardinality=3 > | > | | in pipelines: 01(GETNEXT), 00(OPEN) > | > | | > | > | 00:SCAN HDFS [default.test, RANDOM] > | > | HDFS partitions=1/1 files=1 size=10B > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | columns: unavailable > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=0 row-size=4B cardinality=3 > | > | in pipelines: 00(GETNEXT) > | > +--------------------------------------------------------------------------------------+ > [localhost:21000] default> create table part_test (col int) partitioned by > (partcol int); > [localhost:21000] default> alter table part_test add partition (partcol=1); > [localhost:21000] default> insert into table part_test partition (partcol=1) > values (1), (2), (3), (4), (5); > [localhost:21000] default> show table stats part_test; > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format > | Incremental stats | Location > | > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT > | false | > hdfs://localhost:20500/test-warehouse/part_test/partcol=1 | > | Total | -1 | 1 | 10B | 0B | | > | | > | > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > [localhost:21000] default> show partitions part_test; > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > | partcol | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format > | Incremental stats | Location > | > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > | 1 | -1 | 1 | 10B | NOT CACHED | NOT CACHED | TEXT > | false | > hdfs://localhost:20500/test-warehouse/part_test/partcol=1 | > | Total | -1 | 1 | 10B | 0B | | > | | > | > +---------+-------+--------+------+--------------+-------------------+--------+-------------------+-----------------------------------------------------------+ > [localhost:21000] default> explain select * from part_test order by col limit > 10; > +--------------------------------------------------------------------------------------+ > | Explain String > | > +--------------------------------------------------------------------------------------+ > | Max Per-Host Resource Reservation: Memory=8.00KB Threads=3 > | > | Per-Host Resource Estimates: Memory=32MB > | > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | default.part_test > | > | Analyzed query: SELECT * FROM `default`.part_test ORDER BY col ASC LIMIT > CAST(10 | > | AS TINYINT) > | > | > | > | F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 > | > | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B > thread-reservation=1 | > | PLAN-ROOT SINK > | > | | output exprs: col, partcol > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | | > | > | 02:MERGING-EXCHANGE [UNPARTITIONED] > | > | order by: col ASC > | > | limit: 10 > | > | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0 > | > | tuple-ids=1 row-size=8B cardinality=1 > | > | in pipelines: 01(GETNEXT) > | > | > | > | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 > | > | Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB > thread-reservation=2 | > | DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=02, UNPARTITIONED] > | > | | mem-estimate=0B mem-reservation=0B thread-reservation=0 > | > | 01:TOP-N [LIMIT=10] > | > | | order by: col ASC > | > | | mem-estimate=8B mem-reservation=0B thread-reservation=0 > | > | | tuple-ids=1 row-size=8B cardinality=1 > | > | | in pipelines: 01(GETNEXT), 00(OPEN) > | > | | > | > | 00:SCAN HDFS [default.part_test, RANDOM] > | > | HDFS partitions=1/1 files=1 size=10B > | > | stored statistics: > | > | table: rows=unavailable size=unavailable > | > | partitions: 0/1 rows=1 > | > | columns missing stats: col > | > | extrapolated-rows=disabled max-scan-range-rows=unavailable > | > | mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1 > | > | tuple-ids=0 row-size=8B cardinality=1 > | > | in pipelines: 00(GETNEXT) > | > +--------------------------------------------------------------------------------------+ > {code} > It would be nice if there was a field in the explain plan similar to > "extrapolated-rows" that lists the estimated row count for the table. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org