Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 4: (1 comment) Looks nice! In addition to the empty file concern, I wonder if in the explain output, one can clearly see the application of this optimization, other than checking out the files scanned vs the total one by one. Such an indicator could be very useful in rule out a problem (if any) in the area quickly. Sorry I was not able to see it in the code. http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@241 PS4, Line 241: limit 1 This makes me feel we should skip those files with 0 rows during pruning. In my test for a table with textfile format, I can add empty files in the folder for the table and impala will process it. Query: explain select * from table_bar +------------------------------------------------------------------------------------+ | Explain String | +------------------------------------------------------------------------------------+ | Max Per-Host Resource Reservation: Memory=0B Threads=2 | | Per-Host Resource Estimates: Memory=10MB | | WARNING: The following tables are missing relevant table and/or column statistics. | | default.table_bar | | | | PLAN-ROOT SINK | | | | | 01:EXCHANGE [UNPARTITIONED] | | | | | 00:SCAN HDFS [default.table_bar] | | HDFS partitions=1/1 files=1 size=0B | | row-size=4B cardinality=0 | +------------------------------------------------------------------------------------+ [09:24:31 qchen@qifan-10229: parquet] sqlci -q "select * from table_bar" Starting Impala Shell with no authentication using Python 2.7.16 Warning: live_progress only applies to interactive shell sessions, and is being skipped for now. Opened TCP connection to localhost:21000 Connected to localhost:21000 Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build ebe72ec25f4c6daabaa27f6daddd03b887806507) Query: select * from table_bar Query submitted at: 2020-11-18 09:24:48 (Coordinator: http://qifan-10229:25000) Query progress can be monitored at: http://qifan-10229:25000/query_plan?query_id=df40c6ecaeeb3a0e:11dd5cb700000000 Fetched 0 row(s) in 4.64s drop table if exists table_bar purge; create table if not exists table_bar (a int) STORED AS textfile location '/tmp/table_bar_dir'; touch empty.txt hdfs dfs -copyFromLocal empty.txt /tmp/table_bar_dir -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 18 Nov 2020 14:43:18 +0000 Gerrit-HasComments: Yes
