Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 4:

(1 comment)

Looks nice!

In addition to the empty file concern, I wonder if in the explain output, one 
can clearly see the application of this optimization, other than checking out 
the files scanned vs the total one by one. Such an indicator could be very 
useful in rule out a problem (if any) in the area quickly. Sorry I was not able 
to see it in the code.

http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test:

http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@241
PS4, Line 241:  limit 1
This makes me feel we should skip those files with 0 rows during pruning. In my 
test for a table with textfile format, I can add empty files in the folder for 
the table and impala will process it.

Query: explain select * from table_bar
+------------------------------------------------------------------------------------+
| Explain String                                                                
     |
+------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=0B Threads=2                        
     |
| Per-Host Resource Estimates: Memory=10MB                                      
     |
| WARNING: The following tables are missing relevant table and/or column 
statistics. |
| default.table_bar                                                             
     |
|                                                                               
     |
| PLAN-ROOT SINK                                                                
     |
| |                                                                             
     |
| 01:EXCHANGE [UNPARTITIONED]                                                   
     |
| |                                                                             
     |
| 00:SCAN HDFS [default.table_bar]                                              
     |
|    HDFS partitions=1/1 files=1 size=0B                                        
     |
|    row-size=4B cardinality=0                                                  
     |
+------------------------------------------------------------------------------------+

[09:24:31 qchen@qifan-10229: parquet] sqlci -q "select * from table_bar"
Starting Impala Shell with no authentication using Python 2.7.16
Warning: live_progress only applies to interactive shell sessions, and is being 
skipped for now.
Opened TCP connection to localhost:21000
Connected to localhost:21000
Server version: impalad version 4.0.0-SNAPSHOT DEBUG (build 
ebe72ec25f4c6daabaa27f6daddd03b887806507)
Query: select * from table_bar
Query submitted at: 2020-11-18 09:24:48 (Coordinator: http://qifan-10229:25000)
Query progress can be monitored at: 
http://qifan-10229:25000/query_plan?query_id=df40c6ecaeeb3a0e:11dd5cb700000000
Fetched 0 row(s) in 4.64s


drop table if exists table_bar purge;
create table if not exists table_bar (a int)
STORED AS textfile
location '/tmp/table_bar_dir';

touch empty.txt
hdfs dfs -copyFromLocal empty.txt /tmp/table_bar_dir



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 4
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Wed, 18 Nov 2020 14:43:18 +0000
Gerrit-HasComments: Yes

Reply via email to