Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
> Is it possible for a Parquet data file with empty # of rows pass this test?
The getFileLength() check is used in other places too..so I borrowed that from 
generateScanRangeSpec().  For self-describing schema file formats like Parquet, 
 yes it is possible for the length to be non-zero and  num_rows zero. I think 
that handling such cases will need some major rework of this 
computeScanRangeLocation() method since right now it is agnostic to the file 
format (it does care about file system type but not so much the formats).  
Further,  I believe other changes will be needed in the metadata catalog layer 
to ensure this FileMetaData is plumbed through although I haven't looked 
closely into that. The trade-off is the size of the metadata in catalog cache 
could blow up for large number of files and we are already run into significant 
memory issues.


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@947
PS5, Line 947: if (isSimpleLimit && simpleLimitNumRows ==
             :           analyzer.getSimpleLimitStatus().second) {
             :         // for the simple limit case if the estimated rows has 
already reached the limit
             :         // there's no need to process more partitions
             :         break;
             :       }
> This is good.
Ack



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Thu, 19 Nov 2020 16:51:29 +0000
Gerrit-HasComments: Yes

Reply via email to