Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870 PS5, Line 870: fd.getFileLength() > Is it possible for a Parquet data file with empty # of rows pass this test? The getFileLength() check is used in other places too..so I borrowed that from generateScanRangeSpec(). For self-describing schema file formats like Parquet, yes it is possible for the length to be non-zero and num_rows zero. I think that handling such cases will need some major rework of this computeScanRangeLocation() method since right now it is agnostic to the file format (it does care about file system type but not so much the formats). Further, I believe other changes will be needed in the metadata catalog layer to ensure this FileMetaData is plumbed through although I haven't looked closely into that. The trade-off is the size of the metadata in catalog cache could blow up for large number of files and we are already run into significant memory issues. http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@947 PS5, Line 947: if (isSimpleLimit && simpleLimitNumRows == : analyzer.getSimpleLimitStatus().second) { : // for the simple limit case if the estimated rows has already reached the limit : // there's no need to process more partitions : break; : } > This is good. Ack -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 19 Nov 2020 16:51:29 +0000 Gerrit-HasComments: Yes