Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test: http://gerrit.cloudera.org:8080/#/c/16723/4/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@241 PS4, Line 241: limit 1 > This makes me feel we should skip those files with 0 rows during pruning. I In HdfsScanNode.computeScanRangeLocation(), we skip computing the scan range if file is empty: Line 912 on master: // Skips files that have no associated blocks. if (fileDesc.getNumFileBlocks() == 0) continue; However, we populate the totalFilesPerFs_ treemap earlier .. on line 891 and that's the one that gets used to display the EXPLAIN string. So, yeah there's some inconsistency in the display (although it is possible it is intentional to show all files including empty ones in the explain). For my patch, there are 2 steps in which the pruning happens: (1) in HdfsPartitionPruner when I am limiting the number of partitions based only the number of file descriptors per partition - i.e not examining each file descriptor since that would have overhead, and (2) in HdfsScanNode I am limiting the number of files since that code already iterates over the file descriptors. I guess I could skip empty files in step 2 even though it would mess up the calculation that was done in step 1. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 4 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Wed, 18 Nov 2020 23:14:47 +0000 Gerrit-HasComments: Yes
