Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/13993 )
Change subject: IMPALA-8834: Short-circuit partition key scan ...................................................................... Patch Set 14: (5 comments) A few minor comments. Looks good overall. http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@11 PS14, Line 11: It can therefore be safely enabled by default. I suppose the enabling by default will be a separate patch ? http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@55 PS14, Line 55: Testing: Thanks for adding the detailed tests :) http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc File be/src/exec/hdfs-scan-node-mt.cc: http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc@98 PS14, Line 98: row_batch->limit_capacity(1); Should this limiting be under an if(is_partition_key_scan_) check ? http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1134 PS14, Line 1134: long numRangesAdjusted = numRanges * 100; I can see why the adjustment would be needed .. to ensure a distributed scan is done. Presumably, for smaller number of partitions, we would still be ok to run on a single node ? Since this is multiplying a long value, it could overflow in a pathological case, so suggest capping it with max long value. http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test@21 PS14, Line 21: HDFS partitions=24/24 files=24 size=478.45KB it wasn't obvious to me why the HDFS prefix got added based on the changes in this patch since the new item added to the Explain plan is 'partition key scan'. -- To view, visit http://gerrit.cloudera.org:8080/13993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c Gerrit-Change-Number: 13993 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 21 Apr 2020 00:45:57 +0000 Gerrit-HasComments: Yes
