Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13993 )
Change subject: IMPALA-8834: Short-circuit partition key scan ...................................................................... Patch Set 14: (5 comments) http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@11 PS14, Line 11: It can therefore be safely enabled by default. > I suppose the enabling by default will be a separate patch ? Reworded to make what this actually does clearer. This is adding a weaker version of the optimisation that is always enabled. The old version is still more efficient since it's metadata only, so is still enabled behind the flag. http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@55 PS14, Line 55: Testing: > Thanks for adding the detailed tests :) ack http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc File be/src/exec/hdfs-scan-node-mt.cc: http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc@98 PS14, Line 98: row_batch->limit_capacity(1); > Should this limiting be under an if(is_partition_key_scan_) check ? Yes! This was a bug causing some of the test failures. http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1134 PS14, Line 1134: long numRangesAdjusted = numRanges * 100; > I can see why the adjustment would be needed .. to ensure a distributed sc I think it's not possible to have that many scan ranges (aside from anything else, it wouldn't fit in the various data structures). But I think it's a good pattern to use a safe multiply so I changed it. http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test File testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test@21 PS14, Line 21: HDFS partitions=24/24 files=24 size=478.45KB > it wasn't obvious to me why the HDFS prefix got added based on the changes This was a side-effect of regenerating the planner test output. The Planner tests are rigged to ignore this part of the plan output so the test output wasn't updated when these prefixes were added. -- To view, visit http://gerrit.cloudera.org:8080/13993 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c Gerrit-Change-Number: 13993 Gerrit-PatchSet: 14 Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 21 Apr 2020 19:26:46 +0000 Gerrit-HasComments: Yes
