Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16880 )
Change subject: IMPALA-9687 Improve estimates for number of hosts in Kudu plans ...................................................................... Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/16880/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java: http://gerrit.cloudera.org:8080/#/c/16880/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@a284 PS1, Line 284: I think we still want to cap numNodes_ at hostIndexSet_.size() when the kudu tservers are colocated with Impala executors, because the scan ranges will always be scheduled locally. E.g. if you have a situation with two hosts A and B, with all tablets of the table on A, and Impala executors on A and B, then all the scans will be scheduled on A. But your new logic calculates numNodes_ = 2. I think instead of scanRangeSpecs_.getConcrete_rangesSize(), you want to add together the # of unique hosts with local scan ranges and the number of remote scan ranges to get the maximum number of hosts. HdfsScanNode does a locality check like this using ExecutorMembershipSnapshot.contains(). -- To view, visit http://gerrit.cloudera.org:8080/16880 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I72e341597e980fb6a7e3792905b942ddf5797d03 Gerrit-Change-Number: 16880 Gerrit-PatchSet: 1 Gerrit-Owner: Akos Kovacs <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 15 Dec 2020 21:23:34 +0000 Gerrit-HasComments: Yes
