Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16880 )

Change subject: IMPALA-9687 Improve estimates for number of hosts in Kudu plans
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16880/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16880/1/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@a284
PS1, Line 284:
I think we still want to cap numNodes_ at hostIndexSet_.size() when the kudu 
tservers are colocated with Impala executors, because the scan ranges will 
always be scheduled locally.

E.g. if you have a situation with two hosts A and B, with all tablets of the 
table on A, and Impala executors on A and B, then all the scans will be 
scheduled on A. But your new logic calculates numNodes_ = 2.

I think instead of scanRangeSpecs_.getConcrete_rangesSize(), you want to add 
together the # of unique hosts with local scan ranges and the number of remote 
scan ranges to get the maximum number of hosts.

HdfsScanNode does a locality check like this using 
ExecutorMembershipSnapshot.contains().



--
To view, visit http://gerrit.cloudera.org:8080/16880
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I72e341597e980fb6a7e3792905b942ddf5797d03
Gerrit-Change-Number: 16880
Gerrit-PatchSet: 1
Gerrit-Owner: Akos Kovacs <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 15 Dec 2020 21:23:34 +0000
Gerrit-HasComments: Yes

Reply via email to