Joe McDonnell has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12037 )
Change subject: IMPALA-7928: Consistent remote read scheduling ...................................................................... IMPALA-7928: Consistent remote read scheduling Currently, remote reads for a particular file are not scheduled to a consistent set of nodes. This reduces the efficiency of the HDFS file handle cache. This modifies the scheduling of remote reads to limit the number of executors considered when picking an executor for a remote scan range. The remote executor candidates are generated by hashing the filename+offset multiple times and finding the closest nodes in a hash ring. This is a consistent hash that is designed to limit the number of files remapped when cluster nodes come and go. The number of remote executor candidates is controlled by a query option 'num_remote_executor_candidates', which defaults to 3. It is capped at 16. Once the remote executor candidates are chosen, the algorithm for picking a specific replica uses the same algorithm as picking a local replica. It picks the node with the minimum number of assigned bytes and uses 'schedule_random_replica' to determine how to break ties. This leaves the normal algorithms in place for local files, Kudu, and HBase. If 'num_remote_executor_candidates' is set to 0, the existing remote scheduling algorithm is used. The existing algorithm schedules remote scan ranges on all available executors. Testing: - There is a new hash-ring-test and related tests in scheduler-test. - There is a utility (hash-ring-util) in experiments for hand tuning the hash ring. Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Reviewed-on: http://gerrit.cloudera.org:8080/12037 Reviewed-by: Joe McDonnell <[email protected]> Tested-by: Impala Public Jenkins <[email protected]> --- M be/src/experiments/CMakeLists.txt A be/src/experiments/hash-ring-util.cc M be/src/scheduling/CMakeLists.txt M be/src/scheduling/backend-config.cc M be/src/scheduling/backend-config.h A be/src/scheduling/hash-ring-test.cc A be/src/scheduling/hash-ring.cc A be/src/scheduling/hash-ring.h M be/src/scheduling/scheduler-test-util.cc M be/src/scheduling/scheduler-test-util.h M be/src/scheduling/scheduler-test.cc M be/src/scheduling/scheduler.cc M be/src/scheduling/scheduler.h M be/src/service/query-options.cc M be/src/service/query-options.h M common/thrift/ImpalaInternalService.thrift M common/thrift/ImpalaService.thrift 17 files changed, 823 insertions(+), 16 deletions(-) Approvals: Joe McDonnell: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]>
