Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12037 )
Change subject: IMPALA-7928: Consistent remote read scheduling ...................................................................... Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc File be/src/experiments/hash-ring-test.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@48 PS3, Line 48: IpAddr HostIdxToIpAddr(int host_idx) { > Can you add a comment about that this does with host_idx 0 or 1? Due to the switch from IpAddr to TNetworkAddress, this is gone. The code now uses SchedulerTestUtil's Cluster::GetBackendAddress() for its node addresses. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@101 PS3, Line 101: int main(int argc, char ** argv) { > This is in experiments/ rather than an actual test. Is something actually t Renamed this as hash-ring-util (and stays in experiments). This is for manual testing. I added hash-ring-test in the scheduling directory with several tests for the hash ring itself. That tests for collisions, max-min ratio, etc. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h File be/src/scheduling/hash-ring.h: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@41 PS3, Line 41: /// When nodes are added or removed, only the hash space in the immediate vacinity of the > vicinity Done http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@72 PS3, Line 72: void GetDistributionMap(std::map<IpAddr, uint32_t>& distribution_map) { > Making this private with friend class for tests. This is now private with HashRingTest and HashRingUtil as friend classes. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@115 PS3, Line 115: // collisions and is arbitrary about how it resolves them. > It will depend on the order that the nodes are added. We could make this co Resolved the consistency by switching to a map. On insert, if there is already an element for this hash value, then we use the minimum of the two TNetworkAddress's. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@876 PS3, Line 876: executor_ips_hashring_ = make_unique<HashRing>(&executor_ips, NUM_HASHRING_REPLICAS); > I'm moving the hashring to the BackendConfig, and it will only be recompute Moved the hashring to the BackendConfig. It gets updated when nodes are added or removed. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@919 PS3, Line 919: // Two different hashes of the filename can result in the same executor. : // The function should return distinct executors, so it may need to do more hashes : // than 'num_remote_replicas'. > I'm going to limit the maximum number of iterations. Limited the maximum iterations to 2*num_remote_replicas. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@923 PS3, Line 923: // We need to generate multiple distinct hashes of the filename. This generates an : // initial hash from the filename and separator. It uses this as a seed to a : // linear congruential engine. It generates subsequent hash variants by drawing from : // the linear congruential engine and incorporating that into the existing hash. : // TODO: there are likely better ways to produce distinct hashes : uint32_t hash = HashUtil::Hash(hdfs_file_split->file_name.data(), : hdfs_file_split->file_name.length(), 0); > I'm moving this logic to HashUtil. This moved to HashUtil and became a generator. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@933 PS3, Line 933: for (uint32_t i = 0; distinct_ipaddrs.size() != num_remote_replicas; i++) { > i is unused, I think. perhaps rename to "unused" and drop the "i++", to mak Limiting the maximum iterations means i is now used. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc@731 PS3, Line 731: if (result != StringParser::PARSE_SUCCESS || num_simulated_remote_replicas < 0) { > I am adding a maximum. It doesn't really make sense to set this very high. Limited this to the range [0,16] and updated the message. Verified it by hand. http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift@354 PS3, Line 354: : // The number of simulated replicas to use when scheduling remote HDFS ranges. : // Simulated replicas assign remote files to a consistent set of nodes, providing a : // simulated type of locality. Restricting the number of nodes that will read a single : // file increases the efficiency of file-related caches (e.g. the HDFS file handle : // cache). It is designed to avoid changing file to node mappings when nodes come : // or go. If set to 0, simulated remote replicas are disabled and remote ranges can be : // scheduled on any node. : NUM_SIMULATED_REMOTE_REPLICAS, > This is marked advanced and doesn't show up to the user, right? I think it' Yes, when a user runs "set;", this option is not listed (even under advanced). -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Comment-Date: Thu, 31 Jan 2019 04:07:41 +0000 Gerrit-HasComments: Yes
