Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12037 )
Change subject: IMPALA-7928: Consistent remote read scheduling ...................................................................... Patch Set 3: (11 comments) thanks! I think the consistent hashing makes sense. Substantively, we should see if it makes sense to tie the life cycle of the hash-ring to when we add/remove nodes, because that's the only time it changes. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc File be/src/experiments/hash-ring-test.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@48 PS3, Line 48: IpAddr HostIdxToIpAddr(int host_idx) { Can you add a comment about that this does with host_idx 0 or 1? http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/experiments/hash-ring-test.cc@101 PS3, Line 101: int main(int argc, char ** argv) { This is in experiments/ rather than an actual test. Is something actually testing this code in a unit test fashion? Should we check that the expected numbers here are ok? http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h File be/src/scheduling/hash-ring.h: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@41 PS3, Line 41: /// When nodes are added or removed, only the hash space in the immediate vacinity of the vicinity http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@72 PS3, Line 72: void GetDistributionMap(std::map<IpAddr, uint32_t>& distribution_map) { This is public only for tests, right? Make it private and use a friend class? http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@115 PS3, Line 115: // collisions and is arbitrary about how it resolves them. Is this consistent, though? i.e., do we always get the same schedule for the same inputs? Actually, I read: The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11) at https://en.cppreference.com/w/cpp/container/multimap http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@876 PS3, Line 876: executor_ips_hashring_ = make_unique<HashRing>(&executor_ips, NUM_HASHRING_REPLICAS); I'm not certain, but it seems like executor_ips_hashring_ needs to only be computed when "BackendConfig executor_config" changes, and it looks like it changes infrequently (because cluster membership changes infrequently). The work here is probably "not that much", but, nonetheless, it seems worthwhile to cache it. It looks like UpdateMembership() in this class matches the lifecycle of when we want to do this. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@919 PS3, Line 919: // Two different hashes of the filename can result in the same executor. : // The function should return distinct executors, so it may need to do more hashes : // than 'num_remote_replicas'. We have some flexibility here. I think we're safer to limit ourselves to a limited number of hashes. There's no real harm if a block only has 1 replica. I think you've taken care of the degenerate case (the number of replicas is more than executors), but we could add at least a DCHECK to avoid infinite iterations. (Or simply bail after "i > num_remote_replicas + 3") http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@923 PS3, Line 923: // We need to generate multiple distinct hashes of the filename. This generates an : // initial hash from the filename and separator. It uses this as a seed to a : // linear congruential engine. It generates subsequent hash variants by drawing from : // the linear congruential engine and incorporating that into the existing hash. : // TODO: there are likely better ways to produce distinct hashes : uint32_t hash = HashUtil::Hash(hdfs_file_split->file_name.data(), : hdfs_file_split->file_name.length(), 0); I've seen this comment twice now, right? Should we create HashUtil::HashNTimes(data, bytes, seed, count) of some sort? I'm confused by the following, btw, in hash-util.h: /// TODO: crc32 hashes with different seeds do not result in different hash functions. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@933 PS3, Line 933: for (uint32_t i = 0; distinct_ipaddrs.size() != num_remote_replicas; i++) { i is unused, I think. perhaps rename to "unused" and drop the "i++", to make that move obvious? http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc@731 PS3, Line 731: if (result != StringParser::PARSE_SUCCESS || num_simulated_remote_replicas < 0) { Let's create a maximum here too. I think with the current implementation, settings to 2^60 will probably hang the query forever and/or run us out of memory. http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/12037/3/common/thrift/ImpalaService.thrift@354 PS3, Line 354: : // The number of simulated replicas to use when scheduling remote HDFS ranges. : // Simulated replicas assign remote files to a consistent set of nodes, providing a : // simulated type of locality. Restricting the number of nodes that will read a single : // file increases the efficiency of file-related caches (e.g. the HDFS file handle : // cache). It is designed to avoid changing file to node mappings when nodes come : // or go. If set to 0, simulated remote replicas are disabled and remote ranges can be : // scheduled on any node. : NUM_SIMULATED_REMOTE_REPLICAS, This is marked advanced and doesn't show up to the user, right? I think it's a bug if we end up needing to tweak this. -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Comment-Date: Fri, 11 Jan 2019 19:39:14 +0000 Gerrit-HasComments: Yes
