Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/12037 )
Change subject: IMPALA-7928: Consistent remote read scheduling ...................................................................... Patch Set 3: (6 comments) Adding some comments to indicate what changes I'm making for the next upload. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h File be/src/scheduling/hash-ring.h: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@72 PS3, Line 72: void GetDistributionMap(std::map<IpAddr, uint32_t>& distribution_map) { > This is public only for tests, right? Make it private and use a friend clas Making this private with friend class for tests. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@115 PS3, Line 115: // collisions and is arbitrary about how it resolves them. > Is this consistent, though? i.e., do we always get the same schedule for th It will depend on the order that the nodes are added. We could make this completely consistent by doing a set of pairs. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@876 PS3, Line 876: executor_ips_hashring_ = make_unique<HashRing>(&executor_ips, NUM_HASHRING_REPLICAS); > I'm not certain, but it seems like executor_ips_hashring_ needs to only be I'm moving the hashring to the BackendConfig, and it will only be recomputed on changes to the nodes. BackendConfig has AddBackend() and RemoveBackend() functions that are a good place to do the hashring manipulations. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@919 PS3, Line 919: // Two different hashes of the filename can result in the same executor. : // The function should return distinct executors, so it may need to do more hashes : // than 'num_remote_replicas'. > We have some flexibility here. I think we're safer to limit ourselves to a I'm going to limit the maximum number of iterations. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@923 PS3, Line 923: // We need to generate multiple distinct hashes of the filename. This generates an : // initial hash from the filename and separator. It uses this as a seed to a : // linear congruential engine. It generates subsequent hash variants by drawing from : // the linear congruential engine and incorporating that into the existing hash. : // TODO: there are likely better ways to produce distinct hashes : uint32_t hash = HashUtil::Hash(hdfs_file_split->file_name.data(), : hdfs_file_split->file_name.length(), 0); > I've seen this comment twice now, right? Should we create I'm moving this logic to HashUtil. http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc@731 PS3, Line 731: if (result != StringParser::PARSE_SUCCESS || num_simulated_remote_replicas < 0) { > Let's create a maximum here too. I think with the current implementation, s I am adding a maximum. It doesn't really make sense to set this very high. There are two types of replication. One is how many times we replicate the backend inside the hashring. This is a constant and won't interact with this. The other is how many times we would probe the hashring to get unique backends. That is what this tunes. -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Comment-Date: Tue, 29 Jan 2019 19:11:41 +0000 Gerrit-HasComments: Yes
