Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12037 )

Change subject: IMPALA-7928: Consistent remote read scheduling
......................................................................


Patch Set 3:

(6 comments)

Adding some comments to indicate what changes I'm making for the next upload.

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h
File be/src/scheduling/hash-ring.h:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@72
PS3, Line 72:   void GetDistributionMap(std::map<IpAddr, uint32_t>& 
distribution_map) {
> This is public only for tests, right? Make it private and use a friend clas
Making this private with friend class for tests.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/hash-ring.h@115
PS3, Line 115:   // collisions and is arbitrary about how it resolves them.
> Is this consistent, though? i.e., do we always get the same schedule for th
It will depend on the order that the nodes are added. We could make this 
completely consistent by doing a set of pairs.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@876
PS3, Line 876:   executor_ips_hashring_ = make_unique<HashRing>(&executor_ips, 
NUM_HASHRING_REPLICAS);
> I'm not certain, but it seems like executor_ips_hashring_ needs to only be
I'm moving the hashring to the BackendConfig, and it will only be recomputed on 
changes to the nodes. BackendConfig has AddBackend() and RemoveBackend() 
functions that are a good place to do the hashring manipulations.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@919
PS3, Line 919:   // Two different hashes of the filename can result in the same 
executor.
             :   // The function should return distinct executors, so it may 
need to do more hashes
             :   // than 'num_remote_replicas'.
> We have some flexibility here. I think we're safer to limit ourselves to a
I'm going to limit the maximum number of iterations.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/scheduling/scheduler.cc@923
PS3, Line 923:   // We need to generate multiple distinct hashes of the 
filename. This generates an
             :   // initial hash from the filename and separator. It uses this 
as a seed to a
             :   // linear congruential engine. It generates subsequent hash 
variants by drawing from
             :   // the linear congruential engine and incorporating that into 
the existing hash.
             :   // TODO: there are likely better ways to produce distinct 
hashes
             :   uint32_t hash = 
HashUtil::Hash(hdfs_file_split->file_name.data(),
             :       hdfs_file_split->file_name.length(), 0);
> I've seen this comment twice now, right? Should we create
I'm moving this logic to HashUtil.


http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/12037/3/be/src/service/query-options.cc@731
PS3, Line 731:         if (result != StringParser::PARSE_SUCCESS || 
num_simulated_remote_replicas < 0) {
> Let's create a maximum here too. I think with the current implementation, s
I am adding a maximum. It doesn't really make sense to set this very high.

There are two types of replication. One is how many times we replicate the 
backend inside the hashring. This is a constant and won't interact with this. 
The other is how many times we would probe the hashring to get unique backends. 
That is what this tunes.



--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Comment-Date: Tue, 29 Jan 2019 19:11:41 +0000
Gerrit-HasComments: Yes

Reply via email to