Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12037 )

Change subject: IMPALA-7928: Consistent remote read scheduling
......................................................................


Patch Set 5:

(42 comments)

http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG@15
PS5, Line 15: simluated
nit: typo


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/experiments/hash-ring-util.cc
File be/src/experiments/hash-ring-util.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/experiments/hash-ring-util.cc@39
PS5, Line 39: hashspace
nit: hashspace or hash space (line below)


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/experiments/hash-ring-util.cc@44
PS5, Line 44: HashRingUtil
Can you think of a more descriptive name? HashRingConsistencyCheck or similar?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.h
File be/src/scheduling/backend-config.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.h@33
PS5, Line 33: /// hostnames to IP addresses.
Maybe mention the hash ring in the class comment, too?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.h@83
PS5, Line 83:   HashRing backend_ip_hashring_;
nit: hash_ring? Or Hashring?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc
File be/src/scheduling/backend-config.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc@23
PS5, Line 23: static const uint32_t NUM_HASHRING_REPLICAS = 25;
How did you pick 25? Does this number have to have any special properties, e.g. 
not be a power of 2?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc
File be/src/scheduling/hash-ring-test.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc@42
PS5, Line 42:   void VerifyTotalAllocation(const vector<TNetworkAddress>& 
addresses,
I think these methods could use a comment, some of them weren't clear to me 
from their names alone.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h
File be/src/scheduling/hash-ring.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@31
PS5, Line 31: TNetworkAddress
For the purpose of scheduling we use IP addresses because whether a read is 
local or remote does not depend on the port. Should we use the same type here, 
too?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@45
PS5, Line 45: /// node is remapped. This allows for bounded disruption.
Should we quantify the bound?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@64
PS5, Line 64:     for (auto it = hash_ring.node_hashmap_.begin();
nit: you could use a range-based for loop, here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@76
PS5, Line 76:   HashRing(HashRing&&) = delete;
Deleting a ctor seems uncommon enough to warrant a comment.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@97
PS5, Line 97:   /// this TNetworkAddress. This is useful for examining how the 
hash space is balanced
Can you include in the comment how the portion is represented in the int64_t, 
i.e. that it is sum of all range sizes?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@99
PS5, Line 99:   void GetDistributionMap(std::map<TNetworkAddress, uint64_t>& 
distribution_map) const;
Pass output parameter by pointer.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@106
PS5, Line 106:   std::set<TNetworkAddress> nodes_;
Should we do this in the backend config, possibly in a vector, and then just 
store indexes in this class? The scheduler's performance also suffers from the 
prolific use of strings and network addresses in various maps and could benefit 
from such a mapping in BackendConfig.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@111
PS5, Line 111:   /// of the two underlying TNetworkAddress's. Collisions should 
be rare, so this will
This looks like the birthday paradox and collisions might be more common that 
intuition suggests. Hashing 372 nodes 25 times each to 32bit would have a >1% 
probability of a collision. Please also see my comment in the implementation of 
AddNode on whether a collision of a node means that the subsequent hashes would 
also collide.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@113
PS5, Line 113:   std::map<uint32_t, NodeIterator> node_hashmap_;
It seems a bit confusing that a ordered map is called hash_map. Can you think 
of a different name, e.g. hash_to_node_?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc
File be/src/scheduling/hash-ring.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@57
PS5, Line 57:       node_hashmap_[hash_val] = node_it;
If there's a collision, then another node hashed to the same hash_val, right? 
Does that mean that subsequent hashes returned by the hashgenerator would also 
be the same? Does one node effectively kick out the other altogether then?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@83
PS5, Line 83:   DCHECK_GT(node_hashmap_.size(), 0);
!node_hashmap.empty()


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@104
PS5, Line 104:     TNetworkAddress addr = *(it->second);
nit: consistent naming, here vs ip_addr below


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@107
PS5, Line 107:     if (distribution_map.find(addr) != distribution_map.end()) {
             :       distribution_map[addr] += range;
             :     } else {
             :       distribution_map[addr] = range;
             :     }
I think the map default-initializes when using [], so you can just do map[addr] 
+= range without the check.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@115
PS5, Line 115: UINT_MAX
If you use (1<<32) here, then the two parts of the bucket that wraps from 
UINT_MAX back to 0 should add up to the correct number of elements, no?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test-util.h
File be/src/scheduling/scheduler-test-util.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test-util.h@271
PS5, Line 271:     query_options_.num_simulated_remote_replicas = num;
nit: move to .cc file if it doesn't fit into a single line, like 
SetReplicaPreference() above.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test-util.cc
File be/src/scheduling/scheduler-test-util.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test-util.cc@531
PS5, Line 531:   return host.ip + ":" + std::to_string(host.be_port);
I'm surprised this is necessary, are the IP addresses generated by 
HostIdxToIpAddr not unique?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test.cc
File be/src/scheduling/scheduler-test.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test.cc@220
PS5, Line 220:   for (int i = 0; i < num_data_nodes; ++i) 
cluster.AddHost(false, true);
There's also cluster.AddHosts which does the same as these two loops


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test.cc@229
PS5, Line 229:   for (int32_t num_replicas = 1; num_replicas <= 
num_impala_nodes + 1; num_replicas++) {
nit: use int and pre-increment to stay consistent within the file.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test.cc@248
PS5, Line 248: all blocks
The file has 3 blocks, isn't it all scan ranges that are assigned to the same 
backend?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler-test.cc@293
PS5, Line 293:   }
you could assert that the number of backends is 1 less after the loop or that 
at least RemoveBackend was called once


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h@189
PS5, Line 189: simulated_remote_replicas
When I first read this it wasn't entirely clear to me from the comment what a 
simulated replica is. Would a name like "GetRemoteReplicaCandidates" describe 
the functionality, too?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h@191
PS5, Line 191:     /// the file name from 'hdfs_file_split' multiple times and 
look up the closest
Have you considered passing the offset into the hash, too? Would that increase 
parallelism for large files?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h@198
PS5, Line 198:         int32_t num_remote_replicas, vector<IpAddr>& 
simulated_remote_replicas);
We usually pass output parameters by pointer


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@708
PS5, Line 708:           num_simulated_remote_replicas,
nit: line wrapping


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@904
PS5, Line 904:     const THdfsFileSplit* hdfs_file_split, int32_t 
num_remote_replicas,
I think we usually just use "int" instead of int32_t, here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@907
PS5, Line 907:   DCHECK_EQ(simulated_remote_replicas.size(), 0);
You could call reserve(num_remote_replicas) on simulated_remote_replicas, 
though it probably won't matter much.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@914
PS5, Line 914:   set<TNetworkAddress> distinct_backends;
Does the algorithm require the set to be ordered?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@919
PS5, Line 919:   uint32_t max_iterations = num_remote_replicas * 2;
We usually don't use unsigned types per convention unless required by some 
interface. "int" should work here, too.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@920
PS5, Line 920:   for (uint32_t i = 0; i < max_iterations; i++) {
I think we usually use pre-increment.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@930
PS5, Line 930:   for (auto it = distinct_backends.begin(); it != 
distinct_backends.end(); it++) {
nit: use a range based loop?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.cc@931
PS5, Line 931:     const TBackendDescriptor *be_desc = 
executors_config_.LookUpBackendDesc(*it);
nit: move space to right side of *


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/service/query-options.h@149
PS5, Line 149:       TQueryOptionLevel::DEVELOPMENT)\
I would prefer "advanced" since this change sets a value of 3 which will change 
the existing behavior. Getting the current behavior back would then require to 
change a development option, which sounds more fragile and unstable than it is.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h@282
PS5, Line 282: This is intended to be a slight improvement on that.
Can you mention the benefits of doing that in the comment?


http://gerrit.cloudera.org:8080/#/c/12037/5/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/12037/5/common/thrift/ImpalaService.thrift@355
PS5, Line 355:   // The number of simulated replicas to use when scheduling 
remote HDFS ranges.
The word "simulated" feels unintuitive to me, how about 
NUM_REMOTE_READ_CANDIDATES or NUM_REMOTE_SCANNER_CANDIDATES or something 
similar?


http://gerrit.cloudera.org:8080/#/c/12037/5/common/thrift/ImpalaService.thrift@357
PS5, Line 357: will
... that can read ...?



--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Comment-Date: Fri, 01 Feb 2019 18:55:20 +0000
Gerrit-HasComments: Yes

Reply via email to