Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11207 )
Change subject: [master] replica selection honors placement policy ...................................................................... Patch Set 3: (29 comments) Looks pretty good. I didn't get to review all of the test cases yet but I wanted to post some feedback since it's taken me a while to start reviewing this. Sorry about that. http://gerrit.cloudera.org:8080/#/c/11207/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11207/3//COMMIT_MSG@7 PS3, Line 7: [master] nit: Consider making the label [location_awareness] to be consistent with this patch's parent. http://gerrit.cloudera.org:8080/#/c/11207/3//COMMIT_MSG@9 PS3, Line 9: placement policy Would you mind adding another sentence explaining the single policy that is implemented? Just enough to cover that it is meant to be location aware and to spread replicas evenly over tablet servers. You can refer people to the design doc for the details. http://gerrit.cloudera.org:8080/#/c/11207/3//COMMIT_MSG@10 PS3, Line 10: nit: One space. http://gerrit.cloudera.org:8080/#/c/11207/3//COMMIT_MSG@13 PS3, Line 13: Added nit: This patch also adds a few... http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/catalog_manager.h@524 PS3, Line 524: std::string, int Please add a short comment explaining what the key and value types are. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/catalog_manager.h@880 PS3, Line 880: online nit: I think the standard term in "live". http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/catalog_manager.h@880 PS3, Line 880: grouped by location : // (as specified by 'ltd') What is 'ltd'? Looks like this comment is stale. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/catalog_manager.h@887 PS3, Line 887: PlacementPolicy* nit: Naively I expected this to be a cref because something called a PlacementPolicy would have no state. If the state necessary to do placements will live in the PlacementPolicy class, I vote for a slightly different name. Perhaps a PlacementAlgorithm? http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h File src/kudu/master/placement_policy.h: http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@39 PS3, Line 39: at proper tablet servers on tablet servers according to some rules. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@39 PS3, Line 39: placing place http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@40 PS3, Line 40: class PlacementPolicy { Please add another sentence with a brief description of the specific policy. I also think it'd be a good idea to add one more sentence stating that this class implements a specific placement policy but it could eventually be a base class for subclasses that implement other placement policies. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@49 PS3, Line 49: destructed "destroyed", or say that 'rng' must outlive of the PlacementPolicy. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@82 PS3, Line 82: // TODO (aserbin): add the reference to the document once it's in the repo nit: End with . http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@100 PS3, Line 100: is Remove extra word. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.h@108 PS3, Line 108: : it's in constructor and cached I think if we make this member const it enforces these two things. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc File src/kudu/master/placement_policy.cc: http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@51 PS3, Line 51: the http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@51 PS3, Line 51: a http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@51 PS3, Line 51: repilcas replicas http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@52 PS3, Line 52: of Remove extra 'of'. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@55 PS3, Line 55: unordered_map<string, int> Is this a ReplicaLocationsInfo type? http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@57 PS3, Line 57: DCHECK(it != ltd.end()); : if (it == ltd.end()) { : return numeric_limits<double>::max(); : } Is there a reason we shouldn't always crash if we break the invariant that 'location' must be a key present in 'locations_info'? http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@62 PS3, Line 62: DCHECK(!ts_descriptors.empty()); Same thing here. Shouldn't every location we consider here have at least one live tablet server? Otherwise, there's a bug somewhere. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@77 PS3, Line 77: so both location and : // replica selection code could be unified +1 http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@81 PS3, Line 81: unordered_map<string, int> This too is a ReplicaLocationsInfo, right? http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@101 PS3, Line 101: boost::optional<string> SelectLocation( : const vector<string>& all, : const set<string>& excluded, : const LocationToDescriptorsMap& ltd, : const unordered_map<string, int>& location_info, : ThreadSafeRandom* rng) { : vector<string> two_choices; : rng->ReservoirSample(all, 2, excluded, &two_choices); : DCHECK_LE(two_choices.size(), 2); : if (two_choices.size() == 1) { : return two_choices.front(); : } : if (two_choices.size() == 2) { : // Pick the better of the two. : return PickBetterLocation(two_choices, ltd, location_info, rng); : } : return boost::none; : } > We discussed this offline an it seems we can do better if selecting locatio (thumbsup) http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@167 PS3, Line 167: ts_num_ = accumulate( : ltd_.begin(), ltd_.end(), 0, : [](size_t val, const LocationToDescriptorsMap::value_type& elem) { : return val + elem.second.size(); : }); : DCHECK_EQ(ts_num_, descs.size()); Should ts_num_ be const and set in the initializer list? BuildLocationToDescriptorsMap can do the equivalent check to the DCHECK here. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@188 PS3, Line 188: if (it == ltd_.end()) { : return Status::IllegalState( : Substitute("'$0': no info on tablet servers at location", loc)); : } This should be impossible, right? We should at least DCHECK on it, if so. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@201 PS3, Line 201: unordered_map<string, int> Ditto on the type alias. http://gerrit.cloudera.org:8080/#/c/11207/3/src/kudu/master/placement_policy.cc@203 PS3, Line 203: "" It's a little pedantic, but could you add a comment saying it's ok to use an empty string in place of boost::none for a location because valid location strings begin with '/' and therefore are nonempty. -- To view, visit http://gerrit.cloudera.org:8080/11207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4169098abf17d5591d4c1675561afc15b5477fcd Gerrit-Change-Number: 11207 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Thu, 16 Aug 2018 06:00:26 +0000 Gerrit-HasComments: Yes