Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/10336 )
Change subject: WIP [rebalancing] high-level rebalancing algorithm ...................................................................... Patch Set 3: (56 comments) Lots of comments but it's a lot of nits. Overall looks very nice I think. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc File src/kudu/tools/rebalance-algo-test.cc: http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@194 PS3, Line 194: { : { "0", "1", }, : { : { "A", { 1, 1, } }, : { "B", { 1, 2, } }, : }, : }, Missing a comment on this one. It seems unnecessary given the test case immediately below. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@221 PS3, Line 221: and nit: Extra word. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@222 PS3, Line 222: at nit: on. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@262 PS3, Line 262: to make nit: extra http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@262 PS3, Line 262: nit: , http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@263 PS3, Line 263: achive nit: achieve. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@263 PS3, Line 263: balanced state nit: balance. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@264 PS3, Line 264: Tablet nit: Table. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@272 PS3, Line 272: "B" It'd be better if it weren't deterministic which table it uses for the move. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@282 PS3, Line 282: { { "D", "1", "0" }, } Aside: we'll have to think about single-replica tables. Does 3-4-3 (1-2-1?) work in that case? 3-2-3 doesn't. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@321 PS3, Line 321: { : { "0", "1", }, : { : { "A", { 1, 2, } }, : { "B", { 2, 0, } }, : { "C", { 2, 1, } }, : }, : { : { "B", "0", "1" }, : } : }, 0 has 5 total replicas; 1 has 3, so it doesn't fit in this test. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo-test.cc@483 PS3, Line 483: Coverage is good but we could also use a randomized test. I could add it in a separate patch. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h File src/kudu/tools/rebalance-algo.h: http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@33 PS3, Line 33: Information on t nit: Howabout just "Balance information for a table." http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@34 PS3, Line 34: TableReplicasInfo nit: I like TableBalanceInfo. It's parallel with ClusterBalanceInfo below. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@35 PS3, Line 35: // Table's unique identifier. nit: I think this comment isn't needed. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@38 PS3, Line 38: Total number of tablet replicas of the table per tablet server. nit: It's the reverse of this relationship. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@41 PS3, Line 41: size_t nit: The GSG's rule is to use signed integer types for numbers. See https://google.github.io/styleguide/cppguide.html#Integer_Types. Might be best to change since there's no prospect of the replica count overflowing int32_t and it might save us from some bad arithmetic at some point. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@41 PS3, Line 41: table_replica_num_by_server This name's confusing because it's the reverse of the actual relation, e.g. imagine seeing auto range = tri.table_replica_num_by_server.equal_range(6); Howabout replica_count_to_servers or servers_by_replica_count, which is the correct direction for the relationship and captures that there could be multiples servers per replica count? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@48 PS3, Line 48: replicas_info The values in this map represent tables, so I think a name like table_balance_by_skew or table_info_by_skew is best. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@48 PS3, Line 48: size_t nit: Same. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@50 PS3, Line 50: Total number of replicas of all tables per tablet server. nit: Same as comment on L38. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@53 PS3, Line 53: size_t nit: Same. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@53 PS3, Line 53: replica_num_by_server nit: Same naming nit as L41. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@55 PS3, Line 55: ToString Would you mind putting a comment here explaining what the string looks like? I'm not sure from the fields what I expect the string to be. Is it multiline? Is it a table? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@58 PS3, Line 58: the nit: Extra "the". http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@58 PS3, Line 58: information information Repeated word. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@59 PS3, Line 59: The algorithm does not : // differentiate such entities as tablets After our conversation yesterday I understand what you mean here, but I think the comment is confusingly because it sounds like rebalancing doesn't care about tablets, which can't be true because a tablet is a constraint on possible moves. Instead, I suggest saying that the algorithm represents a move for a table by the source and destination tablet servers, and, IIUC, the "move engine" translates that into an actual move that respects the constraint that no tablet server can have more than one replica of a given tablet. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@64 PS3, Line 64: // Unique identifier of the table. I think this comment is redundant. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@65 PS3, Line 65: Unique identifier nit: I'd say UUID instead of "unique identifier" because I "UUID" always means the canonical UUID, and not saying UUID made me think for a second that we were going to use some other sort of unique identifier. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@66 PS3, Line 66: Unique identifier nit: Same as above. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@79 PS3, Line 79: The method puts the results into the 'moves' output parameter How would a caller know when to stop calling GetNextMoves? Is there a guarantee about what happens if the cluster is balanced? Does the algo have a notion of balance built in or is it meant to be defined by subclasses? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@79 PS3, Line 79: into the 'moves' output parameter nit: Can you make it clear that 'moves' is cleared and filled with the results? I like the verb "populate" for those semantics, as opposed to "add" or "append" if the output parameter is appended to. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@81 PS3, Line 81: size_t nit: Same as L41. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.h@90 PS3, Line 90: nit: Insert "an". http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc File src/kudu/tools/rebalance-algo.cc: http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@55 PS3, Line 55: replica_num_by_server nit: Remember to change this if/when you change the name of the struct fields. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@57 PS3, Line 57: s << " " << elem.first << ":" << elem.second; If this is for debug purposes only then it's fine however it's easiest for you to use. But it might be helpful to have a more user-friendly ToString or PrintTo method that produces a nice table showing the server -> count and skew relationships, for each table, or specific tables. I'm sure something like that will be worthwhile to show initial, final, and progress status in the tool. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@60 PS3, Line 60: replicas_info Likewise. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@77 PS3, Line 77: ClusterBalanceInfo info(cluster_info); nit: Can you add a comment explaining why we copy? IIUC, it's so we can generate the ClusterBalanceInfo for picking move n from the one for move n - 1 by apply move n - 1. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@95 PS3, Line 95: { nit: Add a comment explaining the purpose of this block. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@101 PS3, Line 101: auto it = replica_num_by_server.begin() Maybe we need to keep both the mapping ts uuid -> count and count -> ts uuid? I think we could write a small class to encapsulate keeping both mappings and keeping them synchronized, and it would simplify the code. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@115 PS3, Line 115: it = replica_num_by_server.erase(it); This effectively advances the iterator, right? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@133 PS3, Line 133: auto it = replicas_info.begin(); it != replicas_info.end() Likewise I think we could use a "two-way mapping" class to unify much of this code and the code below with the code above. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@175 PS3, Line 175: instert nit: extra 't'. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@181 PS3, Line 181: DCHECK_GE(max_count, min_count) Since table_info.table_replica_num_by_server is a multimap, isn't this just testing the STL multimap's ordering guarantee? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@190 PS3, Line 190: TwoDimensionalGreedyAlgo::GetNextMove Would you add a short comment on the different major parts of the algorithm, e.g. "Computing the intersection of the most loaded tablet servers for this table with the most loaded tablet server for the cluster, so, if possible, we can pick a move that helps balances the table and the cluster." http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@204 PS3, Line 204: 2 nit: I think this could be 1. 1 is typical in tools and this should only fire a few times per round, at most, right? http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@210 PS3, Line 210: 3 Could be 1 I think. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@211 PS3, Line 211: 3 Same. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@213 PS3, Line 213: // The distribution of replicas for this table is already balanced. However, : // it's necessary to check for the opportunity to balance the replicas : // cluster-wise. I think the code is right here but the comment is wrong/stale. The table is balanced iff its skew is <= 1. If it's = 1 we have a potential opportunity to balance cluster-wise with table-skew-neutral moves. If it's = 0 we don't, so we return. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@224 PS3, Line 224: DCHECK(!max_loaded.empty()); Maybe we ought to check if cnt_by_server is empty earlier. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@227 PS3, Line 227: !max_loaded.empty() This is unnecessary given the DCHECK. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@236 PS3, Line 236: sort(max_loaded.begin(), max_loaded.end()); : sort(max_loaded_cluster_wise.begin(), max_loaded_cluster_wise.end()); : set_intersection( : max_loaded.begin(), max_loaded.end(), : max_loaded_cluster_wise.begin(), max_loaded_cluster_wise.end(), : back_inserter(max_loaded_intersection)); We can probably separate this out as a function or something since it's re-used on L261 below. I think it'd make the logic here clearer overall. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@249 PS3, Line 249: DCHECK(!min_loaded.empty()); Likewise seems unnecessary if we check cnt_by_server is nonempty. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@252 PS3, Line 252: !min_loaded.empty() Redundant. http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@277 PS3, Line 277: 3 1 http://gerrit.cloudera.org:8080/#/c/10336/3/src/kudu/tools/rebalance-algo.cc@280 PS3, Line 280: 3 1 -- To view, visit http://gerrit.cloudera.org:8080/10336 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5a8050ee79117a2ae2f6f88740ed25656946cfb4 Gerrit-Change-Number: 10336 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Tue, 08 May 2018 18:42:37 +0000 Gerrit-HasComments: Yes
