Yuqi Du has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18454 )

Change subject: [master] KUDU-3061 support auto rebalance tablet leaders across 
TServers
......................................................................


Patch Set 19:

(10 comments)

Thanks your crs. Very Kind.

> Patch Set 19:
>
> (1 comment)
>
> I didn't really look into the codes because I'd like to understand the design 
> consideration and the use of algorithm first.
> Here are some questions I don't know much about:
> - Is this leader rebalance task only transfer the leadership between replicas 
> of the same tablet? Or it also copy replicas from one server to another?

Only do leader transfer, do not copy replicas.

> - Do we have to rebalance replicas in a cluster before we run this leader 
> rebalance task?

Yes, it's better enable rebalance replicas.

If enable leader rebalancer but disable auto rebalancer the algorithm work well 
but the effect is not good. The algorithm can be convergence, and the 
algorithm's target is every tserver' replicas, number of leader : number of 
follower is 1 : (replica_refactor -1).


> - Can we finally get a fully balanced cluster in all cases?

Yes, we should enable auto rebalancer and rebalancer leader rebalancer together.

>
> If you already have a design document about this issue, can you share it so 
> wen can discuss about the details.

I'm sorry, I only write an informal document in chinese. The basic and core 
idea can be described as:

For a specific table,  its replicas assigned to tservers(by create table or 
rebalance task), at a specific tserver should satisfy
number of leaders : number of followers is 1 : (replica_refactor -1).

every table should do leader rebalance.

http://gerrit.cloudera.org:8080/#/c/18454/19//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18454/19//COMMIT_MSG@14
PS19, Line 14: Two reasons of load skew:
> Besides clarify why we need to balance leader replicas across TServers, it
ok, I' ll do this at unit test.


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/client/flex_partitioning_client-test.cc
File src/kudu/client/flex_partitioning_client-test.cc:

http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/client/flex_partitioning_client-test.cc@341
PS19, Line 341:   // Using one-level hash bucketing { 3, "key" } as the custom 
hash schema
> Sorry, I didn't get the point. Is this related to this patch?
According to crs before, GetTableInfo()   is changed.  catalog_manager.h#771.


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc
File src/kudu/master/auto_leader_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@68
PS19, Line 68:               10,
> nit: would it be more convenient if the gflag name and default value are in
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@72
PS19, Line 72:               3600,
> same
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@96
PS19, Line 96: "catalog-manager"
> nit: better to use the same category name, i.e. "catalog manager", as other
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@103
PS19, Line 103: if (thread_) {
> Is there any case thread_ is not initialized?
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@181
PS19, Line 181: const vector<string>& leaders = 
FindWithDefault(uuid_leaders_map, uuid,  {});
              :     int32_t leader_count = leaders.size();
> nit: move these code closer to the place where it used, means after L188.
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_leader_rebalancer.cc@192
PS19, Line 192:     int32_t should_transfer_count = leader_count - 
(replica_count / table_data.num_replicas() + 1);
> Could you please add some comments to describe the algorithm?
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_rebalancer.cc@a181
PS19, Line 181:
> why remove it?
Done


http://gerrit.cloudera.org:8080/#/c/18454/19/src/kudu/master/auto_rebalancer.cc@177
PS19, Line 177:                      [this]() { this->RunLoop(); }, &thread_);
              :
> nit: alignment and remove blank line.
Done



--
To view, visit http://gerrit.cloudera.org:8080/18454
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibfb60d8759a93b6a19238637c27df4f6b1cac918
Gerrit-Change-Number: 18454
Gerrit-PatchSet: 19
Gerrit-Owner: Yuqi Du <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Yifan Zhang <[email protected]>
Gerrit-Reviewer: Yingchun Lai <[email protected]>
Gerrit-Reviewer: Yuqi Du <[email protected]>
Gerrit-Comment-Date: Wed, 06 Jul 2022 07:19:37 +0000
Gerrit-HasComments: Yes

Reply via email to