Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14177 )
Change subject: KUDU-2780: create thread for auto-rebalancing ...................................................................... Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/14177/20/src/kudu/master/auto_rebalancer-test.cc File src/kudu/master/auto_rebalancer-test.cc: http://gerrit.cloudera.org:8080/#/c/14177/20/src/kudu/master/auto_rebalancer-test.cc@457 PS20, Line 457: // Verify that movement of replicas to meet the replication factor : // does not count towards rebalancing, i.e. the auto-rebalancer will : // not consider recovering replicas as candidates for replica movement. BTW, how do we know that the absence of attempts to rebalance is not due to the fact that replica distribution is de-facto even and auto-rebalancer sees that and schedules no replica movements? I think more reliable scenario to justify this description (if I understand it correctly), would be having only 3 tablet servers in the beginning when all tablets are being created. Then add a new tablet server and shutdown one of the 3 original tablet servers. Also, re-replication kicks in only after --follower_unavailable_considered_failed_sec interval has passed tablet server becomes unavailable (default is 300). So, if the idea was to catch a few re-replicated replicas in progress, it would be necessary to shorten that interval as well. http://gerrit.cloudera.org:8080/#/c/14177/20/src/kudu/master/auto_rebalancer-test.cc@462 PS20, Line 462: FLAGS_tserver_unresponsive_timeout_ms It's a separate topic (i.e. not a topic for this particular scenario), but did you know what would happen with scenario like this if leaving the timeout as is (i.e. 60 * 1000 ms)? Basically, what if the information about failure of a tablet server is not yet accounted by the TSManager and the auto-rebalancer tries to schedule a replica movement to/from the downed tablet server? The idea is to make sure that the failed replica movement attempt is handled as expected by the auto-rebalancer. Do you think it's possible to a scenario for this? -- To view, visit http://gerrit.cloudera.org:8080/14177 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifca25d1063c07047cf2123e6792b3c7395be20e4 Gerrit-Change-Number: 14177 Gerrit-PatchSet: 20 Gerrit-Owner: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hannah Nguyen <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Wed, 11 Mar 2020 20:48:29 +0000 Gerrit-HasComments: Yes
