Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17691
Change subject: [master] make automatic AddMaster resilient to leadership changes ...................................................................... [master] make automatic AddMaster resilient to leadership changes AutoAddMasterTest.TestAddWithOnGoingDdl would previously fail with the new master shutting down with the following error: Remote error: Failed to perform AddMaster RPC: Illegal state: Failed initiating master Raft ChangeConfig request, error: unknown: Leader has not yet committed an operation in its own term This patch addresses this by catching errors commonly seen in the Raft layer and retrying, as we might do for DML operations. Without this patch, the test failed ~10% of the time in debug mode with --stress_cpu_threads=3. With the patch, the test only fails ~1% of the time, which will be addressed in a later patch. This patch also addresses some leftover feedback from 7e66534d0e62fb850bf300d52da4a0a76889f4b8 regarding verification of masters in the presence of DNS failures. I didn't add a new test for this because it turned out to not change the behavior of things -- the new master would still fail upon attempting to resolve replicas. Change-Id: Ie38453c6fc41ce98c59c010902e2d9fe9db62dee --- M src/kudu/master/master.cc M src/kudu/master/master_runner.cc M src/kudu/master/master_service.cc 3 files changed, 22 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/91/17691/1 -- To view, visit http://gerrit.cloudera.org:8080/17691 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie38453c6fc41ce98c59c010902e2d9fe9db62dee Gerrit-Change-Number: 17691 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong <[email protected]>
