Mike Percy has submitted this change and it was merged. Change subject: consensus: KUDU-2147. Unknown leader should not be treated as valid UUID ......................................................................
consensus: KUDU-2147. Unknown leader should not be treated as valid UUID This patch fixes a rare, long-standing issue that has existed since at least 1.4.0, probably much earlier. It appears that the leader election thread pool changes in 1.5.0 made this problem less rare than it previously was. The summary of the issue is that, prior to this fix, it was possible for the master to believe that no leader existed for a tablet after a configuration change when in fact a leader did exist. This could be triggered if the cluster experiences an election storm in the middle or right after a configuration change. One workaround for this situation is to restart the tablet server where the leader replica currently resides. See KUDU-2147 for an example of the error messages that appear in the logs when it happens. In addition to a fix, this patch also includes a regression test that attempts to exercise a code path likely to trigger the bug. After the fix, I looped the test 200x with 4 stress threads and it succeeded 100% of the time: http://dist-test.cloudera.org/job?job_id=mpercy.1505872165.27701 To verify that the issue was not a regression in 1.5.0, I ran it against the 1.4 branch and it failed 100% of the time: http://dist-test.cloudera.org/job?job_id=mpercy.1505872429.29509 Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c Reviewed-on: http://gerrit.cloudera.org:8080/8109 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Reviewed-by: Adar Dembo <a...@cloudera.com> Tested-by: Kudu Jenkins --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/quorum_util.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc A src/kudu/integration-tests/raft_config_change-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/master-path-handlers.cc M src/kudu/tools/ksck.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/ts_tablet_manager-test.cc 12 files changed, 216 insertions(+), 39 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/8109 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Todd Lipcon <t...@apache.org>