Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/8109
to look at the new patch set (#2).
Change subject: consensus: KUDU-2147. Unknown leader should not be treated as
valid UUID
......................................................................
consensus: KUDU-2147. Unknown leader should not be treated as valid UUID
This patch fixes a rare, long-standing issue that has existed since at
least 1.4.0, probably much earlier. It appears that the leader election
thread pool changes in 1.5.0 made this problem less rare than it
previously was.
The summary of the issue is that, prior to this fix, it was possible for
the master to believe that no leader existed for a tablet after a
configuration change when in fact a leaded did exist. This could be
triggered if the cluster experiences an election storm in the middle or
right after a configuration change. One workaround for this situation is
to restart the tablet server where the leader replica currently resides.
See KUDU-2147 for an example of the error messages that appear in the
logs when it happens.
In addition to a fix, this patch also includes a regression test that
attempts to exercise a code path likely to trigger the bug.
After the fix, I looped the test 200x with 4 stress threads and it
succeeded 100% of the time:
http://dist-test.cloudera.org/job?job_id=mpercy.1505872165.27701
To verify that the issue was not a regression in 1.5.0, I ran it against
the 1.4 branch and it failed 100% of the time:
http://dist-test.cloudera.org/job?job_id=mpercy.1505872429.29509
Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c
---
M src/kudu/consensus/consensus_meta-test.cc
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.cc
A src/kudu/integration-tests/raft_config_change-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master-path-handlers.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager-test.cc
12 files changed, 191 insertions(+), 39 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/8109/2
--
To view, visit http://gerrit.cloudera.org:8080/8109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>