Mike Percy has submitted this change and it was merged.

Change subject: consensus: KUDU-2147. Unknown leader should not be treated as 
valid UUID
......................................................................


consensus: KUDU-2147. Unknown leader should not be treated as valid UUID

This patch fixes a rare, long-standing issue that has existed since at
least 1.4.0, probably much earlier. It appears that the leader election
thread pool changes in 1.5.0 made this problem less rare than it
previously was.

The summary of the issue is that, prior to this fix, it was possible for
the master to believe that no leader existed for a tablet after a
configuration change when in fact a leader did exist. This could be
triggered if the cluster experiences an election storm in the middle or
right after a configuration change. One workaround for this situation is
to restart the tablet server where the leader replica currently resides.
See KUDU-2147 for an example of the error messages that appear in the
logs when it happens.

In addition to a fix, this patch also includes a regression test that
attempts to exercise a code path likely to trigger the bug.

After the fix, I looped the test 200x with 4 stress threads and it
succeeded 100% of the time:
http://dist-test.cloudera.org/job?job_id=mpercy.1505872165.27701

To verify that the issue was not a regression in 1.5.0, I ran it against
the 1.4 branch and it failed 100% of the time:
http://dist-test.cloudera.org/job?job_id=mpercy.1505872429.29509

Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c
Reviewed-on: http://gerrit.cloudera.org:8080/8109
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
Reviewed-by: Adar Dembo <a...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/consensus/consensus_meta-test.cc
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/quorum_util.cc
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/cluster_itest_util.cc
A src/kudu/integration-tests/raft_config_change-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/master-path-handlers.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tserver/heartbeater.cc
M src/kudu/tserver/ts_tablet_manager-test.cc
12 files changed, 216 insertions(+), 39 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/8109
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ie882d05fc58e55836edc0235d14974e65125df6c
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to