Hello David Ribeiro Alves, Adar Dembo, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6170

to look at the new patch set (#22).

Change subject: [catalog_manager] categorization of rw operation failures
......................................................................

[catalog_manager] categorization of rw operation failures

This changelist introduces the categorization of the system catalog's
read and write operation failures which happen on leader post-election
callback.  There are two categories of errors: fatal and non-fatal.

If an operation against the system catalog fails in between terms of
the catalog leadership, the error is considered non-fatal.  In case of
a non-fatal error the leader post-election task bails out: the catalog
is no longer the leader at the original term and the task should be
executed by the new leader upon ElectedAsLeaderCb.

If an operation against the system catalog fails at the same term of the
catalog leadership, the error is considered fatal and that causes the
master process to crash.  This is to avoid possible inconsistency when
working with the tables/tablets metadata, the IPKI certificate authority
information and TSKs (Token Signing Keys).

Any failure of a read or write operation against the system catalog
happened during the catalog's shutdown is ignored and the leader
post-election task bails out once detecting such failure.

The same policy applies to other (i.e. not specific to read and write
operations against the system catalog) errors which might happen while
working with the IPKI certificate authority information and TokenSigner.
The rationale is the same as for handling the system catalog operation
failures: in case of an error, the leader has no consistent information
to work with, meanwhile a non-leader does not use the information
affected by the failure at all and can safely ignore the error.

Added a test to verify that the master server does not crash if change
of leadership detected while trying to persist a newly generated TSK
(Token Signing Key).

Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/catalog_manager_tsk-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master-test.cc
M src/kudu/master/sys_catalog-test.cc
6 files changed, 420 insertions(+), 133 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/6170/22
-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 22
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <a...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <danburk...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dral...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>

Reply via email to