David Ribeiro Alves has submitted this change and it was merged.

Change subject: [catalog_manager] categorization of rw operation failures
......................................................................


[catalog_manager] categorization of rw operation failures

This changelist introduces the categorization of the system catalog's
read and write operation failures which happen on leader post-election
callback. There are two categories of errors: fatal and non-fatal.

If an operation against the system catalog fails in between terms of
the catalog leadership, the error is considered non-fatal. In case of
a non-fatal error the leader post-election task bails out: the catalog
is no longer the leader at the original term and the task should be
executed by the new leader upon execution of the ElectedAsLeaderCb.

If an operation against the system catalog fails within the same term
of catalog leadership, the error is considered fatal and that causes the
master process to crash (with an exception of writing a newly generated
TSK when the TokenSigner still has a TSK to use). This is to avoid a
possible inconsistency when working with the tables/tablets metadata,
the IPKI certificate authority information and the Token Signing Keys.

Any failure of a read or write operation against the system catalog
happened during the catalog's shutdown is ignored and the leader
post-election task bails out once detecting such failure.

The same policy applies to other (i.e. not specific to read and write
operations against the system catalog) errors which might happen while
working with the IPKI certificate authority information and TokenSigner.
The rationale is the same as for handling the system catalog operation
failures: in case of an error, the leader has no consistent information
to work with, meanwhile a non-leader does not use the information
affected by the failure at all and can safely ignore the error.

Added a test to verify that the master server does not crash if change
of leadership detected while trying to persist a newly generated TSK
(Token Signing Key).

Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Reviewed-on: http://gerrit.cloudera.org:8080/6170
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <[email protected]>
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/catalog_manager_tsk-itest.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/master/master-test.cc
M src/kudu/master/master_service.cc
M src/kudu/master/sys_catalog-test.cc
7 files changed, 506 insertions(+), 149 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6170
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8
Gerrit-PatchSet: 31
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>

Reply via email to