David Ribeiro Alves has submitted this change and it was merged. Change subject: [catalog_manager] categorization of rw operation failures ......................................................................
[catalog_manager] categorization of rw operation failures This changelist introduces the categorization of the system catalog's read and write operation failures which happen on leader post-election callback. There are two categories of errors: fatal and non-fatal. If an operation against the system catalog fails in between terms of the catalog leadership, the error is considered non-fatal. In case of a non-fatal error the leader post-election task bails out: the catalog is no longer the leader at the original term and the task should be executed by the new leader upon execution of the ElectedAsLeaderCb. If an operation against the system catalog fails within the same term of catalog leadership, the error is considered fatal and that causes the master process to crash (with an exception of writing a newly generated TSK when the TokenSigner still has a TSK to use). This is to avoid a possible inconsistency when working with the tables/tablets metadata, the IPKI certificate authority information and the Token Signing Keys. Any failure of a read or write operation against the system catalog happened during the catalog's shutdown is ignored and the leader post-election task bails out once detecting such failure. The same policy applies to other (i.e. not specific to read and write operations against the system catalog) errors which might happen while working with the IPKI certificate authority information and TokenSigner. The rationale is the same as for handling the system catalog operation failures: in case of an error, the leader has no consistent information to work with, meanwhile a non-leader does not use the information affected by the failure at all and can safely ignore the error. Added a test to verify that the master server does not crash if change of leadership detected while trying to persist a newly generated TSK (Token Signing Key). Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Reviewed-on: http://gerrit.cloudera.org:8080/6170 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves <[email protected]> --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/catalog_manager_tsk-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master-test.cc M src/kudu/master/master_service.cc M src/kudu/master/sys_catalog-test.cc 7 files changed, 506 insertions(+), 149 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/6170 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I826826049e3c08a6c8345949690cbbedaea32ff8 Gerrit-PatchSet: 31 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]>
