Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9266 )
Change subject: KUDU-2274. RaftConsensus should not access cmeta when shutdown ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/integration-tests/ts_tablet_manager-itest.cc@412 PS2, Line 412: if (!s.ok()) continue; // This replica is shut down or failed. Would it make sense to check for the specific type of the non-OK status here, and continue only in case of IsServiceUnavailable()? http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/master/catalog_manager.cc@4365 PS2, Line 4365: catalog_status_ = s.CloneAndPrepend("ConsensusState is not available"); I think it's worth explicitly setting the status to ServiceUnavailable() here: there is some logic that assumes that other components would retry in that case. Or just add DCHECK(s.IsServiceUnavailable()) here. I'm just concerned about the behavior of the clients trying to connect to the master when if we return anything but ServiceUnavailable here. http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_copy_source_session.cc@194 PS2, Line 194: RETURN_NOT_OK_PREPEND(consensus->ConsensusState(&cstate), Maybe, it's worth adding DCHECK_OK() here as well? Isn't a logical error to initiate tablet copy source session from a tablet which has been shutdown? Or that's acceptable during initialization of a tablet server? http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_copy_source_session.cc@194 PS2, Line 194: RETURN_NOT_OK_PREPEND(consensus->ConsensusState(&cstate), : "Consensus state not available"); Would RETURN_NOT_OK_PREPEND(consensus->ConsensusState(&initial_cstate_), "Consensus state not available"); be acceptable here? http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_server-test.cc@423 PS2, Line 423: replica->SetError(Status::IOError("This error will leave the replica FAILED state at shutdown")); : replica->Shutdown(); nit: maybe, separate these two cases to make sure that having just one is enough to get ServiceUnavailable error on while trying to call RaftConsensus::ConsensusState() method. http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tablet_service.cc@1220 PS2, Line 1220: continue; Does it make sense to add before continue DCHECK(s.IsServiceUnavailable()); ? http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tserver_path_handlers.cc File src/kudu/tserver/tserver_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/9266/2/src/kudu/tserver/tserver_path_handlers.cc@294 PS2, Line 294: Status s = consensus->ConsensusState(&cstate); : if (s.ok()) { : consensus_state_html = ConsensusStatePBToHtml(cstate); : } nit: maybe just if (consensus->ConsensusState(&cstate).ok()) { consensus_state_html = ConsensusStatePBToHtml(cstate); } -- To view, visit http://gerrit.cloudera.org:8080/9266 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib3f3f75674d5739b281e7c689ddb2d433b9b7415 Gerrit-Change-Number: 9266 Gerrit-PatchSet: 2 Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Sat, 10 Feb 2018 02:22:52 +0000 Gerrit-HasComments: Yes
