Mike Percy has submitted this change and it was merged.
Change subject: Fix flaky ts_recovery-itest TestChangeMaxCellSize
......................................................................
Fix flaky ts_recovery-itest TestChangeMaxCellSize
In TSTabletManager::CreateReportedTabletPB(), each replica's state is
reported, and if this state is FAILED, it additionally reports the error
state that caused it to fail. However, replica state is transient, and
can, on rare occasion, change between these two retrievals of replica
state.
This led to a DCHECK failure in ts_recovery-itest when parsing a
report, as reports with errors are expected to be FAILED. The failure
was consistent with the following sequence of events:
1. The tablet is opened, hits an error, and starts shutting down
2. The report notes the replica state (STOPPING)
3. The tablet finishes shutting down and switches its state (FAILED)
4. The report, seeing the replica is FAILED, notes the error that
caused the shut down
5. The catalog manager goes through the report, and notes the error
tacked onto the report and the fact that the tablet is STOPPING
instead of FAILED. This fails the DCHECK, as logged below:
BlockManagerType/TsRecoveryITest.TestChangeMaxCellSize/0:
catalog_manager.cc:2489] Check failed: report.state() == tablet::FAILED (3 vs.
2)
This sequence may be more frequent since a9d17c0, which enforces the
tablet stops (i.e. go into the STOPPING state and shut down the
replica's internals) before going into the FAILED state, instead of
going directly to the FAILED state.
Validated by injecting a pause in the CreateReportedTabletPB between the
two calls and hitting the DCHECK failure. This is fixed by reporting the
error regardless of state and removing this DCHECK.
Change-Id: I6f0a6b19756777e5f4081ef6c8cb5af4ecc8a3d6
Reviewed-on: http://gerrit.cloudera.org:8080/7820
Tested-by: Kudu Jenkins
Reviewed-by: Mike Percy <[email protected]>
---
M src/kudu/master/catalog_manager.cc
M src/kudu/tserver/ts_tablet_manager.cc
2 files changed, 4 insertions(+), 5 deletions(-)
Approvals:
Mike Percy: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/7820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I6f0a6b19756777e5f4081ef6c8cb5af4ecc8a3d6
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>