Andrew Wong has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7820

Change subject: Fix flaky ts_recovery-itest TestChangeMaxCellSize
......................................................................

Fix flaky ts_recovery-itest TestChangeMaxCellSize

In TSTabletManager::CreateReportedTabletPB(), each replica's state is
reported, and if this state is FAILED, it additionally reports the error
state that caused it to fail. However, replica state is transient, and
can, on rare occasion, change between these two retrievals of replica
state.

This led to a DCHECK failure in ts_recovery-itest when parsing a
report, as reports with errors are expected to be FAILED. The failure
was consistent with the following sequence of events:
  1. The tablet is opened, hits an error, and starts shutting down
  2. The report notes the replica state (STOPPING)
  3. The tablet finishes shutting down and switches its state (FAILED)
  4. The report, seeing the replica is FAILED, notes the error that
     caused the shut down
  5. The catalog manager goes through the report, and notes the error
     tacked onto the report and the fact that the tablet is STOPPING
     instead of FAILED. This fails the DCHECK, as logged below:

BlockManagerType/TsRecoveryITest.TestChangeMaxCellSize/0: 
catalog_manager.cc:2489] Check failed: report.state() == tablet::FAILED (3 vs. 
2)

This sequence may be more frequent since a9d17c0, which enforces the
tablet stop (i.e. go into the STOPPING state and shut down the replica's
internals) before going into the FAILED state, instead of going directly
to the FAILED state.

This was validated by injecting a pause in the CreateReportedTabletPB
between the two calls and hitting the DCHECK failure. This is fixed by
reporting the error regardless of state and removing this DCHECK.

Change-Id: I6f0a6b19756777e5f4081ef6c8cb5af4ecc8a3d6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/tserver/ts_tablet_manager.cc
2 files changed, 2 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/7820/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7820
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6f0a6b19756777e5f4081ef6c8cb5af4ecc8a3d6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong <[email protected]>

Reply via email to