Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/7439 )
Change subject: tablet: introduce closed mvcc and stopped tablets ...................................................................... Patch Set 34: (13 comments) http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc File src/kudu/integration-tests/stop_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc@137 PS33, Line 137: // Ensure that stopping a tablet will not prevent scans from completing. > I don't really understand what is being tested here. With 3 replicas, what We don't need to be guaranteed that the specific replica that has been stopped is being scanned, since the property we're trying to enforce here is that a single stopped replica won't stop scans (and we can rely on slight RNG to ensure we have coverage for stopped vs not stopped). http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc@200 PS33, Line 200: a new leader will not be elected > This will likely be flaky because under load, IO blips can cause new leader I've looped this a couple thousand times with some stress in TSAN and only seen "normal" stressed tsan errors http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/rowset_metadata.h File src/kudu/tablet/rowset_metadata.h: http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/rowset_metadata.h@85 PS32, Line 85: Status Flush() const; > nit: it's pretty weird to mark this const, since it's technically mutating Hrm, sorry for the distraction, I was confused about why it wasn't, since it's not mutating the RowSetMetadata. I came to the conclusion that only CommitUpdate() was mutating the in-memory state by mentally marking this as const, so I thought I'd actually mark it const. Should've been in a different patch. http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h@371 PS32, Line 371: col_ids > nit: just for future reference, doing random renames in a big patch is a bi This was to appease tidybot, which I suppose isn't all too important; sorry about that. http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h@668 PS32, Line 668: > nits: Done http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.h File src/kudu/tablet/tablet.h: http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.h@457 PS31, Line 457: switch (s) { > it would be nice to assert on the state transitions in this function to enf Done http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc@320 PS31, Line 320: CancelMaintenanceOps(); > Why do we need to make this call in Stop() ? Can't the ops just cancel them Hrmm, I suppose they could. I put it here since the convention seemed to be that the "owner" of the op should control the lifecycle of op. Also a single call like this makes use of the fact that ops are subclassed (vs if we had them check the tablet state, we'd put that checking code in each op's implementation). http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc@1398 PS31, Line 1398: // operation to finish in Unregister(), a different one can't get re-scheduled. > If we don't want to hold the lock while unregistering the ops then we need Done. This function is only ever called from a single thread, so copying is sufficient. http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.cc@1389 PS32, Line 1389: maintenance_ops_.swap(maintenance_ops); : } > is this sufficient protection here for this method? My worry is that, if yo It is possible that the Stop() would cancel nothing, but UnregisterOps() and RegisterOps() are externally synchronized (in bootstrapping code and replica shutdown) in such a way that this would never be called at the same time as UnregisterOps(). Will add a DFAKE_MUTEX to enforce this. And it's ok to cancel nothing here because a call to UnregisterOps() should be impending. http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc@874 PS33, Line 874: } > warning: redundant boolean literal in conditional return statement [readabi Done http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc@952 PS33, Line 952: } : return s; : : case RowOperationsPB::UPDATE: : case RowOperationsPB::DELETE: : s = MutateRowUnlocked(tx_state, row_op, stats); : if (s.IsNotFound()) { : return Status::OK(); : } : return s; : : default: : LOG_WITH_ > these look like good changes... but are they at all related to the theme of Hrm, somewhat, somewhat not. A part of this patch is about handling errors in the stopped state, where before we would not expect errors. This is a case where before we would never get errors, but we can conceivably have errors here, and we should handle them appropriately, since this is on the Apply path. http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/transactions/transaction_driver.cc@518 PS31, Line 518: HandleFailure(s); > Doesn't HandleFailure already do this? As long as we document that it takes Done http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tserver/tablet_service.cc@1764 PS31, Line 1764: fail > typo: fail Done -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-Change-Number: 7439 Gerrit-PatchSet: 34 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Tue, 21 Nov 2017 07:38:57 +0000 Gerrit-HasComments: Yes
