Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7439 )

Change subject: tablet: introduce closed mvcc and stopped tablets
......................................................................


Patch Set 34:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc
File src/kudu/integration-tests/stop_tablet-itest.cc:

http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc@137
PS33, Line 137: // Ensure that stopping a tablet will not prevent scans from 
completing.
> I don't really understand what is being tested here. With 3 replicas, what
We don't need to be guaranteed that the specific replica that has been stopped 
is being scanned, since the property we're trying to enforce here is that a 
single stopped replica won't stop scans (and we can rely on slight RNG to 
ensure we have coverage for stopped vs not stopped).


http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/integration-tests/stop_tablet-itest.cc@200
PS33, Line 200: a new leader will not be elected
> This will likely be flaky because under load, IO blips can cause new leader
I've looped this a couple thousand times with some stress in TSAN and only seen 
"normal" stressed tsan errors


http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/rowset_metadata.h
File src/kudu/tablet/rowset_metadata.h:

http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/rowset_metadata.h@85
PS32, Line 85:   Status Flush() const;
> nit: it's pretty weird to mark this const, since it's technically mutating
Hrm, sorry for the distraction, I was confused about why it wasn't, since it's 
not mutating the RowSetMetadata. I came to the conclusion that only 
CommitUpdate() was mutating the in-memory state by mentally marking this as 
const, so I thought I'd actually mark it const.

Should've been in a different patch.


http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h@371
PS32, Line 371: col_ids
> nit: just for future reference, doing random renames in a big patch is a bi
This was to appease tidybot, which I suppose isn't all too important; sorry 
about that.


http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.h@668
PS32, Line 668:
> nits:
Done


http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.h@457
PS31, Line 457:     switch (s) {
> it would be nice to assert on the state transitions in this function to enf
Done


http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc@320
PS31, Line 320:   CancelMaintenanceOps();
> Why do we need to make this call in Stop() ? Can't the ops just cancel them
Hrmm, I suppose they could. I put it here since the convention seemed to be 
that the "owner" of the op should control the lifecycle of op.

Also a single call like this makes use of the fact that ops are subclassed (vs 
if we had them check the tablet state, we'd put that checking code in each op's 
implementation).


http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/tablet.cc@1398
PS31, Line 1398:   // operation to finish in Unregister(), a different one 
can't get re-scheduled.
> If we don't want to hold the lock while unregistering the ops then we need
Done. This function is only ever called from a single thread, so copying is 
sufficient.


http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/32/src/kudu/tablet/tablet.cc@1389
PS32, Line 1389:  maintenance_ops_.swap(maintenance_ops);
               : }
> is this sufficient protection here for this method? My worry is that, if yo
It is possible that the Stop() would cancel nothing, but UnregisterOps() and 
RegisterOps() are externally synchronized (in bootstrapping code and replica 
shutdown) in such a way that this would never be called at the same time as 
UnregisterOps(). Will add a DFAKE_MUTEX to enforce this.

And it's ok to cancel nothing here because a call to UnregisterOps() should be 
impending.


http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc@874
PS33, Line 874: }
> warning: redundant boolean literal in conditional return statement [readabi
Done


http://gerrit.cloudera.org:8080/#/c/7439/33/src/kudu/tablet/tablet.cc@952
PS33, Line 952:       }
              :       return s;
              :
              :     case RowOperationsPB::UPDATE:
              :     case RowOperationsPB::DELETE:
              :       s = MutateRowUnlocked(tx_state, row_op, stats);
              :       if (s.IsNotFound()) {
              :         return Status::OK();
              :       }
              :       return s;
              :
              :     default:
              :       LOG_WITH_
> these look like good changes... but are they at all related to the theme of
Hrm, somewhat, somewhat not. A part of this patch is about handling errors in 
the stopped state, where before we would not expect errors.

This is a case where before we would never get errors, but we can conceivably 
have errors here, and we should handle them appropriately, since this is on the 
Apply path.


http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tablet/transactions/transaction_driver.cc@518
PS31, Line 518:       HandleFailure(s);
> Doesn't HandleFailure already do this? As long as we document that it takes
Done


http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/7439/31/src/kudu/tserver/tablet_service.cc@1764
PS31, Line 1764: fail
> typo: fail
Done



--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 34
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Tue, 21 Nov 2017 07:38:57 +0000
Gerrit-HasComments: Yes

Reply via email to