Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/7439 )

Change subject: mvcc: allow tablet shutdown without completing txs
......................................................................


Patch Set 21:

(26 comments)

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc
File src/kudu/integration-tests/ts_tablet_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@77
PS20, Line 77: using itest::SimpleIntKeyKuduSchema;
> warning: using decl 'InternalMiniCluster' is unused [misc-unused-using-decl
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@130
PS20, Line 130:   ASSERT_OK(cluster->Start());
> this should probably be less specific to implementation details
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@179
PS20, Line 179:   FLAGS_enable_leader_failure_detection = false;
> I'm not sure this test suite is the best place for these tests, since they
I moved to stop_tablet-itest.cc


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@183
PS20, Line 183:   FLAGS_allow_unsafe_replication_factor = true;
> here you're stopping a random server, not necessarily the one that the scan
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@187
PS20, Line 187:
> are the readers using fault-tolerant scans by default here?
Yeah they are (see test_workload.cc:219)


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@197
PS20, Line 197:
> Somewhere we have a utility function like AssertNoCrashes(). Otherwise mayb
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@205
PS20, Line 205:   ValueDeleter deleter(&ts_map);
> in these tests it might be beneficial to configure the MM such that it cons
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@235
PS20, Line 235:
> is it possible in this test to re-start the tserver that has the stopped ta
I think it'd be a bit tricky in this patch since this makes use of IMC instead 
of EMC, unless there's a ClusterVerifier equivalent for IMCs that I'm not aware 
of. Good point though, in EMC tests I've been verifying with ClusterVerifier.

Trickier still, this actually makes use of the fact that we're using IMC by 
stopping the tablet directly by reaching into the server, so I'd prefer to keep 
this test IMC.


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/integration-tests/ts_tablet_manager-itest.cc@238
PS20, Line 238:       // The MarkDirty() callback is on an async thread so it 
might take the
> can you combine this test with the above one, using a utility function that
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc
File src/kudu/tablet/mvcc-test.cc:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc@498
PS20, Line 498: waiting_thread.join();
              :   ASSERT_OK(s);
> don't you want to join before assert here?
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc-test.cc@598
PS20, Line 598:   waiting_thread.join();
              :   ASSERT_STR_CONTAINS(s.ToString(), "closed");
              :   ASSERT_TRUE(s.IsAborted());
              :
              :   // New waiters should abort immediately.
              :   s = mgr.WaitForApplyingTransactionsToCommit();
              :
> if you reordered the join and the ASSERT then you could avoid the complexit
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h
File src/kudu/tablet/mvcc.h:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@253
PS20, Line 253: Status::
> specify the type of status
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@254
PS20, Line 254:   Status WaitForApplyingTransactionsToCommit() const 
WARN_UNUSED_RESULT;
> worth adding a WARN_UNUSED_RESULT here
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@275
PS20, Line 275: not
> not?
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/mvcc.h@280
PS20, Line 280: e bool i
> rename to is_closed since it's inlinable short method
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@160
PS20, Line 160:   //
> mind adding a WARN_UNUSED_RESULT to these functions that you're changing fr
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@405
PS20, Line 405:
> nit: "should" is kinda fuzzy. Do you mean "will"? Nicer for callers to thin
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@405
PS20, Line 405:
              :   // Return handle to the metric entity of this tablet.
              :   const scoped
> Would be good to move this declaration up next to 'Shutdown()' and clarify
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@465
PS20, Line 465:   // into an internal UPDATE operation and performs it.
> add doc for what bad status means. Same above with the Apply calls.
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.h@537
PS20, Line 537:   // The 'old_ms' pointer will be set to the current MemRowSet 
set before the replacement.
              :   // If the MemRowSet is not empty it will be added to the 
'compaction' input
              :   // and the MemRowSet compaction
> can't you just use the corresponding method in mvcc?
You're right, although sometimes it's convenient to condition on a the status, 
and sometimes it's convenient to condition a bool. E.g. if we need to return an 
error in multiple places if the tablet has been Stopped(), this is more 
convenient than repeating:

 if (tablet->Stopped()) {
   return Status::Aborted("aborted message");
 }


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@806
PS20, Line 806:         // In order for failures here to be safe, the tablet 
must have been
> I think it's worth a comment here explaining why we expect it to already ha
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@827
PS20, Line 827: p();
> nit: how about "Error while checking roe presence"
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@839
PS20, Line 839: w_ops_bas
> PREDICT_FALSE?
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet.cc@900
PS20, Line 900: ol present = false;
> can't you use CheckNotStopped here
Done


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tablet/tablet_replica.cc@263
PS20, Line 263:
> is it ok to access tablet_ outside of state_change_lock_? can't quite recal
>From the comment around state_change_lock_, it seems it's only guarding 
>long-running lifecycle operations, which this isn't. lock_ seems necessary 
>though, since it accesses tablet_.

I think it was safe before just because we had the lock above serializing and 
deduplicating Stop() calls, but I'll add it up there.


http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/7439/20/src/kudu/tserver/tablet_service.cc@1731
PS20, Line 1731:         if (!s.ok()) {
               :           tmp_error_code = 
TabletServerErrorPB::INVALID_SNAPSHOT;
> this sentence seems incomplete, so what do we do?
Done, also moved this a bit lower and conditioned the error code on Stopped() 
instead.



--
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-Change-Number: 7439
Gerrit-PatchSet: 21
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mpe...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <t...@apache.org>
Gerrit-Comment-Date: Tue, 31 Oct 2017 00:38:24 +0000
Gerrit-HasComments: Yes

Reply via email to