Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12462 )

Change subject: KUDU-2690: don't roll log schema on failed alter
......................................................................


Patch Set 2:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc
File src/kudu/integration-tests/alter_table-test.cc:

http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2151
PS2, Line 2151:   Kudu2690Test() {}
I think you can omit this?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2167
PS2, Line 2167:   void InsertIntRows(int start_row, int num_rows) {
Can you convert the CHECK_OKs in here to ASSERT_OK?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2175
PS2, Line 2175:       gscoped_ptr<KuduInsert> insert(table->NewInsert());
unique_ptr for new code


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2217
PS2, Line 2217:   scoped_refptr<TabletReplica> ts_replica = 
LookupTabletReplica();
              :   const string ts_uuid = 
cluster_->mini_tablet_server(0)->uuid();
              :   const string tablet_id = ts_replica->tablet_id();
              :   ts_replica.reset();
Maybe combine this?

  const string ts_uuid = ...;
  const string tablet_id = LookupTabletReplica()->tablet_id();


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2237
PS2, Line 2237:                     
step->mutable_add_column()->mutable_schema());
Nit: indentation


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2254
PS2, Line 2254:     // Manually put together an alter schema request from the 
master. Even
              :     // though we're manually putting this request together, 
because we wrote to
              :     // the SysCatalog above, it reflects what a real client 
request would lead
              :     // the master to send to tservers.
Not sure why we need to involve the master at all. Seems like we could simplify 
the test if we avoided it (maybe we could shut down the master or disable 
heartbeating or something?). I get that it's more similar to a real-world 
scenario in this way, but it also means more moving parts are involved in the 
test and makes it harder to maintain. Plus the direct manipulation of the 
catalog manager and sys catalog is decidedly inorganic.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2263
PS2, Line 2263:     l.Unlock();
It'll unlock automatically when it goes out of scope.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2270
PS2, Line 2270:   std::shared_ptr<master::TSDescriptor> ts_desc;
using


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2271
PS2, Line 2271: cluster_->mini_master()->master()
Can use master-> here


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2276
PS2, Line 2276:   vector<std::thread> ts;
Nit: we usually use 'ts' variables to name tservers; could you rename this?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2282
PS2, Line 2282:       WARN_NOT_OK(ts_proxy->AlterSchema(tserver_req, 
&tserver_resp, &rpc),
I think the test would be net simpler if you avoided sending RPCs altogether 
and instead called TabletReplica::SubmitAlterSchema directly. I don't see 
anything useful in TabletServiceAdminImpl::AlterSchema, apart from the schema 
version early out (do you need that?).


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2295
PS2, Line 2295:   // Roll over onto a new segment and chomp off the alter 
schema request.
Where's the chomping?

Oh, I guess that's below, in RunLogGC(). The comment placement here suggested 
it happened in one of the next two lines.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2305
PS2, Line 2305:   ASSERT_EVENTUALLY([&] {
Why does this need to be wrapped in ASSERT_EVENTUALLY? I can take my answer in 
the form of a code comment.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2323
PS2, Line 2323:     LOG(INFO) << "Tablet is running";
Probably not necessary; the test will pass if the ASSERT_EQ is satisfied.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc
File src/kudu/tablet/tablet_bootstrap.cc:

http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc@1450
PS2, Line 1450:     // If we put a result in the commit message, it should be 
an error and we
              :     // don't need to replay it.
This doesn't match up with the code just below, which allows for there to be a 
result _without_ a failed status.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc@1476
PS2, Line 1476:     // Note: If the alter _didn't_ complete successfully, that 
should have been
              :     // captured by 'commit_msg' above, unless it was written 
before we started
              :     // putting AlterSchema results into the commit messages.
Could you pull this info and the info on L1450-1451 into one block comment that 
describes the overall failure tri-state of "alter succeeded", "on-disk alter is 
failed one, skip", and "on-disk alter 'succeeded' but then the alter failed to 
replay"?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.h
File src/kudu/tablet/transactions/alter_schema_transaction.h:

http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.h@97
PS2, Line 97:   // Returns the 'error_' in the form of an op result. 'error_' 
must have been
            :   // set to an error upon calling this.
            :   OperationResultPB error() const;
            :
            :   // Whether the alter completed had an error in applying. An 
alter transaction
            :   // can be completed but not have the underlying alter succeed, 
e.g. if the
            :   // "new" schema has an older schema version than the current 
one.
            :   bool has_error() const;
Why not combine these two and return an optional<OperationResultPB>? 
Distinguishing between error() and has_error() has a decidedly protobuf feel to 
it, but this isn't a PB class so I'm not sure what we gain from that verbose 
style.


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc
File src/kudu/tablet/transactions/alter_schema_transaction.cc:

http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@67
PS2, Line 67:   error_ = OperationResultPB();
            :   StatusToPB(s, error_->mutable_failed_status());
This is a non-atomic "set" operation. Doesn't seem like that matters (error_'s 
getters and setter seem to be used by a single thread), but maybe it does?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@84
PS2, Line 84: string AlterSchemaTransactionState::ToString() const {
Want to include error_ here? Or does that complicate locking?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@145
PS2, Line 145:     *result->add_ops() = state_->error();
Can we std::move() the error into the op?


http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@153
PS2, Line 153:   // Altered tablets should be included in the next tserver 
heartbeat so that
             :   // clients waiting on IsAlterTableDone() are unblocked 
promptly.
             :   state_->tablet_replica()->MarkTabletDirty("Alter schema 
finished");
Do we need to do this if there was an error?



--
To view, visit http://gerrit.cloudera.org:8080/12462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id761851741297e29a4666bec0c34fc4f7285f715
Gerrit-Change-Number: 12462
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 15 Feb 2019 19:33:34 +0000
Gerrit-HasComments: Yes

Reply via email to