Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12462 )
Change subject: KUDU-2690: don't roll log schema on failed alter ...................................................................... Patch Set 2: (21 comments) http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc File src/kudu/integration-tests/alter_table-test.cc: http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2151 PS2, Line 2151: Kudu2690Test() {} I think you can omit this? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2167 PS2, Line 2167: void InsertIntRows(int start_row, int num_rows) { Can you convert the CHECK_OKs in here to ASSERT_OK? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2175 PS2, Line 2175: gscoped_ptr<KuduInsert> insert(table->NewInsert()); unique_ptr for new code http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2217 PS2, Line 2217: scoped_refptr<TabletReplica> ts_replica = LookupTabletReplica(); : const string ts_uuid = cluster_->mini_tablet_server(0)->uuid(); : const string tablet_id = ts_replica->tablet_id(); : ts_replica.reset(); Maybe combine this? const string ts_uuid = ...; const string tablet_id = LookupTabletReplica()->tablet_id(); http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2237 PS2, Line 2237: step->mutable_add_column()->mutable_schema()); Nit: indentation http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2254 PS2, Line 2254: // Manually put together an alter schema request from the master. Even : // though we're manually putting this request together, because we wrote to : // the SysCatalog above, it reflects what a real client request would lead : // the master to send to tservers. Not sure why we need to involve the master at all. Seems like we could simplify the test if we avoided it (maybe we could shut down the master or disable heartbeating or something?). I get that it's more similar to a real-world scenario in this way, but it also means more moving parts are involved in the test and makes it harder to maintain. Plus the direct manipulation of the catalog manager and sys catalog is decidedly inorganic. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2263 PS2, Line 2263: l.Unlock(); It'll unlock automatically when it goes out of scope. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2270 PS2, Line 2270: std::shared_ptr<master::TSDescriptor> ts_desc; using http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2271 PS2, Line 2271: cluster_->mini_master()->master() Can use master-> here http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2276 PS2, Line 2276: vector<std::thread> ts; Nit: we usually use 'ts' variables to name tservers; could you rename this? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2282 PS2, Line 2282: WARN_NOT_OK(ts_proxy->AlterSchema(tserver_req, &tserver_resp, &rpc), I think the test would be net simpler if you avoided sending RPCs altogether and instead called TabletReplica::SubmitAlterSchema directly. I don't see anything useful in TabletServiceAdminImpl::AlterSchema, apart from the schema version early out (do you need that?). http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2295 PS2, Line 2295: // Roll over onto a new segment and chomp off the alter schema request. Where's the chomping? Oh, I guess that's below, in RunLogGC(). The comment placement here suggested it happened in one of the next two lines. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2305 PS2, Line 2305: ASSERT_EVENTUALLY([&] { Why does this need to be wrapped in ASSERT_EVENTUALLY? I can take my answer in the form of a code comment. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/integration-tests/alter_table-test.cc@2323 PS2, Line 2323: LOG(INFO) << "Tablet is running"; Probably not necessary; the test will pass if the ASSERT_EQ is satisfied. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc@1450 PS2, Line 1450: // If we put a result in the commit message, it should be an error and we : // don't need to replay it. This doesn't match up with the code just below, which allows for there to be a result _without_ a failed status. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/tablet_bootstrap.cc@1476 PS2, Line 1476: // Note: If the alter _didn't_ complete successfully, that should have been : // captured by 'commit_msg' above, unless it was written before we started : // putting AlterSchema results into the commit messages. Could you pull this info and the info on L1450-1451 into one block comment that describes the overall failure tri-state of "alter succeeded", "on-disk alter is failed one, skip", and "on-disk alter 'succeeded' but then the alter failed to replay"? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.h File src/kudu/tablet/transactions/alter_schema_transaction.h: http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.h@97 PS2, Line 97: // Returns the 'error_' in the form of an op result. 'error_' must have been : // set to an error upon calling this. : OperationResultPB error() const; : : // Whether the alter completed had an error in applying. An alter transaction : // can be completed but not have the underlying alter succeed, e.g. if the : // "new" schema has an older schema version than the current one. : bool has_error() const; Why not combine these two and return an optional<OperationResultPB>? Distinguishing between error() and has_error() has a decidedly protobuf feel to it, but this isn't a PB class so I'm not sure what we gain from that verbose style. http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc File src/kudu/tablet/transactions/alter_schema_transaction.cc: http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@67 PS2, Line 67: error_ = OperationResultPB(); : StatusToPB(s, error_->mutable_failed_status()); This is a non-atomic "set" operation. Doesn't seem like that matters (error_'s getters and setter seem to be used by a single thread), but maybe it does? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@84 PS2, Line 84: string AlterSchemaTransactionState::ToString() const { Want to include error_ here? Or does that complicate locking? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@145 PS2, Line 145: *result->add_ops() = state_->error(); Can we std::move() the error into the op? http://gerrit.cloudera.org:8080/#/c/12462/2/src/kudu/tablet/transactions/alter_schema_transaction.cc@153 PS2, Line 153: // Altered tablets should be included in the next tserver heartbeat so that : // clients waiting on IsAlterTableDone() are unblocked promptly. : state_->tablet_replica()->MarkTabletDirty("Alter schema finished"); Do we need to do this if there was an error? -- To view, visit http://gerrit.cloudera.org:8080/12462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id761851741297e29a4666bec0c34fc4f7285f715 Gerrit-Change-Number: 12462 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 15 Feb 2019 19:33:34 +0000 Gerrit-HasComments: Yes
