Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 )
Change subject: KUDU-2290: Tool to re-create a tablet ...................................................................... Patch Set 7: (21 comments) Are you comfortable merging this given the risk of KUDU-2376? The same segmentation fault will happen, and is captured by the disabled replace_tablet-itest, just like the test case I posted in the JIRA for alter table. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@97 PS7, Line 97: using master::ReplaceTabletRequestPB; > warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@98 PS7, Line 98: using master::ReplaceTabletResponsePB; > warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using- Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101 PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest) > Could you follow the approach Todd took in commit 1c1d3ba to decide what va B-but the test always fails and is disabled, because of KUDU-2376. Can I add a TODO here instead? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@297 PS7, Line 297: // The tablet is gone because it was already replaced or deleted. > Should we still wait in this case? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@468 PS7, Line 468: LOG(INFO) << "Tables created: " << num_tables_created_.Load(); > What's the distribution of operations before and after your change? Curious The numbers vary a lot, but it's roughly the same before and after. I see (with a debug build) ~70 tables created, ~45 deleted or altered, and about 45 tablets replaced. There's 5 master restarts or so. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc File src/kudu/integration-tests/replace_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@63 PS7, Line 63: CHECK(tablet_id); > Not needed? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@76 PS7, Line 76: CHECK(proxy); > Not needed? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4324 PS7, Line 4324: new_tablet = scoped_refptr<TabletInfo>(new TabletInfo(table, GenerateId())); > How about: Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4346 PS7, Line 4346: // Ensure we abort the mutations if persisting the changes fails. > Don't the l_table and l_tablets destructors abort automatically? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4358 PS7, Line 4358: actions.tablets_to_delete.push_back(old_tablet); > Hmm, this should be in tablets_to_update, no? We never actually delete entr Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376 PS7, Line 4376: table->AddRemoveTablets({new_tablet}, {old_tablet}); : l_tablets.Commit(); : l_table.Commit(); > Are you sure these three should happen with the lock_ held? They acquire lo Does the following order work? 1. table->AddRemoveTablets 2. l_tablets.Commit() l(lock_) { 3. Insert to tablet map } 4. l_table.Commit() We need tablet + table locks when adding and removing from the table, and we want to commit the tablet state before publishing the new tablet to the tablet map so uninitialized state won't be visible. However, we need to add the tablet to the tablet map before committing the changes to the table, since if we commit table changes before altering the tablet map there's a window where a GetTableLocations would see the new tablet as part of the table but no info about the new tablet would be available from GetTabletLocations. However, I'm worried about the interval between 2 + 4 where a GetTableLocations would see the old tablet as deleted. I wanted to ensure that GetTableLocations sees either a non-deleted old tablet or a new tablet. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4385 PS7, Line 4385: background_tasks_->Wake(); > This is for the newly created tablet, right? Can you update the comment jus Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@39 PS7, Line 39: class faststring; > warning: invalid case style for class 'faststring' [readability-identifier- Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@155 PS7, Line 155: LeaderMasterProxy() = default; > Nit: indentation. Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@117 PS7, Line 117: using consensus::ConsensusServiceProxy; > warning: using decl 'ConsensusServiceProxy' is unused [misc-unused-using-de Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@122 PS7, Line 122: using master::ListMastersRequestPB; > warning: using decl 'ListMastersRequestPB' is unused [misc-unused-using-dec Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@123 PS7, Line 123: using master::ListMastersResponsePB; > warning: using decl 'ListMastersResponsePB' is unused [misc-unused-using-de Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@124 PS7, Line 124: using master::ListTabletServersRequestPB; > warning: using decl 'ListTabletServersRequestPB' is unused [misc-unused-usi Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@125 PS7, Line 125: using master::ListTabletServersResponsePB; > warning: using decl 'ListTabletServersResponsePB' is unused [misc-unused-us Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@127 PS7, Line 127: using master::ReplaceTabletRequestPB; > warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.cc@128 PS7, Line 128: using master::ReplaceTabletResponsePB; > warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using- Ack -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley <wdberke...@gmail.com> Gerrit-Reviewer: Adar Dembo <a...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Mon, 02 Apr 2018 23:58:14 +0000 Gerrit-HasComments: Yes