[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Reviewed-on: http://gerrit.cloudera.org:8080/9393 Reviewed-by: Dan Burkert Reviewed-by: Adar Dembo Tested-by: Will Berkeley --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 494 insertions(+), 9 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Adar Dembo: Looks good to me, approved Will Berkeley: Verified -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 12 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 11: Verified+1 IWYU failures are bogus (they break the build if they are applied). -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 21:01:29 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has removed a vote on this change. Change subject: KUDU-2290: Tool to re-create a tablet .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 19:06:56 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 10: (1 comment) http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc@549 PS10, Line 549: cout << "Replaced tablet " << tablet_id > If it's not too fiddly, it might be nice to print the 'Replaced tablet ${ta Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 18:59:00 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 11: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 18:59:23 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#11). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 494 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/11 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 10: (1 comment) LGTM except a small nit! http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: http://gerrit.cloudera.org:8080/#/c/9393/10/src/kudu/tools/tool_action_tablet.cc@549 PS10, Line 549: cout << "Replaced tablet " << tablet_id If it's not too fiddly, it might be nice to print the 'Replaced tablet ${tablet_id} with tablet " portion to stderr and the new tablet ID to stdout. This will make it much easier to use this in a script in the future where the new tablet ID is needed. cerr << "Replaced tablet " << tablet_id << " with tablet "; cout << resp.replacement_tablet_id() << endl; -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 05 Apr 2018 17:18:50 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 10: Code-Review+1 I'd like Dan to take a look too. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 03 Apr 2018 21:56:32 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#10). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 493 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/10 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 9: (2 comments) http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4364 PS9, Line 4364: new_tablet->metadata().ReadLock(); > Nit: use a new TabletMetadataLock for this. Done http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4371 PS9, Line 4371: l_table.Commit(); > Actually, you don't need a TableMetadataLock at all; I misunderstood the co Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 03 Apr 2018 21:31:32 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/integration-tests/replace_tablet-itest.cc File src/kudu/integration-tests/replace_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/integration-tests/replace_tablet-itest.cc@90 PS9, Line 90: PROCESSOR Nit: PROCESSORS http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4364 PS9, Line 4364: new_tablet->metadata().ReadLock(); Nit: use a new TabletMetadataLock for this. http://gerrit.cloudera.org:8080/#/c/9393/9/src/kudu/master/catalog_manager.cc@4371 PS9, Line 4371: l_table.Commit(); Actually, you don't need a TableMetadataLock at all; I misunderstood the comment at the top of TableInfo::AddRemoveTablets: you need READ-level locks (or stronger) on the _tablets_, not the table. And unlike AlterTable, we're not making any metadata changes to the TableInfo (e.g. to its schema), so you can forgo its lock altogether. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 03 Apr 2018 17:40:31 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#9). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 496 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/9 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101 PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest) > Sorry, I missed that the disabled test was the only test in the file. Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376 PS7, Line 4376: LOG(INFO) << "ReplaceTablet: tablet " << old_tablet->id() : << " deleted and replaced by tablet " << new_tablet->id(); : resp->set_replaceme > 1. l_table doesn't actually need to be in WRITE mode; READ is sufficient. T I took a look at alter table and mirrored that, since it seemed right to me and it handles a similar situation where a range partition is dropped and added in one alter. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 03 Apr 2018 06:44:57 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 8: (2 comments) http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101 PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest) > B-but the test always fails and is disabled, because of KUDU-2376. Can I ad Sorry, I missed that the disabled test was the only test in the file. Add the TODO near the test itself so it'll be more obvious when it is reenabled. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376 PS7, Line 4376: LOG(INFO) << "ReplaceTablet: tablet " << old_tablet->id() : << " deleted and replaced by tablet " << new_tablet->id(); : resp->set_replaceme > Does the following order work? 1. l_table doesn't actually need to be in WRITE mode; READ is sufficient. That should simplify this a bit by eliminating step #4. 2. We should be able to follow what ProcessPendingAssignments does, right? It also replaces tablets. Looks like it first commits the tablets lock, then adds/removes the tablets to the table, then finally updates the tablet map. In general I'm not sure there's a deadlock free way to do this without exposing a window where GetTableLocations and GetTabletLocations aren't totally consistent with one another. From the looks of it, ProcessPendingAssignments is vulnerable to the window you're describing. Maybe it's able to get away with it because the table is expected to be generally unusable during this time (it's just after a CreateTable after all). Though, maybe the same code path is followed during an ALTER TABLE ADD PARTITION too? -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 03 Apr 2018 04:09:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#8). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 490 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/8 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 7: (21 comments) Are you comfortable merging this given the risk of KUDU-2376? The same segmentation fault will happen, and is captured by the disabled replace_tablet-itest, just like the test case I posted in the JIRA for alter table. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@97 PS7, Line 97: using master::ReplaceTabletRequestPB; > warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/client/client-internal.cc@98 PS7, Line 98: using master::ReplaceTabletResponsePB; > warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using- Ack http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101 PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest) > Could you follow the approach Todd took in commit 1c1d3ba to decide what va B-but the test always fails and is disabled, because of KUDU-2376. Can I add a TODO here instead? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@297 PS7, Line 297: // The tablet is gone because it was already replaced or deleted. > Should we still wait in this case? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@468 PS7, Line 468: LOG(INFO) << "Tables created: " << num_tables_created_.Load(); > What's the distribution of operations before and after your change? Curious The numbers vary a lot, but it's roughly the same before and after. I see (with a debug build) ~70 tables created, ~45 deleted or altered, and about 45 tablets replaced. There's 5 master restarts or so. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc File src/kudu/integration-tests/replace_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@63 PS7, Line 63: CHECK(tablet_id); > Not needed? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@76 PS7, Line 76: CHECK(proxy); > Not needed? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4324 PS7, Line 4324: new_tablet = scoped_refptr(new TabletInfo(table, GenerateId())); > How about: Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4346 PS7, Line 4346: // Ensure we abort the mutations if persisting the changes fails. > Don't the l_table and l_tablets destructors abort automatically? Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4358 PS7, Line 4358: actions.tablets_to_delete.push_back(old_tablet); > Hmm, this should be in tablets_to_update, no? We never actually delete entr Done http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376 PS7, Line 4376: table->AddRemoveTablets({new_tablet}, {old_tablet}); : l_tablets.Commit(); : l_table.Commit(); > Are you sure these three should happen with the lock_ held? They acquire lo Does the following order work? 1. table->AddRemoveTablets 2. l_tablets.Commit() l(lock_) { 3. Insert to tablet map } 4. l_table.Commit() We need tablet + table locks when adding and removing from the table, and we want to commit the tablet state before publishing the new tablet to the tablet map so uninitialized state won't be visible. However, we need to add the tablet to the tablet map before committing the changes to the table, since if we commit table changes before altering the tablet map there's a window where a GetTableLocations would see the new tablet as part of the table but no info about the new tablet would be available from GetTabletLocations. However, I'm worried about the interval between 2 + 4 where a GetTableLocations would see the old tablet as deleted. I wanted to ensure that GetTableLocations sees either a non-deleted old tablet or a new tablet. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4385 PS7, Line 4385: background_tasks_->Wake(); > This is for the newly created tablet, right? Can you update the comment jus Done http://gerrit.cloud
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 7: (11 comments) http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt File src/kudu/integration-tests/CMakeLists.txt: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/CMakeLists.txt@101 PS7, Line 101: ADD_KUDU_TEST(replace_tablet-itest) Could you follow the approach Todd took in commit 1c1d3ba to decide what value of PROCESSORS to assign to this new itest? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@297 PS7, Line 297: // The tablet is gone because it was already replaced or deleted. Should we still wait in this case? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/master-stress-test.cc@468 PS7, Line 468: LOG(INFO) << "Tables created: " << num_tables_created_.Load(); What's the distribution of operations before and after your change? Curious how tablet replacement has affected things. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc File src/kudu/integration-tests/replace_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@63 PS7, Line 63: CHECK(tablet_id); Not needed? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/integration-tests/replace_tablet-itest.cc@76 PS7, Line 76: CHECK(proxy); Not needed? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4324 PS7, Line 4324: new_tablet = scoped_refptr(new TabletInfo(table, GenerateId())); How about: new_tablet.reset(new TabletInfo(...)); http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4346 PS7, Line 4346: // Ensure we abort the mutations if persisting the changes fails. Don't the l_table and l_tablets destructors abort automatically? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4358 PS7, Line 4358: actions.tablets_to_delete.push_back(old_tablet); Hmm, this should be in tablets_to_update, no? We never actually delete entries from the catalog table. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4376 PS7, Line 4376: table->AddRemoveTablets({new_tablet}, {old_tablet}); : l_tablets.Commit(); : l_table.Commit(); Are you sure these three should happen with the lock_ held? They acquire locks of their own; Commit() may also sleep waiting for locks, which we shouldn't do with a spinlock held. IIRC other CatalogManager operations only hold lock_ to update global maps, then do the other steps without it. http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/master/catalog_manager.cc@4385 PS7, Line 4385: background_tasks_->Wake(); This is for the newly created tablet, right? Can you update the comment just above? http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h File src/kudu/tools/tool_action_common.h: http://gerrit.cloudera.org:8080/#/c/9393/7/src/kudu/tools/tool_action_common.h@155 PS7, Line 155:LeaderMasterProxy() = default; Nit: indentation. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 30 Mar 2018 04:37:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 7: > Build Failed Unrelated, pre-existing race. See KUDU-2058. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 21:45:28 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#7). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 506 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/7 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc File src/kudu/integration-tests/replace_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc@55 PS6, Line 55: ExternalMiniClusterITestBase(), > warning: initializer for base class 'kudu::ExternalMiniClusterITestBase' is Done http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/integration-tests/replace_tablet-itest.cc@121 PS6, Line 121: while (workload.rows_inserted() < 2 * kNumRows) { > warning: either cast from 'int' to 'long' is ineffective, or there is loss Done http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/9393/6/src/kudu/tools/tool_action_common.cc@554 PS6, Line 554: LeaderMasterProxy::LeaderMasterProxy(const client::sp::shared_ptr& client) : > warning: pass by value and use std::move [modernize-pass-by-value] Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 16:38:34 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 5: (13 comments) After discussion, I decided to take out the "quarantine" part of this patch: 1. It's a significant question how it would work with authz. 2. The quarantine table is by its nature not writable, and it's a question of if it should be. 3. There's overall a lot of things to consider about how such a setup would work. So now this patch will just seek to add a way to replace a tablet with a new one, deleting the data from the old one. http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@11 PS5, Line 11: is > Nit: omit this Done http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@19 PS5, Line 19: > Given that tablet_id is globally unique and table_name already identifies t n/a since the "quarantine" part is being removed for now. http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@24 PS5, Line 24: its > Nit: it's n/a http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26 PS5, Line 26: eventually this will stop > There may be some interesting fallout from this. In particular, I believe t n/a because the "quarantine" piece is being removed. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912 PS5, Line 912: // Partition should not have changed. > I've developed a bad habit of using "partition" and "tablet" interchangeabl n/a since the "swapping" of the old tablet to a quarantine table is being removed. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@919 PS5, Line 919: // In this case, 'tablets_by_key' will be empty but a RemoteTablet : // will exist in the cache. > I don't understand this case. AFAICT tablet replacement is "atomic", which n/a since the "swapping" of the old tablet to a quarantine table is being removed. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h@581 PS5, Line 581: Status ReplaceTablet(const std::string& tablet_id, master::ReplaceTabletResponsePB* resp); > It'd be nice to have testing of this new method outside of the CLI tests. B I added coverage in master-stress-test. I also added an integration test to test writing to a table as once of its tablets is replaced. This fails due to an existing bug in the client. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4302 PS5, Line 4302: // To be safe, we'll take the global catalog manager lock for the rest of this method. > While I appreciate the motivation to block out everything else, I don't thi Done http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4303 PS5, Line 4303: lock > Style nit: Use l (or l_foo) for lock guards. It especially helps in this ca Done http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@707 PS5, Line 707: recovered > Nit: "manually recovered", to make it clear that any actual recovery is out n/a since the "swapping" of the old tablet to a quarantine table is being removed. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715 PS5, Line 715: > Just to close this out, Dan (and Todd) reminded me that for scalars, option Thanks Adar. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc File src/kudu/tools/tool_action_common.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_common.cc@593 PS5, Line 593: > Nit: these were all grouped together, so either omit this empty line, or ad Done http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc File src/kudu/tools/tool_action_tablet.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/tools/tool_action_tablet.cc@537 PS5, Line 537: proxy.SyncRpc( > This returns a Status that should be checked. Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#6). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Due to KUDU-2376, the new test in replace_tablet-itest fails. Since this tool is meant to fix broken tables, I think it's still valuable to add this tool in the meantime. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/master-stress-test.cc A src/kudu/integration-tests/replace_tablet-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_tablet.cc 14 files changed, 502 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/6 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912 PS5, Line 912: // Partition should not have changed. > What is a partition except a pair of bounds, in other words, what does your I've developed a bad habit of using "partition" and "tablet" interchangeably. When a tablet is replaced, its partition (or partition boundaries) stay the same, but the tablet ID changes. That's what I'm getting at here; that the ID has changed. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715 PS5, Line 715: > Keep it optional, please. We should never introduce new required fields. Just to close this out, Dan (and Todd) reminded me that for scalars, optional provides a reasonable default value, one that would have been a valid value to begin with. For example, the default for "bytes" in C++ is the empty string, which would have been valid to send had the record been required too. Meaning, if the empty string is invalid input, it would need to be sanitized regardless of whether the field was optional or required. The default value for embedded messages is null and thus is an exception to this rule (i.e. you can't dereference 'foo' without first validating it via 'has_foo()'), but those are less prevalent than scalars, so the burden isn't as onerous. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 14 Mar 2018 00:06:59 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 5: (3 comments) I have reservations about how this patch introduces a new 'quarantined' table. In no other circumstances do we pollute the table namespace with auto-created names, and I don't think this is a good reason to start. This will become very painful down the line when begin to integrate with thirdparty metadata systems like Hive and Sentry which have no such concept as a quarantined table. I think it would be better to simply swap in the new tablet to the existing table, and provide the option to have the master not delete the tablet, if the data is still relevant. The data could then still be accessed and operated on via the CLI tools which take the tablet ID directly. Even then there are still authorization issues that will need to be figured out eventually. http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26 PS5, Line 26: eventually this will stop > Maybe be more specific; this behavior ends when the cached entries' TTL exp There may be some interesting fallout from this. In particular, I believe this is the first time in which the meta-cache TTL is significant for _positive_ cache locations . Right now I think the TTL only has observable effects for non-covered ranges. There's nothing actionable, but it's maybe something to think about and document. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912 PS5, Line 912: // Partition should not have changed. > Could you clarify this a bit: the partition's _boundaries_ should have not What is a partition except a pair of bounds, in other words, what does your suggestion clarify? http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715 PS5, Line 715: > I consider myself to be a protobuf luddite, but I don't understand why 'opt Keep it optional, please. We should never introduce new required fields. This is a great example, since you could later go back and add a variant of tablet replacement which just deletes the old version, and doesn't move it to a quarantined state. If we lived in a perfect world with perfect foresight we could use required, but we don't and we shouldn't. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Mar 2018 23:45:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 5: (14 comments) http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@11 PS5, Line 11: is Nit: omit this http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@19 PS5, Line 19: Given that tablet_id is globally unique and table_name already identifies the table for humans, what value does the inclusion of table_id add? http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@24 PS5, Line 24: its Nit: it's http://gerrit.cloudera.org:8080/#/c/9393/5//COMMIT_MSG@26 PS5, Line 26: eventually this will stop Maybe be more specific; this behavior ends when the cached entries' TTL expires? http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@912 PS5, Line 912: // Partition should not have changed. Could you clarify this a bit: the partition's _boundaries_ should have not changed, http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/client/meta_cache.cc@919 PS5, Line 919: // In this case, 'tablets_by_key' will be empty but a RemoteTablet : // will exist in the cache. I don't understand this case. AFAICT tablet replacement is "atomic", which means the client sees one of two states for a replaced tablet: 1. Tablet with ID foo from key x to y. 2. Tablet with ID bar from key x to y. Since the boundaries of the tablet haven't changed, doesn't this mean tablet_lower_bound is a valid way to look up the cached entry regardless of which state the client is exposed to? Looking at the code some more, it seems that following tablet replacement, 'remote' is going to be null so we'll skip all of this, erase the tablet's partition from tablets_by_key (L937), set up a new RemoteTablet for the replacement, and insert the tablet's partition again (L950). At no point is the new tablet exposed in tablets_by_id_ without any corresponding partitions in tablets_by_key. I must be missing something here... http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.h@581 PS5, Line 581: Status ReplaceTablet(const std::string& tablet_id, master::ReplaceTabletResponsePB* resp); It'd be nice to have testing of this new method outside of the CLI tests. Both in isolation as well as in a "stress" environment (i.e. ongoing concurrent data operations to the table and/or replaced tablet). It'd also be nice to test concurrent metadata operations: what happens if I delete a table and replace one of its tablets at the same time? What happens if I alter it concurrently? Etc. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4302 PS5, Line 4302: // To be safe, we'll take the global catalog manager lock for the rest of this method. While I appreciate the motivation to block out everything else, I don't think this is a good idea. First, the global catalog manager lock is a spinlock, which means anyone waiting on this lock will be spinning, and they'll be waiting for a long time because in ReplaceTablet the lock is held during the catalog write, which involves network and disk IO. Second, the catalog manager lock is never held while acquiring other catalog manager locks; breaking that invariant raises the possibility of deadlocks elsewhere. The only exceptions I'm aware of to this rule are during catalog manager initialization, at which point RPCs are largely rejected anyway. Can we treat this like any other DDL method and acquire the right locks at the right time? The background task does tablet replacement in CatalogManager::ProcessPendingAssignments; can that be used as a template for how to do it safely here? http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/catalog_manager.cc@4303 PS5, Line 4303: lock Style nit: Use l (or l_foo) for lock guards. It especially helps in this case, since lock and lock_ are so similar. http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@707 PS5, Line 707: recovered Nit: "manually recovered", to make it clear that any actual recovery is out of scope of Kudu's normal operations? http://gerrit.cloudera.org:8080/#/c/9393/5/src/kudu/master/master.proto@715 PS5, Line 715: I consider myself to be a protobuf luddite, but I don't understand why 'optional' is appropriate for the below fields. There's the phi
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG@23 PS4, Line 23: Note that, if the : tablet recovers quickly enough after being replaced, > This procedure would amount to scanning the new table and dumping the rows Did you mean to comment on the previous sentence? If so, the answer is yup. http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2405 PS4, Line 2405: replaced_tablet_id > nit: could we use "old"/"new" instead of "replaced" here? I'm conflicted be Done http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445 PS4, Line 2445: replica > nit: tablet Done http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445 PS4, Line 2445: plus one new one. > This is referring to the new tablet count, not the number of replicas, righ Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Mar 2018 06:23:22 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#5). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet is has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. The original tablet is moved to a new, specially-named table, named like replaced___ so that bad replicas are "quarantined" so they can be investigated, and if a replica of the original tablet can be recovered, its data can be scanned and recovered relatively easily. Note that, if the tablet recovers quickly enough after being replaced, its possible for clients with cache entries from before the replace to read the rows from the replaced tablet, but eventually this will stop. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/client/meta_cache.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_tablet.cc 11 files changed, 366 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/5 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/4//COMMIT_MSG@23 PS4, Line 23: Note that, if the : tablet recovers quickly enough after being replaced, This procedure would amount to scanning the new table and dumping the rows into the old one, right? http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2405 PS4, Line 2405: replaced_tablet_id nit: could we use "old"/"new" instead of "replaced" here? I'm conflicted because I think it's a clearer than "replaced"/"replacement"/"to-replace", even if it's not as accurate. http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445 PS4, Line 2445: replica nit: tablet http://gerrit.cloudera.org:8080/#/c/9393/4/src/kudu/tools/kudu-tool-test.cc@2445 PS4, Line 2445: plus one new one. This is referring to the new tablet count, not the number of replicas, right? -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 12 Mar 2018 19:11:46 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 4: > Build Failed > > http://jenkins.kudu.apache.org/job/kudu-gerrit/12378/ : FAILURE Another instance of https://issues.apache.org/jira/browse/KUDU-1736 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 12 Mar 2018 19:05:05 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#4). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet is has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. The original tablet is moved to a new, specially-named table, named like replaced___ so that bad replicas are "quarantined" so they can be investigated, and if a replica of the original tablet can be recovered, its data can be scanned and recovered relatively easily. Note that, if the tablet recovers quickly enough after being replaced, its possible for clients with cache entries from before the replace to read the rows from the replaced tablet, but eventually this will stop. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/client/meta_cache.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_tablet.cc 11 files changed, 367 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/4 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 2: (12 comments) http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@9 PS2, Line 9: replace_tablet > nit: how about unsafe_replace_tablet, given we're guaranteed "data loss" Done http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@13 PS2, Line 13: Before, the table would : have to be recreated to recover from a permanently lost tablet > Maybe we should add a check somewhere that we don't actually have a live ta I looked into this a bit and the master doesn't have that much information about how healthy a tablet is, and isn't the authority on it, especially if the tablet is unavailable but not unrecoverable. So I think having the scary "unsafe" in front of the name suffices here; adding a --force flag would mean in practice we'd always have to specify --force and it wouldn't be meaningful. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@97 PS2, Line 97: using master::ReplaceTabletRequestPB; > warning: using decl 'ReplaceTabletRequestPB' is unused [misc-unused-using-d Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/client/client-internal.cc@98 PS2, Line 98: using master::ReplaceTabletResponsePB; > warning: using decl 'ReplaceTabletResponsePB' is unused [misc-unused-using- Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577 PS2, Line 577: , causing all of its data : // to be permanently lost. > Would it be feasible to turn the replaced tablet into a new single-tablet t Done, but it involved making an exception in the MetaCache lookup code because having a tablet switch tables violated some assumptions. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249 PS2, Line 4249: replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED, > if we make it DELETED it will also get removed off of the tablet servers. w Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@712 PS2, Line 712: there an e > nit: there is an Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792 PS2, Line 792: option (kudu.rpc.authz_method) = "AuthorizeService"; > we don't want AuthorizeSuperUser here? Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517 PS2, Line 517: LOG(INFO) << "ReplaceTablet: received request to replace tablet " << req->tablet_id(); > worth logging the requestor info too Done http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2371 PS2, Line 2371: const int kNumTservers = 3; : const int kNumTablets = 3; > Maybe test with multiple masters too? This would still work, right? It's a bit of work to change ToolTest to support that and there's nothing really different about multimaster, so I hope you don't mind if I pass on it. (Also my manual testing was with multimaster). http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2392 PS2, Line 2392: // Replace the tablet. This isn't the use case for the tool, but it should work. > Should it? Maybe we should safeguard against users shooting themselves in t The master can think a tablet is healthy when it's busted, as we've seen, and it's not the authority on the health of a tablet, so I think we ought to trust the users to know what they are doing or be scared of the "unsafe" at the beginning of the action name. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2490 PS2, Line 2490: workload.StopAndJoin(); > Another thought: maybe use the cluster verifier after this to make sure thi Done -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Revie
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#3). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet unsafe_replace_tablet` that can be used to replace a tablet with another having the same partition information. This tool is useful when a tablet is has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. The original tablet is moved to a new, specially-named table, named like replaced___ so that bad replicas are "quarantined" so they can be investigated, and if a replica of the original tablet can be recovered, its data can be scanned and recovered relatively easily. Note that, if the tablet recovers quickly enough after being replaced, its possible for clients with cache entries from before the replace to read the rows from the replaced tablet, but eventually this will stop. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/client/meta_cache.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_tablet.cc 11 files changed, 367 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/3 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 3 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.h@577 PS2, Line 577: , causing all of its data : // to be permanently lost. Would it be feasible to turn the replaced tablet into a new single-tablet table, so that if you eventually were able to repair it, you could at least get the data back out of it? http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/catalog_manager.cc@4249 PS2, Line 4249: replaced_tablet->mutable_metadata()->mutable_dirty()->set_state(SysTabletsEntryPB::DELETED, if we make it DELETED it will also get removed off of the tablet servers. we might want to keep it there for some investigation or attempts to recover the data. Is there a way we can accomplish that? http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@792 PS2, Line 792: option (kudu.rpc.authz_method) = "AuthorizeService"; we don't want AuthorizeSuperUser here? http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc File src/kudu/master/master_service.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master_service.cc@517 PS2, Line 517: LOG(INFO) << "ReplaceTablet: received request to replace tablet " << req->tablet_id(); worth logging the requestor info too -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Feb 2018 23:19:10 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2490 PS2, Line 2490: workload.StopAndJoin(); Another thought: maybe use the cluster verifier after this to make sure things look spick and span. I'd imagine ksck after this point would be immediately healthy, right? -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Feb 2018 17:43:29 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 2: (5 comments) Mostly some nits and high-level points http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@9 PS2, Line 9: replace_tablet nit: how about unsafe_replace_tablet, given we're guaranteed "data loss" http://gerrit.cloudera.org:8080/#/c/9393/2//COMMIT_MSG@13 PS2, Line 13: Before, the table would : have to be recreated to recover from a permanently lost tablet Maybe we should add a check somewhere that we don't actually have a live tablet somewhere. And if we do (or think we do), block the run with a `--force` flag or somesuch. http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto File src/kudu/master/master.proto: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/master/master.proto@712 PS2, Line 712: there an e nit: there is an http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc File src/kudu/tools/kudu-tool-test.cc: http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2371 PS2, Line 2371: const int kNumTservers = 3; : const int kNumTablets = 3; Maybe test with multiple masters too? This would still work, right? http://gerrit.cloudera.org:8080/#/c/9393/2/src/kudu/tools/kudu-tool-test.cc@2392 PS2, Line 2392: // Replace the tablet. This isn't the use case for the tool, but it should work. Should it? Maybe we should safeguard against users shooting themselves in the foot. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Feb 2018 00:25:31 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9393 to look at the new patch set (#2). Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet replace_tablet` that can be used to replace a tablet with another having the same partition information. Any data in the replaced tablet is lost. This tool is useful when a tablet is has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_tablet.cc 10 files changed, 318 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/2 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 2 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9393 ) Change subject: KUDU-2290: Tool to re-create a tablet .. Patch Set 1: I think tidy bot is wrong about the using decls. -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 22 Feb 2018 19:32:31 + Gerrit-HasComments: No
[kudu-CR] KUDU-2290: Tool to re-create a tablet
Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9393 Change subject: KUDU-2290: Tool to re-create a tablet .. KUDU-2290: Tool to re-create a tablet This adds a tool `kudu tablet replace_tablet` that can be used to replace a tablet with another having the same partition information. Any data in the replaced tablet is lost. This tool is useful when a tablet is has permanently lost all replicas because it can repair the tablet's table. Before, the table would have to be recreated to recover from a permanently lost tablet. Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c --- M src/kudu/client/client-internal.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master.proto M src/kudu/master/master_service.cc M src/kudu/master/master_service.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_tablet.cc 10 files changed, 312 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/9393/1 -- To view, visit http://gerrit.cloudera.org:8080/9393 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifbefbde68e3ca724f04efe0426a3906e5c33ed3c Gerrit-Change-Number: 9393 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley