[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 11: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@321 PS9, Line 321: Status s = sys_catalog.CreateNew(master.fs_manager()); > It's a good point, similar to Bankim's mention of cleaning up files. I'll l Yep, I agree -- adding TODO now and addressing this in a separate changelist is the way to go. This changelist as is brings in some value already. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Wed, 11 Aug 2021 18:26:11 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Reviewed-on: http://gerrit.cloudera.org:8080/9490 Tested-by: Kudu Jenkins Reviewed-by: Bankim Bhavsar --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/schema.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 12 files changed, 852 insertions(+), 14 deletions(-) Approvals: Kudu Jenkins: Verified Bankim Bhavsar: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 11 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 10: Code-Review+2 Okay with follow-up improvements. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Wed, 11 Aug 2021 17:29:01 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 10: (6 comments) http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/kudu-admin-test.cc@3133 PS9, Line 3133: we c > nit: drop Done http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h File src/kudu/tools/master_rebuilder.h: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h@35 PS9, Line 35: Object for a > nit: this is not a POD type since its non-static members are not PODs. If Indeed. Thanks for the pointer! http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h@81 PS9, Line 81: const RebuildReport& GetRebuildReport() const; > nit: make this method 'const'? Done http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@63 PS9, Line 63: using kudu::master::SysCatalogTable; : using kudu::master::SysTablesEntryPB; : using kudu::master::SysTabletsEntryPB; : using kudu::master::TableInfo; : using kudu::master::TableMetadataLock; : using kudu::master::TabletInfo; : using kudu::master::TabletMetadataGroupLock; : using kudu::master::TabletMetadataLock; : using kudu::tserver::ListTabletsResponsePB; : using std::string; : using std::vector; : using strings::Substitute; : : namespace kudu { > nit: move these using declarations out from the kudu namespace Done http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@155 PS9, Line 155: assembled all > Would IllegalState/Incomplete/ServiceUnavailable be better choices here? Done http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@321 PS9, Line 321: Status s = sys_catalog.CreateNew(master.fs_manager()); > As an option, would it be a safer approach to create the whole master's fil It's a good point, similar to Bankim's mention of cleaning up files. I'll leave a TODO here. I don't expect failures to run the tool to show up too frequently, so I'd opt to defer that until we see it being an issue. I can address it in a follow up if you feel strongly about it. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Wed, 11 Aug 2021 06:50:37 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#10) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/schema.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 12 files changed, 852 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/10 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 10 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 9: (7 comments) Just a quick initial glance. http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/kudu-admin-test.cc@3020 PS9, Line 3020: shouldnt nit: shouldn't or should not http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/kudu-admin-test.cc@3133 PS9, Line 3133: from nit: drop http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h File src/kudu/tools/master_rebuilder.h: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h@35 PS9, Line 35: POD object f nit: this is not a POD type since its non-static members are not PODs. If in doubt, you can check that using the template struct std::is_pod trait: https://en.cppreference.com/w/cpp/types/is_pod http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.h@81 PS9, Line 81: const RebuildReport& GetRebuildReport(); nit: make this method 'const'? http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@63 PS9, Line 63: using master::Master; : using master::MasterOptions; : using master::SysCatalogTable; : using master::SysTablesEntryPB; : using master::SysTabletsEntryPB; : using master::TableInfo; : using master::TableMetadataLock; : using master::TabletInfo; : using master::TabletMetadataGroupLock; : using master::TabletMetadataLock; : using std::string; : using std::vector; : using strings::Substitute; : using tserver::ListTabletsResponsePB; nit: move these using declarations out from the kudu namespace http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@155 PS9, Line 155: InvalidArgument Would IllegalState/Incomplete/ServiceUnavailable be better choices here? http://gerrit.cloudera.org:8080/#/c/9490/9/src/kudu/tools/master_rebuilder.cc@321 PS9, Line 321: Status s = sys_catalog.CreateNew(master.fs_manager()); As an option, would it be a safer approach to create the whole master's filesystem data directory structure in some temporary directory, perform all the necessary steps to populate the catalog with the metadata, and only in the very end move the directory into the specified destination? That way the critical interval of placing the data into the destination might be shortened. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Tue, 10 Aug 2021 21:28:48 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#9) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/schema.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 12 files changed, 852 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/9 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 9 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@302 PS6, Line 302: metadata.partition().DebugString()); > Missed this. I'll update the patch with this. This ended up being pretty tricky to implement, and not worth the complexity IMO, given the infrequency I expect this to be used. Added a test for running the tool on non-empty directories though. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Mon, 09 Aug 2021 23:44:22 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#8) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/schema.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 12 files changed, 850 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/8 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 8 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Mon, 09 Aug 2021 19:42:15 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 7: (6 comments) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/kudu-admin-test.cc@3001 PS5, Line 3001: NO_FATALS(MakeTestTable(kTable, /*num_rows*/10, /*num_repl > Can we also run the cluster verifier before deleting the recreated table? Done http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@115 PS6, Line 115: state_pb = ta > Nit: It'd be good to give a name that describes the action like CreateOrChe Done http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@209 PS6, Line 209: > Nit:Not this change but this looks like Java instead of using operator over Done http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@302 PS6, Line 302: return Status::Corruption("inconsistent replica: partition mismatch"); > Should we also try to clean up the created tables in case of any errors whi Missed this. I'll update the patch with this. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc@927 PS5, Line 927: " table to --default_num_replicas instead.\n" > I'd add a note about missing cryptographic keys stored in the catalog table Done http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc@931 PS5, Line 931: " - Table metadata like comments, owners, and configurations are not stored on\n" > I know we have a way to enforce permissions in the RPC layer. Could we rest If fine-grained authorization is enforced, only the admin can run this, since it relies on ListTablets. AuthorizeListTablets() only permits super-users. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Mon, 09 Aug 2021 18:55:41 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#7) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/common/partition.cc M src/kudu/common/partition.h M src/kudu/common/schema.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 12 files changed, 820 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/7 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 7 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc@927 PS5, Line 927: " a very large number.\n" I'd add a note about missing cryptographic keys stored in the catalog table (IPKI, TSK), and that they'll need to restart all tservers and clients if it's a secure environment. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Tue, 03 Aug 2021 12:52:49 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: Code-Review+1 (5 comments) LGTM. Minor comments. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/kudu-admin-test.cc@3001 PS5, Line 3001: NO_FATALS(ClusterVerifier(cluster_.get()).CheckCluster()); Can we also run the cluster verifier before deleting the recreated table? http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@115 PS6, Line 115: ProcessReplica Nit: It'd be good to give a name that describes the action like CreateOrCheckTablet() http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@209 PS6, Line 209: .Equals Nit:Not this change but this looks like Java instead of using operator overloading with == in C++. http://gerrit.cloudera.org:8080/#/c/9490/6/src/kudu/tools/master_rebuilder.cc@302 PS6, Line 302: master.Shutdown(); Should we also try to clean up the created tables in case of any errors while writing to sys catalog? http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc File src/kudu/tools/tool_action_master.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/tool_action_master.cc@931 PS5, Line 931: "possibility of recovering the original masters, and you know what you\n" I know we have a way to enforce permissions in the RPC layer. Could we restrict this CLI to admin users, if possible? -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Mon, 02 Aug 2021 23:17:12 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@175 PS5, Line 175: > Wouldn't this brick single-replica tables? It would think it's under-replic Ah that's fair. It should actually be max(FLAGS_min_num_replicas, odd_reported_replicas). Still, let's do that in a follow-up. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Fri, 30 Jul 2021 17:05:35 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@175 PS5, Line 175: > It's good, tables in the cluster may have different RFs, FLAGS_default_num_ Wouldn't this brick single-replica tables? It would think it's under-replicated with no majority, so it couldn't re-replicate to 3 automatically, or am I missing something? -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Fri, 30 Jul 2021 13:13:59 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: Code-Review+1 (3 comments) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@98 PS5, Line 98: for (const auto& tserver_addr : tserver_addrs_) { > This port is only used if none is provided in 'tserver_addr'. Ack http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@175 PS5, Line 175: > Good point. I suppose there's also room for a heuristic like: It's good, tables in the cluster may have different RFs, FLAGS_default_num_replicas may be not right for all tables. We can implement it in the future. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@219 PS5, Line 219: RETURN_NOT_OK(PartitionSchema::FromPB(metadata.partition_schema(), > It was removed in this patch https://gerrit.cloudera.org/c/17558/ Ack -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 29 Jul 2021 03:44:54 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has removed Adar Lieber-Dembo from this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Removed reviewer Adar Lieber-Dembo. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has removed a vote on this change. Change subject: [tools] Add a tool to recover master data from tablet servers .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 6: Verified+1 (6 comments) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@87 PS5, Line 87: } > Use CHECK_EQ? Done http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@92 PS5, Line 92: } > CHECK_EQ Done http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@98 PS5, Line 98: for (const auto& tserver_addr : tserver_addrs_) { > Should use a user specified port. This port is only used if none is provided in 'tserver_addr'. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@175 PS5, Line 175: > Use FLAGS_default_num_replicas instead? Good point. I suppose there's also room for a heuristic like: let odd_reported_replicas = the number of replicas reported by tablet servers, + 1 if the number is even let new_rf = max(FLAGS_default_num_replicas, odd_reported_replicas) That way, there's no possibility that the rebuilding of the masters would result in the deletion of any replicas. Even odd_reported_replicas would suffice here. I'll use FLAGS_default_num_replicas for now, but I'm curious what you think of the heuristic. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@219 PS5, Line 219: RETURN_NOT_OK(PartitionSchema::FromPB(metadata.partition_schema(), > PartitionSchema has a function Equals too, use it instead? It was removed in this patch https://gerrit.cloudera.org/c/17558/ I'll add a != operator though so this is less surprising syntax. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@275 PS5, Line 275: // We do not check the schemas and partition schemas match because they are > Use Equals? Ack -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai Gerrit-Comment-Date: Thu, 29 Jul 2021 02:56:23 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#6) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/common/partition.h M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 10 files changed, 668 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/6 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 6 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 5: This tool must be run on Kudu master, better to add some check. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai <405403...@qq.com> Gerrit-Comment-Date: Sat, 17 Jul 2021 07:30:00 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 5: > Patch Set 5: > > Just did a quick rebase here since I need the tool for a case and seems like > some older thirdparty dependencies aren't downloading properly. I'll probably > try to merge this in the near future. I need this tool too, hope it could be merged quickly. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai <405403...@qq.com> Gerrit-Comment-Date: Sat, 17 Jul 2021 07:24:37 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Yingchun Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@87 PS5, Line 87: CHECK(state_ == State::DONE); Use CHECK_EQ? http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@92 PS5, Line 92: CHECK(state_ == State::NOT_DONE); CHECK_EQ http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@98 PS5, Line 98: Status s = BuildProxy(tserver_addr, tserver::TabletServer::kDefaultPort, ).AndThen([&]() { Should use a user specified port. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@175 PS5, Line 175: table_metadata->set_num_replicas(3); Use FLAGS_default_num_replicas instead? We have some deployments which have only 1 tserver and num_replicas is 1, table rebuilt can't be health if RF is 3. http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@219 PS5, Line 219: if (!(pschema_from_table == pschema_from_replica)) { PartitionSchema has a function Equals too, use it instead? http://gerrit.cloudera.org:8080/#/c/9490/5/src/kudu/tools/master_rebuilder.cc@275 PS5, Line 275: if (!(partition_from_tablet == partition_from_replica)) { Use Equals? -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Reviewer: Yingchun Lai <405403...@qq.com> Gerrit-Comment-Date: Sat, 17 Jul 2021 07:18:43 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 5: Just did a quick rebase here since I need the tool for a case and seems like some older thirdparty dependencies aren't downloading properly. I'll probably try to merge this in the near future. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 14 Jul 2021 06:21:13 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Andrew Wong has uploaded a new patch set (#5) to the change originally created by Will Berkeley. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the help description for the new tool. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 9 files changed, 651 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/5 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 5 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Lieber-Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: Oh also this needs to be tested against a Kerberized cluster. I think it would work fine (with a full cluster restart, which is required for the default level of security anyway) but I'm not sure. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 27 Nov 2018 23:26:40 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: I'm looking at this again because I actually ran into a case where this was useful. Let me jot down my thoughts right now about Adar's original objections to how this tool works: 1) By calling Master::Init(), do we bind to some ports? Will we start responding to incoming RPCs? A CLI tool shouldn't do either. Yes, we do. This should be fixed. 2) By calling SysCatalogTable::CreateNew, you end up baking particulars about _this_ process into the on-disk data. For example (maybe the only example), a cmeta file will be generated with the first RPC address of _this process_ (well, of Master::Init I guess) in it. That seems like a bad idea since the CLI tool and the actual master are likely to run differently (i.e. different UNIX users, maybe different machines too if the only goal here is to generate some on-disk data). This is fine. The idea of the tool is that it makes the data, and it definitely will require additional recovery work if e.g. recovering from a failure of all masters of a multi-master setup. The tool should be run as the kudu user otherwise the files produced will have to be modified for a regular master process to use them anyway. It shouldn't be run remotely because it uses the wal and data dir structure that the new master will have. It should in-place reassemble a starting point for a new master. Experience has shown it does work fine this way in a real cluster. So here are some avenues to explore: 1) Continuing the thread of "reconstruction via generating physical on-disk data directly", could we instantiate a Tablet, load the master's schema, perform tablet writes directly to the Tablet, then Flush() at the end? Then there's no TabletReplica, no cmeta, no WAL, etc. TabletHarness is a test-only class that you may be able to reuse. The big question is whether a master could load such a tablet afterwards. For one, we have to create at least a WAL for the tablet replica to start later. I think this is a lot of work for no gain compared to modifying the current approach not to bind to ports or potentially process RPCs. 2) Or, if we go in the other direction, could we start an empty master and "import" reconstructed metadata into it via RPC? This is what this tool is doing, basically. The master process just lives inside the tool. The benefit is we don't need to introduce new RPC endpoints to the master to shovel the data tservers -> tool -> master. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 27 Nov 2018 23:25:30 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: Quick update on this: I'm planning on writing up a quick comparison between different methods so we can discuss them and figure out the best way forward. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 21:38:29 + Gerrit-HasComments: No
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc@289 PS4, Line 289: // Start up the master and syscatalog. > I think it's worth exploring both, but don't bother reimplementing anything Yes, the metadata import/export is how I'm thinking about it as well. For the master tablet, I think a protobuf container would work well for the export format if we were operating at a DDL layer. Another thing to throw in the discussion is how we import/export authentication and authorization information. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Mar 2018 19:10:37 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: (1 comment) > Patch Set 4: > > (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc@289 PS4, Line 289: // Start up the master and syscatalog. > > 1) By calling Master::Init(), do we bind to some ports? I think it's worth exploring both, but don't bother reimplementing anything yet. I'm also curious to hear what approaches other people might have. One other thing I like about #2 is that it could dovetail nicely with backup/restore metadata functionality. You can imagine master endpoints that import/export the raw metadata into a file format of some kind, and then the rebuilder is really just about "use info from the tservers to build a master metadata dump". But, I don't know whether the master's import/export should operate at the logical (DDL) level, or whether it's just a generic tablet import/export but for the singleton master tablet. -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Mar 2018 18:24:29 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc@289 PS4, Line 289: // Start up the master and syscatalog. > 1) By calling Master::Init(), do we bind to some ports? Yes. Master::Init() calls KuduServer::Init() which calls ServerBase::Init(), which builds the messenger and binds to ports. We also start the webserver though that can be disabled with -webserver_enabled=no. > Will we start responding to incoming RPCs? I don't think so. Doesn't look like it from test logs. It might respond to some base RPCs like Ping? > A CLI tool shouldn't do either. Agreed. > 2) By calling SysCatalogTable::CreateNew, you end up baking > particulars about _this_ process into the on-disk data. I imagined that as par for the course. In my head, this was used to recreate master data in situ so then a real master process could be started on top of it. Part of the running the tool correctly would be running it as the correct user, as it is for some other tools. To get back a distributed master, one would migrate from the reconstructed single master to multimaster. Alternatively, one could use a remote reconstructed master to bootstrap "real" masters by migrating to a distributed master and then dropping the reconstructed master. > 1) Continuing the thread of "reconstruction via generating physical on-disk > data directly", could we instantiate a Tablet, load the master's schema, > perform tablet writes directly to the Tablet, then Flush() at the end? I'm going to evaluate how feasible this is more carefully, but my first impression is that while this would be the ideal way to do things, doing it with a half-cocked pop-up process gets us what we need with relatively little work, while all the extra work earns a cleaner implementation with no more capability. Also, don't we require WAL segments whenever we find data? I think if a Kudu process found just a tablet and nothing else it would not function right, and it'd take some rejiggering to make it work, or fool it into working. > 2) Or, if we go in the other direction, could we start an empty > master and "import" reconstructed metadata into it via RPC? This could work if we added a couple of RPCs, like "AdoptTable" and "AdoptTablet" that accepted table and tablet metadata and wrote it to the syscatalog. The basic operation of the tool would be the same except it would collect the data into the PBs and then send it via RPC to be written, rather than controlling a pop-up master and writing to its syscatalog. Overall I think 1 is more work than it is worth compared to a quicker and dirtier solution, but 2 might be nice. Would it satisfy you if I tried an implementation in the style of 2? -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Fri, 23 Mar 2018 05:53:31 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9490 ) Change subject: [tools] Add a tool to recover master data from tablet servers .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/9490/4/src/kudu/tools/master_rebuilder.cc@289 PS4, Line 289: // Start up the master and syscatalog. When thinking about the layers involved, I agree that recreating the master metadata via SysCatalogTable::Write seems preferable. Any lower and you'd need to start your own TabletReplica and manage the master schema yourself. Any higher and you end up starting a full master process. But, that's actually what you're doing here, and it's concerning for a few reasons: 1) By calling Master::Init(), do we bind to some ports? Will we start responding to incoming RPCs? A CLI tool shouldn't do either. 2) By calling SysCatalogTable::CreateNew, you end up baking particulars about _this_ process into the on-disk data. For example (maybe the only example), a cmeta file will be generated with the first RPC address of _this process_ (well, of Master::Init I guess) in it. That seems like a bad idea since the CLI tool and the actual master are likely to run differently (i.e. different UNIX users, maybe different machines too if the only goal here is to generate some on-disk data). So here are some avenues to explore: 1) Continuing the thread of "reconstruction via generating physical on-disk data directly", could we instantiate a Tablet, load the master's schema, perform tablet writes directly to the Tablet, then Flush() at the end? Then there's no TabletReplica, no cmeta, no WAL, etc. TabletHarness is a test-only class that you may be able to reuse. The big question is whether a master could load such a tablet afterwards. 2) Or, if we go in the other direction, could we start an empty master and "import" reconstructed metadata into it via RPC? -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Thu, 22 Mar 2018 23:08:56 + Gerrit-HasComments: Yes
[kudu-CR] [tools] Add a tool to recover master data from tablet servers
Hello Tidy Bot, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9490 to look at the new patch set (#4). Change subject: [tools] Add a tool to recover master data from tablet servers .. [tools] Add a tool to recover master data from tablet servers This adds a tool that attempts to recover master metadata from tablet servers using ListTablets, writing the metadata to a syscatalog table that a new master process can use to make the cluster operational. It has several limitations. See the comment on MasterRebuilder. Note that it should be possible to fix some limitations by giving more master metadata to tablet servers. The advantage of the tool as-is is that it should work on all versions of Kudu since 1.0. Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 --- M src/kudu/master/catalog_manager.h M src/kudu/tools/CMakeLists.txt M src/kudu/tools/kudu-admin-test.cc A src/kudu/tools/master_rebuilder.cc A src/kudu/tools/master_rebuilder.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_master.cc M src/kudu/tools/tool_action_remote_replica.cc 9 files changed, 653 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/9490/4 -- To view, visit http://gerrit.cloudera.org:8080/9490 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If29e421d466a531ebad72e281ae27e74e458f8c6 Gerrit-Change-Number: 9490 Gerrit-PatchSet: 4 Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot