Yan-Daojiang has posted comments on this change. ( http://gerrit.cloudera.org:8080/23844 )
Change subject: [tools] Add --preserve_table_ids flag to unsafe_rebuild ...................................................................... Patch Set 6: (3 comments) > Patch Set 5: > > (3 comments) > > Overall looks ok to me. > Just a couple of comments. Thanks a lot for your time and for the insightful feedback. Looking forward to your thoughts when you get a chance to review this again. http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc File src/kudu/tools/kudu-admin-test.cc: http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc@3868 PS5, Line 3868: NO_FATALS(BuildAndStart > Here and elsewhere: Any reason replication factor is 1? Can't it be default Done. Changed replication factor from 1 to 3 (default value) in all four tests. http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc@4140 PS5, Line 4140: << "Scan should fail due to table_id mismatch, but it succeeded"; : ASSERT_STR_CONTAINS(scan_status.ToString(), "Not authorized") > Is Timeout status for eventual failure of scan and remote error for interme You're absolutely right. The test result confirms your first scenario: - **Intermediate retries**: Each scan RPC returns `Remote error: Not authorized` - **Final status**: `Timed out` (after 20332 retry attempts!) I agree that `NotAuthorized` should be a non-retriable error. The current behavior where the scanner keeps retrying an authorization failure seems suboptimal. This could be a potential improvement for the scanner's error handling logic?but it's not strongly relevant to the functionality of this patch. However, regardless of that, this does confirm that there is indeed such an issue — the original rebuild behavior has scan-related problems when tserver_enforce_access_control=true. Here is the log captured during the test: """ W20260119 16:45:53.305439 125286 scanner-internal.cc:459] Time spent opening tablet: real 15.001s user 2.321s sys 0.138s I20260119 16:45:53.305508 125286 kudu-admin-test.cc:4147] Scan status: Timed out: exceeded configured scan timeout of 15.000s: after 20332 scan attempts: Scan RPC to 127.122.89.129:33847 timed out after -0.000s (SENT): Remote error: Not authorized: authorization token is for the wrong table ID I20260119 16:45:53.305528 125286 kudu-admin-test.cc:4153] As expected, scan failed after rebuild without --preserve_table_ids: Timed out: exceeded configured scan timeout of 15.000s: after 20332 scan attempts: Scan RPC to 127.122.89.129:33847 timed out after -0.000s (SENT): Remote error: Not authorized: authorization token is for the wrong table ID """ http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/master_rebuilder.cc File src/kudu/tools/master_rebuilder.cc: http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/master_rebuilder.cc@64 PS5, Line 64: tem validates table IDs. It is also useful for maintaining compatibility " : "with external systems (e.g., Impala, HMS) that reference tables > IIRC, if master rebuild happens and a new table ID is generated for a table Thanks for the thorough review! Here are my responses: 1. **Impala impact**: I performed manual testing and discovered a finding: When `--tserver_enforce_access_control` is enabled, `INVALIDATE METADATA` CANNOT fix the access issue after a master rebuild without `--preserve_table_ids`! The reason is that the authz token signature is based on the table ID. After rebuild: - Master has a new table ID - TServer tablets still have the old table ID in their metadata - TServer rejects the token because it expects the old ID This means without `--preserve_table_ids`, even impala fetch the latest ID from kudu cluster, the scanning problem still persists. 2. **Flag description**: Done. Updated the description to clarify this flag is specifically for the `kudu master unsafe_rebuild` command. 3. **Other operations that change table IDs**: Confirmed. I verified that table IDs are only generated in two places: - `CatalogManager::CreateTableInfo()` - for normal table creation (unaffected) - `MasterRebuilder::CreateTable()` - for master rebuild (affected by flag) The `--preserve_table_ids` flag is defined and only used in master_rebuilder.cc, so it has no impact on normal table creation or any other operations. 4. **Manual Impala test**: Completed (see point 1). Will also test the positive case (with `--preserve_table_ids`) and document in a Google Doc [1]. [1] https://docs.google.com/document/d/143errNNqzOTbFRQ_c84olbzbLg9wobIRWvmf5hPo5Lw/edit?usp=sharing -- To view, visit http://gerrit.cloudera.org:8080/23844 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ae4353564922312d646f0323271d804e32e3b0d Gerrit-Change-Number: 23844 Gerrit-PatchSet: 6 Gerrit-Owner: Yan-Daojiang <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Ashwani Raina <[email protected]> Gerrit-Reviewer: Gabriella Lotz <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Yan-Daojiang <[email protected]> Gerrit-Comment-Date: Fri, 23 Jan 2026 09:22:38 +0000 Gerrit-HasComments: Yes
