Yan-Daojiang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23844 )

Change subject: [tools] Add --preserve_table_ids flag to unsafe_rebuild
......................................................................


Patch Set 6:

(3 comments)

> Patch Set 5:
>
> (3 comments)
>
> Overall looks ok to me.
> Just a couple of comments.

Thanks a lot for your time and for the insightful feedback.
Looking forward to your thoughts when you get a chance to review this again.

http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc
File src/kudu/tools/kudu-admin-test.cc:

http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc@3868
PS5, Line 3868: NO_FATALS(BuildAndStart
> Here and elsewhere: Any reason replication factor is 1? Can't it be default
Done. Changed replication factor from 1 to 3 (default value) in all four tests.


http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/kudu-admin-test.cc@4140
PS5, Line 4140:       << "Scan should fail due to table_id mismatch, but it 
succeeded";
              :   ASSERT_STR_CONTAINS(scan_status.ToString(), "Not authorized")
> Is Timeout status for eventual failure of scan and remote error for interme
You're absolutely right. The test result confirms your first scenario:

- **Intermediate retries**: Each scan RPC returns `Remote error: Not authorized`
- **Final status**: `Timed out` (after 20332 retry attempts!)

I agree that `NotAuthorized` should be a non-retriable error. The current 
behavior where the scanner keeps retrying an authorization failure seems 
suboptimal.
This could be a potential improvement for the scanner's error handling 
logic?but it's not strongly relevant to the functionality of this patch.

However, regardless of that, this does confirm that there is indeed such an 
issue — the original rebuild behavior has scan-related problems when 
tserver_enforce_access_control=true.

Here is the log captured during the test:
"""
W20260119 16:45:53.305439 125286 scanner-internal.cc:459] Time spent opening 
tablet: real 15.001s       user 2.321s     sys 0.138s
I20260119 16:45:53.305508 125286 kudu-admin-test.cc:4147] Scan status: Timed 
out: exceeded configured scan timeout of 15.000s: after 20332 scan attempts: 
Scan RPC to 127.122.89.129:33847 timed out after -0.000s (SENT): Remote error: 
Not authorized: authorization token is for the wrong table ID
I20260119 16:45:53.305528 125286 kudu-admin-test.cc:4153] As expected, scan 
failed after rebuild without --preserve_table_ids: Timed out: exceeded 
configured scan timeout of 15.000s: after 20332 scan attempts: Scan RPC to 
127.122.89.129:33847 timed out after -0.000s (SENT): Remote error: Not 
authorized: authorization token is for the wrong table ID
"""


http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/master_rebuilder.cc
File src/kudu/tools/master_rebuilder.cc:

http://gerrit.cloudera.org:8080/#/c/23844/5/src/kudu/tools/master_rebuilder.cc@64
PS5, Line 64: tem validates table IDs. It is also useful for maintaining 
compatibility "
            :             "with external systems (e.g., Impala, HMS) that 
reference tables
> IIRC, if master rebuild happens and a new table ID is generated for a table
Thanks for the thorough review! Here are my responses:

1. **Impala impact**: I performed manual testing and discovered a finding:

   When `--tserver_enforce_access_control` is enabled, `INVALIDATE METADATA`
   CANNOT fix the access issue after a master rebuild without 
`--preserve_table_ids`!

   The reason is that the authz token signature is based on the table ID. After 
rebuild:
   - Master has a new table ID
   - TServer tablets still have the old table ID in their metadata
   - TServer rejects the token because it expects the old ID

   This means without `--preserve_table_ids`, even impala fetch the latest ID 
from kudu cluster, the scanning problem still persists.

2. **Flag description**: Done. Updated the description to clarify this flag is 
specifically
   for the `kudu master unsafe_rebuild` command.

3. **Other operations that change table IDs**: Confirmed. I verified that table 
IDs are
   only generated in two places:
   - `CatalogManager::CreateTableInfo()` - for normal table creation 
(unaffected)
   - `MasterRebuilder::CreateTable()` - for master rebuild (affected by flag)

   The `--preserve_table_ids` flag is defined and only used in 
master_rebuilder.cc, so it
   has no impact on normal table creation or any other operations.

4. **Manual Impala test**: Completed (see point 1). Will also test the positive 
case
   (with `--preserve_table_ids`) and document in a Google Doc [1].


[1] 
https://docs.google.com/document/d/143errNNqzOTbFRQ_c84olbzbLg9wobIRWvmf5hPo5Lw/edit?usp=sharing



--
To view, visit http://gerrit.cloudera.org:8080/23844
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ae4353564922312d646f0323271d804e32e3b0d
Gerrit-Change-Number: 23844
Gerrit-PatchSet: 6
Gerrit-Owner: Yan-Daojiang <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Gabriella Lotz <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Yan-Daojiang <[email protected]>
Gerrit-Comment-Date: Fri, 23 Jan 2026 09:22:38 +0000
Gerrit-HasComments: Yes

Reply via email to