CRZbulabula opened a new pull request, #17908:
URL: https://github.com/apache/iotdb/pull/17908

   ## Problem
   
   When the ConfigNode leader is **gracefully** stopped — or otherwise loses 
leadership — while `AddRegionPeerProcedure` is in `DO_ADD_REGION_PEER` waiting 
for the coordinator DataNode's AddRegionPeer task to finish, `MIGRATE REGION` 
could be reported as **successful while the destination replica had not 
actually received the snapshot**, and the source replica was already removed. 
Queries against the region returned incorrect results during the gap.
   
   ### Reproduction (field case)
   
   `MIGRATE REGION 1 FROM 3 TO 4`, then stop the current ConfigNode leader 
right after the source replica started transmitting the snapshot (an 8.85 GB 
region over IoTConsensus). Observed:
   
   - ConfigNode printed `[MigrateRegion] success` and `show migrations` was 
already empty;
   - `show regions` showed the destination replica as Running/Leader;
   - but the destination DataNode only actually loaded the snapshot and became 
active **~17.5 minutes later**.
   
   During that window the source replica was already deleted while the 
destination was still `Adding`, so query results were wrong.
   
   ### Root cause
   
   `RegionMaintainHandler.waitTaskFinish()` returns `PROCESSING` **only** when 
its polling loop is interrupted by an `InterruptedException` (ConfigNode 
shutdown / leadership loss). A user `CANCEL` or a coordinator disconnection 
both go through the `FAIL` branch instead, so `PROCESSING` unambiguously means 
"interrupted by shutdown/leader switch".
   
   The old `DO_ADD_REGION_PEER` code treated `PROCESSING` as a terminal no-op 
(`return Flow.NO_MORE_STATE`), silently ending `AddRegionPeerProcedure` with 
neither success nor rollback. The parent `RegionMigrateProcedure` had already 
persisted at `CHECK_ADD_REGION_PEER`, so the new leader resumed there directly. 
That check uses `isDataNodeContainsRegion()`, which only inspects the partition 
table's **location list** — and the location is written back at 
`CREATE_NEW_REGION_PEER`, long before the snapshot finishes transferring. So 
the check passed, the source replica was removed, and the migration was 
declared complete prematurely.
   
   ## Fix
   
   In the `DO_ADD_REGION_PEER` `PROCESSING` branch, **stay at 
`DO_ADD_REGION_PEER` and persist it** (`HAS_MORE_STATE`) instead of ending. 
After recovery, the new leader re-enters `DO_ADD_REGION_PEER` and re-polls the 
coordinator task until it truly reaches `SUCCESS` or `FAIL`, so the parent only 
advances to `REMOVE_REGION_PEER` once the destination replica is genuinely 
added.
   
   The re-poll is **idempotent**, so the AddRegionPeer task is never submitted 
twice:
   
   - After a restart / leader switch, the `isStateDeserialized()` guard skips 
re-submitting the task and only re-polls;
   - Even in a same-process immediate re-execute, the coordinator DataNode 
dedups by `taskId` (`taskResultMap.putIfAbsent`), so a duplicate submit is a 
no-op;
   - If the coordinator crashed and lost its task table, the poll returns 
`TASK_NOT_EXIST`, which falls through to the existing `FAIL` / rollback path 
(safe degradation).
   
   The `waitTaskFinish() returns PROCESSING ...` log message (en + zh) is 
updated to reflect the new behavior.
   
   ## Tests
   
   Adds `cnLeaderSwitchDuringDoAddPeerTest` for each consensus protocol — 
IoTConsensus, IoTConsensusV2 (batch & stream), and Ratis.
   
   The existing daily ConfigNode-crash ITs all use `stopForcibly()` (SIGKILL), 
which kills the process before it can run the `PROCESSING` branch, so this path 
was never covered. The new test introduces a graceful `stop()` (SIGTERM) action 
and stops the **leader** among 3 ConfigNodes, so the interrupted `PROCESSING` 
path is actually exercised across a real leader switch, and asserts the 
migration still completes correctly (destination Running, source removed).
   
   > Note: the new tests are temporarily tagged `@Category({DailyIT.class, 
ClusterIT.class})` so they also run on the per-PR `Cluster IT - 1C3D` job for 
validation. This will be reverted to `DailyIT`-only before merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to