CRZbulabula opened a new pull request, #17908:
URL: https://github.com/apache/iotdb/pull/17908
## Problem
When the ConfigNode leader is **gracefully** stopped — or otherwise loses
leadership — while `AddRegionPeerProcedure` is in `DO_ADD_REGION_PEER` waiting
for the coordinator DataNode's AddRegionPeer task to finish, `MIGRATE REGION`
could be reported as **successful while the destination replica had not
actually received the snapshot**, and the source replica was already removed.
Queries against the region returned incorrect results during the gap.
### Reproduction (field case)
`MIGRATE REGION 1 FROM 3 TO 4`, then stop the current ConfigNode leader
right after the source replica started transmitting the snapshot (an 8.85 GB
region over IoTConsensus). Observed:
- ConfigNode printed `[MigrateRegion] success` and `show migrations` was
already empty;
- `show regions` showed the destination replica as Running/Leader;
- but the destination DataNode only actually loaded the snapshot and became
active **~17.5 minutes later**.
During that window the source replica was already deleted while the
destination was still `Adding`, so query results were wrong.
### Root cause
`RegionMaintainHandler.waitTaskFinish()` returns `PROCESSING` **only** when
its polling loop is interrupted by an `InterruptedException` (ConfigNode
shutdown / leadership loss). A user `CANCEL` or a coordinator disconnection
both go through the `FAIL` branch instead, so `PROCESSING` unambiguously means
"interrupted by shutdown/leader switch".
The old `DO_ADD_REGION_PEER` code treated `PROCESSING` as a terminal no-op
(`return Flow.NO_MORE_STATE`), silently ending `AddRegionPeerProcedure` with
neither success nor rollback. The parent `RegionMigrateProcedure` had already
persisted at `CHECK_ADD_REGION_PEER`, so the new leader resumed there directly.
That check uses `isDataNodeContainsRegion()`, which only inspects the partition
table's **location list** — and the location is written back at
`CREATE_NEW_REGION_PEER`, long before the snapshot finishes transferring. So
the check passed, the source replica was removed, and the migration was
declared complete prematurely.
## Fix
In the `DO_ADD_REGION_PEER` `PROCESSING` branch, **stay at
`DO_ADD_REGION_PEER` and persist it** (`HAS_MORE_STATE`) instead of ending.
After recovery, the new leader re-enters `DO_ADD_REGION_PEER` and re-polls the
coordinator task until it truly reaches `SUCCESS` or `FAIL`, so the parent only
advances to `REMOVE_REGION_PEER` once the destination replica is genuinely
added.
The re-poll is **idempotent**, so the AddRegionPeer task is never submitted
twice:
- After a restart / leader switch, the `isStateDeserialized()` guard skips
re-submitting the task and only re-polls;
- Even in a same-process immediate re-execute, the coordinator DataNode
dedups by `taskId` (`taskResultMap.putIfAbsent`), so a duplicate submit is a
no-op;
- If the coordinator crashed and lost its task table, the poll returns
`TASK_NOT_EXIST`, which falls through to the existing `FAIL` / rollback path
(safe degradation).
The `waitTaskFinish() returns PROCESSING ...` log message (en + zh) is
updated to reflect the new behavior.
## Tests
Adds `cnLeaderSwitchDuringDoAddPeerTest` for each consensus protocol —
IoTConsensus, IoTConsensusV2 (batch & stream), and Ratis.
The existing daily ConfigNode-crash ITs all use `stopForcibly()` (SIGKILL),
which kills the process before it can run the `PROCESSING` branch, so this path
was never covered. The new test introduces a graceful `stop()` (SIGTERM) action
and stops the **leader** among 3 ConfigNodes, so the interrupted `PROCESSING`
path is actually exercised across a real leader switch, and asserts the
migration still completes correctly (destination Running, source removed).
> Note: the new tests are temporarily tagged `@Category({DailyIT.class,
ClusterIT.class})` so they also run on the per-PR `Cluster IT - 1C3D` job for
validation. This will be reverted to `DailyIT`-only before merge.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]