CRZbulabula commented on PR #17821:
URL: https://github.com/apache/iotdb/pull/17821#issuecomment-4658808161

   @Caideyipi Good catch — thanks. Fixed in e07433767a by reworking the 
leader-services lifecycle so this race can no longer happen.
   
   The root cause was that `submitIfLeaderServicesEpochCurrent()` only checked 
the epoch *before* `task.run()`, and those submitted tasks were not serialized 
against `notifyNotLeader()`'s cleanup. I removed that helper entirely. The new 
design:
   
   1. **All transitions are serialized on a single-thread executor.** 
`notifyLeaderReady` (become-leader), `notifyNotLeader` / `notifyLeaderChanged` 
(resign) all submit to one single-thread `leaderServicesTransitionExecutor`. 
Because it has exactly one worker, a become-leader orchestration and a resign 
cleanup can never run concurrently — one runs to completion before the other 
starts. So `startCQScheduler()` / `startPipeMetaSync()` / 
`startPipeHeartbeat()` / `startSubscriptionMetaSync()` can no longer interleave 
with cleanup.
   
   2. **The epoch is bumped eagerly on resign, before cleanup is even queued.** 
`notifyNotLeader` calls `invalidateLeaderServices()` synchronously on the 
consensus thread, so the epoch advances the instant we lose leadership. An 
in-flight `becomeLeader` re-checks `isCurrentLeaderServicesEpoch(epoch)` after 
the parallel startups join and again before it sets `leaderServicesReady = 
true`, so a stale epoch bails out and never re-enables services after cleanup.
   
   3. **`leaderServicesReady` is only set inside `leaderServicesLock` with the 
epoch re-checked**, so the "set ready" step is atomic with respect to the epoch.
   
   Within a single become-leader epoch, load services still start first (for 
warm-up), then the remaining independent services start in parallel on a cached 
pool and are joined before the epoch is marked ready. So the check-then-run gap 
you pointed out is closed both by the single-thread serialization and by the 
epoch re-check inside the lock.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to