CRZbulabula commented on PR #17821: URL: https://github.com/apache/iotdb/pull/17821#issuecomment-4658808161
@Caideyipi Good catch — thanks. Fixed in e07433767a by reworking the leader-services lifecycle so this race can no longer happen. The root cause was that `submitIfLeaderServicesEpochCurrent()` only checked the epoch *before* `task.run()`, and those submitted tasks were not serialized against `notifyNotLeader()`'s cleanup. I removed that helper entirely. The new design: 1. **All transitions are serialized on a single-thread executor.** `notifyLeaderReady` (become-leader), `notifyNotLeader` / `notifyLeaderChanged` (resign) all submit to one single-thread `leaderServicesTransitionExecutor`. Because it has exactly one worker, a become-leader orchestration and a resign cleanup can never run concurrently — one runs to completion before the other starts. So `startCQScheduler()` / `startPipeMetaSync()` / `startPipeHeartbeat()` / `startSubscriptionMetaSync()` can no longer interleave with cleanup. 2. **The epoch is bumped eagerly on resign, before cleanup is even queued.** `notifyNotLeader` calls `invalidateLeaderServices()` synchronously on the consensus thread, so the epoch advances the instant we lose leadership. An in-flight `becomeLeader` re-checks `isCurrentLeaderServicesEpoch(epoch)` after the parallel startups join and again before it sets `leaderServicesReady = true`, so a stale epoch bails out and never re-enables services after cleanup. 3. **`leaderServicesReady` is only set inside `leaderServicesLock` with the epoch re-checked**, so the "set ready" step is atomic with respect to the epoch. Within a single become-leader epoch, load services still start first (for warm-up), then the remaining independent services start in parallel on a cached pool and are joined before the epoch is marked ready. So the check-then-run gap you pointed out is closed both by the single-thread serialization and by the epoch re-check inside the lock. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
