[webkit-changes] [WebKit/WebKit] 629632: [JSC][WASM][Debugger] Fix debugger races with idle...

Yijia Huang Mon, 09 Feb 2026 19:14:04 -0800

  Branch: refs/heads/main
  Home:   https://github.com/WebKit/WebKit
  Commit: 629632da96e15c297e5260d60f185ba93ff649c3
      
https://github.com/WebKit/WebKit/commit/629632da96e15c297e5260d60f185ba93ff649c3
  Author: Yijia Huang <[email protected]>
  Date:   2026-02-09 (Mon, 09 Feb 2026)


  Changed paths:
    M Source/JavaScriptCore/runtime/VMManager.cpp
    M Source/JavaScriptCore/runtime/VMManager.h
    M Source/JavaScriptCore/runtime/VMTraps.h
    M Source/JavaScriptCore/wasm/debugger/WasmDebugServerUtilities.h
    M Source/JavaScriptCore/wasm/debugger/WasmExecutionHandler.cpp
    M Source/JavaScriptCore/wasm/debugger/WasmExecutionHandler.h
    M Source/JavaScriptCore/wasm/debugger/tests/ExecutionHandlerIdleStopTest.cpp
    M Source/JavaScriptCore/wasm/debugger/tests/ExecutionHandlerTest.cpp
    M Source/JavaScriptCore/wasm/debugger/tests/ExecutionHandlerTestSupport.h

  Log Message:
  -----------
  [JSC][WASM][Debugger] Fix debugger races with idle and active VMs in 
stop-the-world
https://bugs.webkit.org/show_bug.cgi?id=307288
rdar://169930077

Reviewed by Mark Lam.

Fix multiple race conditions in VMManager's stop-the-world mechanism that
caused the WASM debugger to fail when handling idle VMs (not executing code)
and active VMs (executing code) concurrently. Investigation revealed three
interconnected issues that needed fixing together.

The Story:

We started investigating flaky WASM debugger behavior where interrupting
idle VMs would sometimes fail or cause assertion failures. Tracing through
the code revealed a cascade of architectural issues:

Issue 1: VM Debug State Management Race
========================================

The WasmExecutionHandler's debugger callback (wasmDebuggerOnStop) runs on
the LAST VM that stops. The old code called markVMStates() in this callback,
which iterated through ALL VMs and marked them as stopped. However, this
created a race condition with idle VMs:

- Active VMs (executing code) stop immediately when they check traps
- Idle VMs (not executing code) stop via RunLoop dispatch callbacks
- The dispatch task might not have been triggered/executed yet when the
  debugger callback runs
- But markVMStates() marks the idle VM as "stopped" even though its
  dispatch callback hasn't run yet and it hasn't actually stopped
- This creates inconsistent state where the debugger thinks a VM is stopped
  but the VM is still running/idle

Example race sequence:
T1 (Main): VMManager::requestStopAll() sets traps and dispatches stop handlers
T2 (Active VM): Checks traps immediately and calls notifyVMStop()
T3 (Active VM): Enters debugger callback as last active VM
T3 (Active VM): Callback calls markVMStates() -> marks idle VM as "stopped"
T4 (Idle VM): Dispatch callback hasn't run yet -> still processing RunLoop 
events
Result: State inconsistency - debugger thinks idle VM stopped, but it hasn't

Fix: Move debug state management into VMManager::notifyVMStop() where EACH
VM sets its own state (setStopped) when it ACTUALLY stops, not having one VM
mark all other VMs from the callback. Each VM clears its state (clearStop)
when it exits notifyVMStop(). This ensures state transitions are synchronized
with the actual stop coordination.

Issue 2: RunLoop Dispatch Complexity
====================================

For idle VMs (not executing, not checking traps), we use RunLoop dispatch
to deliver stop requests. The old implementation (handleStopViaDispatch)
manually orchestrated the entire stop sequence:
- Test-and-clear the trap flag
- Manually increment active VM count
- Call notifyVMStop
- Manually decrement active VM count

This was error-prone and duplicated logic from VM entry services.

Fix: Simplify to just use VMEntryScope which naturally triggers the normal
VM entry path, letting entry services and notifyVMStop handle everything
correctly.

Issue 3: Active VM Counting Race
=================================

While fixing Issue 2, we discovered a fundamental race in active VM counting.
Entry services check the NeedStopTheWorld flag without holding m_worldLock
(VM.cpp:1698-1699):

  if 
(hasEntryScopeServiceRequest(ConcurrentEntryScopeService::NeedStopTheWorld)) 
[[unlikely]]
      VMManager::singleton().notifyVMActivation(*this);

The hasEntryScopeServiceRequest() uses atomic operations without holding any 
lock.
Between the unlocked flag check and lock acquisition in notifyVMActivation(), 
the
world can resume to RunAll mode, causing m_hasBeenCountedAsActive flag 
corruption:

Race sequence:
T1 (VM thread): EntryScope check NeedStopTheWorld flag (atomic, no lock) - flag 
SET
T2 (Debugger): World resumes to RunAll mode
T2 (Debugger): Clears NeedStopTheWorld flag and all m_hasBeenCountedAsActive 
flags
T1 (VM thread): (delayed) Acquires lock and calls notifyVMActivation()
T1 (VM thread): notifyVMActivation() calls incrementActiveVMs() which corrupts
                m_numberOfActiveVMs (increments from invalidNumberOfActiveVMs 
sentinel)
                and sets m_hasBeenCountedAsActive=true
T1 (VM thread): notifyVMActivation() sees RunAll mode, so needsStopping=false
T1 (VM thread): Does NOT call notifyVMStop() - just returns
T1 (VM thread): VM continues executing with m_hasBeenCountedAsActive 
erroneously true
T2 (Debugger): Later, new requestStopAll() from RunAll mode resets 
m_numberOfActiveVMs=0
               but does NOT clear the m_hasBeenCountedAsActive flags
T2 (Debugger): requestStopAll() iterates VMs and calls incrementActiveVMs() for 
entered VMs
T2 (Debugger): incrementActiveVMs() sees flag=true and skips incrementing 
(guard prevents
               double-count, but flag shouldn't be true here)
T1 (VM thread): Trap handler fires and calls notifyVMStop()
T1 (VM thread): VM is in notifyVMStop() but was never counted (skipped above)
Result: Assertion failure numberOfStoppedVMs > numberOfActiveVMs

Fix: Replace eager active VM counting (at construction/activation points)
with defensive counting at the actual stop point (notifyVMStop). This makes
counting idempotent and race-safe:
- If entry services succeeded: VM already counted, defensive increment is no-op
- If entry services raced: defensive increment catches the missed count
- Add RunAll mode guard in incrementActiveVMs() to prevent counter corruption

The defensive counting approach is sound because notifyVMStop() is only
called when worldMode is Stopping/Stopped/RunOne, never RunAll, so the
counter is always valid when we increment.

Implementation:

VMManager.cpp/h:
* Moved VM debug state management (setStopped/clearStop) into notifyVMStop()
  so each VM sets its own state when it actually stops, not having one VM
  mark all others from the debugger callback
* Added defensive incrementActiveVMs() at the start of notifyVMStop() to
  handle race conditions where entry services miss the count
* Added RunAll mode skip in incrementActiveVMs() - counter is invalid in
  RunAll mode (set to sentinel value invalidNumberOfActiveVMs)
* Added RunAll mode assertion in decrementActiveVMs() - verify that
  m_hasBeenCountedAsActive flag is never true in RunAll mode
* Removed incrementActiveVMs() from notifyVMConstruction() and
  notifyVMActivation() - defensive counting handles both cases now
* Simplified dispatchStopHandler() by removing handleStopViaDispatch() and
  using VMEntryScope directly, which naturally triggers the stop mechanism
* Removed handleStopViaDispatch() function - no longer needed
* Added m_hasDispatchedStopHandler atomic flag to prevent duplicate RunLoop
  task dispatches for idle VMs during a single stop cycle

WasmExecutionHandler.cpp/h:
* Removed manual VM debug state management from stopCode() - VMManager now
  handles this per-VM in notifyVMStop()
* Removed markVMStates() and clearOtherVMStopData() functions - no longer
  needed since each VM manages its own state in notifyVMStop()
* Added defensive isStopped() checks when enumerating VMs in
  selectDebuggeeIfNeeded(), findVM(), and collectAllStoppedThreads() -
  not all VMs may be stopped during iteration (especially idle VMs whose
  dispatch callbacks may not have run yet)
* Removed redundant stopped check in interrupt() - VMManager handles this

VMTraps.h:
* Added atomic m_hasDispatchedStopHandler flag to prevent duplicate
  dispatches to the same VM during a single stop cycle

WasmDebugServerUtilities.h:
* Removed setRunning() method - VMManager now manages all state transitions

Test Improvements:

ExecutionHandlerIdleStopTest.cpp:
* Added testIdleVMWithActiveVM() to test 2 idle + 3 active VMs scenario,
  specifically stressing RunLoop dispatch mechanism for idle VMs that don't
  check traps - this test would have caught Issue 1
* Improved error reporting by using CHECK instead of warnings
* Added explicit world resume before cleanup to release waiting VMs

ExecutionHandlerTest.cpp:
* Enhanced validateStop() and resume() to verify ALL VMs are in expected state,
  not just the debuggee VM - catches state management bugs like Issue 1
* Improved synchronization using getReplyCount() for deterministic waits
* Updated setupScriptAndWaitForVMs() to wait for VMs to be fully entered and
  actively running before tests start - eliminates VM lifecycle races
* Added final cleanup check to ensure all VMs are destroyed after tests
* Better error detection with CHECK failures instead of warnings

Canonical link: https://commits.webkit.org/307138@main



To unsubscribe from these emails, change your notification settings at 
https://github.com/WebKit/WebKit/settings/notifications

[webkit-changes] [WebKit/WebKit] 629632: [JSC][WASM][Debugger] Fix debugger races with idle...

Reply via email to