Public bug reported:

 == Summary ==
    A soft lockup (system freeze) occurs in the DPMAIF TX kernel thread
    when system suspend (s2idle) is triggered while the thread is active.

    == Environment ==
    Kernel:   6.17.0-1017-oem (Ubuntu OEM)
    Device:   MediaTek MT6880 (FM350-GL) 0000:56:00.0
    ASPM:     L1 Enabled on endpoint (LnkCtl: ASPM L1 Enabled; ClockPM+)
    Sleep:    s2idle

    == Reproduction ==
    Trigger: SIM registered + ASPM L1 enabled + repeated suspend/resume 
(hundreds of cycles)

    Reproduction script (60s cycle):
      while true; do
        systemctl suspend
        sleep 60
      done

    == Kernel Log ==
    watchdog: BUG: soft lockup - CPU#10 stuck for 26s! [dpmaif_tx_hw_pu:625]
    RIP: 0010:ktime_get_mono_fast_ns+0x67/0xd0
    ...
    watchdog: BUG: soft lockup - CPU#10 stuck for 52s! [dpmaif_tx_hw_pu:625]
    RIP: 0010:_raw_spin_unlock_irqrestore+0x3d/0x60
    Call Trace:
      __pm_runtime_resume+0x5b/0x80
      t7xx_dpmaif_tx_hw_push_thread+0xc4/0x4e0 [mtk_t7xx]

    == Root Cause ==
    t7xx_dpmaif_suspend() stops the TX work queue via t7xx_dpmaif_tx_stop()
    but does NOT signal the TX kthread or update dpmaif_ctrl->state.
    The kthread can pass the state guard and call pm_runtime_resume_and_get()
    concurrently with the system PM suspend path, causing a spinlock deadlock.

    == Fix ==
    See attached patch. Three changes:
    1. t7xx_dpmaif_suspend(): set state=PWROFF + wake_up() to signal kthread
    2. t7xx_dpmaif_resume(): restore state=PWRON (symmetric)
    3. t7xx_dpmaif_tx_hw_push_thread(): add state guard before pm_runtime call

    == Workaround ==
    Disable ASPM L1 on endpoint:
      LNKCTL=$(setpci -s 56:00.0 CAP_EXP+10.w)
      setpci -s 56:00.0 CAP_EXP+10.w=$(printf "%04x" $((16#${LNKCTL} & ~0x2)))
    (reduces probability but does not fully prevent the race)

    == Testing ==
    Patched module installed to /lib/modules/.../extra/mtk_t7xx.ko
    Running suspend/resume loop with SIM registered — no lockup observed
    (testing in progress, over 4000 cycles completed without regression).

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: kernel patch pcie regression wwan

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2152281

Title:
   net: wwan: t7xx: soft lockup in dpmaif_tx_hw_push_thread during
  system suspend

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2152281/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to