On Wed, May 13, 2026 at 05:23:12PM -0700, Alison Schofield wrote:
> BTT lanes serialize access to per-lane metadata and workspace state
> during BTT I/O. The btt-check unit test reports data mismatches during
> BTT writes due to a race in lane acquisition that can lead to silent
> data corruption.
> 
> The existing lane model uses a spinlock together with a per-CPU
> recursion count. That recursion model stopped being valid after BTT
> lanes became preemptible: another task can run on the same CPU,
> observe a non-zero recursion count, bypass locking, and use the same
> lane concurrently.
> 
> BTT lanes are also held across metadata and data updates that can
> reach nvdimm_flush(). Some provider flush callbacks can sleep, making
> a spinlock the wrong primitive for the lane lifetime. That issue
> predates this fix, but becomes more visible now that BTT lanes are
> preemptible.
> 
> Replace the spinlock with a per-lane mutex, remove the per-CPU
> recursion fast path, and take the lane lock unconditionally.
> 
> Add might_sleep() to catch any future atomic-context caller.
> 
> Found with the ndctl unit test btt-check.sh.
> 
> Fixes: 36c75ce3bd29 ("nd_btt: Make BTT lanes preemptible")
> Assisted-by: Claude Sonnet 4.5
> Signed-off-by: Alison Schofield <[email protected]>
> ---

Sashiko review offered applicable feedback. With the recursion count
removed, the lanes are really just a lock pool indexed by lane number,
so the per-cpu allocation no longer makes sense.

Working a v4 where pre-CPU lane storage gets replaced with a dynamically
allocated per-lane mutex array.

https://sashiko.dev/#/patchset/20260514002314.65024-1-alison.schofield%40intel.com

snip


Reply via email to