On Thu, 2026-05-14 at 18:47 -0700, Alison Schofield wrote:
> BTT lanes serialize access to per-lane metadata and workspace state
> during BTT I/O. The btt-check unit test reports data mismatches during
> BTT writes due to a race in lane acquisition that can lead to silent
> data corruption.
> 
> The existing lane model uses a spinlock together with a per-CPU
> recursion count. That recursion model stopped being valid after BTT
> lanes became preemptible: another task can run on the same CPU,
> observe a non-zero recursion count, bypass locking, and use the same
> lane concurrently.
> 
> BTT lanes are also held across arena_write_bytes() calls. That path
> reaches nsio_rw_bytes(), which flushes writes with nvdimm_flush().
> Some provider flush callbacks can sleep, making a spinlock the wrong
> primitive for the lane lifetime.
> 
> Replace the spinlock-based recursion model with a dynamically
> allocated per-lane mutex array and take the lane lock
> unconditionally.
> 
> Add might_sleep() to catch any future atomic-context caller.
> 
> Found with the ndctl unit test btt-check.sh.
> 
> Fixes: 36c75ce3bd29 ("nd_btt: Make BTT lanes preemptible")
> Assisted-by: Claude Sonnet 4.5
> Signed-off-by: Alison Schofield <[email protected]>

Couple of minor nits below, but other than those this looks good to me.

Reviewed-by: Vishal Verma <[email protected]>

> ---
> 
> 
> Changes in v5:
> - Align lane mutex entries to cachelines in SMP builds (Sashiko AI)
> - Add sparse lock annotations for lane mutexes (DaveJ)
> - s/spinlock/mutexes in the driver-api doc btt.rst
> 
> Changes in v4:
> - Replace per-CPU lane storage w dynamically allocated mutex array (Sashiko 
> AI)
> - Remove the recursion fast path and take the lane lock unconditionally
> - Update commit log
> 
> Changes in v3:
> Replace spinlock with a per-lane mutex (Arboorva)
> 
> Changes in v2:
> Use spin_(un)lock_bh() (Sashiko AI)
> Update commit log per softirq re-enty and spinlock change
> 
> A new unit test to stress this is under review here:
> https://lore.kernel.org/nvdimm/[email protected]/
> 
> 
>  Documentation/driver-api/nvdimm/btt.rst |  4 +-
>  drivers/nvdimm/nd.h                     |  7 ++-
>  drivers/nvdimm/region_devs.c            | 64 ++++++++-----------------
>  3 files changed, 25 insertions(+), 50 deletions(-)
> 
> diff --git a/Documentation/driver-api/nvdimm/btt.rst 
> b/Documentation/driver-api/nvdimm/btt.rst
> index 2d8269f834bd..e3218863ec96 100644
> --- a/Documentation/driver-api/nvdimm/btt.rst
> +++ b/Documentation/driver-api/nvdimm/btt.rst
> @@ -162,8 +162,8 @@ process::
>  
>  A lane number is obtained at the start of any IO, and is used for indexing 
> into
>  all the on-disk and in-memory data structures for the duration of the IO. If
> -there are more CPUs than the max number of available lanes, than lanes are
> -protected by spinlocks.
> +there are more CPUs than the max number of available lanes, then lanes are
> +protected by mutexes.

This is unconditional now, right? i.e. a mutex is used regardless of
number of CPUs. Maybe reword to something like just "Lanes are
protected by mutexes."

>  
> 
>  d. In-memory data structure: Read Tracking Table (RTT)
> diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
> index b199eea3260e..263b7dde0f87 100644
> --- a/drivers/nvdimm/nd.h
> +++ b/drivers/nvdimm/nd.h
> @@ -366,9 +366,8 @@ unsigned sizeof_namespace_label(struct nvdimm_drvdata 
> *ndd);
>                       res; res = next, next = next ? next->sibling : NULL)
>  
>  struct nd_percpu_lane {
> -     int count;
> -     spinlock_t lock;
> -};
> +     struct mutex lock; /* serialize lane access */
> +} ____cacheline_aligned_in_smp;

Does this also obsolete the nd_percpu_lane naming? Maybe rename the
struct to nd_lane too to avoid any future confusion?

Reply via email to