Re: [PATCH v5] nvdimm/btt: Handle preemption in BTT lane acquisition

Alison Schofield Fri, 15 May 2026 13:15:46 -0700

On Thu, May 14, 2026 at 06:47:26PM -0700, Alison Schofield wrote:

Sashiko AI had a 3 new things to point out with this v5. I'll address
them inline below and expect to only make changes for one of them,
destroying the mutex lock array.


I'm going to wait on some human feedback before creating a v6.
Thanks to my humans so far - AboorvaD and DaveJ.

> BTT lanes serialize access to per-lane metadata and workspace state
> during BTT I/O. The btt-check unit test reports data mismatches during
> BTT writes due to a race in lane acquisition that can lead to silent
> data corruption.
> 
> The existing lane model uses a spinlock together with a per-CPU
> recursion count. That recursion model stopped being valid after BTT
> lanes became preemptible: another task can run on the same CPU,
> observe a non-zero recursion count, bypass locking, and use the same
> lane concurrently.
> 
> BTT lanes are also held across arena_write_bytes() calls. That path
> reaches nsio_rw_bytes(), which flushes writes with nvdimm_flush().
> Some provider flush callbacks can sleep, making a spinlock the wrong
> primitive for the lane lifetime.
> 
> Replace the spinlock-based recursion model with a dynamically
> allocated per-lane mutex array and take the lane lock
> unconditionally.
> 
> Add might_sleep() to catch any future atomic-context caller.
> 
> Found with the ndctl unit test btt-check.sh.
> 
> Fixes: 36c75ce3bd29 ("nd_btt: Make BTT lanes preemptible")
> Assisted-by: Claude Sonnet 4.5
> Signed-off-by: Alison Schofield <[email protected]>
> ---
> 
> 
> Changes in v5:
> - Align lane mutex entries to cachelines in SMP builds (Sashiko AI)
> - Add sparse lock annotations for lane mutexes (DaveJ)
> - s/spinlock/mutexes in the driver-api doc btt.rst
> 
> Changes in v4:
> - Replace per-CPU lane storage w dynamically allocated mutex array (Sashiko 
> AI)
> - Remove the recursion fast path and take the lane lock unconditionally
> - Update commit log
> 
> Changes in v3:
> Replace spinlock with a per-lane mutex (Arboorva)
> 
> Changes in v2:
> Use spin_(un)lock_bh() (Sashiko AI)
> Update commit log per softirq re-enty and spinlock change
> 
> A new unit test to stress this is under review here:
> https://lore.kernel.org/nvdimm/[email protected]/
> 
>

snip


> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index e35c2e18518f..9f5a34181cf5 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -192,7 +192,7 @@ static void nd_region_release(struct device *dev)
>  
>               put_device(&nvdimm->dev);
>       }
> -     free_percpu(nd_region->lane);
> +     kfree(nd_region->lane);


1) Sashiko pointed out that since this is now a dynamically alloc'd mutex
array, it is bad hygiene to skip the mutex destroy.
Will fix up in next rev.


snip

>   */
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region)
> +     __acquires(&nd_region->lane[lane].lock)
>  {
> -     unsigned int cpu, lane;
> +     unsigned int lane;
>  

2) Sashiko suspected the syntax above could cause an undefined
indentifier error when building w sparse.
Not an issue. Sparse does not warn on the annotations as written.


3) And, Sashiko's 3rd point was asking "Since the lane lock is now a
mutex and can sleep, does this allow us to fix the lock drop race
in btt_write_pg()?"

It may indeed be possible to close that window entirely. I view that
as a follow-on cleanup, not part of this conversion. Closing the window
in btt_write_pg() changes BTT write sequencing around freelist recovery.
That needs it's own review and test coverage.
So - no plan to add here.

>

Re: [PATCH v5] nvdimm/btt: Handle preemption in BTT lane acquisition

Reply via email to