On Fri, Jun 03, 2022 at 01:14:40PM -0600, Jonathan Derrick wrote: > When metadata is present in the namespace and deallocates are issued, the > first > deallocate could fail to zero the block range, resulting in another > deallocation to be issued. Normally after the deallocation completes and the > range is checked for zeroes, a deallocation is then issued for the metadata > space. In the failure case where the range is not zeroed, deallocation is > reissued for the block range (and followed with metadata deallocation), but > the > original range deallocation task will also issue a metadata deallocation: > > nvme_dsm_cb() > *range deallocation* > nvme_dsm_md_cb() > if (nvme_block_status_all()) (range deallocation failure) > nvme_dsm_cb() > *range deallocation* > nvme_dsm_md_cb() > if (nvme_block_status_all()) (no failure) > *metadata deallocation* > *metadata deallocation* > > This sequence results in reentry of nvme_dsm_cb() before the metadata has been > deallocated. During reentry, the metadata is deallocated in the reentrant > task. > nvme_dsm_bh() is called which deletes and sets iocb->bh to NULL. When reentry > returns from nvme_dsm_cb(), metadata deallocation takes place again, and > results in a null pointer dereference on the iocb->bh:
Nice, thank you for the detailed analysis. Patch looks good. Reviewed-by: Keith Busch <kbu...@kernel.org>