Re: [Xen-devel] [PATCH 26/34] mm/gup_benchmark.c: convert put_page() to put_user_page*()

2019-08-02 Thread Keith Busch
On Thu, Aug 01, 2019 at 07:19:57PM -0700, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> For pages that were retained via get_user_pages*(), release those pages
> via the new put_user_page*() routines, instead of via put_page() or
> release_pages().
> 
> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions").
> 
> Cc: Dan Carpenter 
> Cc: Greg Kroah-Hartman 
> Cc: Keith Busch 
> Cc: Kirill A. Shutemov 
> Cc: Michael S. Tsirkin 
> Cc: YueHaibing 
> Signed-off-by: John Hubbard 

Looks fine.

Reviewed-by: Keith Busch 

>  mm/gup_benchmark.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
> index 7dd602d7f8db..515ac8eeb6ee 100644
> --- a/mm/gup_benchmark.c
> +++ b/mm/gup_benchmark.c
> @@ -79,7 +79,7 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
>   for (i = 0; i < nr_pages; i++) {
>   if (!pages[i])
>   break;
> - put_page(pages[i]);
> + put_user_page(pages[i]);
>   }
>   end_time = ktime_get();
>   gup->put_delta_usec = ktime_us_delta(end_time, start_time);
> -- 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH 24/27] block: remove QUEUE_FLAG_DISCARD

2022-04-11 Thread Keith Busch
On Sat, Apr 09, 2022 at 06:50:40AM +0200, Christoph Hellwig wrote:
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index efb85c6d8e2d5..7e07dd69262a7 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1607,10 +1607,8 @@ static void nvme_config_discard(struct gendisk *disk, 
> struct nvme_ns *ns)
>   struct request_queue *queue = disk->queue;
>   u32 size = queue_logical_block_size(queue);
>  
> - if (ctrl->max_discard_sectors == 0) {
> - blk_queue_flag_clear(QUEUE_FLAG_DISCARD, queue);
> + if (ctrl->max_discard_sectors == 0)
>   return;
> - }

I think we need to update the queue limit in this condition. While unlikley,
the flag was cleared here in case the device changed support for discard from
the previous reset. 



Re: [PATCH v4 1/4] PCI: Introduce pci_dev_for_each_resource()

2023-03-10 Thread Keith Busch
On Fri, Mar 10, 2023 at 07:14:13PM +0200, Andy Shevchenko wrote:
> +#define __pci_dev_for_each_resource(dev, res, __i, vartype)  \
> + for (vartype __i = 0;   \
> + res = &(dev)->resource[__i], __i < PCI_NUM_RESOURCES;   \
> + __i++)

...

> +#define pci_dev_for_each_resource_p(dev, res)
> \
> + __pci_dev_for_each_resource(dev, res, i, unsigned int)

It looks dangerous to have a macro declare a variable when starting a new
scope. How do you know the name 'i' won't clash with something defined above?



Re: [PATCH 26/26] block: move the bounce flag into the features field

2024-06-17 Thread Keith Busch
On Mon, Jun 17, 2024 at 08:04:53AM +0200, Christoph Hellwig wrote:
> @@ -352,7 +355,6 @@ enum blk_bounce {

No more users of "enum blk_bounce" after this, so you can delete that
too.

>  struct queue_limits {
>   unsigned intfeatures;
>   unsigned intflags;
> - enum blk_bounce bounce;



Re: [PATCH 14/26] block: move the nonrot flag to queue_limits

2024-06-24 Thread Keith Busch
On Mon, Jun 17, 2024 at 08:04:41AM +0200, Christoph Hellwig wrote:
> -#define blk_queue_nonrot(q)  test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
> +#define blk_queue_nonrot(q)  ((q)->limits.features & BLK_FEAT_ROTATIONAL)

This is inverted. Should be:

 #define blk_queue_nonrot(q)(!((q)->limits.features & BLK_FEAT_ROTATIONAL))



Re: [PATCH 01/32] block: Provide blkdev_get_handle_* functions

2023-07-04 Thread Keith Busch
On Tue, Jul 04, 2023 at 02:21:28PM +0200, Jan Kara wrote:
> +struct bdev_handle *blkdev_get_handle_by_dev(dev_t dev, blk_mode_t mode,
> + void *holder, const struct blk_holder_ops *hops)
> +{
> + struct bdev_handle *handle = kmalloc(sizeof(struct bdev_handle),
> +  GFP_KERNEL);

I believe 'sizeof(*handle)' is the preferred style.

> + struct block_device *bdev;
> +
> + if (!handle)
> + return ERR_PTR(-ENOMEM);
> + bdev = blkdev_get_by_dev(dev, mode, holder, hops);
> + if (IS_ERR(bdev))
> + return ERR_CAST(bdev);

Need a 'kfree(handle)' before the error return. Or would it be simpler
to get the bdev first so you can check the mode settings against a
read-only bdev prior to the kmalloc?

> + handle->bdev = bdev;
> + handle->holder = holder;
> + return handle;
> +}
> +EXPORT_SYMBOL(blkdev_get_handle_by_dev);
> +
>  /**
>   * blkdev_get_by_path - open a block device by name
>   * @path: path to the block device to open
> @@ -884,6 +902,28 @@ struct block_device *blkdev_get_by_path(const char 
> *path, blk_mode_t mode,
>  }
>  EXPORT_SYMBOL(blkdev_get_by_path);
>  
> +struct bdev_handle *blkdev_get_handle_by_path(const char *path, blk_mode_t 
> mode,
> + void *holder, const struct blk_holder_ops *hops)
> +{
> + struct bdev_handle *handle;
> + dev_t dev;
> + int error;
> +
> + error = lookup_bdev(path, &dev);
> + if (error)
> + return ERR_PTR(error);
> +
> + handle = blkdev_get_handle_by_dev(dev, mode, holder, hops);
> + if (!IS_ERR(handle) && (mode & BLK_OPEN_WRITE) &&
> + bdev_read_only(handle->bdev)) {
> + blkdev_handle_put(handle);
> + return ERR_PTR(-EACCES);
> + }
> +
> + return handle;
> +}
> +EXPORT_SYMBOL(blkdev_get_handle_by_path);



Re: [PATCH] swiotlb-xen: provide the "max_mapping_size" method

2023-11-06 Thread Keith Busch
On Mon, Nov 06, 2023 at 03:59:40PM +0100, Mikulas Patocka wrote:
> There's a bug that when using the XEN hypervisor with dm-crypt on NVMe, 
> the kernel deadlocks [1].
> 
> The deadlocks are caused by inability to map a large bio vector -
> dma_map_sgtable always returns an error, this gets propagated to the block
> layer as BLK_STS_RESOURCE and the block layer retries the request
> indefinitely.
> 
> XEN uses the swiotlb framework to map discontiguous pages into contiguous
> runs that are submitted to the PCIe device. The swiotlb framework has a
> limitation on the length of a mapping - this needs to be announced with
> the max_mapping_size method to make sure that the hardware drivers do not
> create larger mappings.
> 
> Without max_mapping_size, the NVMe block driver would create large
> mappings that overrun the maximum mapping size.
> 
> [1] https://lore.kernel.org/stable/ZTNH0qtmint%2FzLJZ@mail-itl/

This should be a "Link:" tag.

> Signed-off-by: Mikulas Patocka 
> Reported-by: Marek Marczykowski-G'orecki 
> Tested-by: Marek Marczykowski-G'orecki 
> Suggested-by: Keith Busch 

I was about to send the same thing. I did a little more than suggest
this: it's is the very patch I wrote for testing, minus the redundant
nvme bits! But since you already have a commit message for it...

Acked-by: Keith Busch 

> Suggested-by: Christoph Hellwig 
> Cc: sta...@vger.kernel.org
> 
> ---
>  drivers/xen/swiotlb-xen.c |1 +
>  1 file changed, 1 insertion(+)
> 
> Index: linux-stable/drivers/xen/swiotlb-xen.c
> ===
> --- linux-stable.orig/drivers/xen/swiotlb-xen.c   2023-11-03 
> 17:57:18.0 +0100
> +++ linux-stable/drivers/xen/swiotlb-xen.c2023-11-06 15:30:59.0 
> +0100
> @@ -405,4 +405,5 @@ const struct dma_map_ops xen_swiotlb_dma
>   .get_sgtable = dma_common_get_sgtable,
>   .alloc_pages = dma_common_alloc_pages,
>   .free_pages = dma_common_free_pages,
> + .max_mapping_size = swiotlb_max_mapping_size,
>  };




Re: [PATCH 03/10] nvme-multipath: add error handling support for add_disk()

2021-08-27 Thread Keith Busch
On Fri, Aug 27, 2021 at 12:18:02PM -0700, Luis Chamberlain wrote:
> @@ -479,13 +479,17 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, 
> struct nvme_ns_head *head)
>  static void nvme_mpath_set_live(struct nvme_ns *ns)
>  {
>   struct nvme_ns_head *head = ns->head;
> + int rc;
>  
>   if (!head->disk)
>   return;
>  
> - if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
> - device_add_disk(&head->subsys->dev, head->disk,
> - nvme_ns_id_attr_groups);
> + if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {

This should still be test_and_set_bit() because it is protecting against
two nvme paths simultaneously calling device_add_disk() on the same
namespace head.

> + rc = device_add_disk(&head->subsys->dev, head->disk,
> +  nvme_ns_id_attr_groups);
> + if (rc)
> + return;
> + set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags);
>   nvme_add_ns_head_cdev(head);
>   }



Re: [PATCH v2 03/10] nvme-multipath: add error handling support for add_disk()

2021-09-27 Thread Keith Busch
On Mon, Sep 27, 2021 at 03:00:32PM -0700, Luis Chamberlain wrote:
> + /*
> +  * test_and_set_bit() is used because it is protecting against two nvme
> +  * paths simultaneously calling device_add_disk() on the same namespace
> +  * head.
> +  */
>   if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) {
> - device_add_disk(&head->subsys->dev, head->disk,
> - nvme_ns_id_attr_groups);
> + rc = device_add_disk(&head->subsys->dev, head->disk,
> +  nvme_ns_id_attr_groups);
> + if (rc)
> + return;
> + set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags);

No need to set_bit() here since the test_and_set_bit() already took care
of that.



Re: [PATCH 2/3] vmd: disable MSI remapping bypass under Xen

2025-01-13 Thread Keith Busch
On Mon, Jan 13, 2025 at 11:03:58AM +0100, Roger Pau Monné wrote:
> 
> Hm, OK, but isn't the limit 80 columns according to the kernel coding
> style (Documentation/process/coding-style.rst)?

That's the coding style. The commit message style is described in a
different doc:

  
https://docs.kernel.org/process/submitting-patches.html#the-canonical-patch-format



Re: [PATCH 2/3] vmd: disable MSI remapping bypass under Xen

2025-01-13 Thread Keith Busch
On Mon, Jan 13, 2025 at 05:45:20PM +0100, Roger Pau Monné wrote:
> On Mon, Jan 13, 2025 at 08:11:19AM -0700, Keith Busch wrote:
> > On Mon, Jan 13, 2025 at 11:03:58AM +0100, Roger Pau Monné wrote:
> > > 
> > > Hm, OK, but isn't the limit 80 columns according to the kernel coding
> > > style (Documentation/process/coding-style.rst)?
> > 
> > That's the coding style. The commit message style is described in a
> > different doc:
> > 
> >   
> > https://docs.kernel.org/process/submitting-patches.html#the-canonical-patch-format
> 
> It's quite confusing to have different line length for code vs commit
> messages, but my fault for not reading those. Will adjust to 75
> columns them.

The output of 'git log' prepends spaces to the subject and body of the
listed commits. The lower limit for commit messages vs. code makes the
change log look readable in an 80-char terminal.



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-19 Thread Keith Busch
On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Make sure that CPU is not synced and IOMMU is configured to take
> MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

We may have a minor patch conflict here with my unmerged dma metadata
series, but not a big deal.

Looks good.

Reviewed-by: Keith Busch 



Re: [PATCH v4 11/16] dma-mapping: export new dma_*map_phys() interface

2025-08-19 Thread Keith Busch
On Tue, Aug 19, 2025 at 08:36:55PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
> that operate directly on physical addresses instead of page+offset
> parameters. This provides a more efficient interface for drivers that
> already have physical addresses available.
> 
> The new functions are implemented as the primary mapping layer, with
> the existing dma_map_page_attrs()/dma_map_resource() and
> dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
> wrappers around the phys-based implementations.
> 
> In case dma_map_page_attrs(), the struct page is converted to physical
> address with help of page_to_phys() function and dma_map_resource()
> provides physical address as is together with addition of DMA_ATTR_MMIO
> attribute.
> 
> The old page-based API is preserved in mapping.c to ensure that existing
> code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
> variant for dma_*map_phys().

Looks good.

Reviewed-by: Keith Busch 



Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page

2025-08-19 Thread Keith Busch
On Tue, Aug 19, 2025 at 08:36:58PM +0300, Leon Romanovsky wrote:
>  static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
>   struct blk_dma_iter *iter, struct phys_vec *vec)
>  {
> - iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> - offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> + iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> + rq_dma_dir(req), 0);

Looks good.

Reviewed-by: Keith Busch 

Just a random thought when I had to double back to check what the "0"
means: many dma_ api's have a default macro without an "attrs" argument,
then an _attrs() version for when you need it. Not sure if you want to
strictly follow that pattern, but merely a suggestion.



Re: [PATCH v4 16/16] nvme-pci: unmap MMIO pages with appropriate interface

2025-08-19 Thread Keith Busch
On Tue, Aug 19, 2025 at 08:37:00PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> Block layer maps MMIO memory through dma_map_phys() interface
> with help of DMA_ATTR_MMIO attribute. There is a need to unmap
> that memory with the appropriate unmap function, something which
> wasn't possible before adding new REQ attribute to block layer in
> previous patch.

Looks good.

Reviewed-by: Keith Busch 



Re: [PATCH v4 14/16] block-dma: migrate to dma_map_phys instead of map_page

2025-09-03 Thread Keith Busch
On Tue, Sep 02, 2025 at 10:49:48PM +0200, Marek Szyprowski wrote:
> On 19.08.2025 19:36, Leon Romanovsky wrote:
> > @@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, 
> > struct phys_vec *vec)
> >   static bool blk_dma_map_direct(struct request *req, struct device 
> > *dma_dev,
> > struct blk_dma_iter *iter, struct phys_vec *vec)
> >   {
> > -   iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
> > -   offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
> > +   iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
> > +   rq_dma_dir(req), 0);
> > if (dma_mapping_error(dma_dev, iter->addr)) {
> > iter->status = BLK_STS_RESOURCE;
> > return false;
> 
> I wonder where is the corresponding dma_unmap_page() call and its change 
> to dma_unmap_phys()...

You can't do that in the generic layer, so it's up to the caller. The
dma addrs that blk_dma_iter yield are used in a caller specific
structure. For example, for NVMe, it goes into an NVMe PRP. The generic
layer doesn't know what that is, so the driver has to provide the
unmapping.



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-28 Thread Keith Busch
On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index 09b99d52fd36..283058bcb5b1 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -387,6 +387,7 @@ enum req_flag_bits {
>   __REQ_FS_PRIVATE,   /* for file system (submitter) use */
>   __REQ_ATOMIC,   /* for atomic write operations */
>   __REQ_P2PDMA,   /* contains P2P DMA pages */
> + __REQ_MMIO, /* contains MMIO memory */
>   /*
>* Command specific flags, keep last:
>*/
> @@ -420,6 +421,7 @@ enum req_flag_bits {
>  #define REQ_FS_PRIVATE   (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
>  #define REQ_ATOMIC   (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
>  #define REQ_P2PDMA   (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> +#define REQ_MMIO (__force blk_opf_t)(1ULL << __REQ_MMIO)

Now that my integrity metadata DMA series is staged, I don't think we
can use REQ flags like this because data and metadata may have different
mapping types. I think we should add a flags field to the dma_iova_state
instead.



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-28 Thread Keith Busch
On Thu, Aug 28, 2025 at 07:54:27PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 28, 2025 at 09:19:20AM -0600, Keith Busch wrote:
> > On Tue, Aug 19, 2025 at 08:36:59PM +0300, Leon Romanovsky wrote:
> > > diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> > > index 09b99d52fd36..283058bcb5b1 100644
> > > --- a/include/linux/blk_types.h
> > > +++ b/include/linux/blk_types.h
> > > @@ -387,6 +387,7 @@ enum req_flag_bits {
> > >   __REQ_FS_PRIVATE,   /* for file system (submitter) use */
> > >   __REQ_ATOMIC,   /* for atomic write operations */
> > >   __REQ_P2PDMA,   /* contains P2P DMA pages */
> > > + __REQ_MMIO, /* contains MMIO memory */
> > >   /*
> > >* Command specific flags, keep last:
> > >*/
> > > @@ -420,6 +421,7 @@ enum req_flag_bits {
> > >  #define REQ_FS_PRIVATE   (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
> > >  #define REQ_ATOMIC   (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
> > >  #define REQ_P2PDMA   (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
> > > +#define REQ_MMIO (__force blk_opf_t)(1ULL << __REQ_MMIO)
> > 
> > Now that my integrity metadata DMA series is staged, I don't think we
> > can use REQ flags like this because data and metadata may have different
> > mapping types. I think we should add a flags field to the dma_iova_state
> > instead.
> 
> Before integrity metadata code was merged, the assumption was that request is
> only one type or p2p or host. Is it still holding now?

I don't think that was ever the case. Metadata is allocated
independently of the data payload, usually by the kernel in
bio_integrity_prep() just before dispatching the request. The bio may
have a p2p data payload, but the integrity metadata is just a kmalloc
buf in that path.

> And we can't store in dma_iova_state() as HMM/RDMA code works in page-based
> granularity and one dma_iova_state() can mix different types.

I see.



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-28 Thread Keith Busch
On Thu, Aug 28, 2025 at 03:41:15PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 11:15:20AM -0600, Keith Busch wrote:
> > 
> > I don't think that was ever the case. Metadata is allocated
> > independently of the data payload, usually by the kernel in
> > bio_integrity_prep() just before dispatching the request. The bio may
> > have a p2p data payload, but the integrity metadata is just a kmalloc
> > buf in that path.
> 
> Then you should do two dma mapping operations today, that is how the
> API was built. You shouldn't mix P2P and non P2P within a single
> operation right now..

Data and metadata are mapped as separate operations. They're just
different parts of one blk-mq request.



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-28 Thread Keith Busch
On Thu, Aug 28, 2025 at 04:18:20PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 01:10:32PM -0600, Keith Busch wrote:
> > 
> > Data and metadata are mapped as separate operations. They're just
> > different parts of one blk-mq request.
> 
> In that case the new bit leon proposes should only be used for the
> unmap of the data pages and the metadata unmap should always be
> unmapped as CPU?

The common path uses host allocated memory to attach integrity metadata,
but that isn't the only path. A user can attach their own metadata with
nvme passthrough or the recent io_uring application metadata, and that
could have been allocated from anywhere.

In truth though, I hadn't tried p2p metadata before today, and it looks
like bio_integrity_map_user() is missing the P2P extraction flags to
make that work. Just added this patch below, now I can set p2p or host
memory independently for data and integrity payloads:

---
diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 6b077ca937f6b..cf45603e378d5 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -265,6 +265,7 @@ int bio_integrity_map_user(struct bio *bio, struct iov_iter 
*iter)
unsigned int align = blk_lim_dma_alignment_and_pad(&q->limits);
struct page *stack_pages[UIO_FASTIOV], **pages = stack_pages;
struct bio_vec stack_vec[UIO_FASTIOV], *bvec = stack_vec;
+   iov_iter_extraction_t extraction_flags = 0;
size_t offset, bytes = iter->count;
unsigned int nr_bvecs;
int ret, nr_vecs;
@@ -286,7 +287,12 @@ int bio_integrity_map_user(struct bio *bio, struct 
iov_iter *iter)
}
 
copy = !iov_iter_is_aligned(iter, align, align);
-   ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs, 0, &offset);
+
+   if (blk_queue_pci_p2pdma(q))
+   extraction_flags |= ITER_ALLOW_P2PDMA;
+
+   ret = iov_iter_extract_pages(iter, &pages, bytes, nr_vecs,
+   extraction_flags, &offset);
if (unlikely(ret < 0))
goto free_bvec;
 
--



Re: [PATCH v4 15/16] block-dma: properly take MMIO path

2025-08-29 Thread Keith Busch
On Thu, Aug 28, 2025 at 08:45:42PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 28, 2025 at 02:54:35PM -0600, Keith Busch wrote:
> 
> > In truth though, I hadn't tried p2p metadata before today, and it looks
> > like bio_integrity_map_user() is missing the P2P extraction flags to
> > make that work. Just added this patch below, now I can set p2p or host
> > memory independently for data and integrity payloads:
> 
> I think it is a bit more than that, you have to make sure all the meta
> data is the same, either all p2p or all cpu and then record this
> somehow so the DMA mapping knows what kind it is.

Sure, I can get all that added in for the real patch.
 
> Once that is all done then the above should still be OK, the dma unmap
> of the data can follow Leon's new flag and the dma unmap of the
> integrity can follow however integrity kept track (in the
> bio_integrity_payload perhaps?) ??

We have available bits in the bio_integrity_payload bip_flags, so that
sounds doable. I think we'll need to rearrange some things so we can
reuse the important code for data and metadata mapping/unmapping, but
doesn't look too bad. I'll get started on that.



Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API

2025-09-20 Thread Keith Busch
On Sat, Sep 20, 2025 at 06:53:52PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 19, 2025 at 10:08:21AM -0600, Keith Busch wrote:
> > On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> > > On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > > > >
> > > > > This series does the core code and modern flows. A followup series
> > > > > will give the same treatment to the legacy dma_ops implementation.
> > > > 
> > > > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if 
> > > > it 
> > > > works fine in linux-next.
> > > 
> > > Thanks a lot.
> > 
> > Just fyi, when dma debug is enabled, we're seeing this new warning
> > below. I have not had a chance to look into it yet, so I'm just
> > reporting the observation.
> 
> Did you apply all patches or only Marek's branch?
> I don't get this warning when I run my NVMe tests on current dmabuf-vfio 
> branch.

This was the snapshot of linux-next from the 20250918 tag. It doesn't
have the full patchset applied.

One other thing to note, this was runing on arm64 platform using smmu
configured with 64k pages. If your iommu granule is 4k instead, we
wouldn't use the blk_dma_map_direct path.



Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API

2025-09-19 Thread Keith Busch
On Fri, Sep 12, 2025 at 12:03:27PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 12, 2025 at 12:25:38AM +0200, Marek Szyprowski wrote:
> > >
> > > This series does the core code and modern flows. A followup series
> > > will give the same treatment to the legacy dma_ops implementation.
> > 
> > Applied patches 1-13 into dma-mapping-for-next branch. Let's check if it 
> > works fine in linux-next.
> 
> Thanks a lot.

Just fyi, when dma debug is enabled, we're seeing this new warning
below. I have not had a chance to look into it yet, so I'm just
reporting the observation.

 DMA-API: nvme 0006:01:00.0: cacheline tracking EEXIST, overlapping mappings 
aren't supported
 WARNING: kernel/dma/debug.c:598 at add_dma_entry+0x26c/0x328, CPU#1: 
(udev-worker)/773
 Modules linked in: acpi_power_meter(E) loop(E) efivarfs(E) autofs4(E)
 CPU: 1 UID: 0 PID: 773 Comm: (udev-worker) Tainted: GEN  
6.17.0-rc6-next-20250918-debug #6 PREEMPT(none)
 Tainted: [E]=UNSIGNED_MODULE, [N]=TEST
 pstate: 6349 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
 pc : add_dma_entry+0x26c/0x328
 lr : add_dma_entry+0x26c/0x328
 sp : 80009fe0f460
 x29: 80009fe0f470 x28: 0001 x27: 0001
 x26: 8000835d7f38 x25: 8000835d7000 x24: 8000835d7e60
 x23:  x22: 06e2cc00 x21: 
 x20: 800082e8f218 x19: a908ff80 x18: 
 x17: 8000801972a0 x16: 800080197054 x15: 
 x14:  x13: 0004 x12: 00020006
 x11: 30e4ef9f x10: 800083443358 x9 : 80008019499c
 x8 : fffe x7 : 800083443358 x6 : 
 x5 : 000bfff4 x4 :  x3 : bb005ac0
 x2 :  x1 :  x0 : bb005ac0
 Call trace:
  add_dma_entry+0x26c/0x328 (P)
  debug_dma_map_phys+0xc4/0xf0
  dma_map_phys+0xe0/0x410
  dma_map_page_attrs+0x94/0xf8
  blk_dma_map_direct.isra.0+0x64/0xb8
  blk_rq_dma_map_iter_next+0x6c/0xc8
  nvme_prep_rq+0x894/0xa98
  nvme_queue_rqs+0xb0/0x1a0
  blk_mq_dispatch_queue_requests+0x268/0x3b8
  blk_mq_flush_plug_list+0x90/0x188
  __blk_flush_plug+0x104/0x170
  blk_finish_plug+0x38/0x50
  read_pages+0x1a4/0x3b8
  page_cache_ra_unbounded+0x1a0/0x400
  force_page_cache_ra+0xa8/0xd8
  page_cache_sync_ra+0xa0/0x3f8
  filemap_get_pages+0x104/0x950
  filemap_read+0xf4/0x498
  blkdev_read_iter+0x88/0x180
  vfs_read+0x214/0x310
  ksys_read+0x70/0x110
  __arm64_sys_read+0x20/0x30
  invoke_syscall+0x4c/0x118
  el0_svc_common.constprop.0+0xc4/0xf0
  do_el0_svc+0x24/0x38
  el0_svc+0x1a0/0x340
  el0t_64_sync_handler+0x98/0xe0
  el0t_64_sync+0x17c/0x180
 ---[ end trace  ]---




Re: [PATCH v6 00/16] dma-mapping: migrate to physical address-based API

2025-09-23 Thread Keith Busch
On Tue, Sep 23, 2025 at 02:09:36PM -0300, Jason Gunthorpe wrote:
> On Sat, Sep 20, 2025 at 06:47:27PM -0600, Keith Busch wrote:
> > 
> > One other thing to note, this was runing on arm64 platform using smmu
> > configured with 64k pages. If your iommu granule is 4k instead, we
> > wouldn't use the blk_dma_map_direct path.
> 
> I spent some time looking to see if I could guess what this is and
> came up empty. It seems most likely we are leaking a dma mapping
> tracking somehow? The DMA API side is pretty simple here though..

Yeah, nothing stood out to me here either.
 
> Not sure the 64k/4k itself is a cause, but triggering the non-iova
> flow is probably the issue.
> 
> Can you check the output of this debugfs:

I don't have a system in this state at the moment, so we checked
previous logs on machines running older kernels. It's extermely
uncommon, but this error was happening prior to this series, so I don't
think this introduced any new problem here. I'll keeping looking, but I
don't think we'll make much progress if I can't find a more reliable
reproducer.

Thanks!