RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-18 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 6:05 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Sat, Oct 17, 2020 at 12:57:21AM +, Xiong, Jianxin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Friday, October 16, 2020 5:28 PM
> > > To: Xiong, Jianxin 
> > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org;
> > > Doug Ledford ; Leon Romanovsky
> > > ; Sumit Semwal ; Christian
> > > Koenig ; Vetter, Daniel
> > > 
> > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as
> > > user memory region
> > >
> > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> > > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > > > +  unsigned long addr, size_t size,
> > > > +  int dmabuf_fd, int access,
> > > > +  const struct ib_umem_dmabuf_ops 
> > > > *ops) {
> > > > +   struct dma_buf *dmabuf;
> > > > +   struct ib_umem_dmabuf *umem_dmabuf;
> > > > +   struct ib_umem *umem;
> > > > +   unsigned long end;
> > > > +   long ret;
> > > > +
> > > > +   if (check_add_overflow(addr, (unsigned long)size, ))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   if (unlikely(!ops || !ops->invalidate || !ops->update))
> > > > +   return ERR_PTR(-EINVAL);
> > > > +
> > > > +   umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > > > +   if (!umem_dmabuf)
> > > > +   return ERR_PTR(-ENOMEM);
> > > > +
> > > > +   umem_dmabuf->ops = ops;
> > > > +   INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work);
> > > > +
> > > > +   umem = _dmabuf->umem;
> > > > +   umem->ibdev = device;
> > > > +   umem->length = size;
> > > > +   umem->address = addr;
> > >
> > > addr here is offset within the dma buf, but this code does nothing with 
> > > it.
> > >
> > The current code assumes 0 offset, and 'addr' is the nominal starting
> > address of the buffer. If this is to be changed to offset, then yes,
> > some more handling is needed as you mentioned below.
> 
> There is no such thing as 'nominal starting address'
> 
> If the user is to provide any argument it can only be offset and length.
> 
> > > Also, dma_buf_map_attachment() does not do the correct dma mapping
> > > for RDMA, eg it does not use ib_dma_map(). This is not a problem for
> > > mlx5 but it is troublesome to put in the core code.
> >
> > ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource()
> > for dma_buf_map_attachment(). They belong to the same family, but take
> > different address type (kernel address vs MMIO physical address).
> > Could you elaborate what the problem could be for non-mlx5 HCAs?
> 
> They use the virtual dma ops which we intend to remove

We can have a check with the dma device before attaching the dma-buf and thus 
ib_umem_dmabuf_get() call from such drivers would fail. Something like:

#ifdef CONFIG_DMA_VIRT_OPS
if (device->dma_device->dma_ops == _virt_ops)
return ERR_PTR(-EINVAL);
#endif
 
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-18 Thread Daniel Vetter
On Sat, Oct 17, 2020 at 9:05 PM Jason Gunthorpe  wrote:
>
> On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
>
> > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
> > +{
> > + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > +
> > + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > + ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true);
> > + queue_work(ib_wq, _dmabuf->work);
>
> Do we really want to queue remapping or should it wait until there is
> a page fault?
>
> What do GPUs do?

Atm no gpu drivers in upstream that use buffer-based memory management
and support page faults in the hw. So we have to pull the entire thing
in anyway and use the dma_fence stuff to track what's busy.

For faulting hardware I'd wait until the first page fault and then map
in the entire range again (you get the entire thing anyway). Since the
move_notify happened because the buffer is moving, you'll end up
stalling anyway. Plus if you prefault right away you need some
thrashing limiter to not do that when you get immediate move_notify
again. As a first thing I'd do the same thing you do for mmu notifier
ranges, since it's kinda similarish.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-17 Thread Jason Gunthorpe
On Sat, Oct 17, 2020 at 12:57:21AM +, Xiong, Jianxin wrote:
> > From: Jason Gunthorpe 
> > Sent: Friday, October 16, 2020 5:28 PM
> > To: Xiong, Jianxin 
> > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug 
> > Ledford ; Leon Romanovsky
> > ; Sumit Semwal ; Christian Koenig 
> > ; Vetter, Daniel
> > 
> > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> > memory region
> > 
> > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > > +unsigned long addr, size_t size,
> > > +int dmabuf_fd, int access,
> > > +const struct ib_umem_dmabuf_ops *ops) {
> > > + struct dma_buf *dmabuf;
> > > + struct ib_umem_dmabuf *umem_dmabuf;
> > > + struct ib_umem *umem;
> > > + unsigned long end;
> > > + long ret;
> > > +
> > > + if (check_add_overflow(addr, (unsigned long)size, ))
> > > + return ERR_PTR(-EINVAL);
> > > +
> > > + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > > + return ERR_PTR(-EINVAL);
> > > +
> > > + if (unlikely(!ops || !ops->invalidate || !ops->update))
> > > + return ERR_PTR(-EINVAL);
> > > +
> > > + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > > + if (!umem_dmabuf)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + umem_dmabuf->ops = ops;
> > > + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work);
> > > +
> > > + umem = _dmabuf->umem;
> > > + umem->ibdev = device;
> > > + umem->length = size;
> > > + umem->address = addr;
> > 
> > addr here is offset within the dma buf, but this code does nothing with it.
> > 
> The current code assumes 0 offset, and 'addr' is the nominal starting address 
> of the
> buffer. If this is to be changed to offset, then yes, some more handling is 
> needed
> as you mentioned below.

There is no such thing as 'nominal starting address'

If the user is to provide any argument it can only be offset and length.

> > Also, dma_buf_map_attachment() does not do the correct dma mapping
> > for RDMA, eg it does not use ib_dma_map(). This is not a problem
> > for mlx5 but it is troublesome to put in the core code.
> 
> ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() for
> dma_buf_map_attachment(). They belong to the same family, but take different
> address type (kernel address vs MMIO physical address). Could you elaborate 
> what
> the problem could be for non-mlx5 HCAs?

They use the virtual dma ops which we intend to remove

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-17 Thread Jason Gunthorpe
On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:

> +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach)
> +{
> + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> +
> + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> +
> + ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true);
> + queue_work(ib_wq, _dmabuf->work);

Do we really want to queue remapping or should it wait until there is
a page fault?

What do GPUs do?

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-17 Thread Jason Gunthorpe
On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> +unsigned long addr, size_t size,
> +int dmabuf_fd, int access,
> +const struct ib_umem_dmabuf_ops *ops)
> +{
> + struct dma_buf *dmabuf;
> + struct ib_umem_dmabuf *umem_dmabuf;
> + struct ib_umem *umem;
> + unsigned long end;
> + long ret;
> +
> + if (check_add_overflow(addr, (unsigned long)size, ))
> + return ERR_PTR(-EINVAL);
> +
> + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> + return ERR_PTR(-EINVAL);
> +
> + if (unlikely(!ops || !ops->invalidate || !ops->update))
> + return ERR_PTR(-EINVAL);
> +
> + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> + if (!umem_dmabuf)
> + return ERR_PTR(-ENOMEM);
> +
> + umem_dmabuf->ops = ops;
> + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work);
> +
> + umem = _dmabuf->umem;
> + umem->ibdev = device;
> + umem->length = size;
> + umem->address = addr;

addr here is offset within the dma buf, but this code does nothing
with it.

dma_buf_map_attachment gives a complete SGL for the entire DMA buf,
but offset/length select a subset.

You need to edit the sgls to make them properly span the sub-range and
follow the peculiar rules for how SGLs in ib_umem's have to be
constructed.

Who validates that the total dma length of the SGL is exactly equal to
length? That is really important too.

Also, dma_buf_map_attachment() does not do the correct dma mapping for
RDMA, eg it does not use ib_dma_map(). This is not a problem for mlx5
but it is troublesome to put in the core code.

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 5:28 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device,
> > +  unsigned long addr, size_t size,
> > +  int dmabuf_fd, int access,
> > +  const struct ib_umem_dmabuf_ops *ops) {
> > +   struct dma_buf *dmabuf;
> > +   struct ib_umem_dmabuf *umem_dmabuf;
> > +   struct ib_umem *umem;
> > +   unsigned long end;
> > +   long ret;
> > +
> > +   if (check_add_overflow(addr, (unsigned long)size, ))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   if (unlikely(!ops || !ops->invalidate || !ops->update))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL);
> > +   if (!umem_dmabuf)
> > +   return ERR_PTR(-ENOMEM);
> > +
> > +   umem_dmabuf->ops = ops;
> > +   INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work);
> > +
> > +   umem = _dmabuf->umem;
> > +   umem->ibdev = device;
> > +   umem->length = size;
> > +   umem->address = addr;
> 
> addr here is offset within the dma buf, but this code does nothing with it.
> 
The current code assumes 0 offset, and 'addr' is the nominal starting address 
of the
buffer. If this is to be changed to offset, then yes, some more handling is 
needed
as you mentioned below.

> dma_buf_map_attachment gives a complete SGL for the entire DMA buf, but 
> offset/length select a subset.
> 
> You need to edit the sgls to make them properly span the sub-range and follow 
> the peculiar rules for how SGLs in ib_umem's have to be
> constructed.
> 
> Who validates that the total dma length of the SGL is exactly equal to 
> length? That is really important too.
> 
> Also, dma_buf_map_attachment() does not do the correct dma mapping for RDMA, 
> eg it does not use ib_dma_map(). This is not a problem
> for mlx5 but it is troublesome to put in the core code.

ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() for
dma_buf_map_attachment(). They belong to the same family, but take different
address type (kernel address vs MMIO physical address). Could you elaborate what
the problem could be for non-mlx5 HCAs?
 
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region

2020-10-16 Thread Xiong, Jianxin
> -Original Message-
> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 12:00 PM
> To: Xiong, Jianxin 
> Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford 
> ; Leon Romanovsky
> ; Sumit Semwal ; Christian Koenig 
> ; Vetter, Daniel
> 
> Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user 
> memory region
> 
> On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote:
> 
> > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment
> > +*attach) {
> > +   struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv;
> > +
> > +   dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv);
> > +
> > +   ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true);
> > +   queue_work(ib_wq, _dmabuf->work);
> 
> Do we really want to queue remapping or should it wait until there is a page 
> fault?

Queuing remapping here has performance advantage because it reduces the chance
of getting the page fault.

> 
> What do GPUs do?
> 
> Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel