RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
> -Original Message- > From: Jason Gunthorpe > Sent: Friday, October 16, 2020 6:05 PM > To: Xiong, Jianxin > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford > ; Leon Romanovsky > ; Sumit Semwal ; Christian Koenig > ; Vetter, Daniel > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user > memory region > > On Sat, Oct 17, 2020 at 12:57:21AM +, Xiong, Jianxin wrote: > > > From: Jason Gunthorpe > > > Sent: Friday, October 16, 2020 5:28 PM > > > To: Xiong, Jianxin > > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; > > > Doug Ledford ; Leon Romanovsky > > > ; Sumit Semwal ; Christian > > > Koenig ; Vetter, Daniel > > > > > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as > > > user memory region > > > > > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > > > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, > > > > + unsigned long addr, size_t size, > > > > + int dmabuf_fd, int access, > > > > + const struct ib_umem_dmabuf_ops > > > > *ops) { > > > > + struct dma_buf *dmabuf; > > > > + struct ib_umem_dmabuf *umem_dmabuf; > > > > + struct ib_umem *umem; > > > > + unsigned long end; > > > > + long ret; > > > > + > > > > + if (check_add_overflow(addr, (unsigned long)size, )) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE)) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > + if (unlikely(!ops || !ops->invalidate || !ops->update)) > > > > + return ERR_PTR(-EINVAL); > > > > + > > > > + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL); > > > > + if (!umem_dmabuf) > > > > + return ERR_PTR(-ENOMEM); > > > > + > > > > + umem_dmabuf->ops = ops; > > > > + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work); > > > > + > > > > + umem = _dmabuf->umem; > > > > + umem->ibdev = device; > > > > + umem->length = size; > > > > + umem->address = addr; > > > > > > addr here is offset within the dma buf, but this code does nothing with > > > it. > > > > > The current code assumes 0 offset, and 'addr' is the nominal starting > > address of the buffer. If this is to be changed to offset, then yes, > > some more handling is needed as you mentioned below. > > There is no such thing as 'nominal starting address' > > If the user is to provide any argument it can only be offset and length. > > > > Also, dma_buf_map_attachment() does not do the correct dma mapping > > > for RDMA, eg it does not use ib_dma_map(). This is not a problem for > > > mlx5 but it is troublesome to put in the core code. > > > > ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() > > for dma_buf_map_attachment(). They belong to the same family, but take > > different address type (kernel address vs MMIO physical address). > > Could you elaborate what the problem could be for non-mlx5 HCAs? > > They use the virtual dma ops which we intend to remove We can have a check with the dma device before attaching the dma-buf and thus ib_umem_dmabuf_get() call from such drivers would fail. Something like: #ifdef CONFIG_DMA_VIRT_OPS if (device->dma_device->dma_ops == _virt_ops) return ERR_PTR(-EINVAL); #endif > > Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
On Sat, Oct 17, 2020 at 9:05 PM Jason Gunthorpe wrote: > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > > > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach) > > +{ > > + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; > > + > > + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); > > + > > + ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true); > > + queue_work(ib_wq, _dmabuf->work); > > Do we really want to queue remapping or should it wait until there is > a page fault? > > What do GPUs do? Atm no gpu drivers in upstream that use buffer-based memory management and support page faults in the hw. So we have to pull the entire thing in anyway and use the dma_fence stuff to track what's busy. For faulting hardware I'd wait until the first page fault and then map in the entire range again (you get the entire thing anyway). Since the move_notify happened because the buffer is moving, you'll end up stalling anyway. Plus if you prefault right away you need some thrashing limiter to not do that when you get immediate move_notify again. As a first thing I'd do the same thing you do for mmu notifier ranges, since it's kinda similarish. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
On Sat, Oct 17, 2020 at 12:57:21AM +, Xiong, Jianxin wrote: > > From: Jason Gunthorpe > > Sent: Friday, October 16, 2020 5:28 PM > > To: Xiong, Jianxin > > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug > > Ledford ; Leon Romanovsky > > ; Sumit Semwal ; Christian Koenig > > ; Vetter, Daniel > > > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user > > memory region > > > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, > > > +unsigned long addr, size_t size, > > > +int dmabuf_fd, int access, > > > +const struct ib_umem_dmabuf_ops *ops) { > > > + struct dma_buf *dmabuf; > > > + struct ib_umem_dmabuf *umem_dmabuf; > > > + struct ib_umem *umem; > > > + unsigned long end; > > > + long ret; > > > + > > > + if (check_add_overflow(addr, (unsigned long)size, )) > > > + return ERR_PTR(-EINVAL); > > > + > > > + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE)) > > > + return ERR_PTR(-EINVAL); > > > + > > > + if (unlikely(!ops || !ops->invalidate || !ops->update)) > > > + return ERR_PTR(-EINVAL); > > > + > > > + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL); > > > + if (!umem_dmabuf) > > > + return ERR_PTR(-ENOMEM); > > > + > > > + umem_dmabuf->ops = ops; > > > + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work); > > > + > > > + umem = _dmabuf->umem; > > > + umem->ibdev = device; > > > + umem->length = size; > > > + umem->address = addr; > > > > addr here is offset within the dma buf, but this code does nothing with it. > > > The current code assumes 0 offset, and 'addr' is the nominal starting address > of the > buffer. If this is to be changed to offset, then yes, some more handling is > needed > as you mentioned below. There is no such thing as 'nominal starting address' If the user is to provide any argument it can only be offset and length. > > Also, dma_buf_map_attachment() does not do the correct dma mapping > > for RDMA, eg it does not use ib_dma_map(). This is not a problem > > for mlx5 but it is troublesome to put in the core code. > > ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() for > dma_buf_map_attachment(). They belong to the same family, but take different > address type (kernel address vs MMIO physical address). Could you elaborate > what > the problem could be for non-mlx5 HCAs? They use the virtual dma ops which we intend to remove Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment *attach) > +{ > + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; > + > + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); > + > + ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true); > + queue_work(ib_wq, _dmabuf->work); Do we really want to queue remapping or should it wait until there is a page fault? What do GPUs do? Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, > +unsigned long addr, size_t size, > +int dmabuf_fd, int access, > +const struct ib_umem_dmabuf_ops *ops) > +{ > + struct dma_buf *dmabuf; > + struct ib_umem_dmabuf *umem_dmabuf; > + struct ib_umem *umem; > + unsigned long end; > + long ret; > + > + if (check_add_overflow(addr, (unsigned long)size, )) > + return ERR_PTR(-EINVAL); > + > + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE)) > + return ERR_PTR(-EINVAL); > + > + if (unlikely(!ops || !ops->invalidate || !ops->update)) > + return ERR_PTR(-EINVAL); > + > + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL); > + if (!umem_dmabuf) > + return ERR_PTR(-ENOMEM); > + > + umem_dmabuf->ops = ops; > + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work); > + > + umem = _dmabuf->umem; > + umem->ibdev = device; > + umem->length = size; > + umem->address = addr; addr here is offset within the dma buf, but this code does nothing with it. dma_buf_map_attachment gives a complete SGL for the entire DMA buf, but offset/length select a subset. You need to edit the sgls to make them properly span the sub-range and follow the peculiar rules for how SGLs in ib_umem's have to be constructed. Who validates that the total dma length of the SGL is exactly equal to length? That is really important too. Also, dma_buf_map_attachment() does not do the correct dma mapping for RDMA, eg it does not use ib_dma_map(). This is not a problem for mlx5 but it is troublesome to put in the core code. Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
> -Original Message- > From: Jason Gunthorpe > Sent: Friday, October 16, 2020 5:28 PM > To: Xiong, Jianxin > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford > ; Leon Romanovsky > ; Sumit Semwal ; Christian Koenig > ; Vetter, Daniel > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user > memory region > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > > +struct ib_umem *ib_umem_dmabuf_get(struct ib_device *device, > > + unsigned long addr, size_t size, > > + int dmabuf_fd, int access, > > + const struct ib_umem_dmabuf_ops *ops) { > > + struct dma_buf *dmabuf; > > + struct ib_umem_dmabuf *umem_dmabuf; > > + struct ib_umem *umem; > > + unsigned long end; > > + long ret; > > + > > + if (check_add_overflow(addr, (unsigned long)size, )) > > + return ERR_PTR(-EINVAL); > > + > > + if (unlikely(PAGE_ALIGN(end) < PAGE_SIZE)) > > + return ERR_PTR(-EINVAL); > > + > > + if (unlikely(!ops || !ops->invalidate || !ops->update)) > > + return ERR_PTR(-EINVAL); > > + > > + umem_dmabuf = kzalloc(sizeof(*umem_dmabuf), GFP_KERNEL); > > + if (!umem_dmabuf) > > + return ERR_PTR(-ENOMEM); > > + > > + umem_dmabuf->ops = ops; > > + INIT_WORK(_dmabuf->work, ib_umem_dmabuf_work); > > + > > + umem = _dmabuf->umem; > > + umem->ibdev = device; > > + umem->length = size; > > + umem->address = addr; > > addr here is offset within the dma buf, but this code does nothing with it. > The current code assumes 0 offset, and 'addr' is the nominal starting address of the buffer. If this is to be changed to offset, then yes, some more handling is needed as you mentioned below. > dma_buf_map_attachment gives a complete SGL for the entire DMA buf, but > offset/length select a subset. > > You need to edit the sgls to make them properly span the sub-range and follow > the peculiar rules for how SGLs in ib_umem's have to be > constructed. > > Who validates that the total dma length of the SGL is exactly equal to > length? That is really important too. > > Also, dma_buf_map_attachment() does not do the correct dma mapping for RDMA, > eg it does not use ib_dma_map(). This is not a problem > for mlx5 but it is troublesome to put in the core code. ib_dma_map() uses dma_map_single(), GPU drivers use dma_map_resource() for dma_buf_map_attachment(). They belong to the same family, but take different address type (kernel address vs MMIO physical address). Could you elaborate what the problem could be for non-mlx5 HCAs? > > Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
RE: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user memory region
> -Original Message- > From: Jason Gunthorpe > Sent: Friday, October 16, 2020 12:00 PM > To: Xiong, Jianxin > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug Ledford > ; Leon Romanovsky > ; Sumit Semwal ; Christian Koenig > ; Vetter, Daniel > > Subject: Re: [PATCH v5 1/5] RDMA/umem: Support importing dma-buf as user > memory region > > On Thu, Oct 15, 2020 at 03:02:45PM -0700, Jianxin Xiong wrote: > > > +static void ib_umem_dmabuf_invalidate_cb(struct dma_buf_attachment > > +*attach) { > > + struct ib_umem_dmabuf *umem_dmabuf = attach->importer_priv; > > + > > + dma_resv_assert_held(umem_dmabuf->attach->dmabuf->resv); > > + > > + ib_umem_dmabuf_unmap_pages(_dmabuf->umem, true); > > + queue_work(ib_wq, _dmabuf->work); > > Do we really want to queue remapping or should it wait until there is a page > fault? Queuing remapping here has performance advantage because it reduces the chance of getting the page fault. > > What do GPUs do? > > Jason ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel