Re: [RFC PATCH net-next v7 04/14] netdev: support binding dma-buf to netdevice

2024-03-28 Thread Simon Horman
On Thu, Mar 28, 2024 at 11:55:23AM -0700, Mina Almasry wrote:
> On Thu, Mar 28, 2024 at 11:28 AM Simon Horman  wrote:
> >
> > On Tue, Mar 26, 2024 at 03:50:35PM -0700, Mina Almasry wrote:
> > > Add a netdev_dmabuf_binding struct which represents the
> > > dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
> > > rx queues on the netdevice. On the binding, the dma_buf_attach
> > > & dma_buf_map_attachment will occur. The entries in the sg_table from
> > > mapping will be inserted into a genpool to make it ready
> > > for allocation.
> > >
> > > The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
> > > holds the dma-buf offset of the base of the chunk and the dma_addr of
> > > the chunk. Both are needed to use allocations that come from this chunk.
> > >
> > > We create a new type that represents an allocation from the genpool:
> > > net_iov. We setup the net_iov allocation size in the
> > > genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
> > > allocated by the page pool and given to the drivers.
> > >
> > > The user can unbind the dmabuf from the netdevice by closing the netlink
> > > socket that established the binding. We do this so that the binding is
> > > automatically unbound even if the userspace process crashes.
> > >
> > > The binding and unbinding leaves an indicator in struct netdev_rx_queue
> > > that the given queue is bound, but the binding doesn't take effect until
> > > the driver actually reconfigures its queues, and re-initializes its page
> > > pool.
> > >
> > > The netdev_dmabuf_binding struct is refcounted, and releases its
> > > resources only when all the refs are released.
> > >
> > > Signed-off-by: Willem de Bruijn 
> > > Signed-off-by: Kaiyuan Zhang 
> > > Signed-off-by: Mina Almasry 
> >
> > ...
> >
> > > +int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > > + struct net_devmem_dmabuf_binding 
> > > *binding)
> > > +{
> > > + struct netdev_rx_queue *rxq;
> > > + u32 xa_idx;
> > > + int err;
> > > +
> > > + if (rxq_idx >= dev->num_rx_queues)
> > > + return -ERANGE;
> > > +
> > > + rxq = __netif_get_rx_queue(dev, rxq_idx);
> > > + if (rxq->mp_params.mp_priv)
> > > + return -EEXIST;
> > > +
> > > + err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b,
> > > +GFP_KERNEL);
> > > + if (err)
> > > + return err;
> > > +
> > > + /* We hold the rtnl_lock while binding/unbinding dma-buf, so we 
> > > can't
> > > +  * race with another thread that is also modifying this value. 
> > > However,
> > > +  * the driver may read this config while it's creating its * 
> > > rx-queues.
> > > +  * WRITE_ONCE() here to match the READ_ONCE() in the driver.
> > > +  */
> > > + WRITE_ONCE(rxq->mp_params.mp_ops, &dmabuf_devmem_ops);
> >
> > Hi Mina,
> >
> > This causes a build failure because mabuf_devmem_ops is not added until a
> > subsequent patch in this series.
> >
> 
> My apologies. I do notice the failure in patchwork now. I'll do a
> patch by patch build for the next iteration.

Thanks, much appreciated.

> > > + WRITE_ONCE(rxq->mp_params.mp_priv, binding);
> > > +
> > > + err = net_devmem_restart_rx_queue(dev, rxq_idx);
> > > + if (err)
> > > + goto err_xa_erase;
> > > +
> > > + return 0;
> > > +
> > > +err_xa_erase:
> > > + WRITE_ONCE(rxq->mp_params.mp_ops, NULL);
> > > + WRITE_ONCE(rxq->mp_params.mp_priv, NULL);
> > > + xa_erase(&binding->bound_rxq_list, xa_idx);
> > > +
> > > + return err;
> > > +}
> >
> > ...
> 
> 
> 
> -- 
> Thanks,
> Mina
> 


Re: [RFC PATCH net-next v7 04/14] netdev: support binding dma-buf to netdevice

2024-03-28 Thread Mina Almasry
On Thu, Mar 28, 2024 at 11:28 AM Simon Horman  wrote:
>
> On Tue, Mar 26, 2024 at 03:50:35PM -0700, Mina Almasry wrote:
> > Add a netdev_dmabuf_binding struct which represents the
> > dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
> > rx queues on the netdevice. On the binding, the dma_buf_attach
> > & dma_buf_map_attachment will occur. The entries in the sg_table from
> > mapping will be inserted into a genpool to make it ready
> > for allocation.
> >
> > The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
> > holds the dma-buf offset of the base of the chunk and the dma_addr of
> > the chunk. Both are needed to use allocations that come from this chunk.
> >
> > We create a new type that represents an allocation from the genpool:
> > net_iov. We setup the net_iov allocation size in the
> > genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
> > allocated by the page pool and given to the drivers.
> >
> > The user can unbind the dmabuf from the netdevice by closing the netlink
> > socket that established the binding. We do this so that the binding is
> > automatically unbound even if the userspace process crashes.
> >
> > The binding and unbinding leaves an indicator in struct netdev_rx_queue
> > that the given queue is bound, but the binding doesn't take effect until
> > the driver actually reconfigures its queues, and re-initializes its page
> > pool.
> >
> > The netdev_dmabuf_binding struct is refcounted, and releases its
> > resources only when all the refs are released.
> >
> > Signed-off-by: Willem de Bruijn 
> > Signed-off-by: Kaiyuan Zhang 
> > Signed-off-by: Mina Almasry 
>
> ...
>
> > +int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> > + struct net_devmem_dmabuf_binding *binding)
> > +{
> > + struct netdev_rx_queue *rxq;
> > + u32 xa_idx;
> > + int err;
> > +
> > + if (rxq_idx >= dev->num_rx_queues)
> > + return -ERANGE;
> > +
> > + rxq = __netif_get_rx_queue(dev, rxq_idx);
> > + if (rxq->mp_params.mp_priv)
> > + return -EEXIST;
> > +
> > + err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b,
> > +GFP_KERNEL);
> > + if (err)
> > + return err;
> > +
> > + /* We hold the rtnl_lock while binding/unbinding dma-buf, so we can't
> > +  * race with another thread that is also modifying this value. 
> > However,
> > +  * the driver may read this config while it's creating its * 
> > rx-queues.
> > +  * WRITE_ONCE() here to match the READ_ONCE() in the driver.
> > +  */
> > + WRITE_ONCE(rxq->mp_params.mp_ops, &dmabuf_devmem_ops);
>
> Hi Mina,
>
> This causes a build failure because mabuf_devmem_ops is not added until a
> subsequent patch in this series.
>

My apologies. I do notice the failure in patchwork now. I'll do a
patch by patch build for the next iteration.

> > + WRITE_ONCE(rxq->mp_params.mp_priv, binding);
> > +
> > + err = net_devmem_restart_rx_queue(dev, rxq_idx);
> > + if (err)
> > + goto err_xa_erase;
> > +
> > + return 0;
> > +
> > +err_xa_erase:
> > + WRITE_ONCE(rxq->mp_params.mp_ops, NULL);
> > + WRITE_ONCE(rxq->mp_params.mp_priv, NULL);
> > + xa_erase(&binding->bound_rxq_list, xa_idx);
> > +
> > + return err;
> > +}
>
> ...



-- 
Thanks,
Mina


Re: [RFC PATCH net-next v7 04/14] netdev: support binding dma-buf to netdevice

2024-03-28 Thread Simon Horman
On Tue, Mar 26, 2024 at 03:50:35PM -0700, Mina Almasry wrote:
> Add a netdev_dmabuf_binding struct which represents the
> dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
> rx queues on the netdevice. On the binding, the dma_buf_attach
> & dma_buf_map_attachment will occur. The entries in the sg_table from
> mapping will be inserted into a genpool to make it ready
> for allocation.
> 
> The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
> holds the dma-buf offset of the base of the chunk and the dma_addr of
> the chunk. Both are needed to use allocations that come from this chunk.
> 
> We create a new type that represents an allocation from the genpool:
> net_iov. We setup the net_iov allocation size in the
> genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
> allocated by the page pool and given to the drivers.
> 
> The user can unbind the dmabuf from the netdevice by closing the netlink
> socket that established the binding. We do this so that the binding is
> automatically unbound even if the userspace process crashes.
> 
> The binding and unbinding leaves an indicator in struct netdev_rx_queue
> that the given queue is bound, but the binding doesn't take effect until
> the driver actually reconfigures its queues, and re-initializes its page
> pool.
> 
> The netdev_dmabuf_binding struct is refcounted, and releases its
> resources only when all the refs are released.
> 
> Signed-off-by: Willem de Bruijn 
> Signed-off-by: Kaiyuan Zhang 
> Signed-off-by: Mina Almasry 

...

> +int net_devmem_bind_dmabuf_to_queue(struct net_device *dev, u32 rxq_idx,
> + struct net_devmem_dmabuf_binding *binding)
> +{
> + struct netdev_rx_queue *rxq;
> + u32 xa_idx;
> + int err;
> +
> + if (rxq_idx >= dev->num_rx_queues)
> + return -ERANGE;
> +
> + rxq = __netif_get_rx_queue(dev, rxq_idx);
> + if (rxq->mp_params.mp_priv)
> + return -EEXIST;
> +
> + err = xa_alloc(&binding->bound_rxq_list, &xa_idx, rxq, xa_limit_32b,
> +GFP_KERNEL);
> + if (err)
> + return err;
> +
> + /* We hold the rtnl_lock while binding/unbinding dma-buf, so we can't
> +  * race with another thread that is also modifying this value. However,
> +  * the driver may read this config while it's creating its * rx-queues.
> +  * WRITE_ONCE() here to match the READ_ONCE() in the driver.
> +  */
> + WRITE_ONCE(rxq->mp_params.mp_ops, &dmabuf_devmem_ops);

Hi Mina,

This causes a build failure because mabuf_devmem_ops is not added until a
subsequent patch in this series.

> + WRITE_ONCE(rxq->mp_params.mp_priv, binding);
> +
> + err = net_devmem_restart_rx_queue(dev, rxq_idx);
> + if (err)
> + goto err_xa_erase;
> +
> + return 0;
> +
> +err_xa_erase:
> + WRITE_ONCE(rxq->mp_params.mp_ops, NULL);
> + WRITE_ONCE(rxq->mp_params.mp_priv, NULL);
> + xa_erase(&binding->bound_rxq_list, xa_idx);
> +
> + return err;
> +}

...


[RFC PATCH net-next v7 04/14] netdev: support binding dma-buf to netdevice

2024-03-26 Thread Mina Almasry
Add a netdev_dmabuf_binding struct which represents the
dma-buf-to-netdevice binding. The netlink API will bind the dma-buf to
rx queues on the netdevice. On the binding, the dma_buf_attach
& dma_buf_map_attachment will occur. The entries in the sg_table from
mapping will be inserted into a genpool to make it ready
for allocation.

The chunks in the genpool are owned by a dmabuf_chunk_owner struct which
holds the dma-buf offset of the base of the chunk and the dma_addr of
the chunk. Both are needed to use allocations that come from this chunk.

We create a new type that represents an allocation from the genpool:
net_iov. We setup the net_iov allocation size in the
genpool to PAGE_SIZE for simplicity: to match the PAGE_SIZE normally
allocated by the page pool and given to the drivers.

The user can unbind the dmabuf from the netdevice by closing the netlink
socket that established the binding. We do this so that the binding is
automatically unbound even if the userspace process crashes.

The binding and unbinding leaves an indicator in struct netdev_rx_queue
that the given queue is bound, but the binding doesn't take effect until
the driver actually reconfigures its queues, and re-initializes its page
pool.

The netdev_dmabuf_binding struct is refcounted, and releases its
resources only when all the refs are released.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Kaiyuan Zhang 
Signed-off-by: Mina Almasry 

---

v7:
- Use IS_ERR() instead of IS_ERR_OR_NULL() for the dma_buf_get() return
  value.
- Changes netdev_* naming in devmem.c to net_devmem_* (Yunsheng).
- DMA_BIDIRECTIONAL -> DMA_FROM_DEVICE (Yunsheng).
- Added a comment around recovering of the old rx queue in
  net_devmem_restart_rx_queue(), and added freeing of old_mem if the
  restart of the old queue fails. (Yunsheng).
- Use kernel-family sock-priv (Jakub).
- Put pp_memory_provider_params in netdev_rx_queue instead of the
  dma-buf specific binding (Pavel & David).
- Move queue management ops to queue_mgmt_ops instead of netdev_ops
  (Jakub).
- Remove excess whitespaces (Jakub).
- Use genlmsg_iput (Jakub).

v6:
- Validate rx queue index
- Refactor new functions into devmem.c (Pavel)

v5:
- Renamed page_pool_iov to net_iov, and moved that support to devmem.h
  or netmem.h.

v1:

- Introduce devmem.h instead of bloating netdevice.h (Jakub)
- ENOTSUPP -> EOPNOTSUPP (checkpatch.pl I think)
- Remove unneeded rcu protection for binding->list (rtnl protected)
- Removed extraneous err_binding_put: label.
- Removed dma_addr += len (Paolo).
- Don't override err on netdev_bind_dmabuf_to_queue failure.
- Rename devmem -> dmabuf (David).
- Add id to dmabuf binding (David/Stan).
- Fix missing xa_destroy bound_rq_list.
- Use queue api to reset bound RX queues (Jakub).
- Update netlink API for rx-queue type (tx/re) (Jakub).

RFC v3:
- Support multi rx-queue binding

---
 Documentation/netlink/specs/netdev.yaml |   4 +
 include/net/devmem.h| 111 +
 include/net/netdev_rx_queue.h   |   2 +
 include/net/netmem.h|  10 +
 include/net/page_pool/types.h   |   5 +
 net/core/Makefile   |   2 +-
 net/core/dev.c  |   3 +
 net/core/devmem.c   | 304 
 net/core/netdev-genl-gen.c  |   4 +
 net/core/netdev-genl-gen.h  |   4 +
 net/core/netdev-genl.c  | 105 +++-
 11 files changed, 551 insertions(+), 3 deletions(-)
 create mode 100644 include/net/devmem.h
 create mode 100644 net/core/devmem.c

diff --git a/Documentation/netlink/specs/netdev.yaml 
b/Documentation/netlink/specs/netdev.yaml
index 275d1faa87a6..bf4e58dfe9dd 100644
--- a/Documentation/netlink/specs/netdev.yaml
+++ b/Documentation/netlink/specs/netdev.yaml
@@ -550,6 +550,10 @@ operations:
 - tx-packets
 - tx-bytes
 
+kernel-family:
+  headers: [ "linux/list.h"]
+  sock-priv: struct list_head
+
 mcast-groups:
   list:
 -
diff --git a/include/net/devmem.h b/include/net/devmem.h
new file mode 100644
index ..fa03bdabdffd
--- /dev/null
+++ b/include/net/devmem.h
@@ -0,0 +1,111 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Device memory TCP support
+ *
+ * Authors:Mina Almasry 
+ * Willem de Bruijn 
+ * Kaiyuan Zhang 
+ *
+ */
+#ifndef _NET_DEVMEM_H
+#define _NET_DEVMEM_H
+
+struct net_devmem_dmabuf_binding {
+   struct dma_buf *dmabuf;
+   struct dma_buf_attachment *attachment;
+   struct sg_table *sgt;
+   struct net_device *dev;
+   struct gen_pool *chunk_pool;
+
+   /* The user holds a ref (via the netlink API) for as long as they want
+* the binding to remain alive. Each page pool using this binding holds
+* a ref to keep the binding alive. Each allocated net_iov holds a
+* ref.
+*
+* The binding undos itself and unmaps the underlying dmabuf once all
+* those refs are dropp