Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-29 Thread Jason Wang
On Mon, Jan 29, 2024 at 7:40 PM wangyunjian  wrote:
>
> > -Original Message-
> > From: Jason Wang [mailto:jasow...@redhat.com]
> > Sent: Monday, January 29, 2024 11:03 AM
> > To: wangyunjian 
> > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> > linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> > virtualizat...@lists.linux.dev; xudingke 
> > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> >
> > On Thu, Jan 25, 2024 at 8:54 PM wangyunjian 
> > wrote:
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > > Sent: Thursday, January 25, 2024 12:49 PM
> > > > To: wangyunjian 
> > > > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com;
> > > > k...@kernel.org; da...@davemloft.net; magnus.karls...@intel.com;
> > > > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > > > 
> > > > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> > > >
> > > > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> > > > 
> > > > wrote:
> > > > >
> > > > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > > > drivers, which can reduce CPU utilization on the xdp program.
> > > > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > > > >
> > > > > This patch tries to address this by:
> > > > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe 
> > > > > empty.
> > > > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > > > - add tun_put_user_desc function to copy the Rx data to VM
> > > >
> > > > Code explains themselves, let's explain why you need to do this.
> > > >
> > > > 1) why you want to use peek_len
> > > > 2) for "vq's array", what does it mean?
> > > > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
> > > > guess you meant TX zerocopy instead of RX (as I don't see codes for
> > > > RX?)
> > >
> > > OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX
> > > zerocopy from the view of vhost-net.
> >
> > Ok.
> >
> > >
> > > >
> > > > A big question is how could you handle GSO packets from
> > userspace/guests?
> > >
> > > Now by disabling VM's TSO and csum feature.
> >
> > Btw, how could you do that?
>
> By set network backend-specific options:
> 
>  mrg_rxbuf='off'/>
> 
> 

This is the mgmt work, but the problem is what happens if GSO is not
disabled in the guest, or is there a way to:

1) forcing the guest GSO to be off
2) a graceful fallback

Thanks

>
> Thanks
>
> >
> > Thanks
> >
>




Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-29 Thread Jason Wang
On Mon, Jan 29, 2024 at 7:10 PM wangyunjian  wrote:
>
> > -Original Message-
> > From: Jason Wang [mailto:jasow...@redhat.com]
> > Sent: Monday, January 29, 2024 11:05 AM
> > To: wangyunjian 
> > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> > linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> > virtualizat...@lists.linux.dev; xudingke 
> > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> >
> > On Sat, Jan 27, 2024 at 5:34 PM wangyunjian 
> > wrote:
> > >
> > > > > -Original Message-
> > > > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > > > Sent: Thursday, January 25, 2024 12:49 PM
> > > > > To: wangyunjian 
> > > > > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com;
> > > > > k...@kernel.org; da...@davemloft.net; magnus.karls...@intel.com;
> > > > > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > > > > 
> > > > > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> > > > >
> > > > > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> > > > 
> > > > > wrote:
> > > > > >
> > > > > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > > > > drivers, which can reduce CPU utilization on the xdp program.
> > > > > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > > > > >
> > > > > > This patch tries to address this by:
> > > > > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > > > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe
> > empty.
> > > > > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > > > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > > > > - add tun_put_user_desc function to copy the Rx data to VM
> > > > >
> > > > > Code explains themselves, let's explain why you need to do this.
> > > > >
> > > > > 1) why you want to use peek_len
> > > > > 2) for "vq's array", what does it mean?
> > > > > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so
> > > > > I guess you meant TX zerocopy instead of RX (as I don't see codes
> > > > > for
> > > > > RX?)
> > > >
> > > > OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX
> > > > zerocopy from the view of vhost-net.
> > > >
> > > > >
> > > > > A big question is how could you handle GSO packets from
> > userspace/guests?
> > > >
> > > > Now by disabling VM's TSO and csum feature. XDP does not support GSO
> > > > packets.
> > > > However, this feature can be added once XDP supports it in the future.
> > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Yunjian Wang 
> > > > > > ---
> > > > > >  drivers/net/tun.c   | 165
> > > > > +++-
> > > > > >  drivers/vhost/net.c |  18 +++--
> > > > > >  2 files changed, 176 insertions(+), 7 deletions(-)
> > >
> > > [...]
> > >
> > > > > >
> > > > > >  static int peek_head_len(struct vhost_net_virtqueue *rvq,
> > > > > > struct sock
> > > > > > *sk)  {
> > > > > > +   struct socket *sock = sk->sk_socket;
> > > > > > struct sk_buff *head;
> > > > > > int len = 0;
> > > > > > unsigned long flags;
> > > > > >
> > > > > > -   if (rvq->rx_ring)
> > > > > > -   return vhost_net_buf_peek(rvq);
> > > > > > +   if (rvq->rx_ring) {
> > > > > > +   len = vhost_net_buf_peek(rvq);
> > > > > > +   if (likely(len))
> > > > > > +   return len;
> > > > > > +   }
> > > > > > +
> > > > > > +   if (sock->ops->peek_len)
> > > > > > +   return sock->ops->peek_len(sock);
> > > > >
> > > > > What prevents you from reusing the ptr_ring here? Then you don't
> > > > > need the above tricks.
> > > >
> > > > Thank you for your suggestion. I will consider how to reuse the 
> > > > ptr_ring.
> > >
> > > If ptr_ring is used to transfer xdp_descs, there is a problem: After
> > > some xdp_descs are obtained through xsk_tx_peek_desc(), the descs may
> > > fail to be added to ptr_ring. However, no API is available to
> > > implement the rollback function.
> >
> > I don't understand, this issue seems to exist in the physical NIC as well?
> >
> > We get more descriptors than the free slots in the NIC ring.
> >
> > How did other NIC solve this issue?
>
> Currently, physical NICs such as i40e, ice, ixgbe, igc, and mlx5 obtains
> available NIC descriptors and then retrieve the same number of xsk
> descriptors for processing.

Any reason we can't do the same? ptr_ring should be much simpler than
NIC ring anyhow.

Thanks

>
> Thanks
>
> >
> > Thanks
> >
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > >
> > > > > > spin_lock_irqsave(>sk_receive_queue.lock, flags);
> > > > > > head = skb_peek(>sk_receive_queue);
> > > > > > --
> > > > > > 2.33.0
> > > > > >
> > >
>




RE: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-29 Thread wangyunjian
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, January 29, 2024 11:03 AM
> To: wangyunjian 
> Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> virtualizat...@lists.linux.dev; xudingke 
> Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> 
> On Thu, Jan 25, 2024 at 8:54 PM wangyunjian 
> wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > Sent: Thursday, January 25, 2024 12:49 PM
> > > To: wangyunjian 
> > > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com;
> > > k...@kernel.org; da...@davemloft.net; magnus.karls...@intel.com;
> > > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > > 
> > > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> > >
> > > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> > > 
> > > wrote:
> > > >
> > > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > > drivers, which can reduce CPU utilization on the xdp program.
> > > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > > >
> > > > This patch tries to address this by:
> > > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > > - add tun_put_user_desc function to copy the Rx data to VM
> > >
> > > Code explains themselves, let's explain why you need to do this.
> > >
> > > 1) why you want to use peek_len
> > > 2) for "vq's array", what does it mean?
> > > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
> > > guess you meant TX zerocopy instead of RX (as I don't see codes for
> > > RX?)
> >
> > OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX
> > zerocopy from the view of vhost-net.
> 
> Ok.
> 
> >
> > >
> > > A big question is how could you handle GSO packets from
> userspace/guests?
> >
> > Now by disabling VM's TSO and csum feature.
> 
> Btw, how could you do that?

By set network backend-specific options:





Thanks

> 
> Thanks
> 



RE: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-29 Thread wangyunjian
> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Monday, January 29, 2024 11:05 AM
> To: wangyunjian 
> Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> virtualizat...@lists.linux.dev; xudingke 
> Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> 
> On Sat, Jan 27, 2024 at 5:34 PM wangyunjian 
> wrote:
> >
> > > > -Original Message-
> > > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > > Sent: Thursday, January 25, 2024 12:49 PM
> > > > To: wangyunjian 
> > > > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com;
> > > > k...@kernel.org; da...@davemloft.net; magnus.karls...@intel.com;
> > > > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > > > 
> > > > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> > > >
> > > > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> > > 
> > > > wrote:
> > > > >
> > > > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > > > drivers, which can reduce CPU utilization on the xdp program.
> > > > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > > > >
> > > > > This patch tries to address this by:
> > > > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe
> empty.
> > > > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > > > - add tun_put_user_desc function to copy the Rx data to VM
> > > >
> > > > Code explains themselves, let's explain why you need to do this.
> > > >
> > > > 1) why you want to use peek_len
> > > > 2) for "vq's array", what does it mean?
> > > > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so
> > > > I guess you meant TX zerocopy instead of RX (as I don't see codes
> > > > for
> > > > RX?)
> > >
> > > OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX
> > > zerocopy from the view of vhost-net.
> > >
> > > >
> > > > A big question is how could you handle GSO packets from
> userspace/guests?
> > >
> > > Now by disabling VM's TSO and csum feature. XDP does not support GSO
> > > packets.
> > > However, this feature can be added once XDP supports it in the future.
> > >
> > > >
> > > > >
> > > > > Signed-off-by: Yunjian Wang 
> > > > > ---
> > > > >  drivers/net/tun.c   | 165
> > > > +++-
> > > > >  drivers/vhost/net.c |  18 +++--
> > > > >  2 files changed, 176 insertions(+), 7 deletions(-)
> >
> > [...]
> >
> > > > >
> > > > >  static int peek_head_len(struct vhost_net_virtqueue *rvq,
> > > > > struct sock
> > > > > *sk)  {
> > > > > +   struct socket *sock = sk->sk_socket;
> > > > > struct sk_buff *head;
> > > > > int len = 0;
> > > > > unsigned long flags;
> > > > >
> > > > > -   if (rvq->rx_ring)
> > > > > -   return vhost_net_buf_peek(rvq);
> > > > > +   if (rvq->rx_ring) {
> > > > > +   len = vhost_net_buf_peek(rvq);
> > > > > +   if (likely(len))
> > > > > +   return len;
> > > > > +   }
> > > > > +
> > > > > +   if (sock->ops->peek_len)
> > > > > +   return sock->ops->peek_len(sock);
> > > >
> > > > What prevents you from reusing the ptr_ring here? Then you don't
> > > > need the above tricks.
> > >
> > > Thank you for your suggestion. I will consider how to reuse the ptr_ring.
> >
> > If ptr_ring is used to transfer xdp_descs, there is a problem: After
> > some xdp_descs are obtained through xsk_tx_peek_desc(), the descs may
> > fail to be added to ptr_ring. However, no API is available to
> > implement the rollback function.
> 
> I don't understand, this issue seems to exist in the physical NIC as well?
> 
> We get more descriptors than the free slots in the NIC ring.
> 
> How did other NIC solve this issue?

Currently, physical NICs such as i40e, ice, ixgbe, igc, and mlx5 obtains
available NIC descriptors and then retrieve the same number of xsk
descriptors for processing.

Thanks

> 
> Thanks
> 
> >
> > Thanks
> >
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > >
> > > > > spin_lock_irqsave(>sk_receive_queue.lock, flags);
> > > > > head = skb_peek(>sk_receive_queue);
> > > > > --
> > > > > 2.33.0
> > > > >
> >



Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-28 Thread Jason Wang
On Sat, Jan 27, 2024 at 5:34 PM wangyunjian  wrote:
>
> > > -Original Message-
> > > From: Jason Wang [mailto:jasow...@redhat.com]
> > > Sent: Thursday, January 25, 2024 12:49 PM
> > > To: wangyunjian 
> > > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > > da...@davemloft.net; magnus.karls...@intel.com;
> > > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > > 
> > > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> > >
> > > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> > 
> > > wrote:
> > > >
> > > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > > drivers, which can reduce CPU utilization on the xdp program.
> > > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > > >
> > > > This patch tries to address this by:
> > > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > > - add tun_put_user_desc function to copy the Rx data to VM
> > >
> > > Code explains themselves, let's explain why you need to do this.
> > >
> > > 1) why you want to use peek_len
> > > 2) for "vq's array", what does it mean?
> > > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
> > > guess you meant TX zerocopy instead of RX (as I don't see codes for
> > > RX?)
> >
> > OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX zerocopy
> > from the view of vhost-net.
> >
> > >
> > > A big question is how could you handle GSO packets from userspace/guests?
> >
> > Now by disabling VM's TSO and csum feature. XDP does not support GSO
> > packets.
> > However, this feature can be added once XDP supports it in the future.
> >
> > >
> > > >
> > > > Signed-off-by: Yunjian Wang 
> > > > ---
> > > >  drivers/net/tun.c   | 165
> > > +++-
> > > >  drivers/vhost/net.c |  18 +++--
> > > >  2 files changed, 176 insertions(+), 7 deletions(-)
>
> [...]
>
> > > >
> > > >  static int peek_head_len(struct vhost_net_virtqueue *rvq, struct
> > > > sock
> > > > *sk)  {
> > > > +   struct socket *sock = sk->sk_socket;
> > > > struct sk_buff *head;
> > > > int len = 0;
> > > > unsigned long flags;
> > > >
> > > > -   if (rvq->rx_ring)
> > > > -   return vhost_net_buf_peek(rvq);
> > > > +   if (rvq->rx_ring) {
> > > > +   len = vhost_net_buf_peek(rvq);
> > > > +   if (likely(len))
> > > > +   return len;
> > > > +   }
> > > > +
> > > > +   if (sock->ops->peek_len)
> > > > +   return sock->ops->peek_len(sock);
> > >
> > > What prevents you from reusing the ptr_ring here? Then you don't need
> > > the above tricks.
> >
> > Thank you for your suggestion. I will consider how to reuse the ptr_ring.
>
> If ptr_ring is used to transfer xdp_descs, there is a problem: After some
> xdp_descs are obtained through xsk_tx_peek_desc(), the descs may fail
> to be added to ptr_ring. However, no API is available to implement the
> rollback function.

I don't understand, this issue seems to exist in the physical NIC as well?

We get more descriptors than the free slots in the NIC ring.

How did other NIC solve this issue?

Thanks

>
> Thanks
>
> >
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > spin_lock_irqsave(>sk_receive_queue.lock, flags);
> > > > head = skb_peek(>sk_receive_queue);
> > > > --
> > > > 2.33.0
> > > >
>




Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-28 Thread Jason Wang
On Thu, Jan 25, 2024 at 8:54 PM wangyunjian  wrote:
>
>
>
> > -Original Message-
> > From: Jason Wang [mailto:jasow...@redhat.com]
> > Sent: Thursday, January 25, 2024 12:49 PM
> > To: wangyunjian 
> > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> > linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> > virtualizat...@lists.linux.dev; xudingke 
> > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> >
> > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang 
> > wrote:
> > >
> > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > drivers, which can reduce CPU utilization on the xdp program.
> > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > >
> > > This patch tries to address this by:
> > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > - add tun_put_user_desc function to copy the Rx data to VM
> >
> > Code explains themselves, let's explain why you need to do this.
> >
> > 1) why you want to use peek_len
> > 2) for "vq's array", what does it mean?
> > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I guess 
> > you
> > meant TX zerocopy instead of RX (as I don't see codes for
> > RX?)
>
> OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX zerocopy
> from the view of vhost-net.

Ok.

>
> >
> > A big question is how could you handle GSO packets from userspace/guests?
>
> Now by disabling VM's TSO and csum feature.

Btw, how could you do that?

Thanks




Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-28 Thread kernel test robot
Hi Yunjian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Yunjian-Wang/xsk-Remove-non-zero-dma_page-check-in-xp_assign_dev/20240124-174011
base:   net-next/main
patch link:
https://lore.kernel.org/r/1706089075-16084-1-git-send-email-wangyunjian%40huawei.com
patch subject: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
config: i386-randconfig-062-20240127 
(https://download.01.org/0day-ci/archive/20240128/202401281735.bx1dign9-...@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240128/202401281735.bx1dign9-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401281735.bx1dign9-...@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/net/tun.c:1298:5: sparse: sparse: symbol 'tun_xsk_pool_setup' was 
>> not declared. Should it be static?
   drivers/net/tun.c: note: in included file (through include/linux/mmzone.h, 
include/linux/gfp.h, include/linux/umh.h, include/linux/kmod.h, ...):
   include/linux/page-flags.h:242:46: sparse: sparse: self-comparison always 
evaluates to false

vim +/tun_xsk_pool_setup +1298 drivers/net/tun.c

  1297  
> 1298  int tun_xsk_pool_setup(struct net_device *dev, struct xsk_buff_pool 
> *pool,
  1299 u16 qid)
  1300  {
  1301  return pool ? tun_xsk_pool_enable(dev, pool, qid) :
  1302  tun_xsk_pool_disable(dev, qid);
  1303  }
  1304  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-28 Thread kernel test robot
Hi Yunjian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Yunjian-Wang/xsk-Remove-non-zero-dma_page-check-in-xp_assign_dev/20240124-174011
base:   net-next/main
patch link:
https://lore.kernel.org/r/1706089075-16084-1-git-send-email-wangyunjian%40huawei.com
patch subject: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
config: s390-defconfig 
(https://download.01.org/0day-ci/archive/20240128/202401281639.ybajz4sh-...@intel.com/config)
compiler: s390-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240128/202401281639.ybajz4sh-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401281639.ybajz4sh-...@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/net/tun.c:1298:5: warning: no previous prototype for 
>> 'tun_xsk_pool_setup' [-Wmissing-prototypes]
1298 | int tun_xsk_pool_setup(struct net_device *dev, struct xsk_buff_pool 
*pool,
 | ^~


vim +/tun_xsk_pool_setup +1298 drivers/net/tun.c

  1297  
> 1298  int tun_xsk_pool_setup(struct net_device *dev, struct xsk_buff_pool 
> *pool,
  1299 u16 qid)
  1300  {
  1301  return pool ? tun_xsk_pool_enable(dev, pool, qid) :
  1302  tun_xsk_pool_disable(dev, qid);
  1303  }
  1304  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-27 Thread kernel test robot
Hi Yunjian,

kernel test robot noticed the following build warnings:

[auto build test WARNING on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Yunjian-Wang/xsk-Remove-non-zero-dma_page-check-in-xp_assign_dev/20240124-174011
base:   net-next/main
patch link:
https://lore.kernel.org/r/1706089075-16084-1-git-send-email-wangyunjian%40huawei.com
patch subject: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
config: um-allmodconfig 
(https://download.01.org/0day-ci/archive/20240127/202401271943.bs1gpeeu-...@intel.com/config)
compiler: clang version 18.0.0git (https://github.com/llvm/llvm-project 
a31a60074717fc40887cfe132b77eec93bedd307)
reproduce (this is a W=1 build): 
(https://download.01.org/0day-ci/archive/20240127/202401271943.bs1gpeeu-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202401271943.bs1gpeeu-...@intel.com/

All warnings (new ones prefixed by >>):

   In file included from drivers/net/tun.c:44:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 547 | val = __raw_readb(PCI_IOBASE + addr);
 |   ~~ ^
   include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 560 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + 
addr));
 | ~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from 
macro '__le16_to_cpu'
  37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
 |   ^
   In file included from drivers/net/tun.c:44:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 573 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + 
addr));
 | ~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from 
macro '__le32_to_cpu'
  35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
 |   ^
   In file included from drivers/net/tun.c:44:
   In file included from include/linux/skbuff.h:17:
   In file included from include/linux/bvec.h:10:
   In file included from include/linux/highmem.h:12:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 584 | __raw_writeb(value, PCI_IOBASE + addr);
 | ~~ ^
   include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 594 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + 
addr);
 |   ~~ ^
   include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a 
null pointer has undefined behavior [-Wnull-pointer-arithmetic]
 604 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + 
addr);
 |   ~~ ^
   include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a 
null pointer has undefined behavior 

RE: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-27 Thread wangyunjian
> > -Original Message-
> > From: Jason Wang [mailto:jasow...@redhat.com]
> > Sent: Thursday, January 25, 2024 12:49 PM
> > To: wangyunjian 
> > Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> > da...@davemloft.net; magnus.karls...@intel.com;
> > net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> > k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> > 
> > Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> >
> > On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang
> 
> > wrote:
> > >
> > > Now the zero-copy feature of AF_XDP socket is supported by some
> > > drivers, which can reduce CPU utilization on the xdp program.
> > > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> > >
> > > This patch tries to address this by:
> > > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > > So add a check for empty vq's array in vhost_net_buf_produce().
> > > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > > - add tun_put_user_desc function to copy the Rx data to VM
> >
> > Code explains themselves, let's explain why you need to do this.
> >
> > 1) why you want to use peek_len
> > 2) for "vq's array", what does it mean?
> > 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
> > guess you meant TX zerocopy instead of RX (as I don't see codes for
> > RX?)
> 
> OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX zerocopy
> from the view of vhost-net.
> 
> >
> > A big question is how could you handle GSO packets from userspace/guests?
> 
> Now by disabling VM's TSO and csum feature. XDP does not support GSO
> packets.
> However, this feature can be added once XDP supports it in the future.
> 
> >
> > >
> > > Signed-off-by: Yunjian Wang 
> > > ---
> > >  drivers/net/tun.c   | 165
> > +++-
> > >  drivers/vhost/net.c |  18 +++--
> > >  2 files changed, 176 insertions(+), 7 deletions(-)

[...]

> > >
> > >  static int peek_head_len(struct vhost_net_virtqueue *rvq, struct
> > > sock
> > > *sk)  {
> > > +   struct socket *sock = sk->sk_socket;
> > > struct sk_buff *head;
> > > int len = 0;
> > > unsigned long flags;
> > >
> > > -   if (rvq->rx_ring)
> > > -   return vhost_net_buf_peek(rvq);
> > > +   if (rvq->rx_ring) {
> > > +   len = vhost_net_buf_peek(rvq);
> > > +   if (likely(len))
> > > +   return len;
> > > +   }
> > > +
> > > +   if (sock->ops->peek_len)
> > > +   return sock->ops->peek_len(sock);
> >
> > What prevents you from reusing the ptr_ring here? Then you don't need
> > the above tricks.
> 
> Thank you for your suggestion. I will consider how to reuse the ptr_ring.

If ptr_ring is used to transfer xdp_descs, there is a problem: After some
xdp_descs are obtained through xsk_tx_peek_desc(), the descs may fail
to be added to ptr_ring. However, no API is available to implement the
rollback function.

Thanks

> 
> >
> > Thanks
> >
> >
> > >
> > > spin_lock_irqsave(>sk_receive_queue.lock, flags);
> > > head = skb_peek(>sk_receive_queue);
> > > --
> > > 2.33.0
> > >



RE: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-25 Thread wangyunjian


> -Original Message-
> From: Jason Wang [mailto:jasow...@redhat.com]
> Sent: Thursday, January 25, 2024 12:49 PM
> To: wangyunjian 
> Cc: m...@redhat.com; willemdebruijn.ker...@gmail.com; k...@kernel.org;
> da...@davemloft.net; magnus.karls...@intel.com; net...@vger.kernel.org;
> linux-kernel@vger.kernel.org; k...@vger.kernel.org;
> virtualizat...@lists.linux.dev; xudingke 
> Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> 
> On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang 
> wrote:
> >
> > Now the zero-copy feature of AF_XDP socket is supported by some
> > drivers, which can reduce CPU utilization on the xdp program.
> > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> >
> > This patch tries to address this by:
> > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > So add a check for empty vq's array in vhost_net_buf_produce().
> > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > - add tun_put_user_desc function to copy the Rx data to VM
> 
> Code explains themselves, let's explain why you need to do this.
> 
> 1) why you want to use peek_len
> 2) for "vq's array", what does it mean?
> 3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I guess you
> meant TX zerocopy instead of RX (as I don't see codes for
> RX?)

OK, I agree and use TX zerocopy instead of RX zerocopy. I meant RX zerocopy
from the view of vhost-net. 

> 
> A big question is how could you handle GSO packets from userspace/guests?

Now by disabling VM's TSO and csum feature. XDP does not support GSO packets.
However, this feature can be added once XDP supports it in the future. 

> 
> >
> > Signed-off-by: Yunjian Wang 
> > ---
> >  drivers/net/tun.c   | 165
> +++-
> >  drivers/vhost/net.c |  18 +++--
> >  2 files changed, 176 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c index
> > afa5497f7c35..248b0f8e07d1 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -77,6 +77,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -145,6 +146,10 @@ struct tun_file {
> > struct tun_struct *detached;
> > struct ptr_ring tx_ring;
> > struct xdp_rxq_info xdp_rxq;
> > +   struct xdp_desc desc;
> > +   /* protects xsk pool */
> > +   spinlock_t pool_lock;
> > +   struct xsk_buff_pool *pool;
> >  };
> >
> >  struct tun_page {
> > @@ -208,6 +213,8 @@ struct tun_struct {
> > struct bpf_prog __rcu *xdp_prog;
> > struct tun_prog __rcu *steering_prog;
> > struct tun_prog __rcu *filter_prog;
> > +   /* tracks AF_XDP ZC enabled queues */
> > +   unsigned long *af_xdp_zc_qps;
> > struct ethtool_link_ksettings link_ksettings;
> > /* init args */
> > struct file *file;
> > @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun,
> > struct file *file,
> >
> > tfile->queue_index = tun->numqueues;
> > tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> > +   tfile->desc.len = 0;
> > +   tfile->pool = NULL;
> >
> > if (tfile->detached) {
> > /* Re-attach detached tfile, updating XDP queue_index
> > */ @@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
> > return err;
> > }
> >
> > +   tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES,
> GFP_KERNEL);
> > +   if (!tun->af_xdp_zc_qps) {
> > +   security_tun_dev_free_security(tun->security);
> > +   free_percpu(dev->tstats);
> > +   return -ENOMEM;
> > +   }
> > +
> > tun_flow_init(tun);
> >
> > dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | @@
> -1009,6
> > +1025,7 @@ static int tun_net_init(struct net_device *dev)
> > tun_flow_uninit(tun);
> > security_tun_dev_free_security(tun->security);
> > free_percpu(dev->tstats);
> > +   bitmap_free(tun->af_xdp_zc_qps);
> > return err;
> > }
> > return 0;
> > @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev,
> struct bpf_prog *prog,
> > return 0;
>

RE: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-25 Thread wangyunjian
> -Original Message-
> From: Willem de Bruijn [mailto:willemdebruijn.ker...@gmail.com]
> Sent: Thursday, January 25, 2024 3:05 AM
> To: wangyunjian ; m...@redhat.com;
> willemdebruijn.ker...@gmail.com; jasow...@redhat.com; k...@kernel.org;
> da...@davemloft.net; magnus.karls...@intel.com
> Cc: net...@vger.kernel.org; linux-kernel@vger.kernel.org;
> k...@vger.kernel.org; virtualizat...@lists.linux.dev; xudingke
> ; wangyunjian 
> Subject: Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support
> 
> Yunjian Wang wrote:
> > Now the zero-copy feature of AF_XDP socket is supported by some
> > drivers, which can reduce CPU utilization on the xdp program.
> > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> >
> > This patch tries to address this by:
> > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > So add a check for empty vq's array in vhost_net_buf_produce().
> > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > - add tun_put_user_desc function to copy the Rx data to VM
> >
> > Signed-off-by: Yunjian Wang 
> 
> I don't fully understand the higher level design of this feature yet.

This feature can reduce the memory copy for the xdp program. 

> 
> But some initial comments at the code level.
> 
> > ---
> >  drivers/net/tun.c   | 165
> +++-
> >  drivers/vhost/net.c |  18 +++--
> >  2 files changed, 176 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c index
> > afa5497f7c35..248b0f8e07d1 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -77,6 +77,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >
> >  #include 
> >  #include 
> > @@ -145,6 +146,10 @@ struct tun_file {
> > struct tun_struct *detached;
> > struct ptr_ring tx_ring;
> > struct xdp_rxq_info xdp_rxq;
> > +   struct xdp_desc desc;
> > +   /* protects xsk pool */
> > +   spinlock_t pool_lock;
> > +   struct xsk_buff_pool *pool;
> >  };
> >
> >  struct tun_page {
> > @@ -208,6 +213,8 @@ struct tun_struct {
> > struct bpf_prog __rcu *xdp_prog;
> > struct tun_prog __rcu *steering_prog;
> > struct tun_prog __rcu *filter_prog;
> > +   /* tracks AF_XDP ZC enabled queues */
> > +   unsigned long *af_xdp_zc_qps;
> > struct ethtool_link_ksettings link_ksettings;
> > /* init args */
> > struct file *file;
> > @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun,
> > struct file *file,
> >
> > tfile->queue_index = tun->numqueues;
> > tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> > +   tfile->desc.len = 0;
> > +   tfile->pool = NULL;
> >
> > if (tfile->detached) {
> > /* Re-attach detached tfile, updating XDP queue_index */ @@ 
> > -989,6
> > +998,13 @@ static int tun_net_init(struct net_device *dev)
> > return err;
> > }
> >
> > +   tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
> > +   if (!tun->af_xdp_zc_qps) {
> > +   security_tun_dev_free_security(tun->security);
> > +   free_percpu(dev->tstats);
> > +   return -ENOMEM;
> > +   }
> > +
> > tun_flow_init(tun);
> >
> > dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST | @@ -1009,6
> > +1025,7 @@ static int tun_net_init(struct net_device *dev)
> > tun_flow_uninit(tun);
> > security_tun_dev_free_security(tun->security);
> > free_percpu(dev->tstats);
> > +   bitmap_free(tun->af_xdp_zc_qps);
> 
> Please release state in inverse order of acquire.

Will done.

> 
> > return err;
> > }
> > return 0;
> > @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev,
> struct bpf_prog *prog,
> > return 0;
> >  }
> >
> > +static int tun_xsk_pool_enable(struct net_device *netdev,
> > +  struct xsk_buff_pool *pool,
> > +  u16 qid)
> > +{
> > +   struct tun_struct *tun = netdev_priv(netdev);
> > +   struct tun_file *tfile;
> > +   unsigned long flags;
> > +
> > +   rcu_read_lock();
> > +   tfile = rtnl_dereference(tun->tfiles[qid]);
> > +   if (!tfile) {
> > +   rcu_read_unlock();
> > 

Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Jason Wang
On Wed, Jan 24, 2024 at 5:38 PM Yunjian Wang  wrote:
>
> Now the zero-copy feature of AF_XDP socket is supported by some
> drivers, which can reduce CPU utilization on the xdp program.
> This patch set allows tun to support AF_XDP Rx zero-copy feature.
>
> This patch tries to address this by:
> - Use peek_len to consume a xsk->desc and get xsk->desc length.
> - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> So add a check for empty vq's array in vhost_net_buf_produce().
> - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> - add tun_put_user_desc function to copy the Rx data to VM

Code explains themselves, let's explain why you need to do this.

1) why you want to use peek_len
2) for "vq's array", what does it mean?
3) from the view of TUN/TAP tun_put_user_desc() is the TX path, so I
guess you meant TX zerocopy instead of RX (as I don't see codes for
RX?)

A big question is how could you handle GSO packets from userspace/guests?

>
> Signed-off-by: Yunjian Wang 
> ---
>  drivers/net/tun.c   | 165 +++-
>  drivers/vhost/net.c |  18 +++--
>  2 files changed, 176 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afa5497f7c35..248b0f8e07d1 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -77,6 +77,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include 
>  #include 
> @@ -145,6 +146,10 @@ struct tun_file {
> struct tun_struct *detached;
> struct ptr_ring tx_ring;
> struct xdp_rxq_info xdp_rxq;
> +   struct xdp_desc desc;
> +   /* protects xsk pool */
> +   spinlock_t pool_lock;
> +   struct xsk_buff_pool *pool;
>  };
>
>  struct tun_page {
> @@ -208,6 +213,8 @@ struct tun_struct {
> struct bpf_prog __rcu *xdp_prog;
> struct tun_prog __rcu *steering_prog;
> struct tun_prog __rcu *filter_prog;
> +   /* tracks AF_XDP ZC enabled queues */
> +   unsigned long *af_xdp_zc_qps;
> struct ethtool_link_ksettings link_ksettings;
> /* init args */
> struct file *file;
> @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
> *file,
>
> tfile->queue_index = tun->numqueues;
> tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> +   tfile->desc.len = 0;
> +   tfile->pool = NULL;
>
> if (tfile->detached) {
> /* Re-attach detached tfile, updating XDP queue_index */
> @@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
> return err;
> }
>
> +   tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
> +   if (!tun->af_xdp_zc_qps) {
> +   security_tun_dev_free_security(tun->security);
> +   free_percpu(dev->tstats);
> +   return -ENOMEM;
> +   }
> +
> tun_flow_init(tun);
>
> dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> @@ -1009,6 +1025,7 @@ static int tun_net_init(struct net_device *dev)
> tun_flow_uninit(tun);
> security_tun_dev_free_security(tun->security);
> free_percpu(dev->tstats);
> +   bitmap_free(tun->af_xdp_zc_qps);
> return err;
> }
> return 0;
> @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog,
> return 0;
>  }
>
> +static int tun_xsk_pool_enable(struct net_device *netdev,
> +  struct xsk_buff_pool *pool,
> +  u16 qid)
> +{
> +   struct tun_struct *tun = netdev_priv(netdev);
> +   struct tun_file *tfile;
> +   unsigned long flags;
> +
> +   rcu_read_lock();
> +   tfile = rtnl_dereference(tun->tfiles[qid]);
> +   if (!tfile) {
> +   rcu_read_unlock();
> +   return -ENODEV;
> +   }
> +
> +   spin_lock_irqsave(>pool_lock, flags);
> +   xsk_pool_set_rxq_info(pool, >xdp_rxq);
> +   tfile->pool = pool;
> +   spin_unlock_irqrestore(>pool_lock, flags);
> +
> +   rcu_read_unlock();
> +   set_bit(qid, tun->af_xdp_zc_qps);
> +
> +   return 0;
> +}
> +
> +static int tun_xsk_pool_disable(struct net_device *netdev, u16 qid)
> +{
> +   struct tun_struct *tun = netdev_priv(netdev);
> +   struct tun_file *tfile;
> +   unsigned long flags;
> +
> +   if (!test_bit(qid, tun->af_xdp_zc_qps))
> +   return 0;
> +
> +   clear_bit(qid, tun->af_xdp_zc_qps);
> +
> +   rcu_read_lock();
> +   tfile = rtnl_dereference(tun->tfiles[qid]);
> +   if (!tfile) {
> +   rcu_read_unlock();
> +   return 0;
> +   }
> +
> +   spin_lock_irqsave(>pool_lock, flags);
> +   if (tfile->desc.len) {
> +   xsk_tx_completed(tfile->pool, 1);
> +   tfile->desc.len = 0;
> +   }
> +   tfile->pool = NULL;
> +   spin_unlock_irqrestore(>pool_lock, flags);
> 

Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Jason Wang
On Thu, Jan 25, 2024 at 3:05 AM Willem de Bruijn
 wrote:
>
> Yunjian Wang wrote:
> > Now the zero-copy feature of AF_XDP socket is supported by some
> > drivers, which can reduce CPU utilization on the xdp program.
> > This patch set allows tun to support AF_XDP Rx zero-copy feature.
> >
> > This patch tries to address this by:
> > - Use peek_len to consume a xsk->desc and get xsk->desc length.
> > - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> > So add a check for empty vq's array in vhost_net_buf_produce().
> > - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> > - add tun_put_user_desc function to copy the Rx data to VM
> >
> > Signed-off-by: Yunjian Wang 
>
> I don't fully understand the higher level design of this feature yet.
>
> But some initial comments at the code level.
>
> > ---
> >  drivers/net/tun.c   | 165 +++-
> >  drivers/vhost/net.c |  18 +++--
> >  2 files changed, 176 insertions(+), 7 deletions(-)
> >

[...]

> >  struct tun_page {
> > @@ -208,6 +21
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index f2ed7167c848..a1f143ad2341 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
>
> For virtio maintainer: is it okay to have tun and vhost/net changes in
> the same patch, or is it better to split them?

It's better to split, but as you comment below, if it must be done in
one patch we need to explain why.

>
> > @@ -169,9 +169,10 @@ static int vhost_net_buf_is_empty(struct vhost_net_buf 
> > *rxq)
> >
> >  static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
> >  {
> > - void *ret = vhost_net_buf_get_ptr(rxq);
> > - ++rxq->head;
> > - return ret;
> > + if (rxq->tail == rxq->head)
> > + return NULL;
> > +
> > + return rxq->queue[rxq->head++];
>
> Why this change?

Thanks




Re: [PATCH net-next 2/2] tun: AF_XDP Rx zero-copy support

2024-01-24 Thread Willem de Bruijn
Yunjian Wang wrote:
> Now the zero-copy feature of AF_XDP socket is supported by some
> drivers, which can reduce CPU utilization on the xdp program.
> This patch set allows tun to support AF_XDP Rx zero-copy feature.
> 
> This patch tries to address this by:
> - Use peek_len to consume a xsk->desc and get xsk->desc length.
> - When the tun support AF_XDP Rx zero-copy, the vq's array maybe empty.
> So add a check for empty vq's array in vhost_net_buf_produce().
> - add XDP_SETUP_XSK_POOL and ndo_xsk_wakeup callback support
> - add tun_put_user_desc function to copy the Rx data to VM
> 
> Signed-off-by: Yunjian Wang 

I don't fully understand the higher level design of this feature yet.

But some initial comments at the code level.

> ---
>  drivers/net/tun.c   | 165 +++-
>  drivers/vhost/net.c |  18 +++--
>  2 files changed, 176 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index afa5497f7c35..248b0f8e07d1 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -77,6 +77,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -145,6 +146,10 @@ struct tun_file {
>   struct tun_struct *detached;
>   struct ptr_ring tx_ring;
>   struct xdp_rxq_info xdp_rxq;
> + struct xdp_desc desc;
> + /* protects xsk pool */
> + spinlock_t pool_lock;
> + struct xsk_buff_pool *pool;
>  };
>  
>  struct tun_page {
> @@ -208,6 +213,8 @@ struct tun_struct {
>   struct bpf_prog __rcu *xdp_prog;
>   struct tun_prog __rcu *steering_prog;
>   struct tun_prog __rcu *filter_prog;
> + /* tracks AF_XDP ZC enabled queues */
> + unsigned long *af_xdp_zc_qps;
>   struct ethtool_link_ksettings link_ksettings;
>   /* init args */
>   struct file *file;
> @@ -795,6 +802,8 @@ static int tun_attach(struct tun_struct *tun, struct file 
> *file,
>  
>   tfile->queue_index = tun->numqueues;
>   tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
> + tfile->desc.len = 0;
> + tfile->pool = NULL;
>  
>   if (tfile->detached) {
>   /* Re-attach detached tfile, updating XDP queue_index */
> @@ -989,6 +998,13 @@ static int tun_net_init(struct net_device *dev)
>   return err;
>   }
>  
> + tun->af_xdp_zc_qps = bitmap_zalloc(MAX_TAP_QUEUES, GFP_KERNEL);
> + if (!tun->af_xdp_zc_qps) {
> + security_tun_dev_free_security(tun->security);
> + free_percpu(dev->tstats);
> + return -ENOMEM;
> + }
> +
>   tun_flow_init(tun);
>  
>   dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> @@ -1009,6 +1025,7 @@ static int tun_net_init(struct net_device *dev)
>   tun_flow_uninit(tun);
>   security_tun_dev_free_security(tun->security);
>   free_percpu(dev->tstats);
> + bitmap_free(tun->af_xdp_zc_qps);

Please release state in inverse order of acquire.

>   return err;
>   }
>   return 0;
> @@ -1222,11 +1239,77 @@ static int tun_xdp_set(struct net_device *dev, struct 
> bpf_prog *prog,
>   return 0;
>  }
>  
> +static int tun_xsk_pool_enable(struct net_device *netdev,
> +struct xsk_buff_pool *pool,
> +u16 qid)
> +{
> + struct tun_struct *tun = netdev_priv(netdev);
> + struct tun_file *tfile;
> + unsigned long flags;
> +
> + rcu_read_lock();
> + tfile = rtnl_dereference(tun->tfiles[qid]);
> + if (!tfile) {
> + rcu_read_unlock();
> + return -ENODEV;
> + }

No need for rcu_read_lock with rtnl_dereference.

Consider ASSERT_RTNL() if unsure whether this patch could be reached
without the rtnl held.

> +
> + spin_lock_irqsave(>pool_lock, flags);
> + xsk_pool_set_rxq_info(pool, >xdp_rxq);
> + tfile->pool = pool;
> + spin_unlock_irqrestore(>pool_lock, flags);
> +
> + rcu_read_unlock();
> + set_bit(qid, tun->af_xdp_zc_qps);

What are the concurrency semantics: there's a spinlock to make
the update to xdp_rxq and pool a critical section, but the bitmap
is not part of this? Please also then document why the irqsave.

> +
> + return 0;
> +}
> +
> +static int tun_xsk_pool_disable(struct net_device *netdev, u16 qid)
> +{
> + struct tun_struct *tun = netdev_priv(netdev);
> + struct tun_file *tfile;
> + unsigned long flags;
> +
> + if (!test_bit(qid, tun->af_xdp_zc_qps))
> + return 0;
> +
> + clear_bit(qid, tun->af_xdp_zc_qps);

Time of check to time of use race between test and clear? Or is
there no race because anything that clears will hold the RTNL? If so,
please add a comment.

> +
> + rcu_read_lock();
> + tfile = rtnl_dereference(tun->tfiles[qid]);
> + if (!tfile) {
> + rcu_read_unlock();
> + return 0;
> + }
> +
> + spin_lock_irqsave(>pool_lock, flags);
> + if (tfile->desc.len) {
> +