Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-26 Thread Magnus Karlsson
On Mon, Jan 25, 2021 at 4:22 PM Xuan Zhuo  wrote:
>
> On Mon, 25 Jan 2021 14:16:16 +0100, Magnus Karlsson 
>  wrote:
> > On Mon, Jan 25, 2021 at 1:43 PM Xuan Zhuo  
> > wrote:
> > >
> > > On Mon, 25 Jan 2021 08:44:38 +0100, Magnus Karlsson 
> > >  wrote:
> > > > On Mon, Jan 25, 2021 at 3:27 AM Xuan Zhuo  
> > > > wrote:
> > > > >
> > > > > On Fri, 22 Jan 2021 19:37:06 +0100, Magnus Karlsson 
> > > > >  wrote:
> > > > > > On Fri, Jan 22, 2021 at 6:26 PM Alexander Lobakin  
> > > > > > wrote:
> > > > > > >
> > > > > > > From: Xuan Zhuo 
> > > > > > > Date: Fri, 22 Jan 2021 23:39:15 +0800
> > > > > > >
> > > > > > > > On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
> > > > > > > >  wrote:
> > > > > > > > > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > From: Magnus Karlsson 
> > > > > > > > > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin 
> > > > > > > > > > >  wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > From: Alexander Lobakin 
> > > > > > > > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > > > > > > > >
> > > > > > > > > > > > > From: Eric Dumazet 
> > > > > > > > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > This patch is used to construct skb based on page 
> > > > > > > > > > > > > > > to save memory copy
> > > > > > > > > > > > > > > overhead.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This function is implemented based on 
> > > > > > > > > > > > > > > IFF_TX_SKB_NO_LINEAR. Only the
> > > > > > > > > > > > > > > network card priv_flags supports 
> > > > > > > > > > > > > > > IFF_TX_SKB_NO_LINEAR will use page to
> > > > > > > > > > > > > > > directly construct skb. If this feature is not 
> > > > > > > > > > > > > > > supported, it is still
> > > > > > > > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  Performance Testing 
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > > > > > > > Test cmd:
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Test result data:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > size64  512 10241500
> > > > > > > > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > > > > > > > @@ -430,6 +430,87 @@ static void 
> > > > > > > > > > > > > > > xsk_destruct_skb(struct sk_buff *skb)
> > > > > > > > > > > > > > >   sock_wfree(skb);
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +static struct sk_buff 
> > > > > > > > > > > > > > > *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > > > > > > > > > > > > +   struct 
> > > > > > > > > > > > > > > xdp_desc *desc)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > > > > > > > > + struct sk_buff *skb;
> > > > > > > > > > > > > > > + struct page *page;
> > > > > > > > > > > > > > > + void *buffer;
> > > > > > > > > > > > > > > + int err, i;
> > > > > > > > > > > > > > > + u64 addr;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > > > > > > > > + if (unlikely(!skb))
> > > > > > > > > > > > > > > + return ERR_PTR(err);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > + addr = desc->addr;
> > > > > > > > > > > > > > > + len = desc->len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > + 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-25 Thread Alexander Lobakin
From: Xuan Zhuo 
Date: Mon, 25 Jan 2021 22:57:07 +0800

> On Mon, 25 Jan 2021 13:25:45 +, Alexander Lobakin  wrote:
> > From: Xuan Zhuo 
> > Date: Mon, 25 Jan 2021 11:10:43 +0800
> >
> > > On Fri, 22 Jan 2021 16:24:17 +, Alexander Lobakin  
> > > wrote:
> > > > From: Xuan Zhuo 
> > > > Date: Fri, 22 Jan 2021 23:36:29 +0800
> > > >
> > > > > On Fri, 22 Jan 2021 12:08:00 +, Alexander Lobakin 
> > > > >  wrote:
> > > > > > From: Alexander Lobakin 
> > > > > > Date: Fri, 22 Jan 2021 11:55:35 +
> > > > > >
> > > > > > > From: Alexander Lobakin 
> > > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > > >
> > > > > > > > From: Eric Dumazet 
> > > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > > >
> > > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > > > memory copy
> > > > > > > > > > overhead.
> > > > > > > > > >
> > > > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. 
> > > > > > > > > > Only the
> > > > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will 
> > > > > > > > > > use page to
> > > > > > > > > > directly construct skb. If this feature is not supported, 
> > > > > > > > > > it is still
> > > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > > >
> > > > > > > > > >  Performance Testing 
> > > > > > > > > >
> > > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > > Test cmd:
> > > > > > > > > > ```
> > > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > Test result data:
> > > > > > > > > >
> > > > > > > > > > size64  512 10241500
> > > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > > ---
> > > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > > --
> > > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > > > sk_buff *skb)
> > > > > > > > > > sock_wfree(skb);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct 
> > > > > > > > > > xdp_sock *xs,
> > > > > > > > > > + struct xdp_desc 
> > > > > > > > > > *desc)
> > > > > > > > > > +{
> > > > > > > > > > +   u32 len, offset, copy, copied;
> > > > > > > > > > +   struct sk_buff *skb;
> > > > > > > > > > +   struct page *page;
> > > > > > > > > > +   void *buffer;
> > > > > > > > > > +   int err, i;
> > > > > > > > > > +   u64 addr;
> > > > > > > > > > +
> > > > > > > > > > +   skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > >
> > > > > > Also,
> > > > > > maybe we should allocate it with NET_SKB_PAD so NIC drivers could
> > > > > > use some reserved space?
> > > > > >
> > > > > > skb = sock_alloc_send_skb(>sk, NET_SKB_PAD, 1, 
> > > > > > );
> > > > > > ...
> > > > > > skb_reserve(skb, NET_SKB_PAD);
> > > > > >
> > > > > > Eric, what do you think?
> > > > >
> > > > > I think you are right. Some space should be added to continuous 
> > > > > equipment. This
> > > > > space should also be added in the copy mode below. Is 
> > > > > LL_RESERVED_SPACE more
> > > > > appropriate?
> > > >
> > > > No. If you look at __netdev_alloc_skb() and __napi_alloc_skb(), they
> > > > reserve NET_SKB_PAD at the beginning of linear area. Documentation of
> > > > __build_skb() also says that driver should reserve NET_SKB_PAD before
> > > > the actual frame, so it is a standartized hardware-independent
> > > > headroom.
> > >
> > > I understand that these scenarios are in the case of receiving packets, 
> > > and the
> > > increased space is used by the protocol stack, especially RPS. I don't 
> > > know if
> > > this also applies to the sending scenario?
> > >
> > > > Leaving that space in skb->head will allow developers to implement
> > > > IFF_TX_SKB_NO_LINEAR in a wider variety of drivers, especially when
> > > > a driver has to prepend some sort of data before the actual frame.
> > > > Since it's usually of a size of one cacheline, shouldn't be a big
> > > > deal.
> > > >
> > >
> > > I agree with this. Some network cards require some space. For example,
> > > virtio-net needs to add a virtio_net_hdr_mrg_rxbuf before skb->data, so my
> > > original understanding is used here. When we send the skb 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-25 Thread Alexander Lobakin
From: Xuan Zhuo 
Date: Mon, 25 Jan 2021 11:10:43 +0800

> On Fri, 22 Jan 2021 16:24:17 +, Alexander Lobakin  wrote:
> > From: Xuan Zhuo 
> > Date: Fri, 22 Jan 2021 23:36:29 +0800
> >
> > > On Fri, 22 Jan 2021 12:08:00 +, Alexander Lobakin  
> > > wrote:
> > > > From: Alexander Lobakin 
> > > > Date: Fri, 22 Jan 2021 11:55:35 +
> > > >
> > > > > From: Alexander Lobakin 
> > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > >
> > > > > > From: Eric Dumazet 
> > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > >
> > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > memory copy
> > > > > > > > overhead.
> > > > > > > >
> > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. 
> > > > > > > > Only the
> > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use 
> > > > > > > > page to
> > > > > > > > directly construct skb. If this feature is not supported, it is 
> > > > > > > > still
> > > > > > > > necessary to copy data to construct skb.
> > > > > > > >
> > > > > > > >  Performance Testing 
> > > > > > > >
> > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > Test cmd:
> > > > > > > > ```
> > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Test result data:
> > > > > > > >
> > > > > > > > size64  512 10241500
> > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > ---
> > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > --
> > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > sk_buff *skb)
> > > > > > > > sock_wfree(skb);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock 
> > > > > > > > *xs,
> > > > > > > > + struct xdp_desc 
> > > > > > > > *desc)
> > > > > > > > +{
> > > > > > > > +   u32 len, offset, copy, copied;
> > > > > > > > +   struct sk_buff *skb;
> > > > > > > > +   struct page *page;
> > > > > > > > +   void *buffer;
> > > > > > > > +   int err, i;
> > > > > > > > +   u64 addr;
> > > > > > > > +
> > > > > > > > +   skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > >
> > > > Also,
> > > > maybe we should allocate it with NET_SKB_PAD so NIC drivers could
> > > > use some reserved space?
> > > >
> > > > skb = sock_alloc_send_skb(>sk, NET_SKB_PAD, 1, 
> > > > );
> > > > ...
> > > > skb_reserve(skb, NET_SKB_PAD);
> > > >
> > > > Eric, what do you think?
> > >
> > > I think you are right. Some space should be added to continuous 
> > > equipment. This
> > > space should also be added in the copy mode below. Is LL_RESERVED_SPACE 
> > > more
> > > appropriate?
> >
> > No. If you look at __netdev_alloc_skb() and __napi_alloc_skb(), they
> > reserve NET_SKB_PAD at the beginning of linear area. Documentation of
> > __build_skb() also says that driver should reserve NET_SKB_PAD before
> > the actual frame, so it is a standartized hardware-independent
> > headroom.
> 
> I understand that these scenarios are in the case of receiving packets, and 
> the
> increased space is used by the protocol stack, especially RPS. I don't know if
> this also applies to the sending scenario?
> 
> > Leaving that space in skb->head will allow developers to implement
> > IFF_TX_SKB_NO_LINEAR in a wider variety of drivers, especially when
> > a driver has to prepend some sort of data before the actual frame.
> > Since it's usually of a size of one cacheline, shouldn't be a big
> > deal.
> >
> 
> I agree with this. Some network cards require some space. For example,
> virtio-net needs to add a virtio_net_hdr_mrg_rxbuf before skb->data, so my
> original understanding is used here. When we send the skb to the
> driver, the driver may need a memory space. So I refer to the
> implementation of __ip_append_data, I feel that adding
> LL_RESERVED_SPACE is a suitable solution.
> 
> I feel that I may still not understand the use scene you mentioned. Can you
> elaborate on what you understand this space will be used for?

LL_RESERVED_SPACE() consists of L2 header size (Ethernet for the most
cases) and dev->needed_headroom. That is not a value to count on, as:
 - L2 header is already here in XSK buffer;
 - not all drivers set 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-25 Thread Magnus Karlsson
On Mon, Jan 25, 2021 at 1:43 PM Xuan Zhuo  wrote:
>
> On Mon, 25 Jan 2021 08:44:38 +0100, Magnus Karlsson 
>  wrote:
> > On Mon, Jan 25, 2021 at 3:27 AM Xuan Zhuo  
> > wrote:
> > >
> > > On Fri, 22 Jan 2021 19:37:06 +0100, Magnus Karlsson 
> > >  wrote:
> > > > On Fri, Jan 22, 2021 at 6:26 PM Alexander Lobakin  
> > > > wrote:
> > > > >
> > > > > From: Xuan Zhuo 
> > > > > Date: Fri, 22 Jan 2021 23:39:15 +0800
> > > > >
> > > > > > On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
> > > > > >  wrote:
> > > > > > > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > From: Magnus Karlsson 
> > > > > > > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > > > > > > >
> > > > > > > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > From: Alexander Lobakin 
> > > > > > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > > > > > >
> > > > > > > > > > > From: Eric Dumazet 
> > > > > > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > > > > > >
> > > > > > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > > > > > This patch is used to construct skb based on page to 
> > > > > > > > > > > > > save memory copy
> > > > > > > > > > > > > overhead.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This function is implemented based on 
> > > > > > > > > > > > > IFF_TX_SKB_NO_LINEAR. Only the
> > > > > > > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR 
> > > > > > > > > > > > > will use page to
> > > > > > > > > > > > > directly construct skb. If this feature is not 
> > > > > > > > > > > > > supported, it is still
> > > > > > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > > > > > >
> > > > > > > > > > > > >  Performance Testing 
> > > > > > > > > > > > >
> > > > > > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > > > > > Test cmd:
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > > > > > ```
> > > > > > > > > > > > >
> > > > > > > > > > > > > Test result data:
> > > > > > > > > > > > >
> > > > > > > > > > > > > size64  512 10241500
> > > > > > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > > > > > --
> > > > > > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > > > > > @@ -430,6 +430,87 @@ static void 
> > > > > > > > > > > > > xsk_destruct_skb(struct sk_buff *skb)
> > > > > > > > > > > > >   sock_wfree(skb);
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct 
> > > > > > > > > > > > > xdp_sock *xs,
> > > > > > > > > > > > > +   struct 
> > > > > > > > > > > > > xdp_desc *desc)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > > > > > > + struct sk_buff *skb;
> > > > > > > > > > > > > + struct page *page;
> > > > > > > > > > > > > + void *buffer;
> > > > > > > > > > > > > + int err, i;
> > > > > > > > > > > > > + u64 addr;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > > > > > > + if (unlikely(!skb))
> > > > > > > > > > > > > + return ERR_PTR(err);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + addr = desc->addr;
> > > > > > > > > > > > > + len = desc->len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > > > > > > > > + page = xs->pool->umem->pgs[addr >> 
> > > > > > > > > > > > > PAGE_SHIFT];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + get_page(page);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - 
> > > > > > > > > > > > > copied);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + skb_fill_page_desc(skb, i, page, offset, 
> > > > > > > > > > > > > copy);
> > > > > > > > > > > > > +
> > > 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-25 Thread Magnus Karlsson
On Mon, Jan 25, 2021 at 3:27 AM Xuan Zhuo  wrote:
>
> On Fri, 22 Jan 2021 19:37:06 +0100, Magnus Karlsson 
>  wrote:
> > On Fri, Jan 22, 2021 at 6:26 PM Alexander Lobakin  wrote:
> > >
> > > From: Xuan Zhuo 
> > > Date: Fri, 22 Jan 2021 23:39:15 +0800
> > >
> > > > On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
> > > >  wrote:
> > > > > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin  
> > > > > wrote:
> > > > > >
> > > > > > From: Magnus Karlsson 
> > > > > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > > > > >
> > > > > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > From: Alexander Lobakin 
> > > > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > > > >
> > > > > > > > > From: Eric Dumazet 
> > > > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > > > >
> > > > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > > > > memory copy
> > > > > > > > > > > overhead.
> > > > > > > > > > >
> > > > > > > > > > > This function is implemented based on 
> > > > > > > > > > > IFF_TX_SKB_NO_LINEAR. Only the
> > > > > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR 
> > > > > > > > > > > will use page to
> > > > > > > > > > > directly construct skb. If this feature is not supported, 
> > > > > > > > > > > it is still
> > > > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > > > >
> > > > > > > > > > >  Performance Testing 
> > > > > > > > > > >
> > > > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > > > Test cmd:
> > > > > > > > > > > ```
> > > > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > > > ```
> > > > > > > > > > >
> > > > > > > > > > > Test result data:
> > > > > > > > > > >
> > > > > > > > > > > size64  512 10241500
> > > > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > > > ---
> > > > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > > > --
> > > > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > > > > sk_buff *skb)
> > > > > > > > > > >   sock_wfree(skb);
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct 
> > > > > > > > > > > xdp_sock *xs,
> > > > > > > > > > > +   struct xdp_desc 
> > > > > > > > > > > *desc)
> > > > > > > > > > > +{
> > > > > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > > > > + struct sk_buff *skb;
> > > > > > > > > > > + struct page *page;
> > > > > > > > > > > + void *buffer;
> > > > > > > > > > > + int err, i;
> > > > > > > > > > > + u64 addr;
> > > > > > > > > > > +
> > > > > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > > > > + if (unlikely(!skb))
> > > > > > > > > > > + return ERR_PTR(err);
> > > > > > > > > > > +
> > > > > > > > > > > + addr = desc->addr;
> > > > > > > > > > > + len = desc->len;
> > > > > > > > > > > +
> > > > > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > > > > +
> > > > > > > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > > > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > > > > > > +
> > > > > > > > > > > + get_page(page);
> > > > > > > > > > > +
> > > > > > > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - 
> > > > > > > > > > > copied);
> > > > > > > > > > > +
> > > > > > > > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > > > > > > +
> > > > > > > > > > > + copied += copy;
> > > > > > > > > > > + addr += copy;
> > > > > > > > > > > + offset = 0;
> > > > > > > > > > > + }
> > > > > > > > > > > +
> > > > > > > > > > > + skb->len += len;
> > > > > > > > > > > + skb->data_len += len;
> > > > > > > > > >
> > > > > > > > > > > + skb->truesize += len;
> > > > > > > > > >
> > > > > > > > > > This is not the truesize, unfortunately.
> > > > > > > > > >
> > > > > > > > > > We need to account for the number of pages, not number of 
> > > > > > > > > > bytes.
> > > > > > > > >
> > > > > > > > > The easiest 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Magnus Karlsson
On Fri, Jan 22, 2021 at 6:26 PM Alexander Lobakin  wrote:
>
> From: Xuan Zhuo 
> Date: Fri, 22 Jan 2021 23:39:15 +0800
>
> > On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
> >  wrote:
> > > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin  wrote:
> > > >
> > > > From: Magnus Karlsson 
> > > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > > >
> > > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  
> > > > > wrote:
> > > > > >
> > > > > > From: Alexander Lobakin 
> > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > >
> > > > > > > From: Eric Dumazet 
> > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > >
> > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > > memory copy
> > > > > > > > > overhead.
> > > > > > > > >
> > > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. 
> > > > > > > > > Only the
> > > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will 
> > > > > > > > > use page to
> > > > > > > > > directly construct skb. If this feature is not supported, it 
> > > > > > > > > is still
> > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > >
> > > > > > > > >  Performance Testing 
> > > > > > > > >
> > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > Test cmd:
> > > > > > > > > ```
> > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > Test result data:
> > > > > > > > >
> > > > > > > > > size64  512 10241500
> > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > ---
> > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > --
> > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > > sk_buff *skb)
> > > > > > > > >   sock_wfree(skb);
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct 
> > > > > > > > > xdp_sock *xs,
> > > > > > > > > +   struct xdp_desc *desc)
> > > > > > > > > +{
> > > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > > + struct sk_buff *skb;
> > > > > > > > > + struct page *page;
> > > > > > > > > + void *buffer;
> > > > > > > > > + int err, i;
> > > > > > > > > + u64 addr;
> > > > > > > > > +
> > > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > > + if (unlikely(!skb))
> > > > > > > > > + return ERR_PTR(err);
> > > > > > > > > +
> > > > > > > > > + addr = desc->addr;
> > > > > > > > > + len = desc->len;
> > > > > > > > > +
> > > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > > +
> > > > > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > > > > +
> > > > > > > > > + get_page(page);
> > > > > > > > > +
> > > > > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > > > > > > +
> > > > > > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > > > > +
> > > > > > > > > + copied += copy;
> > > > > > > > > + addr += copy;
> > > > > > > > > + offset = 0;
> > > > > > > > > + }
> > > > > > > > > +
> > > > > > > > > + skb->len += len;
> > > > > > > > > + skb->data_len += len;
> > > > > > > >
> > > > > > > > > + skb->truesize += len;
> > > > > > > >
> > > > > > > > This is not the truesize, unfortunately.
> > > > > > > >
> > > > > > > > We need to account for the number of pages, not number of bytes.
> > > > > > >
> > > > > > > The easiest solution is:
> > > > > > >
> > > > > > >   skb->truesize += PAGE_SIZE * i;
> > > > > > >
> > > > > > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the 
> > > > > > > loop.
> > > > > >
> > > > > > Oops, pls ignore this. I forgot that XSK buffers are not
> > > > > > "one per page".
> > > > > > We need to count the number of pages manually and then do
> > > > > >
> > > > > > skb->truesize += PAGE_SIZE * npages;
> > > > > >
> > > > > > Right.
> > > > >
> > > > > There are two possible packet buffer (chunks) sizes in a umem, 2K and
> > > > > 4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
> > > > 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Magnus Karlsson 
Date: Fri, 22 Jan 2021 19:37:06 +0100

> On Fri, Jan 22, 2021 at 6:26 PM Alexander Lobakin  wrote:
> >
> > From: Xuan Zhuo 
> > Date: Fri, 22 Jan 2021 23:39:15 +0800
> >
> > > On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
> > >  wrote:
> > > > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin  
> > > > wrote:
> > > > >
> > > > > From: Magnus Karlsson 
> > > > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > > > >
> > > > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  
> > > > > > wrote:
> > > > > > >
> > > > > > > From: Alexander Lobakin 
> > > > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > > > >
> > > > > > > > From: Eric Dumazet 
> > > > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > > > >
> > > > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > > > memory copy
> > > > > > > > > > overhead.
> > > > > > > > > >
> > > > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. 
> > > > > > > > > > Only the
> > > > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will 
> > > > > > > > > > use page to
> > > > > > > > > > directly construct skb. If this feature is not supported, 
> > > > > > > > > > it is still
> > > > > > > > > > necessary to copy data to construct skb.
> > > > > > > > > >
> > > > > > > > > >  Performance Testing 
> > > > > > > > > >
> > > > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > > > Test cmd:
> > > > > > > > > > ```
> > > > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > > > ```
> > > > > > > > > >
> > > > > > > > > > Test result data:
> > > > > > > > > >
> > > > > > > > > > size64  512 10241500
> > > > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > > > ---
> > > > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > > > --
> > > > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > > > sk_buff *skb)
> > > > > > > > > >   sock_wfree(skb);
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct 
> > > > > > > > > > xdp_sock *xs,
> > > > > > > > > > +   struct xdp_desc 
> > > > > > > > > > *desc)
> > > > > > > > > > +{
> > > > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > > > + struct sk_buff *skb;
> > > > > > > > > > + struct page *page;
> > > > > > > > > > + void *buffer;
> > > > > > > > > > + int err, i;
> > > > > > > > > > + u64 addr;
> > > > > > > > > > +
> > > > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > > > + if (unlikely(!skb))
> > > > > > > > > > + return ERR_PTR(err);
> > > > > > > > > > +
> > > > > > > > > > + addr = desc->addr;
> > > > > > > > > > + len = desc->len;
> > > > > > > > > > +
> > > > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > > > +
> > > > > > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > > > > > +
> > > > > > > > > > + get_page(page);
> > > > > > > > > > +
> > > > > > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - 
> > > > > > > > > > copied);
> > > > > > > > > > +
> > > > > > > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > > > > > +
> > > > > > > > > > + copied += copy;
> > > > > > > > > > + addr += copy;
> > > > > > > > > > + offset = 0;
> > > > > > > > > > + }
> > > > > > > > > > +
> > > > > > > > > > + skb->len += len;
> > > > > > > > > > + skb->data_len += len;
> > > > > > > > >
> > > > > > > > > > + skb->truesize += len;
> > > > > > > > >
> > > > > > > > > This is not the truesize, unfortunately.
> > > > > > > > >
> > > > > > > > > We need to account for the number of pages, not number of 
> > > > > > > > > bytes.
> > > > > > > >
> > > > > > > > The easiest solution is:
> > > > > > > >
> > > > > > > >   skb->truesize += PAGE_SIZE * i;
> > > > > > > >
> > > > > > > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the 
> > > > > > > > loop.
> > > > > > >
> > > > > > > Oops, pls ignore this. I forgot that XSK buffers are not
> > > 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Xuan Zhuo 
Date: Fri, 22 Jan 2021 23:39:15 +0800

> On Fri, 22 Jan 2021 13:55:14 +0100, Magnus Karlsson 
>  wrote:
> > On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin  wrote:
> > >
> > > From: Magnus Karlsson 
> > > Date: Fri, 22 Jan 2021 13:18:47 +0100
> > >
> > > > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  
> > > > wrote:
> > > > >
> > > > > From: Alexander Lobakin 
> > > > > Date: Fri, 22 Jan 2021 11:47:45 +
> > > > >
> > > > > > From: Eric Dumazet 
> > > > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > > > >
> > > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > > > This patch is used to construct skb based on page to save 
> > > > > > > > memory copy
> > > > > > > > overhead.
> > > > > > > >
> > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. 
> > > > > > > > Only the
> > > > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use 
> > > > > > > > page to
> > > > > > > > directly construct skb. If this feature is not supported, it is 
> > > > > > > > still
> > > > > > > > necessary to copy data to construct skb.
> > > > > > > >
> > > > > > > >  Performance Testing 
> > > > > > > >
> > > > > > > > The test environment is Aliyun ECS server.
> > > > > > > > Test cmd:
> > > > > > > > ```
> > > > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > > > ```
> > > > > > > >
> > > > > > > > Test result data:
> > > > > > > >
> > > > > > > > size64  512 10241500
> > > > > > > > copy1916747 1775988 1600203 1440054
> > > > > > > > page1974058 1953655 1945463 1904478
> > > > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > > > Reviewed-by: Dust Li 
> > > > > > > > ---
> > > > > > > >  net/xdp/xsk.c | 104 
> > > > > > > > --
> > > > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > > > index 4a83117..38af7f1 100644
> > > > > > > > --- a/net/xdp/xsk.c
> > > > > > > > +++ b/net/xdp/xsk.c
> > > > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct 
> > > > > > > > sk_buff *skb)
> > > > > > > >   sock_wfree(skb);
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock 
> > > > > > > > *xs,
> > > > > > > > +   struct xdp_desc *desc)
> > > > > > > > +{
> > > > > > > > + u32 len, offset, copy, copied;
> > > > > > > > + struct sk_buff *skb;
> > > > > > > > + struct page *page;
> > > > > > > > + void *buffer;
> > > > > > > > + int err, i;
> > > > > > > > + u64 addr;
> > > > > > > > +
> > > > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > > > + if (unlikely(!skb))
> > > > > > > > + return ERR_PTR(err);
> > > > > > > > +
> > > > > > > > + addr = desc->addr;
> > > > > > > > + len = desc->len;
> > > > > > > > +
> > > > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > > > + offset = offset_in_page(buffer);
> > > > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > > > +
> > > > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > > > +
> > > > > > > > + get_page(page);
> > > > > > > > +
> > > > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > > > > > +
> > > > > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > > > +
> > > > > > > > + copied += copy;
> > > > > > > > + addr += copy;
> > > > > > > > + offset = 0;
> > > > > > > > + }
> > > > > > > > +
> > > > > > > > + skb->len += len;
> > > > > > > > + skb->data_len += len;
> > > > > > >
> > > > > > > > + skb->truesize += len;
> > > > > > >
> > > > > > > This is not the truesize, unfortunately.
> > > > > > >
> > > > > > > We need to account for the number of pages, not number of bytes.
> > > > > >
> > > > > > The easiest solution is:
> > > > > >
> > > > > >   skb->truesize += PAGE_SIZE * i;
> > > > > >
> > > > > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the 
> > > > > > loop.
> > > > >
> > > > > Oops, pls ignore this. I forgot that XSK buffers are not
> > > > > "one per page".
> > > > > We need to count the number of pages manually and then do
> > > > >
> > > > > skb->truesize += PAGE_SIZE * npages;
> > > > >
> > > > > Right.
> > > >
> > > > There are two possible packet buffer (chunks) sizes in a umem, 2K and
> > > > 4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
> > > > please correct me if wrong, truesize is used for memory accounting.
> > > > But in this code, no kernel memory has been allocated (apart from the
> > > > skb). The page is just a part of the umem that has been already
> > > > allocated beforehand and by user-space in this case. So what should
> > > > 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Xuan Zhuo 
Date: Fri, 22 Jan 2021 23:36:29 +0800

> On Fri, 22 Jan 2021 12:08:00 +, Alexander Lobakin  wrote:
> > From: Alexander Lobakin 
> > Date: Fri, 22 Jan 2021 11:55:35 +
> >
> > > From: Alexander Lobakin 
> > > Date: Fri, 22 Jan 2021 11:47:45 +
> > >
> > > > From: Eric Dumazet 
> > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > >
> > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > This patch is used to construct skb based on page to save memory 
> > > > > > copy
> > > > > > overhead.
> > > > > >
> > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page 
> > > > > > to
> > > > > > directly construct skb. If this feature is not supported, it is 
> > > > > > still
> > > > > > necessary to copy data to construct skb.
> > > > > >
> > > > > >  Performance Testing 
> > > > > >
> > > > > > The test environment is Aliyun ECS server.
> > > > > > Test cmd:
> > > > > > ```
> > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > ```
> > > > > >
> > > > > > Test result data:
> > > > > >
> > > > > > size64  512 10241500
> > > > > > copy1916747 1775988 1600203 1440054
> > > > > > page1974058 1953655 1945463 1904478
> > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > Reviewed-by: Dust Li 
> > > > > > ---
> > > > > >  net/xdp/xsk.c | 104 
> > > > > > --
> > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > >
> > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > index 4a83117..38af7f1 100644
> > > > > > --- a/net/xdp/xsk.c
> > > > > > +++ b/net/xdp/xsk.c
> > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff 
> > > > > > *skb)
> > > > > > sock_wfree(skb);
> > > > > >  }
> > > > > >
> > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > > > + struct xdp_desc *desc)
> > > > > > +{
> > > > > > +   u32 len, offset, copy, copied;
> > > > > > +   struct sk_buff *skb;
> > > > > > +   struct page *page;
> > > > > > +   void *buffer;
> > > > > > +   int err, i;
> > > > > > +   u64 addr;
> > > > > > +
> > > > > > +   skb = sock_alloc_send_skb(>sk, 0, 1, );
> >
> > Also,
> > maybe we should allocate it with NET_SKB_PAD so NIC drivers could
> > use some reserved space?
> >
> > skb = sock_alloc_send_skb(>sk, NET_SKB_PAD, 1, );
> > ...
> > skb_reserve(skb, NET_SKB_PAD);
> >
> > Eric, what do you think?
> 
> I think you are right. Some space should be added to continuous equipment. 
> This
> space should also be added in the copy mode below. Is LL_RESERVED_SPACE more
> appropriate?

No. If you look at __netdev_alloc_skb() and __napi_alloc_skb(), they
reserve NET_SKB_PAD at the beginning of linear area. Documentation of
__build_skb() also says that driver should reserve NET_SKB_PAD before
the actual frame, so it is a standartized hardware-independent
headroom.
Leaving that space in skb->head will allow developers to implement
IFF_TX_SKB_NO_LINEAR in a wider variety of drivers, especially when
a driver has to prepend some sort of data before the actual frame.
Since it's usually of a size of one cacheline, shouldn't be a big
deal.


[ I also had an idea of allocating an skb with a headroom of
NET_SKB_PAD + 256 bytes, so nearly all drivers could just call
pskb_pull_tail() to support such type of skbuffs without much
effort, but I think that it's better to teach drivers to support
xmitting of really headless ones. If virtio_net can do it, why
shouldn't the others ]

> > > > > > +   if (unlikely(!skb))
> > > > > > +   return ERR_PTR(err);
> > > > > > +
> > > > > > +   addr = desc->addr;
> > > > > > +   len = desc->len;
> > > > > > +
> > > > > > +   buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > +   offset = offset_in_page(buffer);
> > > > > > +   addr = buffer - xs->pool->addrs;
> > > > > > +
> > > > > > +   for (copied = 0, i = 0; copied < len; i++) {
> > > > > > +   page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > +
> > > > > > +   get_page(page);
> > > > > > +
> > > > > > +   copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > > > +
> > > > > > +   skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > +
> > > > > > +   copied += copy;
> > > > > > +   addr += copy;
> > > > > > +   offset = 0;
> > > > > > +   }
> > > > > > +
> > > > > > +   skb->len += len;
> > > > > > +   skb->data_len += len;
> > > > >
> > > > > > +   skb->truesize += len;
> > > > >
> > > > > This is not the truesize, unfortunately.
> > > > >
> > > > > We need to account for the number of pages, not number of bytes.
> > > >
> > > > The easiest solution is:
> > > >
> > > > skb->truesize += PAGE_SIZE * i;
> > > 

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Magnus Karlsson
On Fri, Jan 22, 2021 at 1:39 PM Alexander Lobakin  wrote:
>
> From: Magnus Karlsson 
> Date: Fri, 22 Jan 2021 13:18:47 +0100
>
> > On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  wrote:
> > >
> > > From: Alexander Lobakin 
> > > Date: Fri, 22 Jan 2021 11:47:45 +
> > >
> > > > From: Eric Dumazet 
> > > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > > >
> > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > > This patch is used to construct skb based on page to save memory 
> > > > > > copy
> > > > > > overhead.
> > > > > >
> > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page 
> > > > > > to
> > > > > > directly construct skb. If this feature is not supported, it is 
> > > > > > still
> > > > > > necessary to copy data to construct skb.
> > > > > >
> > > > > >  Performance Testing 
> > > > > >
> > > > > > The test environment is Aliyun ECS server.
> > > > > > Test cmd:
> > > > > > ```
> > > > > > xdpsock -i eth0 -t  -S -s 
> > > > > > ```
> > > > > >
> > > > > > Test result data:
> > > > > >
> > > > > > size64  512 10241500
> > > > > > copy1916747 1775988 1600203 1440054
> > > > > > page1974058 1953655 1945463 1904478
> > > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo 
> > > > > > Reviewed-by: Dust Li 
> > > > > > ---
> > > > > >  net/xdp/xsk.c | 104 
> > > > > > --
> > > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > > >
> > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > > index 4a83117..38af7f1 100644
> > > > > > --- a/net/xdp/xsk.c
> > > > > > +++ b/net/xdp/xsk.c
> > > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff 
> > > > > > *skb)
> > > > > >   sock_wfree(skb);
> > > > > >  }
> > > > > >
> > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > > > +   struct xdp_desc *desc)
> > > > > > +{
> > > > > > + u32 len, offset, copy, copied;
> > > > > > + struct sk_buff *skb;
> > > > > > + struct page *page;
> > > > > > + void *buffer;
> > > > > > + int err, i;
> > > > > > + u64 addr;
> > > > > > +
> > > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > > + if (unlikely(!skb))
> > > > > > + return ERR_PTR(err);
> > > > > > +
> > > > > > + addr = desc->addr;
> > > > > > + len = desc->len;
> > > > > > +
> > > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > > + offset = offset_in_page(buffer);
> > > > > > + addr = buffer - xs->pool->addrs;
> > > > > > +
> > > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > > +
> > > > > > + get_page(page);
> > > > > > +
> > > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > > > +
> > > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > > +
> > > > > > + copied += copy;
> > > > > > + addr += copy;
> > > > > > + offset = 0;
> > > > > > + }
> > > > > > +
> > > > > > + skb->len += len;
> > > > > > + skb->data_len += len;
> > > > >
> > > > > > + skb->truesize += len;
> > > > >
> > > > > This is not the truesize, unfortunately.
> > > > >
> > > > > We need to account for the number of pages, not number of bytes.
> > > >
> > > > The easiest solution is:
> > > >
> > > >   skb->truesize += PAGE_SIZE * i;
> > > >
> > > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.
> > >
> > > Oops, pls ignore this. I forgot that XSK buffers are not
> > > "one per page".
> > > We need to count the number of pages manually and then do
> > >
> > > skb->truesize += PAGE_SIZE * npages;
> > >
> > > Right.
> >
> > There are two possible packet buffer (chunks) sizes in a umem, 2K and
> > 4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
> > please correct me if wrong, truesize is used for memory accounting.
> > But in this code, no kernel memory has been allocated (apart from the
> > skb). The page is just a part of the umem that has been already
> > allocated beforehand and by user-space in this case. So what should
> > truesize be in this case? Do we add 0, chunk_size * i, or the
> > complicated case of counting exactly how many 4K pages that are used
> > when the chunk_size is 2K, as two chunks could occupy the same page,
> > or just the upper bound of PAGE_SIZE * i that is likely a good
> > approximation in most cases? Just note that there might be other uses
> > of truesize that I am unaware of that could impact this choice.
>
> Truesize is "what amount of memory does this skb occupy with all its
> fragments, linear space and struct sk_buff itself". The closest it
> will be to the actual value, the better.
> In this case, I think adding of chunk_size * i would be enough.

Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Magnus Karlsson 
Date: Fri, 22 Jan 2021 13:18:47 +0100

> On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  wrote:
> >
> > From: Alexander Lobakin 
> > Date: Fri, 22 Jan 2021 11:47:45 +
> >
> > > From: Eric Dumazet 
> > > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > >
> > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > > This patch is used to construct skb based on page to save memory copy
> > > > > overhead.
> > > > >
> > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > > > > directly construct skb. If this feature is not supported, it is still
> > > > > necessary to copy data to construct skb.
> > > > >
> > > > >  Performance Testing 
> > > > >
> > > > > The test environment is Aliyun ECS server.
> > > > > Test cmd:
> > > > > ```
> > > > > xdpsock -i eth0 -t  -S -s 
> > > > > ```
> > > > >
> > > > > Test result data:
> > > > >
> > > > > size64  512 10241500
> > > > > copy1916747 1775988 1600203 1440054
> > > > > page1974058 1953655 1945463 1904478
> > > > > percent 3.0%10.0%   21.58%  32.3%
> > > > >
> > > > > Signed-off-by: Xuan Zhuo 
> > > > > Reviewed-by: Dust Li 
> > > > > ---
> > > > >  net/xdp/xsk.c | 104 
> > > > > --
> > > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > >
> > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > > index 4a83117..38af7f1 100644
> > > > > --- a/net/xdp/xsk.c
> > > > > +++ b/net/xdp/xsk.c
> > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > > > >   sock_wfree(skb);
> > > > >  }
> > > > >
> > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > > +   struct xdp_desc *desc)
> > > > > +{
> > > > > + u32 len, offset, copy, copied;
> > > > > + struct sk_buff *skb;
> > > > > + struct page *page;
> > > > > + void *buffer;
> > > > > + int err, i;
> > > > > + u64 addr;
> > > > > +
> > > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > > + if (unlikely(!skb))
> > > > > + return ERR_PTR(err);
> > > > > +
> > > > > + addr = desc->addr;
> > > > > + len = desc->len;
> > > > > +
> > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > > + offset = offset_in_page(buffer);
> > > > > + addr = buffer - xs->pool->addrs;
> > > > > +
> > > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > > +
> > > > > + get_page(page);
> > > > > +
> > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > > +
> > > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > > +
> > > > > + copied += copy;
> > > > > + addr += copy;
> > > > > + offset = 0;
> > > > > + }
> > > > > +
> > > > > + skb->len += len;
> > > > > + skb->data_len += len;
> > > >
> > > > > + skb->truesize += len;
> > > >
> > > > This is not the truesize, unfortunately.
> > > >
> > > > We need to account for the number of pages, not number of bytes.
> > >
> > > The easiest solution is:
> > >
> > >   skb->truesize += PAGE_SIZE * i;
> > >
> > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.
> >
> > Oops, pls ignore this. I forgot that XSK buffers are not
> > "one per page".
> > We need to count the number of pages manually and then do
> >
> > skb->truesize += PAGE_SIZE * npages;
> >
> > Right.
> 
> There are two possible packet buffer (chunks) sizes in a umem, 2K and
> 4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
> please correct me if wrong, truesize is used for memory accounting.
> But in this code, no kernel memory has been allocated (apart from the
> skb). The page is just a part of the umem that has been already
> allocated beforehand and by user-space in this case. So what should
> truesize be in this case? Do we add 0, chunk_size * i, or the
> complicated case of counting exactly how many 4K pages that are used
> when the chunk_size is 2K, as two chunks could occupy the same page,
> or just the upper bound of PAGE_SIZE * i that is likely a good
> approximation in most cases? Just note that there might be other uses
> of truesize that I am unaware of that could impact this choice.

Truesize is "what amount of memory does this skb occupy with all its
fragments, linear space and struct sk_buff itself". The closest it
will be to the actual value, the better.
In this case, I think adding of chunk_size * i would be enough.

(PAGE_SIZE * i can be overwhelming when chunk_size is 2K, especially
for setups with PAGE_SIZE > SZ_4K)

> > > > > +
> > > > > + refcount_add(len, >sk.sk_wmem_alloc);
> > > > > +
> > > > > + return skb;
> > > > > +}
> > > > > +
> > >
> > > Al
> >
> > Thanks,
> > Al

Al



Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Alexander Lobakin 
Date: Fri, 22 Jan 2021 11:55:35 +

> From: Alexander Lobakin 
> Date: Fri, 22 Jan 2021 11:47:45 +
> 
> > From: Eric Dumazet 
> > Date: Thu, 21 Jan 2021 16:41:33 +0100
> > 
> > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > This patch is used to construct skb based on page to save memory copy
> > > > overhead.
> > > > 
> > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > > > directly construct skb. If this feature is not supported, it is still
> > > > necessary to copy data to construct skb.
> > > > 
> > > >  Performance Testing 
> > > > 
> > > > The test environment is Aliyun ECS server.
> > > > Test cmd:
> > > > ```
> > > > xdpsock -i eth0 -t  -S -s 
> > > > ```
> > > > 
> > > > Test result data:
> > > > 
> > > > size64  512 10241500
> > > > copy1916747 1775988 1600203 1440054
> > > > page1974058 1953655 1945463 1904478
> > > > percent 3.0%10.0%   21.58%  32.3%
> > > > 
> > > > Signed-off-by: Xuan Zhuo 
> > > > Reviewed-by: Dust Li 
> > > > ---
> > > >  net/xdp/xsk.c | 104 
> > > > --
> > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > > 
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 4a83117..38af7f1 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > > > sock_wfree(skb);
> > > >  }
> > > >  
> > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > + struct xdp_desc *desc)
> > > > +{
> > > > +   u32 len, offset, copy, copied;
> > > > +   struct sk_buff *skb;
> > > > +   struct page *page;
> > > > +   void *buffer;
> > > > +   int err, i;
> > > > +   u64 addr;
> > > > +
> > > > +   skb = sock_alloc_send_skb(>sk, 0, 1, );

Also,
maybe we should allocate it with NET_SKB_PAD so NIC drivers could
use some reserved space?

skb = sock_alloc_send_skb(>sk, NET_SKB_PAD, 1, );
...
skb_reserve(skb, NET_SKB_PAD);

Eric, what do you think?

> > > > +   if (unlikely(!skb))
> > > > +   return ERR_PTR(err);
> > > > +
> > > > +   addr = desc->addr;
> > > > +   len = desc->len;
> > > > +
> > > > +   buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > +   offset = offset_in_page(buffer);
> > > > +   addr = buffer - xs->pool->addrs;
> > > > +
> > > > +   for (copied = 0, i = 0; copied < len; i++) {
> > > > +   page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > +
> > > > +   get_page(page);
> > > > +
> > > > +   copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > +
> > > > +   skb_fill_page_desc(skb, i, page, offset, copy);
> > > > +
> > > > +   copied += copy;
> > > > +   addr += copy;
> > > > +   offset = 0;
> > > > +   }
> > > > +
> > > > +   skb->len += len;
> > > > +   skb->data_len += len;
> > > 
> > > > +   skb->truesize += len;
> > > 
> > > This is not the truesize, unfortunately.
> > > 
> > > We need to account for the number of pages, not number of bytes.
> > 
> > The easiest solution is:
> > 
> > skb->truesize += PAGE_SIZE * i;
> > 
> > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.
> 
> Oops, pls ignore this. I forgot that XSK buffers are not
> "one per page".
> We need to count the number of pages manually and then do
> 
>   skb->truesize += PAGE_SIZE * npages;
> 
> Right.
> 
> > > > +
> > > > +   refcount_add(len, >sk.sk_wmem_alloc);
> > > > +
> > > > +   return skb;
> > > > +}
> > > > +
> > 
> > Al
> 
> Thanks,
> Al

Al



Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Magnus Karlsson
On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin  wrote:
>
> From: Alexander Lobakin 
> Date: Fri, 22 Jan 2021 11:47:45 +
>
> > From: Eric Dumazet 
> > Date: Thu, 21 Jan 2021 16:41:33 +0100
> >
> > > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > > This patch is used to construct skb based on page to save memory copy
> > > > overhead.
> > > >
> > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > > > directly construct skb. If this feature is not supported, it is still
> > > > necessary to copy data to construct skb.
> > > >
> > > >  Performance Testing 
> > > >
> > > > The test environment is Aliyun ECS server.
> > > > Test cmd:
> > > > ```
> > > > xdpsock -i eth0 -t  -S -s 
> > > > ```
> > > >
> > > > Test result data:
> > > >
> > > > size64  512 10241500
> > > > copy1916747 1775988 1600203 1440054
> > > > page1974058 1953655 1945463 1904478
> > > > percent 3.0%10.0%   21.58%  32.3%
> > > >
> > > > Signed-off-by: Xuan Zhuo 
> > > > Reviewed-by: Dust Li 
> > > > ---
> > > >  net/xdp/xsk.c | 104 
> > > > --
> > > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > > index 4a83117..38af7f1 100644
> > > > --- a/net/xdp/xsk.c
> > > > +++ b/net/xdp/xsk.c
> > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > > >   sock_wfree(skb);
> > > >  }
> > > >
> > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > > +   struct xdp_desc *desc)
> > > > +{
> > > > + u32 len, offset, copy, copied;
> > > > + struct sk_buff *skb;
> > > > + struct page *page;
> > > > + void *buffer;
> > > > + int err, i;
> > > > + u64 addr;
> > > > +
> > > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > > + if (unlikely(!skb))
> > > > + return ERR_PTR(err);
> > > > +
> > > > + addr = desc->addr;
> > > > + len = desc->len;
> > > > +
> > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > > + offset = offset_in_page(buffer);
> > > > + addr = buffer - xs->pool->addrs;
> > > > +
> > > > + for (copied = 0, i = 0; copied < len; i++) {
> > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > > +
> > > > + get_page(page);
> > > > +
> > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > > +
> > > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > > +
> > > > + copied += copy;
> > > > + addr += copy;
> > > > + offset = 0;
> > > > + }
> > > > +
> > > > + skb->len += len;
> > > > + skb->data_len += len;
> > >
> > > > + skb->truesize += len;
> > >
> > > This is not the truesize, unfortunately.
> > >
> > > We need to account for the number of pages, not number of bytes.
> >
> > The easiest solution is:
> >
> >   skb->truesize += PAGE_SIZE * i;
> >
> > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.
>
> Oops, pls ignore this. I forgot that XSK buffers are not
> "one per page".
> We need to count the number of pages manually and then do
>
> skb->truesize += PAGE_SIZE * npages;
>
> Right.

There are two possible packet buffer (chunks) sizes in a umem, 2K and
4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and
please correct me if wrong, truesize is used for memory accounting.
But in this code, no kernel memory has been allocated (apart from the
skb). The page is just a part of the umem that has been already
allocated beforehand and by user-space in this case. So what should
truesize be in this case? Do we add 0, chunk_size * i, or the
complicated case of counting exactly how many 4K pages that are used
when the chunk_size is 2K, as two chunks could occupy the same page,
or just the upper bound of PAGE_SIZE * i that is likely a good
approximation in most cases? Just note that there might be other uses
of truesize that I am unaware of that could impact this choice.

> > > > +
> > > > + refcount_add(len, >sk.sk_wmem_alloc);
> > > > +
> > > > + return skb;
> > > > +}
> > > > +
> >
> > Al
>
> Thanks,
> Al
>


Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Alexander Lobakin 
Date: Fri, 22 Jan 2021 11:47:45 +

> From: Eric Dumazet 
> Date: Thu, 21 Jan 2021 16:41:33 +0100
> 
> > On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > > This patch is used to construct skb based on page to save memory copy
> > > overhead.
> > > 
> > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > > directly construct skb. If this feature is not supported, it is still
> > > necessary to copy data to construct skb.
> > > 
> > >  Performance Testing 
> > > 
> > > The test environment is Aliyun ECS server.
> > > Test cmd:
> > > ```
> > > xdpsock -i eth0 -t  -S -s 
> > > ```
> > > 
> > > Test result data:
> > > 
> > > size64  512 10241500
> > > copy1916747 1775988 1600203 1440054
> > > page1974058 1953655 1945463 1904478
> > > percent 3.0%10.0%   21.58%  32.3%
> > > 
> > > Signed-off-by: Xuan Zhuo 
> > > Reviewed-by: Dust Li 
> > > ---
> > >  net/xdp/xsk.c | 104 
> > > --
> > >  1 file changed, 86 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > > index 4a83117..38af7f1 100644
> > > --- a/net/xdp/xsk.c
> > > +++ b/net/xdp/xsk.c
> > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > >   sock_wfree(skb);
> > >  }
> > >  
> > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > > +   struct xdp_desc *desc)
> > > +{
> > > + u32 len, offset, copy, copied;
> > > + struct sk_buff *skb;
> > > + struct page *page;
> > > + void *buffer;
> > > + int err, i;
> > > + u64 addr;
> > > +
> > > + skb = sock_alloc_send_skb(>sk, 0, 1, );
> > > + if (unlikely(!skb))
> > > + return ERR_PTR(err);
> > > +
> > > + addr = desc->addr;
> > > + len = desc->len;
> > > +
> > > + buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > > + offset = offset_in_page(buffer);
> > > + addr = buffer - xs->pool->addrs;
> > > +
> > > + for (copied = 0, i = 0; copied < len; i++) {
> > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > > +
> > > + get_page(page);
> > > +
> > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > > +
> > > + skb_fill_page_desc(skb, i, page, offset, copy);
> > > +
> > > + copied += copy;
> > > + addr += copy;
> > > + offset = 0;
> > > + }
> > > +
> > > + skb->len += len;
> > > + skb->data_len += len;
> > 
> > > + skb->truesize += len;
> > 
> > This is not the truesize, unfortunately.
> > 
> > We need to account for the number of pages, not number of bytes.
> 
> The easiest solution is:
> 
>   skb->truesize += PAGE_SIZE * i;
> 
> i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.

Oops, pls ignore this. I forgot that XSK buffers are not
"one per page".
We need to count the number of pages manually and then do

skb->truesize += PAGE_SIZE * npages;

Right.

> > > +
> > > + refcount_add(len, >sk.sk_wmem_alloc);
> > > +
> > > + return skb;
> > > +}
> > > +
> 
> Al

Thanks,
Al



Re: [PATCH bpf-next v3 3/3] xsk: build skb by page

2021-01-22 Thread Alexander Lobakin
From: Eric Dumazet 
Date: Thu, 21 Jan 2021 16:41:33 +0100

> On 1/21/21 2:47 PM, Xuan Zhuo wrote:
> > This patch is used to construct skb based on page to save memory copy
> > overhead.
> > 
> > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the
> > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to
> > directly construct skb. If this feature is not supported, it is still
> > necessary to copy data to construct skb.
> > 
> >  Performance Testing 
> > 
> > The test environment is Aliyun ECS server.
> > Test cmd:
> > ```
> > xdpsock -i eth0 -t  -S -s 
> > ```
> > 
> > Test result data:
> > 
> > size64  512 10241500
> > copy1916747 1775988 1600203 1440054
> > page1974058 1953655 1945463 1904478
> > percent 3.0%10.0%   21.58%  32.3%
> > 
> > Signed-off-by: Xuan Zhuo 
> > Reviewed-by: Dust Li 
> > ---
> >  net/xdp/xsk.c | 104 
> > --
> >  1 file changed, 86 insertions(+), 18 deletions(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index 4a83117..38af7f1 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb)
> > sock_wfree(skb);
> >  }
> >  
> > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs,
> > + struct xdp_desc *desc)
> > +{
> > +   u32 len, offset, copy, copied;
> > +   struct sk_buff *skb;
> > +   struct page *page;
> > +   void *buffer;
> > +   int err, i;
> > +   u64 addr;
> > +
> > +   skb = sock_alloc_send_skb(>sk, 0, 1, );
> > +   if (unlikely(!skb))
> > +   return ERR_PTR(err);
> > +
> > +   addr = desc->addr;
> > +   len = desc->len;
> > +
> > +   buffer = xsk_buff_raw_get_data(xs->pool, addr);
> > +   offset = offset_in_page(buffer);
> > +   addr = buffer - xs->pool->addrs;
> > +
> > +   for (copied = 0, i = 0; copied < len; i++) {
> > +   page = xs->pool->umem->pgs[addr >> PAGE_SHIFT];
> > +
> > +   get_page(page);
> > +
> > +   copy = min_t(u32, PAGE_SIZE - offset, len - copied);
> > +
> > +   skb_fill_page_desc(skb, i, page, offset, copy);
> > +
> > +   copied += copy;
> > +   addr += copy;
> > +   offset = 0;
> > +   }
> > +
> > +   skb->len += len;
> > +   skb->data_len += len;
> 
> > +   skb->truesize += len;
> 
> This is not the truesize, unfortunately.
> 
> We need to account for the number of pages, not number of bytes.

The easiest solution is:

skb->truesize += PAGE_SIZE * i;

i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop.

> > +
> > +   refcount_add(len, >sk.sk_wmem_alloc);
> > +
> > +   return skb;
> > +}
> > +

Al