Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-27 Thread Jason Wang


在 2021/4/27 上午10:46, Xuan Zhuo 写道:

On Tue, 20 Apr 2021 10:41:03 +0800, Jason Wang  wrote:


Btw, since the patch modifies a critical path of virtio-net I suggest to
test the following cases:

1) netperf TCP stream
2) netperf UDP with packet size from 64 to PAGE_SIZE
3) XDP_PASS with 1)
4) XDP_PASS with 2)
5) XDP metadata with 1)
6) XDP metadata with 2)

I have completed the above test on the latest net-next branch. The tested script
and xdp code are as follows. The kernel is with KCOV and everything is normal
without exception.

Thanks.



Looks good.

Thanks for the testing.




## test script:
#!/usr/bin/env sh


for s in $(seq 64 4096)
do
netperf -H 192.168.122.202  -t UDP_STREAM -- -m $s
done

for s in $(seq 64 4096)
do
netperf -H 192.168.122.202  -t TCP_STREAM -- -m $s
done

## xdp pass:

#define KBUILD_MODNAME "foo"
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define DEFAULT_TTL 64
#define MAX_PCKT_SIZE 600
#define ICMP_TOOBIG_SIZE 98
#define ICMP_TOOBIG_PAYLOAD_SIZE 92


#define SEC(NAME) __attribute__((section(NAME), used))


SEC("xdp")
int _xdp(struct xdp_md *xdp)
{
return XDP_PASS;
}

char _license[] SEC("license") = "GPL";

## xdp metadata:

#define KBUILD_MODNAME "foo"
#include 
#include 
#include 
#include 
#include 
#include 
#include 

static long (*bpf_xdp_adjust_meta)(struct xdp_md *xdp_md, int delta) = 
(void *) 54;


#define SEC(NAME) __attribute__((section(NAME), used))

struct meta_info {
__u32 mark;
} __attribute__((aligned(4)));

SEC("xdp_mark")
int _xdp_mark(struct xdp_md *ctx)
{
struct meta_info *meta;
void *data, *data_end;
int ret;

/* Reserve space in-front of data pointer for our meta info.
 * (Notice drivers not supporting data_meta will fail here!)
 */
ret = bpf_xdp_adjust_meta(ctx, -(int)sizeof(*meta));
if (ret < 0)
return XDP_ABORTED;

/* Notice: Kernel-side verifier requires that loading of
 * ctx->data MUST happen _after_ helper bpf_xdp_adjust_meta(),
 * as pkt-data pointers are invalidated.  Helpers that require
 * this are determined/marked by bpf_helper_changes_pkt_data()
 */
data = (void *)(unsigned long)ctx->data;

/* Check data_meta have room for meta_info struct */
meta = (void *)(unsigned long)ctx->data_meta;
if ((void *)(meta + 1) > data)
return XDP_ABORTED;

meta->mark = 42;

return XDP_PASS;
}


char _license[] SEC("license") = "GPL";



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-22 Thread Ido Schimmel
On Thu, Apr 22, 2021 at 08:12:31PM +0800, Xuan Zhuo wrote:
> Thank you very much for reporting this problem. Can you try this patch? Of
> course, it also includes two patches from eric.
> 
>  af39c8f72301 virtio-net: fix use-after-free in page_to_skb()
>  f5d7872a8b8a virtio-net: restrict build_skb() use to some arches

Applied your patch on top of net-next, looks good. Feel free to add:

Tested-by: Ido Schimmel 

Thanks for the quick fix
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-22 Thread Ido Schimmel
On Fri, Apr 16, 2021 at 05:16:15PM +0800, Xuan Zhuo wrote:
> In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
> can use build_skb to create skb directly. No need to alloc for
> additional space. And it can save a 'frags slot', which is very friendly
> to GRO.
> 
> Here, if the payload of the received package is too small (less than
> GOOD_COPY_LEN), we still choose to copy it directly to the space got by
> napi_alloc_skb. So we can reuse these pages.
> 
> Testing Machine:
> The four queues of the network card are bound to the cpu1.
> 
> Test command:
> for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& 
> done
> 
> The size of the udp package is 1000, so in the case of this patch, there
> will always be enough tailroom to use build_skb. The sent udp packet
> will be discarded because there is no port to receive it. The irqsoftd
> of the machine is 100%, we observe the received quantity displayed by
> sar -n DEV 1:
> 
> no build_skb:  956864.00 rxpck/s
> build_skb:1158465.00 rxpck/s
> 
> Signed-off-by: Xuan Zhuo 
> Suggested-by: Jason Wang 

We have VMs that use virtio_net for their management interface. After
this patch was applied we started seeing crashes when these VMs access
an NFS file system. I thought Eric's patches will fix it, but problem
persists even with his two patches:

af39c8f72301 virtio-net: fix use-after-free in page_to_skb()
f5d7872a8b8a virtio-net: restrict build_skb() use to some arches

Reverting all three patches makes the problem go away. A KASAN enabled
kernel emits the following (decoded) stack trace.

[1]
BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260)
Write of size 16 at addr 88811619fffc by task kworker/u9:0/534
CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 
5.12.0-rc7-custom-16372-gb150be05b806 #3382
Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 
04/01/2014
Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc]
Call Trace:
 
dump_stack (lib/dump_stack.c:122)
print_address_description.constprop.0 (mm/kasan/report.c:233)
kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416)
skb_gro_receive (net/core/skbuff.c:4260)
tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1))
tcp4_gro_receive (net/ipv4/tcp_offload.c:316)
inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2))
dev_gro_receive (net/core/dev.c:6075)
napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198)
receive_buf (drivers/net/virtio_net.c:1151) virtio_net
virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) 
virtio_net
__napi_poll (net/core/dev.c:6964)
net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118)
__do_softirq (./arch/x86/include/asm/jump_label.h:25 
./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 
kernel/softirq.c:346)
irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434)
common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))

asm_common_interrupt (./arch/x86/include/asm/idtentry.h:623)
RIP: 0010:read_hpet (arch/x86/kernel/hpet.c:823 arch/x86/kernel/hpet.c:793)
Code: 90 48 8b 05 a5 ef 6f 02 48 89 44 24 68 48 c1 e8 20 89 c2 3b 44 24 4c 74 
d1 89 d0 e9 c7 fe ff ff e8 38 90 35 00 fb 8b 44 24 6c  b8 fe ff ff 8b 54 24 
6
All code

   0:   90  nop
   1:   48 8b 05 a5 ef 6f 02mov0x26fefa5(%rip),%rax# 0x26fefad
   8:   48 89 44 24 68  mov%rax,0x68(%rsp)
   d:   48 c1 e8 20 shr$0x20,%rax
  11:   89 c2   mov%eax,%edx
  13:   3b 44 24 4c cmp0x4c(%rsp),%eax
  17:   74 d1   je 0xffea
  19:   89 d0   mov%edx,%eax
  1b:   e9 c7 fe ff ff  jmpq   0xfee7
  20:   e8 38 90 35 00  callq  0x35905d
  25:   fb  sti
  26:   8b 44 24 6c mov0x6c(%rsp),%eax
  2a:*  e9 b8 fe ff ff  jmpq   0xfee7   <-- 
trapping instruction
  2f:   8b 54 24 06 mov0x6(%rsp),%edx

Code starting with the faulting instruction
===
   0:   e9 b8 fe ff ff  jmpq   0xfebd
   5:   8b 54 24 06 mov0x6(%rsp),%edx
c 89 d0 e9 ad fe ff ff e8 fe 8c 35 00 e9
RSP: 0018:c900015a7a68 EFLAGS: 0206
RAX: 4ad84e9a RBX: 1920002b4f4e RCX: 97f7
RDX:  RSI: 0001 RDI: 
RBP: 0200 R08: 0001 R09: 8c645737
R10: fbfff18c8ae6 R11: 0001 R12: dc00
R13: 0016f23c724e R14: 888154e24000 R15: 88810c2b2c00
ktime_get (kernel/time/timekeeping.c:290 (discriminator 4) 
kernel/time/timekeeping.c:386 (discriminator 4) kernel/time/timekeeping.c:829 
(discriminator 4))
xprt_lookup_rqst (net/sunrpc/xprt.c:1049) sunrpc
xs_read_stream.constprop.0 (net/sunrpc/xprtsock.c:595 
net/sunrpc/xprtsock.c:646) sunrpc

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-20 Thread Jason Wang


在 2021/4/20 下午8:35, Xuan Zhuo 写道:

I realize this has been merged to net-next already, but I'm getting a
use-after-free with KASAN in page_to_skb() with this patch. Reverting this
change fixes the UAF. I've included the KASAN dump below, and a couple of
comments inline.

I think something went wrong, this was merged before it was acked. Based on the
Jason Wang's comments, there are still some tests that have not been done?

If it has been merged, what should I do now, can I make up the test?



I think so, please test net-next which should contains the fixes from Eric.

Thanks





Thanks.



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-20 Thread Eric Dumazet



On 4/16/21 11:16 AM, Xuan Zhuo wrote:
> In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
> can use build_skb to create skb directly. No need to alloc for
> additional space. And it can save a 'frags slot', which is very friendly
> to GRO.
> 
> Here, if the payload of the received package is too small (less than
> GOOD_COPY_LEN), we still choose to copy it directly to the space got by
> napi_alloc_skb. So we can reuse these pages.
> 
> Testing Machine:
> The four queues of the network card are bound to the cpu1.
> 
> Test command:
> for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& 
> done
> 
> The size of the udp package is 1000, so in the case of this patch, there
> will always be enough tailroom to use build_skb. The sent udp packet
> will be discarded because there is no port to receive it. The irqsoftd
> of the machine is 100%, we observe the received quantity displayed by
> sar -n DEV 1:
> 
> no build_skb:  956864.00 rxpck/s
> build_skb:1158465.00 rxpck/s
> 
> Signed-off-by: Xuan Zhuo 
> Suggested-by: Jason Wang 
> ---
> 
> v3: fix the truesize when headroom > 0
> 
> v2: conflict resolution
> 
>  drivers/net/virtio_net.c | 69 
>  1 file changed, 48 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 101659cd4b87..8cd76037c724 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -379,21 +379,17 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>  struct receive_queue *rq,
>  struct page *page, unsigned int offset,
>  unsigned int len, unsigned int truesize,
> -bool hdr_valid, unsigned int metasize)
> +bool hdr_valid, unsigned int metasize,
> +unsigned int headroom)
>  {
>   struct sk_buff *skb;
>   struct virtio_net_hdr_mrg_rxbuf *hdr;
>   unsigned int copy, hdr_len, hdr_padded_len;
> - char *p;
> + int tailroom, shinfo_size;
> + char *p, *hdr_p;
> 
>   p = page_address(page) + offset;
> -
> - /* copy small packet so we can reuse these pages for small data */
> - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
> - if (unlikely(!skb))
> - return NULL;
> -
> - hdr = skb_vnet_hdr(skb);
> + hdr_p = p;
> 
>   hdr_len = vi->hdr_len;
>   if (vi->mergeable_rx_bufs)
> @@ -401,14 +397,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   else
>   hdr_padded_len = sizeof(struct padded_vnet_hdr);
> 
> - /* hdr_valid means no XDP, so we can copy the vnet header */
> - if (hdr_valid)
> - memcpy(hdr, p, hdr_len);
> + /* If headroom is not 0, there is an offset between the beginning of the
> +  * data and the allocated space, otherwise the data and the allocated
> +  * space are aligned.
> +  */
> + if (headroom) {
> + /* The actual allocated space size is PAGE_SIZE. */
> + truesize = PAGE_SIZE;
> + tailroom = truesize - len - offset;
> + } else {
> + tailroom = truesize - len;
> + }
> 
>   len -= hdr_len;
>   offset += hdr_padded_len;
>   p += hdr_padded_len;
> 
> + shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> + if (len > GOOD_COPY_LEN && tailroom >= shinfo_size) {
> + skb = build_skb(p, truesize);
> + if (unlikely(!skb))
> + return NULL;
> +
> + skb_put(skb, len);
> + goto ok;
> + }
> +
> + /* copy small packet so we can reuse these pages for small data */
> + skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
> + if (unlikely(!skb))
> + return NULL;
> +
>   /* Copy all frame if it fits skb->head, otherwise
>* we let virtio_net_hdr_to_skb() and GRO pull headers as needed.
>*/
> @@ -418,11 +438,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   copy = ETH_HLEN + metasize;
>   skb_put_data(skb, p, copy);
> 
> - if (metasize) {
> - __skb_pull(skb, metasize);
> - skb_metadata_set(skb, metasize);
> - }
> -
>   len -= copy;
>   offset += copy;
> 
> @@ -431,7 +446,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   skb_add_rx_frag(skb, 0, page, offset, len, truesize);
>   else
>   put_page(page);

Here the struct page has been freed..

> - return skb;
> + goto ok;
>   }
> 
>   /*
> @@ -458,6 +473,18 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   if (page)
>   give_pages(rq, page);
> 
> +ok:
> + /* hdr_valid means no XDP, so we can copy the vnet header */
> + if (hdr_v

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-19 Thread Jason Wang


在 2021/4/20 上午12:48, David Ahern 写道:

On 4/16/21 2:16 AM, Xuan Zhuo wrote:

In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.

Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.

Testing Machine:
 The four queues of the network card are bound to the cpu1.

Test command:
 for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& 
done

The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:

no build_skb:  956864.00 rxpck/s
build_skb:1158465.00 rxpck/s


virtio_net is using napi_consume_skb, so napi_build_skb should show a
small increase from build_skb.



Yes and we probably need to do this in receive_small().

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-19 Thread Jason Wang


在 2021/4/20 上午10:38, Jason Wang 写道:

:
+    /* hdr_valid means no XDP, so we can copy the vnet header */
+    if (hdr_valid) {
+    hdr = skb_vnet_hdr(skb);
+    memcpy(hdr, hdr_p, hdr_len);


and hdr_p is dereferenced here.



Right, I tend to recover the way to copy hdr and set meta just after 
alloc/build_skb().


Thanks 



Btw, since the patch modifies a critical path of virtio-net I suggest to 
test the following cases:


1) netperf TCP stream
2) netperf UDP with packet size from 64 to PAGE_SIZE
3) XDP_PASS with 1)
4) XDP_PASS with 2)
5) XDP metadata with 1)
6) XDP metadata with 2)

Thanks

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-19 Thread Jason Wang


在 2021/4/20 上午7:29, Mat Martineau 写道:


On Fri, 16 Apr 2021, Xuan Zhuo wrote:


In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.

Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.

Testing Machine:
   The four queues of the network card are bound to the cpu1.

Test command:
   for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 
150& done


The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:

no build_skb:  956864.00 rxpck/s
build_skb:    1158465.00 rxpck/s

Signed-off-by: Xuan Zhuo 
Suggested-by: Jason Wang 
---

v3: fix the truesize when headroom > 0

v2: conflict resolution

drivers/net/virtio_net.c | 69 
1 file changed, 48 insertions(+), 21 deletions(-)


Xuan,

I realize this has been merged to net-next already, but I'm getting a 
use-after-free with KASAN in page_to_skb() with this patch. Reverting 
this change fixes the UAF. I've included the KASAN dump below, and a 
couple of comments inline.





diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 101659cd4b87..8cd76037c724 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -379,21 +379,17 @@ static struct sk_buff *page_to_skb(struct 
virtnet_info *vi,

   struct receive_queue *rq,
   struct page *page, unsigned int offset,
   unsigned int len, unsigned int truesize,
-   bool hdr_valid, unsigned int metasize)
+   bool hdr_valid, unsigned int metasize,
+   unsigned int headroom)
{
struct sk_buff *skb;
struct virtio_net_hdr_mrg_rxbuf *hdr;
unsigned int copy, hdr_len, hdr_padded_len;
-    char *p;
+    int tailroom, shinfo_size;
+    char *p, *hdr_p;

p = page_address(page) + offset;
-
-    /* copy small packet so we can reuse these pages for small data */
-    skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
-    if (unlikely(!skb))
-    return NULL;
-
-    hdr = skb_vnet_hdr(skb);
+    hdr_p = p;


hdr_p is assigned here, pointer to an address in the provided page...



hdr_len = vi->hdr_len;
if (vi->mergeable_rx_bufs)


(snip)

@@ -431,7 +446,7 @@ static struct sk_buff *page_to_skb(struct 
virtnet_info *vi,

    skb_add_rx_frag(skb, 0, page, offset, len, truesize);
    else
    put_page(page);


page is potentially released here...


-    return skb;
+    goto ok;
}

/*
@@ -458,6 +473,18 @@ static struct sk_buff *page_to_skb(struct 
virtnet_info *vi,

if (page)
    give_pages(rq, page);

+ok:
+    /* hdr_valid means no XDP, so we can copy the vnet header */
+    if (hdr_valid) {
+    hdr = skb_vnet_hdr(skb);
+    memcpy(hdr, hdr_p, hdr_len);


and hdr_p is dereferenced here.



Right, I tend to recover the way to copy hdr and set meta just after 
alloc/build_skb().


Thanks




I'm seeing this KASAN UAF at boot time in a kvm VM (Fedora 33 host and 
guest, if that helps):


[   61.202483] 
==

[   61.204005] BUG: KASAN: use-after-free in page_to_skb+0x32a/0x4b0
[   61.205387] Read of size 12 at addr 888105bdf800 by task 
NetworkManager/579
[   61.207035] [   61.207408] CPU: 0 PID: 579 Comm: NetworkManager Not 
tainted 5.12.0-rc7+ #2
[   61.208715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS 1.14.0-1.fc33 04/01/2014

[   61.210257] Call Trace:
[   61.210730]  
[   61.211209]  dump_stack+0x93/0xc2
[   61.211996]  print_address_description.constprop.0+0x18/0x130
[   61.213310]  ? page_to_skb+0x32a/0x4b0
[   61.214318]  ? page_to_skb+0x32a/0x4b0
[   61.215085]  kasan_report.cold+0x7f/0x111
[   61.215966]  ? trace_hardirqs_off+0x10/0xe0
[   61.216823]  ? page_to_skb+0x32a/0x4b0
[   61.217809]  kasan_check_range+0xf9/0x1e0
[   61.217834]  memcpy+0x20/0x60
[   61.217848]  page_to_skb+0x32a/0x4b0
[   61.217888]  receive_buf+0x1434/0x2690
[   61.217926]  ? page_to_skb+0x4b0/0x4b0
[   61.217947]  ? find_held_lock+0x85/0xa0
[   61.217964]  ? lock_release+0x1d0/0x400
[   61.217974]  ? virtnet_poll+0x1d8/0x6b0
[   61.217983]  ? detach_buf_split+0x254/0x290
[   61.218008]  ? virtqueue_get_buf_ctx_split+0x145/0x1f0
[   61.218032]  virtnet_poll+0x2a8/0x6b0
[   61.218057]  ? receive_buf+0x2690/0x2690
[   61.218067]  ? lock_release+0x400/0x400
[   61.218119]  __napi_poll+0x57/0x2f0
[   61.229624]  net_rx_action+0x4dd/0x590
[   61.230453]  ? napi_threaded_poll+0x2b0/0x2b0
[   61

Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-19 Thread David Ahern
On 4/16/21 2:16 AM, Xuan Zhuo wrote:
> In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
> can use build_skb to create skb directly. No need to alloc for
> additional space. And it can save a 'frags slot', which is very friendly
> to GRO.
> 
> Here, if the payload of the received package is too small (less than
> GOOD_COPY_LEN), we still choose to copy it directly to the space got by
> napi_alloc_skb. So we can reuse these pages.
> 
> Testing Machine:
> The four queues of the network card are bound to the cpu1.
> 
> Test command:
> for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& 
> done
> 
> The size of the udp package is 1000, so in the case of this patch, there
> will always be enough tailroom to use build_skb. The sent udp packet
> will be discarded because there is no port to receive it. The irqsoftd
> of the machine is 100%, we observe the received quantity displayed by
> sar -n DEV 1:
> 
> no build_skb:  956864.00 rxpck/s
> build_skb:1158465.00 rxpck/s
> 

virtio_net is using napi_consume_skb, so napi_build_skb should show a
small increase from build_skb.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3] virtio-net: page_to_skb() use build_skb when there's sufficient tailroom

2021-04-18 Thread Jason Wang


在 2021/4/16 下午5:16, Xuan Zhuo 写道:

In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
can use build_skb to create skb directly. No need to alloc for
additional space. And it can save a 'frags slot', which is very friendly
to GRO.

Here, if the payload of the received package is too small (less than
GOOD_COPY_LEN), we still choose to copy it directly to the space got by
napi_alloc_skb. So we can reuse these pages.

Testing Machine:
 The four queues of the network card are bound to the cpu1.

Test command:
 for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& 
done

The size of the udp package is 1000, so in the case of this patch, there
will always be enough tailroom to use build_skb. The sent udp packet
will be discarded because there is no port to receive it. The irqsoftd
of the machine is 100%, we observe the received quantity displayed by
sar -n DEV 1:

no build_skb:  956864.00 rxpck/s
build_skb:1158465.00 rxpck/s



I suggess to test the case of XDP_PASS in this case as well.




Signed-off-by: Xuan Zhuo 
Suggested-by: Jason Wang 
---

v3: fix the truesize when headroom > 0

v2: conflict resolution

  drivers/net/virtio_net.c | 69 
  1 file changed, 48 insertions(+), 21 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 101659cd4b87..8cd76037c724 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -379,21 +379,17 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
   struct receive_queue *rq,
   struct page *page, unsigned int offset,
   unsigned int len, unsigned int truesize,
-  bool hdr_valid, unsigned int metasize)
+  bool hdr_valid, unsigned int metasize,
+  unsigned int headroom)
  {
struct sk_buff *skb;
struct virtio_net_hdr_mrg_rxbuf *hdr;
unsigned int copy, hdr_len, hdr_padded_len;
-   char *p;
+   int tailroom, shinfo_size;
+   char *p, *hdr_p;

p = page_address(page) + offset;
-
-   /* copy small packet so we can reuse these pages for small data */
-   skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
-   if (unlikely(!skb))
-   return NULL;
-
-   hdr = skb_vnet_hdr(skb);
+   hdr_p = p;

hdr_len = vi->hdr_len;
if (vi->mergeable_rx_bufs)
@@ -401,14 +397,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
*vi,
else
hdr_padded_len = sizeof(struct padded_vnet_hdr);

-   /* hdr_valid means no XDP, so we can copy the vnet header */
-   if (hdr_valid)
-   memcpy(hdr, p, hdr_len);
+   /* If headroom is not 0, there is an offset between the beginning of the
+* data and the allocated space, otherwise the data and the allocated
+* space are aligned.
+*/
+   if (headroom) {
+   /* The actual allocated space size is PAGE_SIZE. */



I think the comment in receive_mergeable() is better:

    /* Buffers with headroom use PAGE_SIZE as alloc size,
 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
 */



+   truesize = PAGE_SIZE;
+   tailroom = truesize - len - offset;
+   } else {
+   tailroom = truesize - len;
+   }

len -= hdr_len;
offset += hdr_padded_len;
p += hdr_padded_len;

+   shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+   if (len > GOOD_COPY_LEN && tailroom >= shinfo_size) {



Any reason that you don't use build_skb() for len < GOOD_COPY_LEN?

Other looks good.

Thanks



+   skb = build_skb(p, truesize);
+   if (unlikely(!skb))
+   return NULL;
+
+   skb_put(skb, len);
+   goto ok;
+   }
+
+   /* copy small packet so we can reuse these pages for small data */
+   skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN);
+   if (unlikely(!skb))
+   return NULL;
+
/* Copy all frame if it fits skb->head, otherwise
 * we let virtio_net_hdr_to_skb() and GRO pull headers as needed.
 */
@@ -418,11 +438,6 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
copy = ETH_HLEN + metasize;
skb_put_data(skb, p, copy);

-   if (metasize) {
-   __skb_pull(skb, metasize);
-   skb_metadata_set(skb, metasize);
-   }
-
len -= copy;
offset += copy;

@@ -431,7 +446,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
skb_add_rx_frag(skb, 0, page, offset, len, truesize);
else
put_page(page);
-   return skb;
+   goto ok;
}

/*
@@ -458,6 +473,1