Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-05 Thread Liang Chen
On Wed, Jul 5, 2023 at 2:05 PM Jason Wang  wrote:
>
> On Wed, Jul 5, 2023 at 1:41 PM Liang Chen  wrote:
> >
> > On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  
> > wrote:
> > >
> > > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> > > >
> > > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > > > >  wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > > > The implementation at the moment uses one page per packet 
> > > > > > > > > > > in both the
> > > > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > > > parameter to enable
> > > > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > > > >
> > > > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > > > performance gain
> > > > > > > > > > > in the normal path.
> > > > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > > > >
> > > > > > > > > > > In multi-core vm testing environments, The most 
> > > > > > > > > > > significant performance
> > > > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > > > >
> > > > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > > > fragmentation and
> > > > > > > > > > > DMA map/unmap support.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > > > >
> > > > > > > > > > Why off by default?
> > > > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > > > The less modes we have the better...
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Sure, now I believe it makes sense to enable it by default. 
> > > > > > > > > When the
> > > > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > > > coalescing. But such cases are rare.
> > > > > > > >
> > > > > > > > small packets are rare? These workloads are easy to create 
> > > > > > > > actually.
> > > > > > > > Pls try and include benchmark with small packet size.
> > > > > > > >
> > > > > > >
> > > > > > > Sure, Thanks!
> > > > > >
> > > > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > > > advice for the cases of small packets. I have done more performance
> > > > > > benchmark with small packets since then. Here is a list of iperf
> > > > > > output,
> > > > > >
> > > > > > With PP and PP fragmenting:
> > > > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0   
> > > > > >  144 KBytes
> > > > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > > > 223 KBytes
> > > > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > > > 324 KBytes
> > > > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > > > 1.08 MBytes
> > > > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > > > 744 KBytes
> > > > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0  
> > > > > >   963 KBytes
> > > > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0  
> > > > > >  1.25 MBytes
> > > > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0  
> > > > > >  1.70 MBytes
> > > > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > > > 4.26 MBytes
> > > > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > > > 3.20 MBytes
> > > >
> > > > Note that virtio-net driver is lacking things like BQL and others, so
> > > > it might suffer from buffer bloat for TCP performance. Would you mind
> > > > to measure with e.g using testpmd on the vhost to see the rx PPS?
> > > >
> > >
> > > No problem. Before we proceed to measure with testpmd, could you
> > > please take a look at the PPS measurements we obtained previously and
> > > see if they are sufficient? Though we will only utilize page pool for
> > > xdp on v2.
> > >
> > > netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
> > >
> > > with page pool:
> > > 1.
> > > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > > rxcmp/s   txcmp/s  

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-05 Thread Jason Wang
On Wed, Jul 5, 2023 at 1:41 PM Liang Chen  wrote:
>
> On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  wrote:
> >
> > On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> > >
> > > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> > > >
> > > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen 
> > > > >  wrote:
> > > > > >
> > > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > > >  wrote:
> > > > > > > > >
> > > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > > The implementation at the moment uses one page per packet 
> > > > > > > > > > in both the
> > > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > > parameter to enable
> > > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > > >
> > > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > > performance gain
> > > > > > > > > > in the normal path.
> > > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > > > performance
> > > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > > >
> > > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > > fragmentation and
> > > > > > > > > > DMA map/unmap support.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > > >
> > > > > > > > > Why off by default?
> > > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > > The less modes we have the better...
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > Sure, now I believe it makes sense to enable it by default. 
> > > > > > > > When the
> > > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > > coalescing. But such cases are rare.
> > > > > > >
> > > > > > > small packets are rare? These workloads are easy to create 
> > > > > > > actually.
> > > > > > > Pls try and include benchmark with small packet size.
> > > > > > >
> > > > > >
> > > > > > Sure, Thanks!
> > > > >
> > > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > > advice for the cases of small packets. I have done more performance
> > > > > benchmark with small packets since then. Here is a list of iperf
> > > > > output,
> > > > >
> > > > > With PP and PP fragmenting:
> > > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0
> > > > > 144 KBytes
> > > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > > 223 KBytes
> > > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > > 324 KBytes
> > > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > > 1.08 MBytes
> > > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > > 744 KBytes
> > > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0
> > > > > 963 KBytes
> > > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   
> > > > > 1.25 MBytes
> > > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   
> > > > > 1.70 MBytes
> > > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > > 4.26 MBytes
> > > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > > 3.20 MBytes
> > >
> > > Note that virtio-net driver is lacking things like BQL and others, so
> > > it might suffer from buffer bloat for TCP performance. Would you mind
> > > to measure with e.g using testpmd on the vhost to see the rx PPS?
> > >
> >
> > No problem. Before we proceed to measure with testpmd, could you
> > please take a look at the PPS measurements we obtained previously and
> > see if they are sufficient? Though we will only utilize page pool for
> > xdp on v2.
> >
> > netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
> >
> > with page pool:
> > 1.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 655092.27  0.35  27508.77  0.03
> > 0.00  0.00  0.00  0.00
> > 2.
> > Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> > rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> > Average:   enp8s0 654749.87  0.63  

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-07-04 Thread Liang Chen
On Fri, Jun 9, 2023 at 10:57 AM Liang Chen  wrote:
>
> On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
> >
> > On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> > >
> > > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > > > wrote:
> > > > >
> > > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > > The implementation at the moment uses one page per packet in 
> > > > > > > > > both the
> > > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > > parameter to enable
> > > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > > >
> > > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > > performance gain
> > > > > > > > > in the normal path.
> > > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > > >
> > > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > > performance
> > > > > > > > > gain is observed in XDP cpumap:
> > > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > > >
> > > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > > fragmentation and
> > > > > > > > > DMA map/unmap support.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Liang Chen 
> > > > > > > >
> > > > > > > > Why off by default?
> > > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > > >
> > > > > > > >
> > > > > > > > What happens if we use page pool for big mode too?
> > > > > > > > The less modes we have the better...
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Sure, now I believe it makes sense to enable it by default. When 
> > > > > > > the
> > > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > > coalescing. But such cases are rare.
> > > > > >
> > > > > > small packets are rare? These workloads are easy to create actually.
> > > > > > Pls try and include benchmark with small packet size.
> > > > > >
> > > > >
> > > > > Sure, Thanks!
> > > >
> > > > Before going ahead and posting v2 patch, I would like to hear more
> > > > advice for the cases of small packets. I have done more performance
> > > > benchmark with small packets since then. Here is a list of iperf
> > > > output,
> > > >
> > > > With PP and PP fragmenting:
> > > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0
> > > > 144 KBytes
> > > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > > 223 KBytes
> > > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > > 324 KBytes
> > > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > > 1.08 MBytes
> > > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > > 744 KBytes
> > > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0
> > > > 963 KBytes
> > > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   
> > > > 1.25 MBytes
> > > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   
> > > > 1.70 MBytes
> > > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   
> > > > 4.26 MBytes
> > > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   
> > > > 3.20 MBytes
> >
> > Note that virtio-net driver is lacking things like BQL and others, so
> > it might suffer from buffer bloat for TCP performance. Would you mind
> > to measure with e.g using testpmd on the vhost to see the rx PPS?
> >
>
> No problem. Before we proceed to measure with testpmd, could you
> please take a look at the PPS measurements we obtained previously and
> see if they are sufficient? Though we will only utilize page pool for
> xdp on v2.
>
> netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))
>
> with page pool:
> 1.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 655092.27  0.35  27508.77  0.03
> 0.00  0.00  0.00  0.00
> 2.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 654749.87  0.63  27494.42  0.05
> 0.00  0.00  0.00  0.00
> 3.
> Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
> rxcmp/s   txcmp/s  rxmcst/s   %ifutil
> Average:   enp8s0 654230.40  0.10  27472.57  0.01
> 0.00  0.00  0.00  0.00
> 4.
> Average:

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-08 Thread Liang Chen
On Thu, Jun 8, 2023 at 8:38 AM Jason Wang  wrote:
>
> On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > The implementation at the moment uses one page per packet in 
> > > > > > > > both the
> > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > parameter to enable
> > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > >
> > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > performance gain
> > > > > > > > in the normal path.
> > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > >
> > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > performance
> > > > > > > > gain is observed in XDP cpumap:
> > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > >
> > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > fragmentation and
> > > > > > > > DMA map/unmap support.
> > > > > > > >
> > > > > > > > Signed-off-by: Liang Chen 
> > > > > > >
> > > > > > > Why off by default?
> > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > >
> > > > > > >
> > > > > > > What happens if we use page pool for big mode too?
> > > > > > > The less modes we have the better...
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > coalescing. But such cases are rare.
> > > > >
> > > > > small packets are rare? These workloads are easy to create actually.
> > > > > Pls try and include benchmark with small packet size.
> > > > >
> > > >
> > > > Sure, Thanks!
> > >
> > > Before going ahead and posting v2 patch, I would like to hear more
> > > advice for the cases of small packets. I have done more performance
> > > benchmark with small packets since then. Here is a list of iperf
> > > output,
> > >
> > > With PP and PP fragmenting:
> > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > > KBytes
> > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > 223 KBytes
> > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > 324 KBytes
> > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > 1.08 MBytes
> > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > 744 KBytes
> > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > > KBytes
> > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > > MBytes
> > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > > MBytes
> > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > > MBytes
> > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > > MBytes
>
> Note that virtio-net driver is lacking things like BQL and others, so
> it might suffer from buffer bloat for TCP performance. Would you mind
> to measure with e.g using testpmd on the vhost to see the rx PPS?
>

No problem. Before we proceed to measure with testpmd, could you
please take a look at the PPS measurements we obtained previously and
see if they are sufficient? Though we will only utilize page pool for
xdp on v2.

netperf -H 192.168.124.197 -p  -t UDP_STREAM -l 0 -- -m $((1))

with page pool:
1.
Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:   enp8s0 655092.27  0.35  27508.77  0.03
0.00  0.00  0.00  0.00
2.
Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:   enp8s0 654749.87  0.63  27494.42  0.05
0.00  0.00  0.00  0.00
3.
Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:   enp8s0 654230.40  0.10  27472.57  0.01
0.00  0.00  0.00  0.00
4.
Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
rxcmp/s   txcmp/s  rxmcst/s   %ifutil
Average:   enp8s0 656661.33  0.15  27574.65  0.01
0.00  0.00  0.00  0.00


without page pool:
1.
Average:IFACE   rxpck/s   txpck/srxkB/stxkB/s
rxcmp/s   txcmp/s  rxmcst/s   

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-08 Thread Liang Chen
On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > wrote:
> > >
> > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path. In addition, introducing a module parameter 
> > > > > > > to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > >
> > > > > > Why off by default?
> > > > > > I am guessing it sometimes has performance costs too?
> > > > > >
> > > > > >
> > > > > > What happens if we use page pool for big mode too?
> > > > > > The less modes we have the better...
> > > > > >
> > > > > >
> > > > >
> > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > packet size is very small, it reduces the likelihood of skb
> > > > > coalescing. But such cases are rare.
> > > >
> > > > small packets are rare? These workloads are easy to create actually.
> > > > Pls try and include benchmark with small packet size.
> > > >
> > >
> > > Sure, Thanks!
> >
> > Before going ahead and posting v2 patch, I would like to hear more
> > advice for the cases of small packets. I have done more performance
> > benchmark with small packets since then. Here is a list of iperf
> > output,
> >
> > With PP and PP fragmenting:
> > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > KBytes
> > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > 223 KBytes
> > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > 324 KBytes
> > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > 1.08 MBytes
> > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > 744 KBytes
> > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > KBytes
> > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > MBytes
> > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > MBytes
> > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > MBytes
> > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > MBytes
> >
> > Without PP:
> > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > KBytes
> > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > KBytes
> > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > MBytes
> > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > MBytes
> > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > MBytes
> > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > MBytes
> > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > MBytes
> > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > MBytes
> > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > MBytes
> > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > MBytes
> >
> >
> > The major factor contributing to the performance drop is the reduction
> > of skb coalescing. Additionally, without the page pool, small packets
> > can still benefit from the allocation of 8 continuous pages by
> > breaking them down into smaller pieces. This effectively reduces the
> > frequency of page allocation from the buddy system. For instance, the
> > arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> > the benefits of using a page pool are limited in such cases. In fact,
> > without page pool fragmenting enabled, it can even hinder performance
> > from this perspective.
> >
> > Upon further consideration, I tend to believe making page pool the
> > default option may not be appropriate. As 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Thu, 8 Jun 2023 08:38:14 +0800, Jason Wang  wrote:
> On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
> >
> > On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin 
> > > > > >  wrote:
> > > > > > >
> > > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > > The implementation at the moment uses one page per packet in 
> > > > > > > > both the
> > > > > > > > normal and XDP path. In addition, introducing a module 
> > > > > > > > parameter to enable
> > > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > > >
> > > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > > performance gain
> > > > > > > > in the normal path.
> > > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > > >
> > > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > > performance
> > > > > > > > gain is observed in XDP cpumap:
> > > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > > >
> > > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > > fragmentation and
> > > > > > > > DMA map/unmap support.
> > > > > > > >
> > > > > > > > Signed-off-by: Liang Chen 
> > > > > > >
> > > > > > > Why off by default?
> > > > > > > I am guessing it sometimes has performance costs too?
> > > > > > >
> > > > > > >
> > > > > > > What happens if we use page pool for big mode too?
> > > > > > > The less modes we have the better...
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > > packet size is very small, it reduces the likelihood of skb
> > > > > > coalescing. But such cases are rare.
> > > > >
> > > > > small packets are rare? These workloads are easy to create actually.
> > > > > Pls try and include benchmark with small packet size.
> > > > >
> > > >
> > > > Sure, Thanks!
> > >
> > > Before going ahead and posting v2 patch, I would like to hear more
> > > advice for the cases of small packets. I have done more performance
> > > benchmark with small packets since then. Here is a list of iperf
> > > output,
> > >
> > > With PP and PP fragmenting:
> > > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > > KBytes
> > > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > > 223 KBytes
> > > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > > 324 KBytes
> > > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > > 1.08 MBytes
> > > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > > 744 KBytes
> > > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > > KBytes
> > > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > > MBytes
> > > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > > MBytes
> > > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > > MBytes
> > > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > > MBytes
>
> Note that virtio-net driver is lacking things like BQL and others, so
> it might suffer from buffer bloat for TCP performance. Would you mind
> to measure with e.g using testpmd on the vhost to see the rx PPS?
>
> > >
> > > Without PP:
> > > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > > KBytes
> > > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > > KBytes
> > > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > > MBytes
> > > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > > MBytes
> > > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > > MBytes
> > > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > > MBytes
> > > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > > MBytes
> > > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > > MBytes
> > > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > > MBytes
> > > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > > MBytes
> > >
> > >
> > > The major factor contributing to the performance drop is the reduction
> > > of skb coalescing. Additionally, without the page pool, small packets
> > > can still benefit from the allocation of 8 continuous pages by
> > > breaking them down 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Jason Wang
On Thu, Jun 8, 2023 at 4:17 AM Michael S. Tsirkin  wrote:
>
> On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > wrote:
> > >
> > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path. In addition, introducing a module parameter 
> > > > > > > to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > >
> > > > > > Why off by default?
> > > > > > I am guessing it sometimes has performance costs too?
> > > > > >
> > > > > >
> > > > > > What happens if we use page pool for big mode too?
> > > > > > The less modes we have the better...
> > > > > >
> > > > > >
> > > > >
> > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > packet size is very small, it reduces the likelihood of skb
> > > > > coalescing. But such cases are rare.
> > > >
> > > > small packets are rare? These workloads are easy to create actually.
> > > > Pls try and include benchmark with small packet size.
> > > >
> > >
> > > Sure, Thanks!
> >
> > Before going ahead and posting v2 patch, I would like to hear more
> > advice for the cases of small packets. I have done more performance
> > benchmark with small packets since then. Here is a list of iperf
> > output,
> >
> > With PP and PP fragmenting:
> > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > KBytes
> > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > 223 KBytes
> > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > 324 KBytes
> > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > 1.08 MBytes
> > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > 744 KBytes
> > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > KBytes
> > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > MBytes
> > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > MBytes
> > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > MBytes
> > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > MBytes

Note that virtio-net driver is lacking things like BQL and others, so
it might suffer from buffer bloat for TCP performance. Would you mind
to measure with e.g using testpmd on the vhost to see the rx PPS?

> >
> > Without PP:
> > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > KBytes
> > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > KBytes
> > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > MBytes
> > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > MBytes
> > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > MBytes
> > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > MBytes
> > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > MBytes
> > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > MBytes
> > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > MBytes
> > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > MBytes
> >
> >
> > The major factor contributing to the performance drop is the reduction
> > of skb coalescing. Additionally, without the page pool, small packets
> > can still benefit from the allocation of 8 continuous pages by
> > breaking them down into smaller pieces. This effectively reduces the
> > frequency of page allocation from the buddy system. For instance, the
> > arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> > the benefits of using a page pool are limited in such cases.

I wonder if we can 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Michael S. Tsirkin
On Wed, Jun 07, 2023 at 05:08:59PM +0800, Liang Chen wrote:
> On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
> >
> > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > > enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > >
> > > > > Why off by default?
> > > > > I am guessing it sometimes has performance costs too?
> > > > >
> > > > >
> > > > > What happens if we use page pool for big mode too?
> > > > > The less modes we have the better...
> > > > >
> > > > >
> > > >
> > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > packet size is very small, it reduces the likelihood of skb
> > > > coalescing. But such cases are rare.
> > >
> > > small packets are rare? These workloads are easy to create actually.
> > > Pls try and include benchmark with small packet size.
> > >
> >
> > Sure, Thanks!
> 
> Before going ahead and posting v2 patch, I would like to hear more
> advice for the cases of small packets. I have done more performance
> benchmark with small packets since then. Here is a list of iperf
> output,
> 
> With PP and PP fragmenting:
> 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> KBytes
> 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> 223 KBytes
> 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> 324 KBytes
> 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> 1.08 MBytes
> 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> 744 KBytes
> 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> KBytes
> 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> MBytes
> 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> MBytes
> 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> MBytes
> 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> MBytes
> 
> Without PP:
> 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> KBytes
> 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> KBytes
> 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> MBytes
> 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> MBytes
> 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> MBytes
> 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> MBytes
> 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> MBytes
> 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> MBytes
> 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
> 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes
> 
> 
> The major factor contributing to the performance drop is the reduction
> of skb coalescing. Additionally, without the page pool, small packets
> can still benefit from the allocation of 8 continuous pages by
> breaking them down into smaller pieces. This effectively reduces the
> frequency of page allocation from the buddy system. For instance, the
> arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> the benefits of using a page pool are limited in such cases. In fact,
> without page pool fragmenting enabled, it can even hinder performance
> from this perspective.
> 
> Upon further consideration, I tend to believe making page pool the
> default option may not be appropriate. As you pointed out, we cannot
> simply ignore the performance impact on small packets. Any comments on
> this will be much appreciated.
> 
> 
> Thanks,
> Liang


So, let's only use page pool for XDP then?

> 
> > > > The usage of page pool for big mode is being evaluated now. Thanks!
> > > >
> > > > > > 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Wed, Jun 7, 2023 at 5:36 PM Xuan Zhuo  wrote:
>
> On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen  
> wrote:
> > On Tue, May 30, 2023 at 9:19 AM Liang Chen  
> > wrote:
> > >
> > > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path. In addition, introducing a module parameter 
> > > > > > > to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > >
> > > > > > Why off by default?
> > > > > > I am guessing it sometimes has performance costs too?
> > > > > >
> > > > > >
> > > > > > What happens if we use page pool for big mode too?
> > > > > > The less modes we have the better...
> > > > > >
> > > > > >
> > > > >
> > > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > > packet size is very small, it reduces the likelihood of skb
> > > > > coalescing. But such cases are rare.
> > > >
> > > > small packets are rare? These workloads are easy to create actually.
> > > > Pls try and include benchmark with small packet size.
> > > >
> > >
> > > Sure, Thanks!
> >
> > Before going ahead and posting v2 patch, I would like to hear more
> > advice for the cases of small packets. I have done more performance
> > benchmark with small packets since then. Here is a list of iperf
> > output,
>
> Could you show the commnad line?
>
> Thanks
>
>

Sure.   iperf3 -c  -i 5 -f g -t 0 -l 

> >
> > With PP and PP fragmenting:
> > 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> > KBytes
> > 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> > 223 KBytes
> > 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> > 324 KBytes
> > 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> > 1.08 MBytes
> > 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> > 744 KBytes
> > 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> > KBytes
> > 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> > MBytes
> > 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> > MBytes
> > 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> > MBytes
> > 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> > MBytes
> >
> > Without PP:
> > 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> > KBytes
> > 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> > KBytes
> > 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> > MBytes
> > 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> > MBytes
> > 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> > MBytes
> > 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> > MBytes
> > 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> > MBytes
> > 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> > MBytes
> > 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 
> > MBytes
> > 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 
> > MBytes
> >
> >
> > The major factor contributing to the performance drop is the reduction
> > of skb coalescing. Additionally, without the page pool, small packets
> > can still benefit from the allocation of 8 continuous pages by
> > breaking them down into smaller pieces. This effectively reduces the
> > frequency of page allocation from the buddy system. For instance, the
> > arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> > the benefits of using a page pool are limited in such cases. In fact,
> > without page pool fragmenting enabled, it can even hinder performance
> > from this perspective.
> >
> > Upon further consideration, I 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Wed, 7 Jun 2023 17:08:59 +0800, Liang Chen  wrote:
> On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
> >
> > On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> > >
> > > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > > wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > > enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > >
> > > > > Why off by default?
> > > > > I am guessing it sometimes has performance costs too?
> > > > >
> > > > >
> > > > > What happens if we use page pool for big mode too?
> > > > > The less modes we have the better...
> > > > >
> > > > >
> > > >
> > > > Sure, now I believe it makes sense to enable it by default. When the
> > > > packet size is very small, it reduces the likelihood of skb
> > > > coalescing. But such cases are rare.
> > >
> > > small packets are rare? These workloads are easy to create actually.
> > > Pls try and include benchmark with small packet size.
> > >
> >
> > Sure, Thanks!
>
> Before going ahead and posting v2 patch, I would like to hear more
> advice for the cases of small packets. I have done more performance
> benchmark with small packets since then. Here is a list of iperf
> output,

Could you show the commnad line?

Thanks


>
> With PP and PP fragmenting:
> 256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 
> KBytes
> 1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
> 223 KBytes
> 2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
> 324 KBytes
> 4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
> 1.08 MBytes
> 8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
> 744 KBytes
> 16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 
> KBytes
> 32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 
> MBytes
> 64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 
> MBytes
> 128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 
> MBytes
> 256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 
> MBytes
>
> Without PP:
> 256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 
> KBytes
> 1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 
> KBytes
> 2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 
> MBytes
> 4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 
> MBytes
> 8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 
> MBytes
> 16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 
> MBytes
> 32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 
> MBytes
> 64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 
> MBytes
> 128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
> 256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes
>
>
> The major factor contributing to the performance drop is the reduction
> of skb coalescing. Additionally, without the page pool, small packets
> can still benefit from the allocation of 8 continuous pages by
> breaking them down into smaller pieces. This effectively reduces the
> frequency of page allocation from the buddy system. For instance, the
> arrival of 32 1K packets only triggers one alloc_page call. Therefore,
> the benefits of using a page pool are limited in such cases. In fact,
> without page pool fragmenting enabled, it can even hinder performance
> from this perspective.
>
> Upon further consideration, I tend to believe making page pool the
> default option may not be appropriate. As you pointed out, we cannot
> simply ignore the performance impact on small packets. Any comments on
> this will be much appreciated.
>
>
> Thanks,
> Liang
>
>
> > > > The usage of page pool for big mode is being evaluated now. Thanks!
> > > >
> > > > > > ---
> > > > > 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Xuan Zhuo
On Wed, 7 Jun 2023 17:11:44 +0800, Liang Chen  wrote:
> On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo  wrote:
> >
> > On Mon, 29 May 2023 15:28:17 +0800, Liang Chen  
> > wrote:
> > > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang  
> > > > > wrote:
> > > > > >
> > > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen 
> > > > > >  wrote:
> > > > > > >
> > > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > > the
> > > > > > > normal and XDP path.
> > > > > >
> > > > > > It's better to explain why we need a page pool and how it can help 
> > > > > > the
> > > > > > performance.
> > > > > >
> > > > >
> > > > > Sure, I will include that on v2.
> > > > > > > In addition, introducing a module parameter to enable
> > > > > > > or disable the usage of page pool (disabled by default).
> > > > > >
> > > > > > If page pool wins for most of the cases, any reason to disable it 
> > > > > > by default?
> > > > > >
> > > > >
> > > > > Thank you for raising the point. It does make sense to enable it by 
> > > > > default.
> > > >
> > > > I'd like to see more benchmarks pls then, with a variety of packet
> > > > sizes, udp and tcp.
> > > >
> > >
> > > Sure, more benchmarks will be provided. Thanks.
> >
> >
> > I think so.
> >
> > I did this, but I did not found any improve. So I gave up it.
> >
> > Thanks.
> >
> >
>
> Our UDP benchmark shows a steady 0.8 percent change in PPS
> measurement. However, when conducting iperf TCP stream performance
> testing, the results vary depending on the packet size and testing
> setup. With small packet sizes, the performance actually drops
> slightly due to the reasons I explained in the previous email. On the
> other hand, with large packets, we need to ensure that the sender side
> doesn't become the bottleneck. To achieve this, our setup uses a
> single-core vm to keep the receiver busy, which allows us to identify
> performance differences in the receiving path.

Could you show some numbers?

Thanks.


>
>
> Thanks,
> Liang
>
>
>
>
> > >
> > >
> > > > > > >
> > > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > > performance gain
> > > > > > > in the normal path.
> > > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > > >
> > > > > > > In multi-core vm testing environments, The most significant 
> > > > > > > performance
> > > > > > > gain is observed in XDP cpumap:
> > > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > > >
> > > > > > Please show more details on the test. E.g which kinds of tests have
> > > > > > you measured?
> > > > > >
> > > > > > Btw, it would be better to measure PPS as well.
> > > > > >
> > > > >
> > > > > Sure. It will be added on v2.
> > > > > > >
> > > > > > > With this foundation, we can further integrate page pool 
> > > > > > > fragmentation and
> > > > > > > DMA map/unmap support.
> > > > > > >
> > > > > > > Signed-off-by: Liang Chen 
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 188 
> > > > > > > ++-
> > > > > >
> > > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > > > > the ifdef tricks at least.
> > > > > >
> > > > >
> > > > > Sure. it will be done on v2.
> > > > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > > > > >  module_param(gso, bool, 0444);
> > > > > > >  module_param(napi_tx, bool, 0644);
> > > > > > >
> > > > > > > +static bool page_pool_enabled;
> > > > > > > +module_param(page_pool_enabled, bool, 0400);
> > > > > > > +
> > > > > > >  /* FIXME: MTU in config. */
> > > > > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > > > > >  #define GOOD_COPY_LEN  128
> > > > > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > > > > /* Chain pages by the private ptr. */
> > > > > > > struct page *pages;
> > > > > > >
> > > > > > > +   /* Page pool */
> > > > > > > +   struct page_pool *page_pool;
> > > > > > > +
> > > > > > > /* Average packet length for mergeable receive buffers. */
> > > > > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > > > > >
> > > > > > > @@ -459,6 +465,14 @@ static struct sk_buff 
> > > > > > > *virtnet_build_skb(void *buf, unsigned int buflen,
> > > > > > > return skb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct 
> > > > > > > page *page)
> > 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Wed, May 31, 2023 at 11:12 AM Xuan Zhuo  wrote:
>
> On Mon, 29 May 2023 15:28:17 +0800, Liang Chen  
> wrote:
> > On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  wrote:
> > >
> > > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > > > On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
> > > > >
> > > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen 
> > > > >  wrote:
> > > > > >
> > > > > > The implementation at the moment uses one page per packet in both 
> > > > > > the
> > > > > > normal and XDP path.
> > > > >
> > > > > It's better to explain why we need a page pool and how it can help the
> > > > > performance.
> > > > >
> > > >
> > > > Sure, I will include that on v2.
> > > > > > In addition, introducing a module parameter to enable
> > > > > > or disable the usage of page pool (disabled by default).
> > > > >
> > > > > If page pool wins for most of the cases, any reason to disable it by 
> > > > > default?
> > > > >
> > > >
> > > > Thank you for raising the point. It does make sense to enable it by 
> > > > default.
> > >
> > > I'd like to see more benchmarks pls then, with a variety of packet
> > > sizes, udp and tcp.
> > >
> >
> > Sure, more benchmarks will be provided. Thanks.
>
>
> I think so.
>
> I did this, but I did not found any improve. So I gave up it.
>
> Thanks.
>
>

Our UDP benchmark shows a steady 0.8 percent change in PPS
measurement. However, when conducting iperf TCP stream performance
testing, the results vary depending on the packet size and testing
setup. With small packet sizes, the performance actually drops
slightly due to the reasons I explained in the previous email. On the
other hand, with large packets, we need to ensure that the sender side
doesn't become the bottleneck. To achieve this, our setup uses a
single-core vm to keep the receiver busy, which allows us to identify
performance differences in the receiving path.


Thanks,
Liang




> >
> >
> > > > > >
> > > > > > In single-core vm testing environments, it gives a modest 
> > > > > > performance gain
> > > > > > in the normal path.
> > > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > > >
> > > > > > In multi-core vm testing environments, The most significant 
> > > > > > performance
> > > > > > gain is observed in XDP cpumap:
> > > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > >
> > > > > Please show more details on the test. E.g which kinds of tests have
> > > > > you measured?
> > > > >
> > > > > Btw, it would be better to measure PPS as well.
> > > > >
> > > >
> > > > Sure. It will be added on v2.
> > > > > >
> > > > > > With this foundation, we can further integrate page pool 
> > > > > > fragmentation and
> > > > > > DMA map/unmap support.
> > > > > >
> > > > > > Signed-off-by: Liang Chen 
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 188 
> > > > > > ++-
> > > > >
> > > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > > > the ifdef tricks at least.
> > > > >
> > > >
> > > > Sure. it will be done on v2.
> > > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > > > >  module_param(gso, bool, 0444);
> > > > > >  module_param(napi_tx, bool, 0644);
> > > > > >
> > > > > > +static bool page_pool_enabled;
> > > > > > +module_param(page_pool_enabled, bool, 0400);
> > > > > > +
> > > > > >  /* FIXME: MTU in config. */
> > > > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > > > >  #define GOOD_COPY_LEN  128
> > > > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > > > /* Chain pages by the private ptr. */
> > > > > > struct page *pages;
> > > > > >
> > > > > > +   /* Page pool */
> > > > > > +   struct page_pool *page_pool;
> > > > > > +
> > > > > > /* Average packet length for mergeable receive buffers. */
> > > > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > > > >
> > > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void 
> > > > > > *buf, unsigned int buflen,
> > > > > > return skb;
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page 
> > > > > > *page)
> > > > > > +{
> > > > > > +   if (rq->page_pool)
> > > > > > +   page_pool_put_full_page(rq->page_pool, page, true);
> > > > > > +   else
> > > > > > +   put_page(page);
> > > > > > +}
> > > > > > +
> > > > > >  /* Called from bottom half context */
> > > > > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >   

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-06-07 Thread Liang Chen
On Tue, May 30, 2023 at 9:19 AM Liang Chen  wrote:
>
> On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
> >
> > On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  
> > > wrote:
> > > >
> > > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > > The implementation at the moment uses one page per packet in both the
> > > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > > enable
> > > > > or disable the usage of page pool (disabled by default).
> > > > >
> > > > > In single-core vm testing environments, it gives a modest performance 
> > > > > gain
> > > > > in the normal path.
> > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > >
> > > > > In multi-core vm testing environments, The most significant 
> > > > > performance
> > > > > gain is observed in XDP cpumap:
> > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > > >
> > > > > With this foundation, we can further integrate page pool 
> > > > > fragmentation and
> > > > > DMA map/unmap support.
> > > > >
> > > > > Signed-off-by: Liang Chen 
> > > >
> > > > Why off by default?
> > > > I am guessing it sometimes has performance costs too?
> > > >
> > > >
> > > > What happens if we use page pool for big mode too?
> > > > The less modes we have the better...
> > > >
> > > >
> > >
> > > Sure, now I believe it makes sense to enable it by default. When the
> > > packet size is very small, it reduces the likelihood of skb
> > > coalescing. But such cases are rare.
> >
> > small packets are rare? These workloads are easy to create actually.
> > Pls try and include benchmark with small packet size.
> >
>
> Sure, Thanks!

Before going ahead and posting v2 patch, I would like to hear more
advice for the cases of small packets. I have done more performance
benchmark with small packets since then. Here is a list of iperf
output,

With PP and PP fragmenting:
256K:   [  5] 505.00-510.00 sec  1.34 GBytes  2.31 Gbits/sec0144 KBytes
1K:   [  5]  30.00-35.00  sec  4.63 GBytes  7.95 Gbits/sec0
223 KBytes
2K:   [  5]  65.00-70.00  sec  8.33 GBytes  14.3 Gbits/sec0
324 KBytes
4K:   [  5]  30.00-35.00  sec  13.3 GBytes  22.8 Gbits/sec0
1.08 MBytes
8K:   [  5]  50.00-55.00  sec  18.9 GBytes  32.4 Gbits/sec0
744 KBytes
16K: [  5]  25.00-30.00  sec  24.6 GBytes  42.3 Gbits/sec0963 KBytes
32K: [  5]  45.00-50.00  sec  29.8 GBytes  51.2 Gbits/sec0   1.25 MBytes
64K: [  5]  35.00-40.00  sec  34.0 GBytes  58.4 Gbits/sec0   1.70 MBytes
128K:   [  5]  45.00-50.00  sec  36.7 GBytes  63.1 Gbits/sec0   4.26 MBytes
256K:   [  5]  30.00-35.00  sec  40.0 GBytes  68.8 Gbits/sec0   3.20 MBytes

Without PP:
256: [  5] 680.00-685.00 sec  1.57 GBytes  2.69 Gbits/sec0359 KBytes
1K:  [  5]  75.00-80.00  sec  5.47 GBytes  9.40 Gbits/sec0730 KBytes
2K:  [  5]  65.00-70.00  sec  9.46 GBytes  16.2 Gbits/sec0   1.99 MBytes
4K:  [  5]  30.00-35.00  sec  14.5 GBytes  25.0 Gbits/sec0   1.20 MBytes
8K:  [  5]  45.00-50.00  sec  19.9 GBytes  34.1 Gbits/sec0   1.72 MBytes
16K:[  5]   5.00-10.00  sec  23.8 GBytes  40.9 Gbits/sec0   2.90 MBytes
32K:[  5]  15.00-20.00  sec  28.0 GBytes  48.1 Gbits/sec0   3.03 MBytes
64K:[  5]  60.00-65.00  sec  31.8 GBytes  54.6 Gbits/sec0   3.05 MBytes
128K:  [  5]  45.00-50.00  sec  33.0 GBytes  56.6 Gbits/sec1   3.03 MBytes
256K:  [  5]  25.00-30.00  sec  34.7 GBytes  59.6 Gbits/sec0   3.11 MBytes


The major factor contributing to the performance drop is the reduction
of skb coalescing. Additionally, without the page pool, small packets
can still benefit from the allocation of 8 continuous pages by
breaking them down into smaller pieces. This effectively reduces the
frequency of page allocation from the buddy system. For instance, the
arrival of 32 1K packets only triggers one alloc_page call. Therefore,
the benefits of using a page pool are limited in such cases. In fact,
without page pool fragmenting enabled, it can even hinder performance
from this perspective.

Upon further consideration, I tend to believe making page pool the
default option may not be appropriate. As you pointed out, we cannot
simply ignore the performance impact on small packets. Any comments on
this will be much appreciated.


Thanks,
Liang


> > > The usage of page pool for big mode is being evaluated now. Thanks!
> > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 188 
> > > > > ++-
> > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-30 Thread Xuan Zhuo
On Mon, 29 May 2023 15:28:17 +0800, Liang Chen  
wrote:
> On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  wrote:
> >
> > On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > > On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
> > > >
> > > > On Fri, May 26, 2023 at 1:46 PM Liang Chen  
> > > > wrote:
> > > > >
> > > > > The implementation at the moment uses one page per packet in both the
> > > > > normal and XDP path.
> > > >
> > > > It's better to explain why we need a page pool and how it can help the
> > > > performance.
> > > >
> > >
> > > Sure, I will include that on v2.
> > > > > In addition, introducing a module parameter to enable
> > > > > or disable the usage of page pool (disabled by default).
> > > >
> > > > If page pool wins for most of the cases, any reason to disable it by 
> > > > default?
> > > >
> > >
> > > Thank you for raising the point. It does make sense to enable it by 
> > > default.
> >
> > I'd like to see more benchmarks pls then, with a variety of packet
> > sizes, udp and tcp.
> >
>
> Sure, more benchmarks will be provided. Thanks.


I think so.

I did this, but I did not found any improve. So I gave up it.

Thanks.


>
>
> > > > >
> > > > > In single-core vm testing environments, it gives a modest performance 
> > > > > gain
> > > > > in the normal path.
> > > > >   Upstream codebase: 47.5 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > > >
> > > > > In multi-core vm testing environments, The most significant 
> > > > > performance
> > > > > gain is observed in XDP cpumap:
> > > > >   Upstream codebase: 1.38 Gbits/sec
> > > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > >
> > > > Please show more details on the test. E.g which kinds of tests have
> > > > you measured?
> > > >
> > > > Btw, it would be better to measure PPS as well.
> > > >
> > >
> > > Sure. It will be added on v2.
> > > > >
> > > > > With this foundation, we can further integrate page pool 
> > > > > fragmentation and
> > > > > DMA map/unmap support.
> > > > >
> > > > > Signed-off-by: Liang Chen 
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 188 
> > > > > ++-
> > > >
> > > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > > the ifdef tricks at least.
> > > >
> > >
> > > Sure. it will be done on v2.
> > > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > > >  module_param(gso, bool, 0444);
> > > > >  module_param(napi_tx, bool, 0644);
> > > > >
> > > > > +static bool page_pool_enabled;
> > > > > +module_param(page_pool_enabled, bool, 0400);
> > > > > +
> > > > >  /* FIXME: MTU in config. */
> > > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > > >  #define GOOD_COPY_LEN  128
> > > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > > /* Chain pages by the private ptr. */
> > > > > struct page *pages;
> > > > >
> > > > > +   /* Page pool */
> > > > > +   struct page_pool *page_pool;
> > > > > +
> > > > > /* Average packet length for mergeable receive buffers. */
> > > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > > >
> > > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void 
> > > > > *buf, unsigned int buflen,
> > > > > return skb;
> > > > >  }
> > > > >
> > > > > +static void virtnet_put_page(struct receive_queue *rq, struct page 
> > > > > *page)
> > > > > +{
> > > > > +   if (rq->page_pool)
> > > > > +   page_pool_put_full_page(rq->page_pool, page, true);
> > > > > +   else
> > > > > +   put_page(page);
> > > > > +}
> > > > > +
> > > > >  /* Called from bottom half context */
> > > > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >struct receive_queue *rq,
> > > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct 
> > > > > virtnet_info *vi,
> > > > > hdr = skb_vnet_hdr(skb);
> > > > > memcpy(hdr, hdr_p, hdr_len);
> > > > > if (page_to_free)
> > > > > -   put_page(page_to_free);
> > > > > +   virtnet_put_page(rq, page_to_free);
> > > > >
> > > > > return skb;
> > > > >  }
> > > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device 
> > > > > *dev,
> > > > > return ret;
> > > > >  }
> > > > >
> > > > > -static void put_xdp_frags(struct xdp_buff *xdp)
> > > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue 
> > > > > *rq)
> > > > >  {
> > > >
> > > > rq could be fetched from xdp_rxq_info?
> > >
> > > Yeah, it has the queue_index there.
> > > >
> > > > > struct skb_shared_info 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-29 Thread Liang Chen
On Mon, May 29, 2023 at 5:55 PM Michael S. Tsirkin  wrote:
>
> On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> > On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  wrote:
> > >
> > > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > > The implementation at the moment uses one page per packet in both the
> > > > normal and XDP path. In addition, introducing a module parameter to 
> > > > enable
> > > > or disable the usage of page pool (disabled by default).
> > > >
> > > > In single-core vm testing environments, it gives a modest performance 
> > > > gain
> > > > in the normal path.
> > > >   Upstream codebase: 47.5 Gbits/sec
> > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > >
> > > > In multi-core vm testing environments, The most significant performance
> > > > gain is observed in XDP cpumap:
> > > >   Upstream codebase: 1.38 Gbits/sec
> > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > > >
> > > > With this foundation, we can further integrate page pool fragmentation 
> > > > and
> > > > DMA map/unmap support.
> > > >
> > > > Signed-off-by: Liang Chen 
> > >
> > > Why off by default?
> > > I am guessing it sometimes has performance costs too?
> > >
> > >
> > > What happens if we use page pool for big mode too?
> > > The less modes we have the better...
> > >
> > >
> >
> > Sure, now I believe it makes sense to enable it by default. When the
> > packet size is very small, it reduces the likelihood of skb
> > coalescing. But such cases are rare.
>
> small packets are rare? These workloads are easy to create actually.
> Pls try and include benchmark with small packet size.
>

Sure, Thanks!
> > The usage of page pool for big mode is being evaluated now. Thanks!
> >
> > > > ---
> > > >  drivers/net/virtio_net.c | 188 ++-
> > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > >  module_param(gso, bool, 0444);
> > > >  module_param(napi_tx, bool, 0644);
> > > >
> > > > +static bool page_pool_enabled;
> > > > +module_param(page_pool_enabled, bool, 0400);
> > > > +
> > > >  /* FIXME: MTU in config. */
> > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > >  #define GOOD_COPY_LEN128
> > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > >   /* Chain pages by the private ptr. */
> > > >   struct page *pages;
> > > >
> > > > + /* Page pool */
> > > > + struct page_pool *page_pool;
> > > > +
> > > >   /* Average packet length for mergeable receive buffers. */
> > > >   struct ewma_pkt_len mrg_avg_pkt_len;
> > > >
> > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void 
> > > > *buf, unsigned int buflen,
> > > >   return skb;
> > > >  }
> > > >
> > > > +static void virtnet_put_page(struct receive_queue *rq, struct page 
> > > > *page)
> > > > +{
> > > > + if (rq->page_pool)
> > > > + page_pool_put_full_page(rq->page_pool, page, true);
> > > > + else
> > > > + put_page(page);
> > > > +}
> > > > +
> > > >  /* Called from bottom half context */
> > > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >  struct receive_queue *rq,
> > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct 
> > > > virtnet_info *vi,
> > > >   hdr = skb_vnet_hdr(skb);
> > > >   memcpy(hdr, hdr_p, hdr_len);
> > > >   if (page_to_free)
> > > > - put_page(page_to_free);
> > > > + virtnet_put_page(rq, page_to_free);
> > > >
> > > >   return skb;
> > > >  }
> > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > >   return ret;
> > > >  }
> > > >
> > > > -static void put_xdp_frags(struct xdp_buff *xdp)
> > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue 
> > > > *rq)
> > > >  {
> > > >   struct skb_shared_info *shinfo;
> > > >   struct page *xdp_page;
> > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> > > >   shinfo = xdp_get_shared_info_from_buff(xdp);
> > > >   for (i = 0; i < shinfo->nr_frags; i++) {
> > > >   xdp_page = skb_frag_page(>frags[i]);
> > > > - put_page(xdp_page);
> > > > + virtnet_put_page(rq, xdp_page);
> > > >   }
> > > >   }
> > > >  }
> > > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> > > > receive_queue *rq,
> > > >   if (page_off + *len + tailroom > PAGE_SIZE)
> > > >   return NULL;
> > > >
> > > > - page = alloc_page(GFP_ATOMIC);
> > > > + if (rq->page_pool)
> > > > + page = 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-29 Thread Michael S. Tsirkin
On Mon, May 29, 2023 at 03:27:56PM +0800, Liang Chen wrote:
> On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  wrote:
> >
> > On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > > The implementation at the moment uses one page per packet in both the
> > > normal and XDP path. In addition, introducing a module parameter to enable
> > > or disable the usage of page pool (disabled by default).
> > >
> > > In single-core vm testing environments, it gives a modest performance gain
> > > in the normal path.
> > >   Upstream codebase: 47.5 Gbits/sec
> > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > >
> > > In multi-core vm testing environments, The most significant performance
> > > gain is observed in XDP cpumap:
> > >   Upstream codebase: 1.38 Gbits/sec
> > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > >
> > > With this foundation, we can further integrate page pool fragmentation and
> > > DMA map/unmap support.
> > >
> > > Signed-off-by: Liang Chen 
> >
> > Why off by default?
> > I am guessing it sometimes has performance costs too?
> >
> >
> > What happens if we use page pool for big mode too?
> > The less modes we have the better...
> >
> >
> 
> Sure, now I believe it makes sense to enable it by default. When the
> packet size is very small, it reduces the likelihood of skb
> coalescing. But such cases are rare.

small packets are rare? These workloads are easy to create actually.
Pls try and include benchmark with small packet size.

> The usage of page pool for big mode is being evaluated now. Thanks!
> 
> > > ---
> > >  drivers/net/virtio_net.c | 188 ++-
> > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c5dca0d92e64..99c0ca0c1781 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > >  module_param(gso, bool, 0444);
> > >  module_param(napi_tx, bool, 0644);
> > >
> > > +static bool page_pool_enabled;
> > > +module_param(page_pool_enabled, bool, 0400);
> > > +
> > >  /* FIXME: MTU in config. */
> > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > >  #define GOOD_COPY_LEN128
> > > @@ -159,6 +162,9 @@ struct receive_queue {
> > >   /* Chain pages by the private ptr. */
> > >   struct page *pages;
> > >
> > > + /* Page pool */
> > > + struct page_pool *page_pool;
> > > +
> > >   /* Average packet length for mergeable receive buffers. */
> > >   struct ewma_pkt_len mrg_avg_pkt_len;
> > >
> > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> > > unsigned int buflen,
> > >   return skb;
> > >  }
> > >
> > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> > > +{
> > > + if (rq->page_pool)
> > > + page_pool_put_full_page(rq->page_pool, page, true);
> > > + else
> > > + put_page(page);
> > > +}
> > > +
> > >  /* Called from bottom half context */
> > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >  struct receive_queue *rq,
> > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct 
> > > virtnet_info *vi,
> > >   hdr = skb_vnet_hdr(skb);
> > >   memcpy(hdr, hdr_p, hdr_len);
> > >   if (page_to_free)
> > > - put_page(page_to_free);
> > > + virtnet_put_page(rq, page_to_free);
> > >
> > >   return skb;
> > >  }
> > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > >   return ret;
> > >  }
> > >
> > > -static void put_xdp_frags(struct xdp_buff *xdp)
> > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
> > >  {
> > >   struct skb_shared_info *shinfo;
> > >   struct page *xdp_page;
> > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> > >   shinfo = xdp_get_shared_info_from_buff(xdp);
> > >   for (i = 0; i < shinfo->nr_frags; i++) {
> > >   xdp_page = skb_frag_page(>frags[i]);
> > > - put_page(xdp_page);
> > > + virtnet_put_page(rq, xdp_page);
> > >   }
> > >   }
> > >  }
> > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> > > receive_queue *rq,
> > >   if (page_off + *len + tailroom > PAGE_SIZE)
> > >   return NULL;
> > >
> > > - page = alloc_page(GFP_ATOMIC);
> > > + if (rq->page_pool)
> > > + page = page_pool_dev_alloc_pages(rq->page_pool);
> > > + else
> > > + page = alloc_page(GFP_ATOMIC);
> > > +
> > >   if (!page)
> > >   return NULL;
> > >
> > > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct 
> > > receive_queue *rq,
> > >* is sending packet larger than the MTU.
> > >*/
> > >   

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-29 Thread Liang Chen
On Sun, May 28, 2023 at 2:40 PM Michael S. Tsirkin  wrote:
>
> On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> > On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
> > >
> > > On Fri, May 26, 2023 at 1:46 PM Liang Chen  
> > > wrote:
> > > >
> > > > The implementation at the moment uses one page per packet in both the
> > > > normal and XDP path.
> > >
> > > It's better to explain why we need a page pool and how it can help the
> > > performance.
> > >
> >
> > Sure, I will include that on v2.
> > > > In addition, introducing a module parameter to enable
> > > > or disable the usage of page pool (disabled by default).
> > >
> > > If page pool wins for most of the cases, any reason to disable it by 
> > > default?
> > >
> >
> > Thank you for raising the point. It does make sense to enable it by default.
>
> I'd like to see more benchmarks pls then, with a variety of packet
> sizes, udp and tcp.
>

Sure, more benchmarks will be provided. Thanks.


> > > >
> > > > In single-core vm testing environments, it gives a modest performance 
> > > > gain
> > > > in the normal path.
> > > >   Upstream codebase: 47.5 Gbits/sec
> > > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > > >
> > > > In multi-core vm testing environments, The most significant performance
> > > > gain is observed in XDP cpumap:
> > > >   Upstream codebase: 1.38 Gbits/sec
> > > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> > >
> > > Please show more details on the test. E.g which kinds of tests have
> > > you measured?
> > >
> > > Btw, it would be better to measure PPS as well.
> > >
> >
> > Sure. It will be added on v2.
> > > >
> > > > With this foundation, we can further integrate page pool fragmentation 
> > > > and
> > > > DMA map/unmap support.
> > > >
> > > > Signed-off-by: Liang Chen 
> > > > ---
> > > >  drivers/net/virtio_net.c | 188 ++-
> > >
> > > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > > the ifdef tricks at least.
> > >
> >
> > Sure. it will be done on v2.
> > > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index c5dca0d92e64..99c0ca0c1781 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > > >  module_param(gso, bool, 0444);
> > > >  module_param(napi_tx, bool, 0644);
> > > >
> > > > +static bool page_pool_enabled;
> > > > +module_param(page_pool_enabled, bool, 0400);
> > > > +
> > > >  /* FIXME: MTU in config. */
> > > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > > >  #define GOOD_COPY_LEN  128
> > > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > > /* Chain pages by the private ptr. */
> > > > struct page *pages;
> > > >
> > > > +   /* Page pool */
> > > > +   struct page_pool *page_pool;
> > > > +
> > > > /* Average packet length for mergeable receive buffers. */
> > > > struct ewma_pkt_len mrg_avg_pkt_len;
> > > >
> > > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void 
> > > > *buf, unsigned int buflen,
> > > > return skb;
> > > >  }
> > > >
> > > > +static void virtnet_put_page(struct receive_queue *rq, struct page 
> > > > *page)
> > > > +{
> > > > +   if (rq->page_pool)
> > > > +   page_pool_put_full_page(rq->page_pool, page, true);
> > > > +   else
> > > > +   put_page(page);
> > > > +}
> > > > +
> > > >  /* Called from bottom half context */
> > > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >struct receive_queue *rq,
> > > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct 
> > > > virtnet_info *vi,
> > > > hdr = skb_vnet_hdr(skb);
> > > > memcpy(hdr, hdr_p, hdr_len);
> > > > if (page_to_free)
> > > > -   put_page(page_to_free);
> > > > +   virtnet_put_page(rq, page_to_free);
> > > >
> > > > return skb;
> > > >  }
> > > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > > return ret;
> > > >  }
> > > >
> > > > -static void put_xdp_frags(struct xdp_buff *xdp)
> > > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue 
> > > > *rq)
> > > >  {
> > >
> > > rq could be fetched from xdp_rxq_info?
> >
> > Yeah, it has the queue_index there.
> > >
> > > > struct skb_shared_info *shinfo;
> > > > struct page *xdp_page;
> > > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> > > > shinfo = xdp_get_shared_info_from_buff(xdp);
> > > > for (i = 0; i < shinfo->nr_frags; i++) {
> > > > xdp_page = skb_frag_page(>frags[i]);
> > > > -   put_page(xdp_page);
> > > > +   virtnet_put_page(rq, xdp_page);
> > > >  

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-29 Thread Liang Chen
On Sun, May 28, 2023 at 2:27 PM Michael S. Tsirkin  wrote:
>
> On Sat, May 27, 2023 at 12:11:25AM +0800, kernel test robot wrote:
> > Hi Liang,
> >
> > kernel test robot noticed the following build errors:
> >
> > [auto build test ERROR on net-next/main]
> >
> > url:
> > https://github.com/intel-lab-lkp/linux/commits/Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
> > base:   net-next/main
> > patch link:
> > https://lore.kernel.org/r/20230526054621.18371-2-liangchen.linux%40gmail.com
> > patch subject: [PATCH net-next 2/5] virtio_net: Add page_pool support to 
> > improve performance
> > config: x86_64-defconfig 
> > (https://download.01.org/0day-ci/archive/20230526/202305262334.gifq3wpg-...@intel.com/config)
> > compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
> > reproduce (this is a W=1 build):
> > # 
> > https://github.com/intel-lab-lkp/linux/commit/bfba563f43bba37181d8502cb2e566c32f96ec9e
> > git remote add linux-review https://github.com/intel-lab-lkp/linux
> > git fetch --no-tags linux-review 
> > Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
> > git checkout bfba563f43bba37181d8502cb2e566c32f96ec9e
> > # save the config file
> > mkdir build_dir && cp config build_dir/.config
> > make W=1 O=build_dir ARCH=x86_64 olddefconfig
> > make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
> >
> > If you fix the issue, kindly add following tag where applicable
> > | Reported-by: kernel test robot 
> > | Closes: 
> > https://lore.kernel.org/oe-kbuild-all/202305262334.gifq3wpg-...@intel.com/
> >
> > All errors (new ones prefixed by >>):
> >
> >ld: vmlinux.o: in function `virtnet_find_vqs':
> > >> virtio_net.c:(.text+0x901fb5): undefined reference to `page_pool_create'
> >ld: vmlinux.o: in function `add_recvbuf_mergeable.isra.0':
> > >> virtio_net.c:(.text+0x905618): undefined reference to 
> > >> `page_pool_alloc_pages'
> >ld: vmlinux.o: in function `xdp_linearize_page':
> >virtio_net.c:(.text+0x906b6b): undefined reference to 
> > `page_pool_alloc_pages'
> >ld: vmlinux.o: in function `mergeable_xdp_get_buf.isra.0':
> >virtio_net.c:(.text+0x90728f): undefined reference to 
> > `page_pool_alloc_pages'
>
>
> you need to tweak Kconfig to select PAGE_POOL I think.
>

Sure, thanks!


> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
>
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-29 Thread Liang Chen
On Sun, May 28, 2023 at 2:20 PM Michael S. Tsirkin  wrote:
>
> On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> > The implementation at the moment uses one page per packet in both the
> > normal and XDP path. In addition, introducing a module parameter to enable
> > or disable the usage of page pool (disabled by default).
> >
> > In single-core vm testing environments, it gives a modest performance gain
> > in the normal path.
> >   Upstream codebase: 47.5 Gbits/sec
> >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> >
> > In multi-core vm testing environments, The most significant performance
> > gain is observed in XDP cpumap:
> >   Upstream codebase: 1.38 Gbits/sec
> >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> >
> > With this foundation, we can further integrate page pool fragmentation and
> > DMA map/unmap support.
> >
> > Signed-off-by: Liang Chen 
>
> Why off by default?
> I am guessing it sometimes has performance costs too?
>
>
> What happens if we use page pool for big mode too?
> The less modes we have the better...
>
>

Sure, now I believe it makes sense to enable it by default. When the
packet size is very small, it reduces the likelihood of skb
coalescing. But such cases are rare.
The usage of page pool for big mode is being evaluated now. Thanks!

> > ---
> >  drivers/net/virtio_net.c | 188 ++-
> >  1 file changed, 146 insertions(+), 42 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c5dca0d92e64..99c0ca0c1781 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> >
> > +static bool page_pool_enabled;
> > +module_param(page_pool_enabled, bool, 0400);
> > +
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >  #define GOOD_COPY_LEN128
> > @@ -159,6 +162,9 @@ struct receive_queue {
> >   /* Chain pages by the private ptr. */
> >   struct page *pages;
> >
> > + /* Page pool */
> > + struct page_pool *page_pool;
> > +
> >   /* Average packet length for mergeable receive buffers. */
> >   struct ewma_pkt_len mrg_avg_pkt_len;
> >
> > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> > unsigned int buflen,
> >   return skb;
> >  }
> >
> > +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> > +{
> > + if (rq->page_pool)
> > + page_pool_put_full_page(rq->page_pool, page, true);
> > + else
> > + put_page(page);
> > +}
> > +
> >  /* Called from bottom half context */
> >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >  struct receive_queue *rq,
> > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> > *vi,
> >   hdr = skb_vnet_hdr(skb);
> >   memcpy(hdr, hdr_p, hdr_len);
> >   if (page_to_free)
> > - put_page(page_to_free);
> > + virtnet_put_page(rq, page_to_free);
> >
> >   return skb;
> >  }
> > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >   return ret;
> >  }
> >
> > -static void put_xdp_frags(struct xdp_buff *xdp)
> > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
> >  {
> >   struct skb_shared_info *shinfo;
> >   struct page *xdp_page;
> > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> >   shinfo = xdp_get_shared_info_from_buff(xdp);
> >   for (i = 0; i < shinfo->nr_frags; i++) {
> >   xdp_page = skb_frag_page(>frags[i]);
> > - put_page(xdp_page);
> > + virtnet_put_page(rq, xdp_page);
> >   }
> >   }
> >  }
> > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> > receive_queue *rq,
> >   if (page_off + *len + tailroom > PAGE_SIZE)
> >   return NULL;
> >
> > - page = alloc_page(GFP_ATOMIC);
> > + if (rq->page_pool)
> > + page = page_pool_dev_alloc_pages(rq->page_pool);
> > + else
> > + page = alloc_page(GFP_ATOMIC);
> > +
> >   if (!page)
> >   return NULL;
> >
> > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct 
> > receive_queue *rq,
> >* is sending packet larger than the MTU.
> >*/
> >   if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> > - put_page(p);
> > + virtnet_put_page(rq, p);
> >   goto err_buf;
> >   }
> >
> >   memcpy(page_address(page) + page_off,
> >  page_address(p) + off, buflen);
> >   page_off += buflen;
> > - put_page(p);
> > + virtnet_put_page(rq, 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-28 Thread Michael S. Tsirkin
On Sat, May 27, 2023 at 08:35:01PM +0800, Liang Chen wrote:
> On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
> >
> > On Fri, May 26, 2023 at 1:46 PM Liang Chen  
> > wrote:
> > >
> > > The implementation at the moment uses one page per packet in both the
> > > normal and XDP path.
> >
> > It's better to explain why we need a page pool and how it can help the
> > performance.
> >
> 
> Sure, I will include that on v2.
> > > In addition, introducing a module parameter to enable
> > > or disable the usage of page pool (disabled by default).
> >
> > If page pool wins for most of the cases, any reason to disable it by 
> > default?
> >
> 
> Thank you for raising the point. It does make sense to enable it by default.

I'd like to see more benchmarks pls then, with a variety of packet
sizes, udp and tcp.

> > >
> > > In single-core vm testing environments, it gives a modest performance gain
> > > in the normal path.
> > >   Upstream codebase: 47.5 Gbits/sec
> > >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> > >
> > > In multi-core vm testing environments, The most significant performance
> > > gain is observed in XDP cpumap:
> > >   Upstream codebase: 1.38 Gbits/sec
> > >   Upstream codebase + page_pool support: 9.74 Gbits/sec
> >
> > Please show more details on the test. E.g which kinds of tests have
> > you measured?
> >
> > Btw, it would be better to measure PPS as well.
> >
> 
> Sure. It will be added on v2.
> > >
> > > With this foundation, we can further integrate page pool fragmentation and
> > > DMA map/unmap support.
> > >
> > > Signed-off-by: Liang Chen 
> > > ---
> > >  drivers/net/virtio_net.c | 188 ++-
> >
> > I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> > the ifdef tricks at least.
> >
> 
> Sure. it will be done on v2.
> > >  1 file changed, 146 insertions(+), 42 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index c5dca0d92e64..99c0ca0c1781 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> > >  module_param(gso, bool, 0444);
> > >  module_param(napi_tx, bool, 0644);
> > >
> > > +static bool page_pool_enabled;
> > > +module_param(page_pool_enabled, bool, 0400);
> > > +
> > >  /* FIXME: MTU in config. */
> > >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > >  #define GOOD_COPY_LEN  128
> > > @@ -159,6 +162,9 @@ struct receive_queue {
> > > /* Chain pages by the private ptr. */
> > > struct page *pages;
> > >
> > > +   /* Page pool */
> > > +   struct page_pool *page_pool;
> > > +
> > > /* Average packet length for mergeable receive buffers. */
> > > struct ewma_pkt_len mrg_avg_pkt_len;
> > >
> > > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> > > unsigned int buflen,
> > > return skb;
> > >  }
> > >
> > > +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> > > +{
> > > +   if (rq->page_pool)
> > > +   page_pool_put_full_page(rq->page_pool, page, true);
> > > +   else
> > > +   put_page(page);
> > > +}
> > > +
> > >  /* Called from bottom half context */
> > >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >struct receive_queue *rq,
> > > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct 
> > > virtnet_info *vi,
> > > hdr = skb_vnet_hdr(skb);
> > > memcpy(hdr, hdr_p, hdr_len);
> > > if (page_to_free)
> > > -   put_page(page_to_free);
> > > +   virtnet_put_page(rq, page_to_free);
> > >
> > > return skb;
> > >  }
> > > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > > return ret;
> > >  }
> > >
> > > -static void put_xdp_frags(struct xdp_buff *xdp)
> > > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
> > >  {
> >
> > rq could be fetched from xdp_rxq_info?
> 
> Yeah, it has the queue_index there.
> >
> > > struct skb_shared_info *shinfo;
> > > struct page *xdp_page;
> > > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> > > shinfo = xdp_get_shared_info_from_buff(xdp);
> > > for (i = 0; i < shinfo->nr_frags; i++) {
> > > xdp_page = skb_frag_page(>frags[i]);
> > > -   put_page(xdp_page);
> > > +   virtnet_put_page(rq, xdp_page);
> > > }
> > > }
> > >  }
> > > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> > > receive_queue *rq,
> > > if (page_off + *len + tailroom > PAGE_SIZE)
> > > return NULL;
> > >
> > > -   page = alloc_page(GFP_ATOMIC);
> > > +   if (rq->page_pool)
> > > +   page = page_pool_dev_alloc_pages(rq->page_pool);
> > > + 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-28 Thread Michael S. Tsirkin
On Sat, May 27, 2023 at 12:11:25AM +0800, kernel test robot wrote:
> Hi Liang,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on net-next/main]
> 
> url:
> https://github.com/intel-lab-lkp/linux/commits/Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
> base:   net-next/main
> patch link:
> https://lore.kernel.org/r/20230526054621.18371-2-liangchen.linux%40gmail.com
> patch subject: [PATCH net-next 2/5] virtio_net: Add page_pool support to 
> improve performance
> config: x86_64-defconfig 
> (https://download.01.org/0day-ci/archive/20230526/202305262334.gifq3wpg-...@intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
> reproduce (this is a W=1 build):
> # 
> https://github.com/intel-lab-lkp/linux/commit/bfba563f43bba37181d8502cb2e566c32f96ec9e
> git remote add linux-review https://github.com/intel-lab-lkp/linux
> git fetch --no-tags linux-review 
> Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
> git checkout bfba563f43bba37181d8502cb2e566c32f96ec9e
> # save the config file
> mkdir build_dir && cp config build_dir/.config
> make W=1 O=build_dir ARCH=x86_64 olddefconfig
> make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot 
> | Closes: 
> https://lore.kernel.org/oe-kbuild-all/202305262334.gifq3wpg-...@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>ld: vmlinux.o: in function `virtnet_find_vqs':
> >> virtio_net.c:(.text+0x901fb5): undefined reference to `page_pool_create'
>ld: vmlinux.o: in function `add_recvbuf_mergeable.isra.0':
> >> virtio_net.c:(.text+0x905618): undefined reference to 
> >> `page_pool_alloc_pages'
>ld: vmlinux.o: in function `xdp_linearize_page':
>virtio_net.c:(.text+0x906b6b): undefined reference to 
> `page_pool_alloc_pages'
>ld: vmlinux.o: in function `mergeable_xdp_get_buf.isra.0':
>virtio_net.c:(.text+0x90728f): undefined reference to 
> `page_pool_alloc_pages'


you need to tweak Kconfig to select PAGE_POOL I think.

> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-28 Thread Michael S. Tsirkin
On Fri, May 26, 2023 at 01:46:18PM +0800, Liang Chen wrote:
> The implementation at the moment uses one page per packet in both the
> normal and XDP path. In addition, introducing a module parameter to enable
> or disable the usage of page pool (disabled by default).
> 
> In single-core vm testing environments, it gives a modest performance gain
> in the normal path.
>   Upstream codebase: 47.5 Gbits/sec
>   Upstream codebase + page_pool support: 50.2 Gbits/sec
> 
> In multi-core vm testing environments, The most significant performance
> gain is observed in XDP cpumap:
>   Upstream codebase: 1.38 Gbits/sec
>   Upstream codebase + page_pool support: 9.74 Gbits/sec
> 
> With this foundation, we can further integrate page pool fragmentation and
> DMA map/unmap support.
> 
> Signed-off-by: Liang Chen 

Why off by default?
I am guessing it sometimes has performance costs too?


What happens if we use page pool for big mode too?
The less modes we have the better...


> ---
>  drivers/net/virtio_net.c | 188 ++-
>  1 file changed, 146 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c5dca0d92e64..99c0ca0c1781 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
>  
> +static bool page_pool_enabled;
> +module_param(page_pool_enabled, bool, 0400);
> +
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN128
> @@ -159,6 +162,9 @@ struct receive_queue {
>   /* Chain pages by the private ptr. */
>   struct page *pages;
>  
> + /* Page pool */
> + struct page_pool *page_pool;
> +
>   /* Average packet length for mergeable receive buffers. */
>   struct ewma_pkt_len mrg_avg_pkt_len;
>  
> @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> unsigned int buflen,
>   return skb;
>  }
>  
> +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> +{
> + if (rq->page_pool)
> + page_pool_put_full_page(rq->page_pool, page, true);
> + else
> + put_page(page);
> +}
> +
>  /* Called from bottom half context */
>  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  struct receive_queue *rq,
> @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
>   hdr = skb_vnet_hdr(skb);
>   memcpy(hdr, hdr_p, hdr_len);
>   if (page_to_free)
> - put_page(page_to_free);
> + virtnet_put_page(rq, page_to_free);
>  
>   return skb;
>  }
> @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>   return ret;
>  }
>  
> -static void put_xdp_frags(struct xdp_buff *xdp)
> +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
>  {
>   struct skb_shared_info *shinfo;
>   struct page *xdp_page;
> @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
>   shinfo = xdp_get_shared_info_from_buff(xdp);
>   for (i = 0; i < shinfo->nr_frags; i++) {
>   xdp_page = skb_frag_page(>frags[i]);
> - put_page(xdp_page);
> + virtnet_put_page(rq, xdp_page);
>   }
>   }
>  }
> @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>   if (page_off + *len + tailroom > PAGE_SIZE)
>   return NULL;
>  
> - page = alloc_page(GFP_ATOMIC);
> + if (rq->page_pool)
> + page = page_pool_dev_alloc_pages(rq->page_pool);
> + else
> + page = alloc_page(GFP_ATOMIC);
> +
>   if (!page)
>   return NULL;
>  
> @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>* is sending packet larger than the MTU.
>*/
>   if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> - put_page(p);
> + virtnet_put_page(rq, p);
>   goto err_buf;
>   }
>  
>   memcpy(page_address(page) + page_off,
>  page_address(p) + off, buflen);
>   page_off += buflen;
> - put_page(p);
> + virtnet_put_page(rq, p);
>   }
>  
>   /* Headroom does not contribute to packet length */
>   *len = page_off - VIRTIO_XDP_HEADROOM;
>   return page;
>  err_buf:
> - __free_pages(page, 0);
> + if (rq->page_pool)
> + page_pool_put_full_page(rq->page_pool, page, true);
> + else
> + __free_pages(page, 0);
>   return NULL;
>  }
>  
> @@ -1144,7 +1165,7 @@ static void mergeable_buf_free(struct receive_queue 
> *rq, int num_buf,
>   }
>   stats->bytes += len;
>  

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-27 Thread Liang Chen
On Fri, May 26, 2023 at 2:51 PM Jason Wang  wrote:
>
> On Fri, May 26, 2023 at 1:46 PM Liang Chen  wrote:
> >
> > The implementation at the moment uses one page per packet in both the
> > normal and XDP path.
>
> It's better to explain why we need a page pool and how it can help the
> performance.
>

Sure, I will include that on v2.
> > In addition, introducing a module parameter to enable
> > or disable the usage of page pool (disabled by default).
>
> If page pool wins for most of the cases, any reason to disable it by default?
>

Thank you for raising the point. It does make sense to enable it by default.
> >
> > In single-core vm testing environments, it gives a modest performance gain
> > in the normal path.
> >   Upstream codebase: 47.5 Gbits/sec
> >   Upstream codebase + page_pool support: 50.2 Gbits/sec
> >
> > In multi-core vm testing environments, The most significant performance
> > gain is observed in XDP cpumap:
> >   Upstream codebase: 1.38 Gbits/sec
> >   Upstream codebase + page_pool support: 9.74 Gbits/sec
>
> Please show more details on the test. E.g which kinds of tests have
> you measured?
>
> Btw, it would be better to measure PPS as well.
>

Sure. It will be added on v2.
> >
> > With this foundation, we can further integrate page pool fragmentation and
> > DMA map/unmap support.
> >
> > Signed-off-by: Liang Chen 
> > ---
> >  drivers/net/virtio_net.c | 188 ++-
>
> I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
> the ifdef tricks at least.
>

Sure. it will be done on v2.
> >  1 file changed, 146 insertions(+), 42 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c5dca0d92e64..99c0ca0c1781 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> >
> > +static bool page_pool_enabled;
> > +module_param(page_pool_enabled, bool, 0400);
> > +
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >  #define GOOD_COPY_LEN  128
> > @@ -159,6 +162,9 @@ struct receive_queue {
> > /* Chain pages by the private ptr. */
> > struct page *pages;
> >
> > +   /* Page pool */
> > +   struct page_pool *page_pool;
> > +
> > /* Average packet length for mergeable receive buffers. */
> > struct ewma_pkt_len mrg_avg_pkt_len;
> >
> > @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> > unsigned int buflen,
> > return skb;
> >  }
> >
> > +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> > +{
> > +   if (rq->page_pool)
> > +   page_pool_put_full_page(rq->page_pool, page, true);
> > +   else
> > +   put_page(page);
> > +}
> > +
> >  /* Called from bottom half context */
> >  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >struct receive_queue *rq,
> > @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> > *vi,
> > hdr = skb_vnet_hdr(skb);
> > memcpy(hdr, hdr_p, hdr_len);
> > if (page_to_free)
> > -   put_page(page_to_free);
> > +   virtnet_put_page(rq, page_to_free);
> >
> > return skb;
> >  }
> > @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> > return ret;
> >  }
> >
> > -static void put_xdp_frags(struct xdp_buff *xdp)
> > +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
> >  {
>
> rq could be fetched from xdp_rxq_info?

Yeah, it has the queue_index there.
>
> > struct skb_shared_info *shinfo;
> > struct page *xdp_page;
> > @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> > shinfo = xdp_get_shared_info_from_buff(xdp);
> > for (i = 0; i < shinfo->nr_frags; i++) {
> > xdp_page = skb_frag_page(>frags[i]);
> > -   put_page(xdp_page);
> > +   virtnet_put_page(rq, xdp_page);
> > }
> > }
> >  }
> > @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> > receive_queue *rq,
> > if (page_off + *len + tailroom > PAGE_SIZE)
> > return NULL;
> >
> > -   page = alloc_page(GFP_ATOMIC);
> > +   if (rq->page_pool)
> > +   page = page_pool_dev_alloc_pages(rq->page_pool);
> > +   else
> > +   page = alloc_page(GFP_ATOMIC);
> > +
> > if (!page)
> > return NULL;
> >
> > @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct 
> > receive_queue *rq,
> >  * is sending packet larger than the MTU.
> >  */
> > if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> > -   put_page(p);
> > + 

Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-26 Thread kernel test robot
Hi Liang,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:
https://github.com/intel-lab-lkp/linux/commits/Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
base:   net-next/main
patch link:
https://lore.kernel.org/r/20230526054621.18371-2-liangchen.linux%40gmail.com
patch subject: [PATCH net-next 2/5] virtio_net: Add page_pool support to 
improve performance
config: x86_64-defconfig 
(https://download.01.org/0day-ci/archive/20230526/202305262334.gifq3wpg-...@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0
reproduce (this is a W=1 build):
# 
https://github.com/intel-lab-lkp/linux/commit/bfba563f43bba37181d8502cb2e566c32f96ec9e
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review 
Liang-Chen/virtio_net-Add-page_pool-support-to-improve-performance/20230526-135805
git checkout bfba563f43bba37181d8502cb2e566c32f96ec9e
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 O=build_dir ARCH=x86_64 olddefconfig
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202305262334.gifq3wpg-...@intel.com/

All errors (new ones prefixed by >>):

   ld: vmlinux.o: in function `virtnet_find_vqs':
>> virtio_net.c:(.text+0x901fb5): undefined reference to `page_pool_create'
   ld: vmlinux.o: in function `add_recvbuf_mergeable.isra.0':
>> virtio_net.c:(.text+0x905618): undefined reference to `page_pool_alloc_pages'
   ld: vmlinux.o: in function `xdp_linearize_page':
   virtio_net.c:(.text+0x906b6b): undefined reference to `page_pool_alloc_pages'
   ld: vmlinux.o: in function `mergeable_xdp_get_buf.isra.0':
   virtio_net.c:(.text+0x90728f): undefined reference to `page_pool_alloc_pages'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/5] virtio_net: Add page_pool support to improve performance

2023-05-26 Thread Jason Wang
On Fri, May 26, 2023 at 1:46 PM Liang Chen  wrote:
>
> The implementation at the moment uses one page per packet in both the
> normal and XDP path.

It's better to explain why we need a page pool and how it can help the
performance.

> In addition, introducing a module parameter to enable
> or disable the usage of page pool (disabled by default).

If page pool wins for most of the cases, any reason to disable it by default?

>
> In single-core vm testing environments, it gives a modest performance gain
> in the normal path.
>   Upstream codebase: 47.5 Gbits/sec
>   Upstream codebase + page_pool support: 50.2 Gbits/sec
>
> In multi-core vm testing environments, The most significant performance
> gain is observed in XDP cpumap:
>   Upstream codebase: 1.38 Gbits/sec
>   Upstream codebase + page_pool support: 9.74 Gbits/sec

Please show more details on the test. E.g which kinds of tests have
you measured?

Btw, it would be better to measure PPS as well.

>
> With this foundation, we can further integrate page pool fragmentation and
> DMA map/unmap support.
>
> Signed-off-by: Liang Chen 
> ---
>  drivers/net/virtio_net.c | 188 ++-

I believe we should make virtio-net to select CONFIG_PAGE_POOL or do
the ifdef tricks at least.

>  1 file changed, 146 insertions(+), 42 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c5dca0d92e64..99c0ca0c1781 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -31,6 +31,9 @@ module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
>
> +static bool page_pool_enabled;
> +module_param(page_pool_enabled, bool, 0400);
> +
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN  128
> @@ -159,6 +162,9 @@ struct receive_queue {
> /* Chain pages by the private ptr. */
> struct page *pages;
>
> +   /* Page pool */
> +   struct page_pool *page_pool;
> +
> /* Average packet length for mergeable receive buffers. */
> struct ewma_pkt_len mrg_avg_pkt_len;
>
> @@ -459,6 +465,14 @@ static struct sk_buff *virtnet_build_skb(void *buf, 
> unsigned int buflen,
> return skb;
>  }
>
> +static void virtnet_put_page(struct receive_queue *rq, struct page *page)
> +{
> +   if (rq->page_pool)
> +   page_pool_put_full_page(rq->page_pool, page, true);
> +   else
> +   put_page(page);
> +}
> +
>  /* Called from bottom half context */
>  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>struct receive_queue *rq,
> @@ -555,7 +569,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info 
> *vi,
> hdr = skb_vnet_hdr(skb);
> memcpy(hdr, hdr_p, hdr_len);
> if (page_to_free)
> -   put_page(page_to_free);
> +   virtnet_put_page(rq, page_to_free);
>
> return skb;
>  }
> @@ -802,7 +816,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> return ret;
>  }
>
> -static void put_xdp_frags(struct xdp_buff *xdp)
> +static void put_xdp_frags(struct xdp_buff *xdp, struct receive_queue *rq)
>  {

rq could be fetched from xdp_rxq_info?

> struct skb_shared_info *shinfo;
> struct page *xdp_page;
> @@ -812,7 +826,7 @@ static void put_xdp_frags(struct xdp_buff *xdp)
> shinfo = xdp_get_shared_info_from_buff(xdp);
> for (i = 0; i < shinfo->nr_frags; i++) {
> xdp_page = skb_frag_page(>frags[i]);
> -   put_page(xdp_page);
> +   virtnet_put_page(rq, xdp_page);
> }
> }
>  }
> @@ -903,7 +917,11 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
> if (page_off + *len + tailroom > PAGE_SIZE)
> return NULL;
>
> -   page = alloc_page(GFP_ATOMIC);
> +   if (rq->page_pool)
> +   page = page_pool_dev_alloc_pages(rq->page_pool);
> +   else
> +   page = alloc_page(GFP_ATOMIC);
> +
> if (!page)
> return NULL;
>
> @@ -926,21 +944,24 @@ static struct page *xdp_linearize_page(struct 
> receive_queue *rq,
>  * is sending packet larger than the MTU.
>  */
> if ((page_off + buflen + tailroom) > PAGE_SIZE) {
> -   put_page(p);
> +   virtnet_put_page(rq, p);
> goto err_buf;
> }
>
> memcpy(page_address(page) + page_off,
>page_address(p) + off, buflen);
> page_off += buflen;
> -   put_page(p);
> +   virtnet_put_page(rq, p);
> }
>
> /* Headroom does not contribute to packet length */
> *len = page_off - VIRTIO_XDP_HEADROOM;
> return page;
>  err_buf:
> -   __free_pages(page,