Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Stefano Garzarella
On Tue, Jul 30, 2019 at 11:55:09AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of 
> > > memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in 
> > > order
> > > to better track the performance trends.
> > > 
> > > v5:
> > > - rebased all patches on net-next
> > > - added Stefan's R-b and Michael's A-b
> > 
> > This doesn't solve all issues around allocation - as I mentioned I think
> > we will need to improve accounting for that,
> > and maybe add pre-allocation.

Yes, I'll work on it following your suggestions.

> > But it's a great series of steps in the right direction!
> > 

Thank you very much :)
Stefano

> 
> 
> So
> 
> Acked-by: Michael S. Tsirkin 
> 
> > 
> > > v4: https://patchwork.kernel.org/cover/11047717
> > > v3: https://patchwork.kernel.org/cover/10970145
> > > v2: https://patchwork.kernel.org/cover/10938743
> > > v1: https://patchwork.kernel.org/cover/10885431
> > > 
> > > Below are the benchmarks step by step. I used iperf3 [1] modified with 
> > > VSOCK
> > > support. As Michael suggested in the v1, I booted host and guest with 
> > > 'nosmap'.
> > > 
> > > A brief description of patches:
> > > - Patches 1:   limit the memory usage with an extra copy for small packets
> > > - Patches 2+3: reduce the number of credit update messages sent to the
> > >transmitter
> > > - Patches 4+5: allow the host to split packets on multiple buffers and use
> > >VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size 
> > > allowed
> > > 
> > > host -> guest [Gbps]
> > > pkt_size before opt   p 1 p 2+3p 4+5
> > > 
> > > 32 0.032 0.0300.0480.051
> > > 64 0.061 0.0590.1080.117
> > > 1280.122 0.1120.2270.234
> > > 2560.244 0.2410.4180.415
> > > 5120.459 0.4660.8470.865
> > > 1K 0.927 0.9191.6571.641
> > > 2K 1.884 1.8133.2623.269
> > > 4K 3.378 3.3266.0446.195
> > > 8K 5.637 5.676   10.141   11.287
> > > 16K8.250 8.402   15.976   16.736
> > > 32K   13.32713.204   19.013   20.515
> > > 64K   21.24121.341   20.973   21.879
> > > 128K  21.85122.354   21.816   23.203
> > > 256K  21.40821.693   21.846   24.088
> > > 512K  21.60021.899   21.921   24.106
> > > 
> > > guest -> host [Gbps]
> > > pkt_size before opt   p 1 p 2+3p 4+5
> > > 
> > > 32 0.045 0.0460.0570.057
> > > 64 0.089 0.0910.1030.104
> > > 1280.170 0.1790.1920.200
> > > 2560.364 0.3510.3610.379
> > > 5120.709 0.6990.7310.790
> > > 1K 1.399 1.4071.3951.427
> > > 2K 2.670 2.6842.7452.835
> > > 4K 5.171 5.1995.3055.451
> > > 8K 8.442 8.500   10.0839.941
> > > 16K   12.30512.259   13.519   15.385
> > > 32K   11.41811.150   11.988   24.680
> > > 64K   10.77810.659   11.589   35.273
> > > 128K  10.42110.339   10.939   40.338
> > > 256K  10.300 9.719   10.508   36.562
> > > 512K   9.833 9.808   10.612   35.979
> > > 
> > > As Stefan suggested in the v1, I measured also the efficiency in this way:
> > > efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > > 
> > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best 
> > > way,
> > > but it's provided for free from iperf3 and could be an indication.
> > > 
> > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1 p 2+3p 4+5
> > > 
> > > 32 0.35  0.45 0.79 1.02
> > > 64 0.56  0.80 1.41 1.54
> > > 1281.11  1.52 3.03 3.12
> > > 2562.20  2.16 5.44 5.58
> > > 5124.17  4.1810.9611.46
> > > 1K 8.30  8.2620.9920.89
> > > 2K16.82 16.3139.7639.73
> > > 4K30.89 30.7974.0775.73
> > > 8K53.74 54.49   124.24   148.91
> > > 16K   80.68 83.63   200.21   232.79
> > > 32K  132.27132.52   260.81   357.07
> > > 64K  229.82230.40   300.19   444.18
> > > 128K 332.60329.78   331.51   492.28
> > > 256K 331.06337.22   339.59   511.59
> > > 512K 335.58328.50   331.56   504.56
> > > 
> > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > > pkt_size before opt   p 1 p 2+3p 4+5
> > > 
> > > 32 0.43  0.43 0.53 0.56
> > > 64 0.85  0.86 1.04   

Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Michael S. Tsirkin
On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote:
> On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of 
> > memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v5:
> > - rebased all patches on net-next
> > - added Stefan's R-b and Michael's A-b
> 
> This doesn't solve all issues around allocation - as I mentioned I think
> we will need to improve accounting for that,
> and maybe add pre-allocation.
> But it's a great series of steps in the right direction!
> 


So

Acked-by: Michael S. Tsirkin 

> 
> > v4: https://patchwork.kernel.org/cover/11047717
> > v3: https://patchwork.kernel.org/cover/10970145
> > v2: https://patchwork.kernel.org/cover/10938743
> > v1: https://patchwork.kernel.org/cover/10885431
> > 
> > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> > support. As Michael suggested in the v1, I booted host and guest with 
> > 'nosmap'.
> > 
> > A brief description of patches:
> > - Patches 1:   limit the memory usage with an extra copy for small packets
> > - Patches 2+3: reduce the number of credit update messages sent to the
> >transmitter
> > - Patches 4+5: allow the host to split packets on multiple buffers and use
> >VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> > 
> > host -> guest [Gbps]
> > pkt_size before opt   p 1 p 2+3p 4+5
> > 
> > 32 0.032 0.0300.0480.051
> > 64 0.061 0.0590.1080.117
> > 1280.122 0.1120.2270.234
> > 2560.244 0.2410.4180.415
> > 5120.459 0.4660.8470.865
> > 1K 0.927 0.9191.6571.641
> > 2K 1.884 1.8133.2623.269
> > 4K 3.378 3.3266.0446.195
> > 8K 5.637 5.676   10.141   11.287
> > 16K8.250 8.402   15.976   16.736
> > 32K   13.32713.204   19.013   20.515
> > 64K   21.24121.341   20.973   21.879
> > 128K  21.85122.354   21.816   23.203
> > 256K  21.40821.693   21.846   24.088
> > 512K  21.60021.899   21.921   24.106
> > 
> > guest -> host [Gbps]
> > pkt_size before opt   p 1 p 2+3p 4+5
> > 
> > 32 0.045 0.0460.0570.057
> > 64 0.089 0.0910.1030.104
> > 1280.170 0.1790.1920.200
> > 2560.364 0.3510.3610.379
> > 5120.709 0.6990.7310.790
> > 1K 1.399 1.4071.3951.427
> > 2K 2.670 2.6842.7452.835
> > 4K 5.171 5.1995.3055.451
> > 8K 8.442 8.500   10.0839.941
> > 16K   12.30512.259   13.519   15.385
> > 32K   11.41811.150   11.988   24.680
> > 64K   10.77810.659   11.589   35.273
> > 128K  10.42110.339   10.939   40.338
> > 256K  10.300 9.719   10.508   36.562
> > 512K   9.833 9.808   10.612   35.979
> > 
> > As Stefan suggested in the v1, I measured also the efficiency in this way:
> > efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> > 
> > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> > but it's provided for free from iperf3 and could be an indication.
> > 
> > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1 p 2+3p 4+5
> > 
> > 32 0.35  0.45 0.79 1.02
> > 64 0.56  0.80 1.41 1.54
> > 1281.11  1.52 3.03 3.12
> > 2562.20  2.16 5.44 5.58
> > 5124.17  4.1810.9611.46
> > 1K 8.30  8.2620.9920.89
> > 2K16.82 16.3139.7639.73
> > 4K30.89 30.7974.0775.73
> > 8K53.74 54.49   124.24   148.91
> > 16K   80.68 83.63   200.21   232.79
> > 32K  132.27132.52   260.81   357.07
> > 64K  229.82230.40   300.19   444.18
> > 128K 332.60329.78   331.51   492.28
> > 256K 331.06337.22   339.59   511.59
> > 512K 335.58328.50   331.56   504.56
> > 
> > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> > pkt_size before opt   p 1 p 2+3p 4+5
> > 
> > 32 0.43  0.43 0.53 0.56
> > 64 0.85  0.86 1.04 1.10
> > 1281.63  1.71 2.07 2.13
> > 2563.48  3.35 4.02 4.22
> > 5126.80  6.67 7.97 8.63
> > 1K13.32 13.3115.7215.94
> > 2K25.79 25.9230.8430.98
> > 4K50.37 50.4858.7959.69
> > 8K95.90 96.15   107.04   110.33
> > 16K  145.80145.43   143.97   174.70
> > 

Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Michael S. Tsirkin
On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v5:
> - rebased all patches on net-next
> - added Stefan's R-b and Michael's A-b

This doesn't solve all issues around allocation - as I mentioned I think
we will need to improve accounting for that,
and maybe add pre-allocation.
But it's a great series of steps in the right direction!



> v4: https://patchwork.kernel.org/cover/11047717
> v3: https://patchwork.kernel.org/cover/10970145
> v2: https://patchwork.kernel.org/cover/10938743
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Michael suggested in the v1, I booted host and guest with 
> 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
> host -> guest [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.032 0.0300.0480.051
> 64 0.061 0.0590.1080.117
> 1280.122 0.1120.2270.234
> 2560.244 0.2410.4180.415
> 5120.459 0.4660.8470.865
> 1K 0.927 0.9191.6571.641
> 2K 1.884 1.8133.2623.269
> 4K 3.378 3.3266.0446.195
> 8K 5.637 5.676   10.141   11.287
> 16K8.250 8.402   15.976   16.736
> 32K   13.32713.204   19.013   20.515
> 64K   21.24121.341   20.973   21.879
> 128K  21.85122.354   21.816   23.203
> 256K  21.40821.693   21.846   24.088
> 512K  21.60021.899   21.921   24.106
> 
> guest -> host [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.045 0.0460.0570.057
> 64 0.089 0.0910.1030.104
> 1280.170 0.1790.1920.200
> 2560.364 0.3510.3610.379
> 5120.709 0.6990.7310.790
> 1K 1.399 1.4071.3951.427
> 2K 2.670 2.6842.7452.835
> 4K 5.171 5.1995.3055.451
> 8K 8.442 8.500   10.0839.941
> 16K   12.30512.259   13.519   15.385
> 32K   11.41811.150   11.988   24.680
> 64K   10.77810.659   11.589   35.273
> 128K  10.42110.339   10.939   40.338
> 256K  10.300 9.719   10.508   36.562
> 512K   9.833 9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.35  0.45 0.79 1.02
> 64 0.56  0.80 1.41 1.54
> 1281.11  1.52 3.03 3.12
> 2562.20  2.16 5.44 5.58
> 5124.17  4.1810.9611.46
> 1K 8.30  8.2620.9920.89
> 2K16.82 16.3139.7639.73
> 4K30.89 30.7974.0775.73
> 8K53.74 54.49   124.24   148.91
> 16K   80.68 83.63   200.21   232.79
> 32K  132.27132.52   260.81   357.07
> 64K  229.82230.40   300.19   444.18
> 128K 332.60329.78   331.51   492.28
> 256K 331.06337.22   339.59   511.59
> 512K 335.58328.50   331.56   504.56
> 
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.43  0.43 0.53 0.56
> 64 0.85  0.86 1.04 1.10
> 1281.63  1.71 2.07 2.13
> 2563.48  3.35 4.02 4.22
> 5126.80  6.67 7.97 8.63
> 1K13.32 13.3115.7215.94
> 2K25.79 25.9230.8430.98
> 4K50.37 50.4858.7959.69
> 8K95.90 96.15   107.04   110.33
> 16K  145.80145.43   143.97   174.70
> 32K  147.06144.74   146.02   282.48
> 64K  145.25143.99   141.62   406.40
> 128K 149.34146.96   147.49   489.34
> 256K 156.35149.81   152.21   536.37
> 512K 151.65150.74   151.52   519.93
> 
> [1] https://github.com/stefano-garzarella/iperf/
> 
> Stefano Garzarella (5):
>   vsock/virtio: