request for stable (was Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput)

2019-09-03 Thread Michael S. Tsirkin
Patches 1,3 and 4 are needed for stable.
Dave, could you queue them there please?

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>moved this patch after "vsock/virtio: reduce credit update 
> messages"
>to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>conflicts
> 
> v3: https://patchwork.kernel.org/cover/10970145
> 
> v2: https://patchwork.kernel.org/cover/10938743
> 
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 
> 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
> host -> guest [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.032 0.0300.0480.051
> 64 0.061 0.0590.1080.117
> 1280.122 0.1120.2270.234
> 2560.244 0.2410.4180.415
> 5120.459 0.4660.8470.865
> 1K 0.927 0.9191.6571.641
> 2K 1.884 1.8133.2623.269
> 4K 3.378 3.3266.0446.195
> 8K 5.637 5.676   10.141   11.287
> 16K8.250 8.402   15.976   16.736
> 32K   13.32713.204   19.013   20.515
> 64K   21.24121.341   20.973   21.879
> 128K  21.85122.354   21.816   23.203
> 256K  21.40821.693   21.846   24.088
> 512K  21.60021.899   21.921   24.106
> 
> guest -> host [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.045 0.0460.0570.057
> 64 0.089 0.0910.1030.104
> 1280.170 0.1790.1920.200
> 2560.364 0.3510.3610.379
> 5120.709 0.6990.7310.790
> 1K 1.399 1.4071.3951.427
> 2K 2.670 2.6842.7452.835
> 4K 5.171 5.1995.3055.451
> 8K 8.442 8.500   10.0839.941
> 16K   12.30512.259   13.519   15.385
> 32K   11.41811.150   11.988   24.680
> 64K   10.77810.659   11.589   35.273
> 128K  10.42110.339   10.939   40.338
> 256K  10.300 9.719   10.508   36.562
> 512K   9.833 9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.35  0.45 0.79 1.02
> 64 0.56  0.80 1.41 1.54
> 1281.11  1.52 3.03 3.12
> 2562.20  2.16 5.44 5.58
> 5124.17  4.1810.9611.46
> 1K 8.30  8.2620.9920.89
> 2K16.82 16.3139.7639.73
> 4K30.89 30.7974.0775.73
> 8K53.74 54.49   124.24   148.91
> 16K   80.68 83.63   200.21   232.79
> 32K  132.27132.52   260.81   357.07
> 64K  229.82230.40   300.19   444.18
> 128K 332.60329.78   331.51   492.28
> 256K 331.06337.22   339.59   511.59
> 512K 335.58328.50   331.56   504.56
> 
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.43  0.43 0.53 0.56
> 64 0.85  0.86 1.04 1.10
> 1281.63  1.71 2.07 2.13
> 2563.48  3.35 4.02 4.22
> 5126.80  6.67 7.97 8.63
> 1K13.32 13.3115.7215.94
> 2K25.79 25.9230.8430.98
> 4K50.37 50.4858.7959.69
> 8K95.90 96.15   107.04   110.33
> 16K  145.80145.43   143.97   174.70
> 32K  147.06144.74   146.02   282.48
> 64K  145.25143.99   141.62   406.40
> 128K 149.34146.96   147.49   489.34
> 256K 156.35149.81   152.21   536.37
> 512K 

Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Stefano Garzarella
On Tue, Jul 30, 2019 at 06:03:24PM +0800, Jason Wang wrote:
> 
> On 2019/7/30 下午5:40, Stefano Garzarella wrote:
> > On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> > > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > > This series tries to increase the throughput of virtio-vsock with slight
> > > > changes.
> > > > While I was testing the v2 of this series I discovered an huge use of 
> > > > memory,
> > > > so I added patch 1 to mitigate this issue. I put it in this series in 
> > > > order
> > > > to better track the performance trends.
> > > Series:
> > > 
> > > Acked-by: Michael S. Tsirkin 
> > > 
> > > Can this go into net-next?
> > > 
> > I think so.
> > Michael, Stefan thanks to ack the series!
> > 
> > Should I resend it with net-next tag?
> > 
> > Thanks,
> > Stefano
> 
> 
> I think so.

Okay, I'll resend it!

Thanks,
Stefano


Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Jason Wang



On 2019/7/30 下午5:40, Stefano Garzarella wrote:

On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:

On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:

This series tries to increase the throughput of virtio-vsock with slight
changes.
While I was testing the v2 of this series I discovered an huge use of memory,
so I added patch 1 to mitigate this issue. I put it in this series in order
to better track the performance trends.

Series:

Acked-by: Michael S. Tsirkin 

Can this go into net-next?


I think so.
Michael, Stefan thanks to ack the series!

Should I resend it with net-next tag?

Thanks,
Stefano



I think so.

Thanks



Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-30 Thread Stefano Garzarella
On Mon, Jul 29, 2019 at 09:59:23AM -0400, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of 
> > memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> 
> Series:
> 
> Acked-by: Michael S. Tsirkin 
> 
> Can this go into net-next?
> 

I think so.
Michael, Stefan thanks to ack the series!

Should I resend it with net-next tag?

Thanks,
Stefano


Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-29 Thread Michael S. Tsirkin
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.

Series:

Acked-by: Michael S. Tsirkin 

Can this go into net-next?


> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>moved this patch after "vsock/virtio: reduce credit update 
> messages"
>to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>conflicts
> 
> v3: https://patchwork.kernel.org/cover/10970145
> 
> v2: https://patchwork.kernel.org/cover/10938743
> 
> v1: https://patchwork.kernel.org/cover/10885431
> 
> Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK
> support. As Micheal suggested in the v1, I booted host and guest with 
> 'nosmap'.
> 
> A brief description of patches:
> - Patches 1:   limit the memory usage with an extra copy for small packets
> - Patches 2+3: reduce the number of credit update messages sent to the
>transmitter
> - Patches 4+5: allow the host to split packets on multiple buffers and use
>VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed
> 
> host -> guest [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.032 0.0300.0480.051
> 64 0.061 0.0590.1080.117
> 1280.122 0.1120.2270.234
> 2560.244 0.2410.4180.415
> 5120.459 0.4660.8470.865
> 1K 0.927 0.9191.6571.641
> 2K 1.884 1.8133.2623.269
> 4K 3.378 3.3266.0446.195
> 8K 5.637 5.676   10.141   11.287
> 16K8.250 8.402   15.976   16.736
> 32K   13.32713.204   19.013   20.515
> 64K   21.24121.341   20.973   21.879
> 128K  21.85122.354   21.816   23.203
> 256K  21.40821.693   21.846   24.088
> 512K  21.60021.899   21.921   24.106
> 
> guest -> host [Gbps]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.045 0.0460.0570.057
> 64 0.089 0.0910.1030.104
> 1280.170 0.1790.1920.200
> 2560.364 0.3510.3610.379
> 5120.709 0.6990.7310.790
> 1K 1.399 1.4071.3951.427
> 2K 2.670 2.6842.7452.835
> 4K 5.171 5.1995.3055.451
> 8K 8.442 8.500   10.0839.941
> 16K   12.30512.259   13.519   15.385
> 32K   11.41811.150   11.988   24.680
> 64K   10.77810.659   11.589   35.273
> 128K  10.42110.339   10.939   40.338
> 256K  10.300 9.719   10.508   36.562
> 512K   9.833 9.808   10.612   35.979
> 
> As Stefan suggested in the v1, I measured also the efficiency in this way:
> efficiency = Mbps / (%CPU_Host + %CPU_Guest)
> 
> The '%CPU_Guest' is taken inside the VM. I know that it is not the best way,
> but it's provided for free from iperf3 and could be an indication.
> 
> host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.35  0.45 0.79 1.02
> 64 0.56  0.80 1.41 1.54
> 1281.11  1.52 3.03 3.12
> 2562.20  2.16 5.44 5.58
> 5124.17  4.1810.9611.46
> 1K 8.30  8.2620.9920.89
> 2K16.82 16.3139.7639.73
> 4K30.89 30.7974.0775.73
> 8K53.74 54.49   124.24   148.91
> 16K   80.68 83.63   200.21   232.79
> 32K  132.27132.52   260.81   357.07
> 64K  229.82230.40   300.19   444.18
> 128K 332.60329.78   331.51   492.28
> 256K 331.06337.22   339.59   511.59
> 512K 335.58328.50   331.56   504.56
> 
> guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)]
> pkt_size before opt   p 1 p 2+3p 4+5
> 
> 32 0.43  0.43 0.53 0.56
> 64 0.85  0.86 1.04 1.10
> 1281.63  1.71 2.07 2.13
> 2563.48  3.35 4.02 4.22
> 5126.80  6.67 7.97 8.63
> 1K13.32 13.3115.7215.94
> 2K25.79 25.9230.8430.98
> 4K50.37 50.4858.7959.69
> 8K95.90 96.15   107.04   110.33
> 16K  145.80145.43   143.97   174.70
> 32K  147.06144.74   146.02   282.48
> 64K  145.25143.99   141.62   406.40
> 128K 149.34146.96   147.49   489.34
> 256K 156.35149.81   152.21   536.37
> 512K 151.65150.74   

Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-29 Thread Stefan Hajnoczi
On Mon, Jul 22, 2019 at 11:14:34AM +0200, Stefano Garzarella wrote:
> On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > > This series tries to increase the throughput of virtio-vsock with slight
> > > changes.
> > > While I was testing the v2 of this series I discovered an huge use of 
> > > memory,
> > > so I added patch 1 to mitigate this issue. I put it in this series in 
> > > order
> > > to better track the performance trends.
> > > 
> > > v4:
> > > - rebased all patches on current master (conflicts is Patch 4)
> > > - Patch 1: added Stefan's R-b
> > > - Patch 3: removed lock when buf_alloc is written [David];
> > >moved this patch after "vsock/virtio: reduce credit update 
> > > messages"
> > >to make it clearer
> > > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved 
> > > some
> > >conflicts
> > 
> > Stefano: Do you want to continue experimenting before we merge this
> > patch series?  The code looks functionally correct and the performance
> > increases, so I'm happy for it to be merged.
> 
> I think we can merge this series.
> 
> I'll continue to do other experiments (e.g. removing TX workers, allocating
> pages, etc.) but I think these changes are prerequisites for the other 
> patches,
> so we can merge them.
> 
> Thank you very much for the reviews!

All patches have been reviewed by here.  Have an Ack for good measure:

Acked-by: Stefan Hajnoczi 

The topics discussed in sub-threads relate to longer-term optimization
work that doesn't block this series.  Please merge.


signature.asc
Description: PGP signature


Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-22 Thread Stefano Garzarella
On Mon, Jul 22, 2019 at 10:08:35AM +0100, Stefan Hajnoczi wrote:
> On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> > This series tries to increase the throughput of virtio-vsock with slight
> > changes.
> > While I was testing the v2 of this series I discovered an huge use of 
> > memory,
> > so I added patch 1 to mitigate this issue. I put it in this series in order
> > to better track the performance trends.
> > 
> > v4:
> > - rebased all patches on current master (conflicts is Patch 4)
> > - Patch 1: added Stefan's R-b
> > - Patch 3: removed lock when buf_alloc is written [David];
> >moved this patch after "vsock/virtio: reduce credit update 
> > messages"
> >to make it clearer
> > - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved 
> > some
> >conflicts
> 
> Stefano: Do you want to continue experimenting before we merge this
> patch series?  The code looks functionally correct and the performance
> increases, so I'm happy for it to be merged.

I think we can merge this series.

I'll continue to do other experiments (e.g. removing TX workers, allocating
pages, etc.) but I think these changes are prerequisites for the other patches,
so we can merge them.

Thank you very much for the reviews!
Stefano


Re: [PATCH v4 0/5] vsock/virtio: optimizations to increase the throughput

2019-07-22 Thread Stefan Hajnoczi
On Wed, Jul 17, 2019 at 01:30:25PM +0200, Stefano Garzarella wrote:
> This series tries to increase the throughput of virtio-vsock with slight
> changes.
> While I was testing the v2 of this series I discovered an huge use of memory,
> so I added patch 1 to mitigate this issue. I put it in this series in order
> to better track the performance trends.
> 
> v4:
> - rebased all patches on current master (conflicts is Patch 4)
> - Patch 1: added Stefan's R-b
> - Patch 3: removed lock when buf_alloc is written [David];
>moved this patch after "vsock/virtio: reduce credit update 
> messages"
>to make it clearer
> - Patch 4: vhost_exceeds_weight() is recently introduced, so I've solved some
>conflicts

Stefano: Do you want to continue experimenting before we merge this
patch series?  The code looks functionally correct and the performance
increases, so I'm happy for it to be merged.


signature.asc
Description: PGP signature