Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
On Tue, Jul 30, 2019 at 11:55:09AM -0400, Michael S. Tsirkin wrote: > On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote: > > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote: > > > This series tries to increase the throughput of virtio-vsock with slight > > > changes. > > > While I was testing the v2 of this series I discovered an huge use of > > > memory, > > > so I added patch 1 to mitigate this issue. I put it in this series in > > > order > > > to better track the performance trends. > > > > > > v5: > > > - rebased all patches on net-next > > > - added Stefan's R-b and Michael's A-b > > > > This doesn't solve all issues around allocation - as I mentioned I think > > we will need to improve accounting for that, > > and maybe add pre-allocation. Yes, I'll work on it following your suggestions. > > But it's a great series of steps in the right direction! > > Thank you very much :) Stefano > > > So > > Acked-by: Michael S. Tsirkin > > > > > > v4: https://patchwork.kernel.org/cover/11047717 > > > v3: https://patchwork.kernel.org/cover/10970145 > > > v2: https://patchwork.kernel.org/cover/10938743 > > > v1: https://patchwork.kernel.org/cover/10885431 > > > > > > Below are the benchmarks step by step. I used iperf3 [1] modified with > > > VSOCK > > > support. As Michael suggested in the v1, I booted host and guest with > > > 'nosmap'. > > > > > > A brief description of patches: > > > - Patches 1: limit the memory usage with an extra copy for small packets > > > - Patches 2+3: reduce the number of credit update messages sent to the > > >transmitter > > > - Patches 4+5: allow the host to split packets on multiple buffers and use > > >VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size > > > allowed > > > > > > host -> guest [Gbps] > > > pkt_size before opt p 1 p 2+3p 4+5 > > > > > > 32 0.032 0.0300.0480.051 > > > 64 0.061 0.0590.1080.117 > > > 1280.122 0.1120.2270.234 > > > 2560.244 0.2410.4180.415 > > > 5120.459 0.4660.8470.865 > > > 1K 0.927 0.9191.6571.641 > > > 2K 1.884 1.8133.2623.269 > > > 4K 3.378 3.3266.0446.195 > > > 8K 5.637 5.676 10.141 11.287 > > > 16K8.250 8.402 15.976 16.736 > > > 32K 13.32713.204 19.013 20.515 > > > 64K 21.24121.341 20.973 21.879 > > > 128K 21.85122.354 21.816 23.203 > > > 256K 21.40821.693 21.846 24.088 > > > 512K 21.60021.899 21.921 24.106 > > > > > > guest -> host [Gbps] > > > pkt_size before opt p 1 p 2+3p 4+5 > > > > > > 32 0.045 0.0460.0570.057 > > > 64 0.089 0.0910.1030.104 > > > 1280.170 0.1790.1920.200 > > > 2560.364 0.3510.3610.379 > > > 5120.709 0.6990.7310.790 > > > 1K 1.399 1.4071.3951.427 > > > 2K 2.670 2.6842.7452.835 > > > 4K 5.171 5.1995.3055.451 > > > 8K 8.442 8.500 10.0839.941 > > > 16K 12.30512.259 13.519 15.385 > > > 32K 11.41811.150 11.988 24.680 > > > 64K 10.77810.659 11.589 35.273 > > > 128K 10.42110.339 10.939 40.338 > > > 256K 10.300 9.719 10.508 36.562 > > > 512K 9.833 9.808 10.612 35.979 > > > > > > As Stefan suggested in the v1, I measured also the efficiency in this way: > > > efficiency = Mbps / (%CPU_Host + %CPU_Guest) > > > > > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best > > > way, > > > but it's provided for free from iperf3 and could be an indication. > > > > > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > > > pkt_size before opt p 1 p 2+3p 4+5 > > > > > > 32 0.35 0.45 0.79 1.02 > > > 64 0.56 0.80 1.41 1.54 > > > 1281.11 1.52 3.03 3.12 > > > 2562.20 2.16 5.44 5.58 > > > 5124.17 4.1810.9611.46 > > > 1K 8.30 8.2620.9920.89 > > > 2K16.82 16.3139.7639.73 > > > 4K30.89 30.7974.0775.73 > > > 8K53.74 54.49 124.24 148.91 > > > 16K 80.68 83.63 200.21 232.79 > > > 32K 132.27132.52 260.81 357.07 > > > 64K 229.82230.40 300.19 444.18 > > > 128K 332.60329.78 331.51 492.28 > > > 256K 331.06337.22 339.59 511.59 > > > 512K 335.58328.50 331.56 504.56 > > > > > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > > > pkt_size before opt p 1 p 2+3p 4+5 > > > > > > 32 0.43 0.43 0.53 0.56 > > > 64 0.85 0.86 1.04
Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
On Tue, Jul 30, 2019 at 11:54:53AM -0400, Michael S. Tsirkin wrote: > On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote: > > This series tries to increase the throughput of virtio-vsock with slight > > changes. > > While I was testing the v2 of this series I discovered an huge use of > > memory, > > so I added patch 1 to mitigate this issue. I put it in this series in order > > to better track the performance trends. > > > > v5: > > - rebased all patches on net-next > > - added Stefan's R-b and Michael's A-b > > This doesn't solve all issues around allocation - as I mentioned I think > we will need to improve accounting for that, > and maybe add pre-allocation. > But it's a great series of steps in the right direction! > So Acked-by: Michael S. Tsirkin > > > v4: https://patchwork.kernel.org/cover/11047717 > > v3: https://patchwork.kernel.org/cover/10970145 > > v2: https://patchwork.kernel.org/cover/10938743 > > v1: https://patchwork.kernel.org/cover/10885431 > > > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK > > support. As Michael suggested in the v1, I booted host and guest with > > 'nosmap'. > > > > A brief description of patches: > > - Patches 1: limit the memory usage with an extra copy for small packets > > - Patches 2+3: reduce the number of credit update messages sent to the > >transmitter > > - Patches 4+5: allow the host to split packets on multiple buffers and use > >VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed > > > > host -> guest [Gbps] > > pkt_size before opt p 1 p 2+3p 4+5 > > > > 32 0.032 0.0300.0480.051 > > 64 0.061 0.0590.1080.117 > > 1280.122 0.1120.2270.234 > > 2560.244 0.2410.4180.415 > > 5120.459 0.4660.8470.865 > > 1K 0.927 0.9191.6571.641 > > 2K 1.884 1.8133.2623.269 > > 4K 3.378 3.3266.0446.195 > > 8K 5.637 5.676 10.141 11.287 > > 16K8.250 8.402 15.976 16.736 > > 32K 13.32713.204 19.013 20.515 > > 64K 21.24121.341 20.973 21.879 > > 128K 21.85122.354 21.816 23.203 > > 256K 21.40821.693 21.846 24.088 > > 512K 21.60021.899 21.921 24.106 > > > > guest -> host [Gbps] > > pkt_size before opt p 1 p 2+3p 4+5 > > > > 32 0.045 0.0460.0570.057 > > 64 0.089 0.0910.1030.104 > > 1280.170 0.1790.1920.200 > > 2560.364 0.3510.3610.379 > > 5120.709 0.6990.7310.790 > > 1K 1.399 1.4071.3951.427 > > 2K 2.670 2.6842.7452.835 > > 4K 5.171 5.1995.3055.451 > > 8K 8.442 8.500 10.0839.941 > > 16K 12.30512.259 13.519 15.385 > > 32K 11.41811.150 11.988 24.680 > > 64K 10.77810.659 11.589 35.273 > > 128K 10.42110.339 10.939 40.338 > > 256K 10.300 9.719 10.508 36.562 > > 512K 9.833 9.808 10.612 35.979 > > > > As Stefan suggested in the v1, I measured also the efficiency in this way: > > efficiency = Mbps / (%CPU_Host + %CPU_Guest) > > > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way, > > but it's provided for free from iperf3 and could be an indication. > > > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > > pkt_size before opt p 1 p 2+3p 4+5 > > > > 32 0.35 0.45 0.79 1.02 > > 64 0.56 0.80 1.41 1.54 > > 1281.11 1.52 3.03 3.12 > > 2562.20 2.16 5.44 5.58 > > 5124.17 4.1810.9611.46 > > 1K 8.30 8.2620.9920.89 > > 2K16.82 16.3139.7639.73 > > 4K30.89 30.7974.0775.73 > > 8K53.74 54.49 124.24 148.91 > > 16K 80.68 83.63 200.21 232.79 > > 32K 132.27132.52 260.81 357.07 > > 64K 229.82230.40 300.19 444.18 > > 128K 332.60329.78 331.51 492.28 > > 256K 331.06337.22 339.59 511.59 > > 512K 335.58328.50 331.56 504.56 > > > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > > pkt_size before opt p 1 p 2+3p 4+5 > > > > 32 0.43 0.43 0.53 0.56 > > 64 0.85 0.86 1.04 1.10 > > 1281.63 1.71 2.07 2.13 > > 2563.48 3.35 4.02 4.22 > > 5126.80 6.67 7.97 8.63 > > 1K13.32 13.3115.7215.94 > > 2K25.79 25.9230.8430.98 > > 4K50.37 50.4858.7959.69 > > 8K95.90 96.15 107.04 110.33 > > 16K 145.80145.43 143.97 174.70 > >
Re: [PATCH net-next v5 0/5] vsock/virtio: optimizations to increase the throughput
On Tue, Jul 30, 2019 at 05:43:29PM +0200, Stefano Garzarella wrote: > This series tries to increase the throughput of virtio-vsock with slight > changes. > While I was testing the v2 of this series I discovered an huge use of memory, > so I added patch 1 to mitigate this issue. I put it in this series in order > to better track the performance trends. > > v5: > - rebased all patches on net-next > - added Stefan's R-b and Michael's A-b This doesn't solve all issues around allocation - as I mentioned I think we will need to improve accounting for that, and maybe add pre-allocation. But it's a great series of steps in the right direction! > v4: https://patchwork.kernel.org/cover/11047717 > v3: https://patchwork.kernel.org/cover/10970145 > v2: https://patchwork.kernel.org/cover/10938743 > v1: https://patchwork.kernel.org/cover/10885431 > > Below are the benchmarks step by step. I used iperf3 [1] modified with VSOCK > support. As Michael suggested in the v1, I booted host and guest with > 'nosmap'. > > A brief description of patches: > - Patches 1: limit the memory usage with an extra copy for small packets > - Patches 2+3: reduce the number of credit update messages sent to the >transmitter > - Patches 4+5: allow the host to split packets on multiple buffers and use >VIRTIO_VSOCK_MAX_PKT_BUF_SIZE as the max packet size allowed > > host -> guest [Gbps] > pkt_size before opt p 1 p 2+3p 4+5 > > 32 0.032 0.0300.0480.051 > 64 0.061 0.0590.1080.117 > 1280.122 0.1120.2270.234 > 2560.244 0.2410.4180.415 > 5120.459 0.4660.8470.865 > 1K 0.927 0.9191.6571.641 > 2K 1.884 1.8133.2623.269 > 4K 3.378 3.3266.0446.195 > 8K 5.637 5.676 10.141 11.287 > 16K8.250 8.402 15.976 16.736 > 32K 13.32713.204 19.013 20.515 > 64K 21.24121.341 20.973 21.879 > 128K 21.85122.354 21.816 23.203 > 256K 21.40821.693 21.846 24.088 > 512K 21.60021.899 21.921 24.106 > > guest -> host [Gbps] > pkt_size before opt p 1 p 2+3p 4+5 > > 32 0.045 0.0460.0570.057 > 64 0.089 0.0910.1030.104 > 1280.170 0.1790.1920.200 > 2560.364 0.3510.3610.379 > 5120.709 0.6990.7310.790 > 1K 1.399 1.4071.3951.427 > 2K 2.670 2.6842.7452.835 > 4K 5.171 5.1995.3055.451 > 8K 8.442 8.500 10.0839.941 > 16K 12.30512.259 13.519 15.385 > 32K 11.41811.150 11.988 24.680 > 64K 10.77810.659 11.589 35.273 > 128K 10.42110.339 10.939 40.338 > 256K 10.300 9.719 10.508 36.562 > 512K 9.833 9.808 10.612 35.979 > > As Stefan suggested in the v1, I measured also the efficiency in this way: > efficiency = Mbps / (%CPU_Host + %CPU_Guest) > > The '%CPU_Guest' is taken inside the VM. I know that it is not the best way, > but it's provided for free from iperf3 and could be an indication. > > host -> guest efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3p 4+5 > > 32 0.35 0.45 0.79 1.02 > 64 0.56 0.80 1.41 1.54 > 1281.11 1.52 3.03 3.12 > 2562.20 2.16 5.44 5.58 > 5124.17 4.1810.9611.46 > 1K 8.30 8.2620.9920.89 > 2K16.82 16.3139.7639.73 > 4K30.89 30.7974.0775.73 > 8K53.74 54.49 124.24 148.91 > 16K 80.68 83.63 200.21 232.79 > 32K 132.27132.52 260.81 357.07 > 64K 229.82230.40 300.19 444.18 > 128K 332.60329.78 331.51 492.28 > 256K 331.06337.22 339.59 511.59 > 512K 335.58328.50 331.56 504.56 > > guest -> host efficiency [Mbps / (%CPU_Host + %CPU_Guest)] > pkt_size before opt p 1 p 2+3p 4+5 > > 32 0.43 0.43 0.53 0.56 > 64 0.85 0.86 1.04 1.10 > 1281.63 1.71 2.07 2.13 > 2563.48 3.35 4.02 4.22 > 5126.80 6.67 7.97 8.63 > 1K13.32 13.3115.7215.94 > 2K25.79 25.9230.8430.98 > 4K50.37 50.4858.7959.69 > 8K95.90 96.15 107.04 110.33 > 16K 145.80145.43 143.97 174.70 > 32K 147.06144.74 146.02 282.48 > 64K 145.25143.99 141.62 406.40 > 128K 149.34146.96 147.49 489.34 > 256K 156.35149.81 152.21 536.37 > 512K 151.65150.74 151.52 519.93 > > [1] https://github.com/stefano-garzarella/iperf/ > > Stefano Garzarella (5): > vsock/virtio: