[ovs-dev] 答复: 答复: 答复: 答复: [PATCH] userspace: fix bad UDP performance issue of?veth

杨燚 Thu, 03 Sep 2020 23:16:22 -0700

No problem, will add Documentation/topics/userspace-udp-performance-tunning 
.rst to document more information. This is also helpful to VM-to-VM udp packet 
loss rate improvement.


-----邮件原件-----
发件人: Ilya Maximets [mailto:i.maxim...@ovn.org] 
发送时间: 2020年9月3日 20:03
收件人: Yi Yang (杨燚)-云服务集团 <yangy...@inspur.com>; f...@sysclose.org
抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org; Aaron 
Conole <acon...@redhat.com>
主题: Re: 答复: [ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance issue 
of?veth

On 9/3/20 4:06 AM, Yi Yang (杨燚)-云服务集团 wrote:
> As I have replied per Aaron's concern, users need to use 
> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max to set values 
> they prefer, final values are smaller one between 
> rmem_max(SOL_RCVBUF)/wmem_max(SOL_SNDBUF) and  1073741823. So 
> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max are just knob 
> you're expecting.

AFAIU, the question here is not the size of buffers, but the fact of calling 
setsockopt with SOL_RCVBUF/SOL_SNDBUF regardless of the value.

By making this syscall you're disabling a big pile of optimizations in kernel 
TCP stack and this is not a good thing to do.  So, if user wants to have better 
UDP performance and agrees to sacrifice some features of TCP stack, we could 
allow that with a special configuration knob for the OVS interface.  But we 
should not sacrifice any TCP features by default even if this increases UDP 
preformance.

Documentation with pros and cons will be needed for this configuration knob.

Best regards, Ilya Maximets.

> 
> -----邮件原件-----
> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
> 发送时间: 2020年9月3日 1:32
> 收件人: Yi Yang (杨燚)-云服务集团 <yangy...@inspur.com>
> 抄送: yang_y...@163.com; ovs-dev@openvswitch.org; i.maxim...@ovn.org
> 主题: Re: [ovs-dev] 答复: 答复: [PATCH] userspace: fix bad UDP performance 
> issue of?veth
> 
> On Mon, Aug 31, 2020 at 12:38:16AM +0000, Yi Yang (杨燚)-云服务集团 wrote:
>> Flavio, per my test, it also improved TCP performance, you can use 
>> run-iperf.sh script  I sent to have a try. Actually, iperf3 did the 
>> same thing by using -w option.
> 
> I believe that the results improve with lab tests on a controlled 
> environment. The kernel auto-tune buffer and the backlogging happens when the 
> system is busy with something else and you have multiple devices and TCP 
> streams going on.  For instance, the bufferbloat problem as Aaron already 
> mentioned.
> 
> So, I agree with Aaron that this needs a config knob for cases where using 
> the maximum is not a good idea. For instance, I know that some software 
> solutions out there recommends to pump those defaults to huge numbers, which 
> might be ok for that solution, but it may cause OVS issues under load.
> 
> Does that make sense?
> 
> fbl
> 
>>
>> -----邮件原件-----
>> 发件人: Flavio Leitner [mailto:f...@sysclose.org]
>> 发送时间: 2020年8月27日 21:28
>> 收件人: Yi Yang (杨燚)-云服务集团 <yangy...@inspur.com>
>> 抄送: acon...@redhat.com; yang_y...@163.com; ovs-dev@openvswitch.org; 
>> i.maxim...@ovn.org
>> 主题: Re: [ovs-dev] 答复: [PATCH] userspace: fix bad UDP performance 
>> issue of?veth
>>
>>
>> Hi,
>>
>>
>> On Wed, Aug 26, 2020 at 12:47:43AM +0000, Yi Yang (杨燚)-云服务集团 wrote:
>>> Aaron, thank for your comments, actually final value depends on 
>>> /proc/sys/net/core/rmem_max and /proc/sys/net/core/wmem_max, so it 
>>> is still configurable. setsockopt(...) will set it to minimum one 
>>> among of
>>> 1073741823 and w/rmem_max.
>>>
>>> -----邮件原件-----
>>> 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Aaron Conole
>>> 发送时间: 2020年8月25日 23:26
>>> 收件人: yang_y...@163.com
>>> 抄送: ovs-dev@openvswitch.org; i.maxim...@ovn.org; f...@sysclose.org
>>> 主题: Re: [ovs-dev] [PATCH] userspace: fix bad UDP performance issue 
>>> of veth
>>>
>>> yang_y...@163.com writes:
>>>
>>>> From: Yi Yang <yangy...@inspur.com>
>>>>
>>>> iperf3 UDP performance of veth to veth case is very very bad 
>>>> because of too many packet loss, the root cause is rmem_default and 
>>>> wmem_default are just 212992, but iperf3 UDP test used 8K UDP size 
>>>> which resulted in many UDP fragment in case that MTU size is 1500, 
>>>> one 8K UDP send would enqueue 6 UDP fragments to socket receive 
>>>> queue, the default small socket buffer size can't cache so many 
>>>> packets that many packets are lost.
>>>>
>>>> This commit fixed packet loss issue, it set socket receive and send 
>>>> buffer to maximum possible value, therefore there will not be 
>>>> packet loss forever, this also helps improve TCP performance 
>>>> because of no retransmit.
>>>>
>>>> By the way, big socket buffer doesn't mean it will allocate big 
>>>> buffer on creating socket, actually it won't alocate any extra 
>>>> buffer compared to default socket buffer size, it just means more 
>>>> skbuffs can be enqueued to socket receive queue and send queue, 
>>>> therefore there will not be packet loss.
>>>>
>>>> The below is for your reference.
>>>>
>>>> The result before apply this commit 
>>>> ===================================
>>>> $ ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 100M -c 10.15.2.6 
>>>> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [ 
>>>> 4] local 10.15.2.2 port 59053 connected to 10.15.2.6 port 5201
>>>> [ ID] Interval           Transfer     Bandwidth       Total Datagrams
>>>> [  4]   0.00-1.00   sec  10.8 MBytes  90.3 Mbits/sec  1378
>>>> [  4]   1.00-2.00   sec  11.9 MBytes   100 Mbits/sec  1526
>>>> [  4]   2.00-3.00   sec  11.9 MBytes   100 Mbits/sec  1526
>>>> [  4]   3.00-4.00   sec  11.9 MBytes   100 Mbits/sec  1526
>>>> [  4]   4.00-5.00   sec  11.9 MBytes   100 Mbits/sec  1526
>>>> - - - - - - - - - - - - - - - - - - - - - - - - -
>>>> [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
>>>> Datagrams
>>>> [  4]   0.00-5.00   sec  58.5 MBytes  98.1 Mbits/sec  0.047 ms  357/531 
>>>> (67%)
>>>> [  4] Sent 531 datagrams
>>>>
>>>> Server output:
>>>> -----------------------------------------------------------
>>>> Accepted connection from 10.15.2.2, port 60314 [  5] local
>>>> 10.15.2.6 port 5201 connected to 10.15.2.2 port 59053
>>>> [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
>>>> Datagrams
>>>> [  5]   0.00-1.00   sec  1.36 MBytes  11.4 Mbits/sec  0.047 ms  357/531 
>>>> (67%)
>>>> [  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>>>> [  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>>>> [  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>>>> [  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec  0.047 ms  0/0 (-nan%)
>>>>
>>>> iperf Done.
>>>>
>>>> The result after apply this commit 
>>>> ===================================
>>>> $ sudo ip netns exec ns02 iperf3 -t 5 -i 1 -u -b 4G -c 10.15.2.6 
>>>> --get-server-output -A 5 Connecting to host 10.15.2.6, port 5201 [ 
>>>> 4] local 10.15.2.2 port 48547 connected to 10.15.2.6 port 5201
>>>> [ ID] Interval           Transfer     Bandwidth       Total Datagrams
>>>> [  4]   0.00-1.00   sec   440 MBytes  3.69 Gbits/sec  56276
>>>> [  4]   1.00-2.00   sec   481 MBytes  4.04 Gbits/sec  61579
>>>> [  4]   2.00-3.00   sec   474 MBytes  3.98 Gbits/sec  60678
>>>> [  4]   3.00-4.00   sec   480 MBytes  4.03 Gbits/sec  61452
>>>> [  4]   4.00-5.00   sec   480 MBytes  4.03 Gbits/sec  61441
>>>> - - - - - - - - - - - - - - - - - - - - - - - - -
>>>> [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
>>>> Datagrams
>>>> [  4]   0.00-5.00   sec  2.30 GBytes  3.95 Gbits/sec  0.024 ms  0/301426 
>>>> (0%)
>>>> [  4] Sent 301426 datagrams
>>>>
>>>> Server output:
>>>> -----------------------------------------------------------
>>>> Accepted connection from 10.15.2.2, port 60320 [  5] local
>>>> 10.15.2.6 port 5201 connected to 10.15.2.2 port 48547
>>>> [ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total 
>>>> Datagrams
>>>> [  5]   0.00-1.00   sec   209 MBytes  1.75 Gbits/sec  0.021 ms  0/26704 
>>>> (0%)
>>>> [  5]   1.00-2.00   sec   258 MBytes  2.16 Gbits/sec  0.025 ms  0/32967 
>>>> (0%)
>>>> [  5]   2.00-3.00   sec   258 MBytes  2.16 Gbits/sec  0.022 ms  0/32987 
>>>> (0%)
>>>> [  5]   3.00-4.00   sec   257 MBytes  2.16 Gbits/sec  0.023 ms  0/32954 
>>>> (0%)
>>>> [  5]   4.00-5.00   sec   257 MBytes  2.16 Gbits/sec  0.021 ms  0/32937 
>>>> (0%)
>>>> [  5]   5.00-6.00   sec   255 MBytes  2.14 Gbits/sec  0.026 ms  0/32685 
>>>> (0%)
>>>> [  5]   6.00-7.00   sec   254 MBytes  2.13 Gbits/sec  0.025 ms  0/32453 
>>>> (0%)
>>>> [  5]   7.00-8.00   sec   255 MBytes  2.14 Gbits/sec  0.026 ms  0/32679 
>>>> (0%)
>>>> [  5]   8.00-9.00   sec   255 MBytes  2.14 Gbits/sec  0.022 ms  0/32669 
>>>> (0%)
>>>>
>>>> iperf Done.
>>>>
>>>> Signed-off-by: Yi Yang <yangy...@inspur.com>
>>>> ---
>>>
>>> I think we should make it configurable.  Each RXQ will potentially allow a 
>>> huge number of skbuffs to be enqueued after this.  That might, ironically, 
>>> lead to worse performance (since there could be some kind of buffer bloat 
>>> effect at higher rmem values as documented at 
>>> https://serverfault.com/questions/410230/higher-rmem-max-value-leading-to-more-packet-loss).
>>>
>>> I think it should be a decision that the operator can take.  Currently, 
>>> they could modify it anyway via procfs, so we shouldn't break that.
>>> Instead, I think there could be a config knob (or maybe reuse the
>>> 'n_{r,t}xq_desc'?) that when set would be used, and otherwise could just be 
>>> from default.
>>
>> If my memory serves me right, calling setsockopt(SO_RCVBUF) disables 
>> kernel's auto-tune buffer which very often hurts TCP performance, especially 
>> under load. 
>>
>> --
>> fbl
>> _______________________________________________
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> --
> fbl
> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

[ovs-dev] 答复: 答复: 答复: 答复: [PATCH] userspace: fix bad UDP performance issue of?veth

Reply via email to