On 8/31/22 17:14, Ilya Maximets wrote:
> On 8/31/22 17:12, Han Zhou wrote:
>>
>>
>> On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets <[email protected] 
>> <mailto:[email protected]>> wrote:
>>>
>>> On 8/31/22 01:32, Han Zhou wrote:
>>>>
>>>>
>>>> On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets <[email protected] 
>>>> <mailto:[email protected]> <mailto:[email protected] 
>>>> <mailto:[email protected]>>> wrote:
>>>>>
>>>>> On 8/24/22 08:40, Han Zhou wrote:
>>>>>> The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to
>>>>>> registers is causing a critical dataplane performance impact to
>>>>>> short-lived connections, because it unwildcards megaflows with exact
>>>>>> match on dst IP and L4 ports. Any new connections with a different
>>>>>> client side L4 port will encounter datapath flow miss and upcall to
>>>>>> ovs-vswitchd, which makes typical use cases such as HTTP1.0 based
>>>>>> RESTful API calls suffer big performance degredations.
>>>>>>
>>>>>> These fields (dst IP and port) were saved to registers to solve a
>>>>>> problem of LB hairpin use case when different VIPs are sharing
>>>>>> overlapping backend+port [0]. The change [0] might not have as wide
>>>>>> performance impact as it is now because at that time one of the match
>>>>>> condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and
>>>>>> natted traffic, while now the impact is more obvious because
>>>>>> REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP
>>>>>> configured on the LS) since commit [1], after several other indirectly
>>>>>> related optimizations and refactors.
>>>>>>
>>>>>> This patch fixes the problem by modifying the priority-120 flows in
>>>>>> ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any
>>>>>> traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4
>>>>>> port only for traffic matching the LB VIPs, because these are the ones
>>>>>> that need to be saved for the hairpin purpose. The existed priority-110
>>>>>> flows will match the rest of the traffic just like before but wouldn't
>>>>>> not save dst IP and L4 port, so any server->client traffic would not
>>>>>> unwildcard megaflows with client side L4 ports.
>>>>>
>>>>> Hmm, but if higher priority flows have matches on these fields, datapath
>>>>> flows will have them unwildcarded anyway.  So, why exactly that is better
>>>>> than the current approach?
>>>>>
>>>> Hi Ilya,
>>>>
>>>> The problem of the current approach is that it blindly saves the L4 dst 
>>>> port for any traffic in any direction, as long as there are VIPs 
>>>> configured on the datapath.
>>>> So consider the most typical scenario of a client sending API requests to 
>>>> server backends behind a VIP. On the server side, any *reply* packets 
>>>> would hit the flow that saves the client side L4 port because for 
>>>> server->client direction the client port is the dst. If the client sends 
>>>> 10 requests, each with a different source port, the server side will end 
>>>> up with unwildcarded DP flows like below: (192.168.1.2 is client IP)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224),
>>>>  packets:5, bytes:2475, used:1.118s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x20)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226),
>>>>  packets:5, bytes:2475, used:1.105s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x21)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798),
>>>>  packets:5, bytes:2475, used:0.574s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x40)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250),
>>>>  packets:5, bytes:2475, used:0.872s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x2d)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940),
>>>>  packets:5, bytes:2475, used:0.109s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x60)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938),
>>>>  packets:5, bytes:2475, used:0.118s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x5f)
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236),
>>>>  packets:5, bytes:2475, used:0.938s, flags:FP., 
>>>> actions:ct(zone=8,nat),recirc(0x26)
>>>> ...
>>>>
>>>> As a result, DP flows explode and every new request is going to be a miss 
>>>> and upcall to userspace, which is very inefficient. Even worse, as the 
>>>> flow is so generic, even traffic unrelated to the VIP would have the same 
>>>> impact, as long as a server on a LS with any VIP configuration is replying 
>>>> client requests.
>>>> With the fix, only the client->VIP packets would hit such flows, and in 
>>>> those cases the dst port is the server (well known) port, which is 
>>>> expected to be matched in megaflows anyway, while the client side port is 
>>>> not unwildcarded, so new requests/replies will match megaflows in fast 
>>>> path.
>>>> The above megaflows become:
>>>> recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=128.0.0.0/128.0.0.0,frag=no
>>>>  <http://128.0.0.0/128.0.0.0,frag=no> <http://128.0.0.0/128.0.0.0,frag=no 
>>>> <http://128.0.0.0/128.0.0.0,frag=no>>), packets:263, bytes:112082, 
>>>> used:0.013s, flags:SFP., actions:ct(zone=8,nat),recirc(0xd)
>>>
>>>
>>> Oh, OK.  Thanks for the explanation!
>>>
>>> So, it's a reply traffic, and it will not have matches on L3 level 
>>> unwildcarded
>>> too much since, I suppose, it has a destination address typically in a 
>>> different
>>> subnet.
>>
>> After the fix, yes. Before the fix, no, because of the flow that saves dst 
>> IP and port to registers.
>>
>>> So, the ipv4 trie on addresses cuts off the rest of the L3/L4 headers
>>> including source ip and the ports from the match criteria.
>>
>> Sorry I am not sure if I understand your question here.
>> If you are talking about the server(source)->client(destination) direction, 
>> for the source/server ip and port, this is correct (before and after the 
>> fix).
>> If you are talking about the client ip and ports, it is the case after the 
>> fix, but not before the fix.
> 
> 
> I meant after the fix.  Ack.

"Ack" as if I got the explanation.  I didn't review the code itself. :)

> Thanks for clarification!
> 
>>
>> Thanks,
>> Han
>>
>>>
>>> Did I get that right?
>>>
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>> I see how that can help for the case where vIPs has no ports specified,
>>>>> because we will not have ports unwildcarded in this case, but I thought
>>>>> it's a very unlikely scenario for, e.g., ovn-kubernetes setups.  And if
>>>>> even one vIP will have a port, all the datapath flows will have a port
>>>>> match.  Or am I missing something?
>>>>>
>>>>> Best regards, Ilya Maximets.
>>>
> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to