On 8/31/22 17:12, Han Zhou wrote: > > > On Wed, Aug 31, 2022 at 2:06 AM Ilya Maximets <[email protected] > <mailto:[email protected]>> wrote: >> >> On 8/31/22 01:32, Han Zhou wrote: >> > >> > >> > On Tue, Aug 30, 2022 at 9:35 AM Ilya Maximets <[email protected] >> > <mailto:[email protected]> <mailto:[email protected] >> > <mailto:[email protected]>>> wrote: >> >> >> >> On 8/24/22 08:40, Han Zhou wrote: >> >> > The ls_in_pre_stateful priority 120 flow that saves dst IP and Port to >> >> > registers is causing a critical dataplane performance impact to >> >> > short-lived connections, because it unwildcards megaflows with exact >> >> > match on dst IP and L4 ports. Any new connections with a different >> >> > client side L4 port will encounter datapath flow miss and upcall to >> >> > ovs-vswitchd, which makes typical use cases such as HTTP1.0 based >> >> > RESTful API calls suffer big performance degredations. >> >> > >> >> > These fields (dst IP and port) were saved to registers to solve a >> >> > problem of LB hairpin use case when different VIPs are sharing >> >> > overlapping backend+port [0]. The change [0] might not have as wide >> >> > performance impact as it is now because at that time one of the match >> >> > condition "REGBIT_CONNTRACK_NAT == 1" was set only for established and >> >> > natted traffic, while now the impact is more obvious because >> >> > REGBIT_CONNTRACK_NAT is now set for all IP traffic (if any VIP >> >> > configured on the LS) since commit [1], after several other indirectly >> >> > related optimizations and refactors. >> >> > >> >> > This patch fixes the problem by modifying the priority-120 flows in >> >> > ls_in_pre_stateful. Instead of blindly saving dst IP and L4 port for any >> >> > traffic with the REGBIT_CONNTRACK_NAT == 1, we now save dst IP and L4 >> >> > port only for traffic matching the LB VIPs, because these are the ones >> >> > that need to be saved for the hairpin purpose. The existed priority-110 >> >> > flows will match the rest of the traffic just like before but wouldn't >> >> > not save dst IP and L4 port, so any server->client traffic would not >> >> > unwildcard megaflows with client side L4 ports. >> >> >> >> Hmm, but if higher priority flows have matches on these fields, datapath >> >> flows will have them unwildcarded anyway. So, why exactly that is better >> >> than the current approach? >> >> >> > Hi Ilya, >> > >> > The problem of the current approach is that it blindly saves the L4 dst >> > port for any traffic in any direction, as long as there are VIPs >> > configured on the datapath. >> > So consider the most typical scenario of a client sending API requests to >> > server backends behind a VIP. On the server side, any *reply* packets >> > would hit the flow that saves the client side L4 port because for >> > server->client direction the client port is the dst. If the client sends >> > 10 requests, each with a different source port, the server side will end >> > up with unwildcarded DP flows like below: (192.168.1.2 is client IP) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51224), >> > packets:5, bytes:2475, used:1.118s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x20) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51226), >> > packets:5, bytes:2475, used:1.105s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x21) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=37798), >> > packets:5, bytes:2475, used:0.574s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x40) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51250), >> > packets:5, bytes:2475, used:0.872s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x2d) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46940), >> > packets:5, bytes:2475, used:0.109s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x60) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=46938), >> > packets:5, bytes:2475, used:0.118s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x5f) >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=192.168.1.2,proto=6,frag=no),tcp(dst=51236), >> > packets:5, bytes:2475, used:0.938s, flags:FP., >> > actions:ct(zone=8,nat),recirc(0x26) >> > ... >> > >> > As a result, DP flows explode and every new request is going to be a miss >> > and upcall to userspace, which is very inefficient. Even worse, as the >> > flow is so generic, even traffic unrelated to the VIP would have the same >> > impact, as long as a server on a LS with any VIP configuration is replying >> > client requests. >> > With the fix, only the client->VIP packets would hit such flows, and in >> > those cases the dst port is the server (well known) port, which is >> > expected to be matched in megaflows anyway, while the client side port is >> > not unwildcarded, so new requests/replies will match megaflows in fast >> > path. >> > The above megaflows become: >> > recirc_id(0),in_port(4),eth(src=00:00:00:00:00:00/01:00:00:00:00:00,dst=00:00:01:01:02:04),eth_type(0x0800),ipv4(dst=128.0.0.0/128.0.0.0,frag=no >> > <http://128.0.0.0/128.0.0.0,frag=no> <http://128.0.0.0/128.0.0.0,frag=no >> > <http://128.0.0.0/128.0.0.0,frag=no>>), packets:263, bytes:112082, >> > used:0.013s, flags:SFP., actions:ct(zone=8,nat),recirc(0xd) >> >> >> Oh, OK. Thanks for the explanation! >> >> So, it's a reply traffic, and it will not have matches on L3 level >> unwildcarded >> too much since, I suppose, it has a destination address typically in a >> different >> subnet. > > After the fix, yes. Before the fix, no, because of the flow that saves dst IP > and port to registers. > >> So, the ipv4 trie on addresses cuts off the rest of the L3/L4 headers >> including source ip and the ports from the match criteria. > > Sorry I am not sure if I understand your question here. > If you are talking about the server(source)->client(destination) direction, > for the source/server ip and port, this is correct (before and after the fix). > If you are talking about the client ip and ports, it is the case after the > fix, but not before the fix.
I meant after the fix. Ack. Thanks for clarification! > > Thanks, > Han > >> >> Did I get that right? >> >> > >> > Thanks, >> > Han >> > >> >> I see how that can help for the case where vIPs has no ports specified, >> >> because we will not have ports unwildcarded in this case, but I thought >> >> it's a very unlikely scenario for, e.g., ovn-kubernetes setups. And if >> >> even one vIP will have a port, all the datapath flows will have a port >> >> match. Or am I missing something? >> >> >> >> Best regards, Ilya Maximets. >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
