Dumitru Ceara <[email protected]> writes: > On 5/26/21 4:34 PM, Paolo Valerio wrote: >> Hi Dumitru, >> >> Dumitru Ceara <[email protected]> writes: >> >>> On 4/28/21 11:56 PM, Paolo Valerio wrote: >>>> this patch introduces for the userspace datapath the handling >>>> of rules like the following: >>>> >>>> ct(commit,nat(src=0.0.0.0),...) >>>> >>>> Kernel datapath already handle this case that is particularly >>>> handy in scenarios like the following: >>>> >>>> Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2 >>>> >>>> A opens a connection toward B on port 80 selecting as source port 10000. >>>> B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80). >>>> >>>> This will result in: >>>> >>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) >>>> >>>> A now tries to establish another connection with C using source port >>>> 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80). >>>> >>>> This second connection, if processed by conntrack with no SNAT/DNAT >>>> involved, collides with the reverse tuple of the first connection, >>>> so the entry for this valid connection doesn't get created. >>>> >>>> With this commit, and adding a SNAT rule with 0.0.0.0 for >>>> 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry: >>>> >>>> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED) >>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) >>>> >>>> The issue exists even in the opposite case (with A trying to connect >>>> to C using B's IP after establishing a direct connection from A to C). >>>> >>>> This commit refactors the relevant function in a way that both of the >>>> previously mentioned cases are handled as well. >>>> >>>> Suggested-by: Eelco Chaudron <[email protected]> >>>> Signed-off-by: Paolo Valerio <[email protected]> >>>> --- >>> >>> Hi Paolo, >>> >>> There seems to be a functionality difference between kernel and >>> userspace datapath, wrt. all-zero IP SNAT. >>> >>> With the sample script below, when using the kernel datapath, p1 is able >>> to connect to the VIP (DNAT) with backend p2. However, when using the >>> userspace datapath, it doesn't work because userspace conntrack will >>> always perform an additional SNAT if two commit operations happen for >>> the same packet in the same zone. >>> >>> Arguably this pipeline could be optimized, but it's actually what OVN >>> generates today, with the addition that on the second commit we also >>> perform SNAT(0.0.0.0). OVN first executes the load balancing stage >>> (DNAT + commit) and later executes the firewalling stage (ACL) and if >>> the packet should be allowed the connection gets committed again. This >>> also covers the case when packets are not load balanced, and only >>> committed in the ACL stage. >>> >> >> The behavior is present in the userspace dp without this series. >> As of today, multiple commits appear to be managed in kernel dp but not >> in userspace dp. > > Ok, I see. The thing is in OVN we don't have chains of multiple commits > performing NAT in the same zone. But that will have to change when we > start using all-zero IP SNAT. So, even though this is not a new issue, > it's being hit now that we start using all-zero IP SNAT. >
Absolutely, that was just useful to know that a possible fix should be done in a different patch, IMO. >> >> E.g. with the current release (so no null snat involved), using your >> script, setting an IP address other than all-zero (even the same source >> IP of the sender in the orig direction) in the snat rule: >> >> ovs-ofctl add-flow br-test 'table=1,tcp >> actions=ct(commit,nat(src=42.42.42.3),table=2)' >> >> there's basically the same behavior for both datapaths: >> - one working entry for kernel dp >> - two for userspace with the subsequent failure >> >> Kernel dp tries to prevent the second one from being executed >> caching and doing other checks, while the userspace dp doesn't take >> care of it. >> >> Of course, in this case, the example is made up and just like you said, >> the problem can be avoided changing the rules. > > Yes, we can try to work around in OVN but it seems a bit artificial to > me to change the generic openflow pipeline configured by OVN just so > that we avoid datapath-specific bugs. > I see. >> >>> While I think we can try and work around this in OVN by using additional >>> openflow register bits to differentiate between traffic that was load >>> balanced or not, it would make it way easier if the sequence: >>> >>> table=T: ct(commit, nat(dst=x.y.z.t), zone=Z, table=T+1) >>> >>> followed by (without changing conntrack zone) >>> >>> table=T+1: ct(commit, nat(src=0.0.0.0), zone=Z) >>> >>> would translate the second commit to a no-op similar to what the kernel >>> does. >>> >> >> It would be good to prevent failures (as long as it is possible) and >> have a consistent behavior between datapaths. > > Yes, I think that's one of the goals indeed, consistency between > datapaths. However, at least from my knowledge, most OVN deployments > today use the kernel datapath. So it would be great if the patch we > discussed with Aaron, to add the ct_all_zero_snat (or whatever name we > decide on) capability to the Datapath ovsdb record and mark it as > supported for the kernel datapath, would make it in OVS earlier rather > than later. > Yes, I agree, that would be very useful. > I'm also assuming that such a patch would be backportable to stable OVS > branches which would unblock OVN and allow us to fix at least the > deployments that use the kernel datapath. > > I went ahead and sent an OVN RFC series to implement that: > > http://patchwork.ozlabs.org/project/ovn/list/?series=245843 > great! Thank you for that. >> >> Nice catch though! >> >> Paolo > > Thanks, > Dumitru > >> >>> Thanks, >>> Dumitru >>> >>> ovs-vsctl add-br br-test >>> ovs-vsctl set bridge br-test datapath_type=netdev >>> ovs-vsctl add-port br-test p1 -- set interface p1 type=internal >>> ovs-vsctl add-port br-test p2 -- set interface p2 type=internal >>> ip netns add p1 >>> ip link set p1 netns p1 >>> ip netns exec p1 ip link set up dev p1 >>> ip netns exec p1 ip link set p1 address 00:00:00:00:00:01 >>> ip netns exec p1 ip addr add dev p1 42.42.42.1/24 >>> ip netns exec p1 ip neigh add 42.42.42.42 lladdr 00:00:00:00:00:02 dev p1 >>> >>> ip netns add p2 >>> ip link set p2 netns p2 >>> ip netns exec p2 ip link set up dev p2 >>> ip netns exec p2 ip link set p2 address 00:00:00:00:00:02 >>> ip netns exec p2 ip addr add dev p2 42.42.42.2/24 >>> >>> ovs-ofctl del-flows br-test >>> ovs-ofctl add-flow br-test >>> 'table=0,priority=2,in_port=1,tcp,nw_dst=42.42.42.42,tp_dst=8080 >>> actions=ct(commit,nat(dst=42.42.42.2:80),table=1)' >>> ovs-ofctl add-flow br-test 'table=0,priority=2,in_port=2,tcp >>> actions=ct(nat,table=1)' >>> ovs-ofctl add-flow br-test 'table=0,priority=1 actions=resubmit(,2)' >>> ovs-ofctl add-flow br-test 'table=1,tcp >>> actions=ct(commit,nat(src=0.0.0.0),table=2)' >>> ovs-ofctl add-flow br-test 'table=2,in_port=1 actions=2' >>> ovs-ofctl add-flow br-test 'table=2,in_port=2 actions=1' >>> >>> # Start a server. >>> ip netns exec p2 nc -l -v -k 42.42.42.2 80 >>> >>> # Start client1 (should succeed): >>> ip netns exec p1 nc -v -z -s 42.42.42.1 -p 4000 42.42.42.42 8080 >> _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
