On 5/26/21 4:34 PM, Paolo Valerio wrote: > Hi Dumitru, > > Dumitru Ceara <[email protected]> writes: > >> On 4/28/21 11:56 PM, Paolo Valerio wrote: >>> this patch introduces for the userspace datapath the handling >>> of rules like the following: >>> >>> ct(commit,nat(src=0.0.0.0),...) >>> >>> Kernel datapath already handle this case that is particularly >>> handy in scenarios like the following: >>> >>> Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2 >>> >>> A opens a connection toward B on port 80 selecting as source port 10000. >>> B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80). >>> >>> This will result in: >>> >>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) >>> >>> A now tries to establish another connection with C using source port >>> 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80). >>> >>> This second connection, if processed by conntrack with no SNAT/DNAT >>> involved, collides with the reverse tuple of the first connection, >>> so the entry for this valid connection doesn't get created. >>> >>> With this commit, and adding a SNAT rule with 0.0.0.0 for >>> 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry: >>> >>> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED) >>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED) >>> >>> The issue exists even in the opposite case (with A trying to connect >>> to C using B's IP after establishing a direct connection from A to C). >>> >>> This commit refactors the relevant function in a way that both of the >>> previously mentioned cases are handled as well. >>> >>> Suggested-by: Eelco Chaudron <[email protected]> >>> Signed-off-by: Paolo Valerio <[email protected]> >>> --- >> >> Hi Paolo, >> >> There seems to be a functionality difference between kernel and >> userspace datapath, wrt. all-zero IP SNAT. >> >> With the sample script below, when using the kernel datapath, p1 is able >> to connect to the VIP (DNAT) with backend p2. However, when using the >> userspace datapath, it doesn't work because userspace conntrack will >> always perform an additional SNAT if two commit operations happen for >> the same packet in the same zone. >> >> Arguably this pipeline could be optimized, but it's actually what OVN >> generates today, with the addition that on the second commit we also >> perform SNAT(0.0.0.0). OVN first executes the load balancing stage >> (DNAT + commit) and later executes the firewalling stage (ACL) and if >> the packet should be allowed the connection gets committed again. This >> also covers the case when packets are not load balanced, and only >> committed in the ACL stage. >> > > The behavior is present in the userspace dp without this series. > As of today, multiple commits appear to be managed in kernel dp but not > in userspace dp.
Ok, I see. The thing is in OVN we don't have chains of multiple commits performing NAT in the same zone. But that will have to change when we start using all-zero IP SNAT. So, even though this is not a new issue, it's being hit now that we start using all-zero IP SNAT. > > E.g. with the current release (so no null snat involved), using your > script, setting an IP address other than all-zero (even the same source > IP of the sender in the orig direction) in the snat rule: > > ovs-ofctl add-flow br-test 'table=1,tcp > actions=ct(commit,nat(src=42.42.42.3),table=2)' > > there's basically the same behavior for both datapaths: > - one working entry for kernel dp > - two for userspace with the subsequent failure > > Kernel dp tries to prevent the second one from being executed > caching and doing other checks, while the userspace dp doesn't take > care of it. > > Of course, in this case, the example is made up and just like you said, > the problem can be avoided changing the rules. Yes, we can try to work around in OVN but it seems a bit artificial to me to change the generic openflow pipeline configured by OVN just so that we avoid datapath-specific bugs. > >> While I think we can try and work around this in OVN by using additional >> openflow register bits to differentiate between traffic that was load >> balanced or not, it would make it way easier if the sequence: >> >> table=T: ct(commit, nat(dst=x.y.z.t), zone=Z, table=T+1) >> >> followed by (without changing conntrack zone) >> >> table=T+1: ct(commit, nat(src=0.0.0.0), zone=Z) >> >> would translate the second commit to a no-op similar to what the kernel >> does. >> > > It would be good to prevent failures (as long as it is possible) and > have a consistent behavior between datapaths. Yes, I think that's one of the goals indeed, consistency between datapaths. However, at least from my knowledge, most OVN deployments today use the kernel datapath. So it would be great if the patch we discussed with Aaron, to add the ct_all_zero_snat (or whatever name we decide on) capability to the Datapath ovsdb record and mark it as supported for the kernel datapath, would make it in OVS earlier rather than later. I'm also assuming that such a patch would be backportable to stable OVS branches which would unblock OVN and allow us to fix at least the deployments that use the kernel datapath. I went ahead and sent an OVN RFC series to implement that: http://patchwork.ozlabs.org/project/ovn/list/?series=245843 > > Nice catch though! > > Paolo Thanks, Dumitru > >> Thanks, >> Dumitru >> >> ovs-vsctl add-br br-test >> ovs-vsctl set bridge br-test datapath_type=netdev >> ovs-vsctl add-port br-test p1 -- set interface p1 type=internal >> ovs-vsctl add-port br-test p2 -- set interface p2 type=internal >> ip netns add p1 >> ip link set p1 netns p1 >> ip netns exec p1 ip link set up dev p1 >> ip netns exec p1 ip link set p1 address 00:00:00:00:00:01 >> ip netns exec p1 ip addr add dev p1 42.42.42.1/24 >> ip netns exec p1 ip neigh add 42.42.42.42 lladdr 00:00:00:00:00:02 dev p1 >> >> ip netns add p2 >> ip link set p2 netns p2 >> ip netns exec p2 ip link set up dev p2 >> ip netns exec p2 ip link set p2 address 00:00:00:00:00:02 >> ip netns exec p2 ip addr add dev p2 42.42.42.2/24 >> >> ovs-ofctl del-flows br-test >> ovs-ofctl add-flow br-test >> 'table=0,priority=2,in_port=1,tcp,nw_dst=42.42.42.42,tp_dst=8080 >> actions=ct(commit,nat(dst=42.42.42.2:80),table=1)' >> ovs-ofctl add-flow br-test 'table=0,priority=2,in_port=2,tcp >> actions=ct(nat,table=1)' >> ovs-ofctl add-flow br-test 'table=0,priority=1 actions=resubmit(,2)' >> ovs-ofctl add-flow br-test 'table=1,tcp >> actions=ct(commit,nat(src=0.0.0.0),table=2)' >> ovs-ofctl add-flow br-test 'table=2,in_port=1 actions=2' >> ovs-ofctl add-flow br-test 'table=2,in_port=2 actions=1' >> >> # Start a server. >> ip netns exec p2 nc -l -v -k 42.42.42.2 80 >> >> # Start client1 (should succeed): >> ip netns exec p1 nc -v -z -s 42.42.42.1 -p 4000 42.42.42.42 8080 > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
