Dumitru Ceara <[email protected]> writes:

> On 5/26/21 4:34 PM, Paolo Valerio wrote:
>> Hi Dumitru,
>> 
>> Dumitru Ceara <[email protected]> writes:
>> 
>>> On 4/28/21 11:56 PM, Paolo Valerio wrote:
>>>> this patch introduces for the userspace datapath the handling
>>>> of rules like the following:
>>>>
>>>> ct(commit,nat(src=0.0.0.0),...)
>>>>
>>>> Kernel datapath already handle this case that is particularly
>>>> handy in scenarios like the following:
>>>>
>>>> Given A: 10.1.1.1, B: 192.168.2.100, C: 10.1.1.2
>>>>
>>>> A opens a connection toward B on port 80 selecting as source port 10000.
>>>> B's IP gets dnat'ed to C's IP (10.1.1.1:10000 -> 192.168.2.100:80).
>>>>
>>>> This will result in:
>>>>
>>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED)
>>>>
>>>> A now tries to establish another connection with C using source port
>>>> 10000, this time using C's IP address (10.1.1.1:10000 -> 10.1.1.2:80).
>>>>
>>>> This second connection, if processed by conntrack with no SNAT/DNAT
>>>> involved, collides with the reverse tuple of the first connection,
>>>> so the entry for this valid connection doesn't get created.
>>>>
>>>> With this commit, and adding a SNAT rule with 0.0.0.0 for
>>>> 10.1.1.1:10000 -> 10.1.1.2:80 will allow to create the conn entry:
>>>>
>>>> tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10001),protoinfo=(state=ESTABLISHED)
>>>> tcp,orig=(src=10.1.1.1,dst=192.168.2.100,sport=10000,dport=80),reply=(src=10.1.1.2,dst=10.1.1.1,sport=80,dport=10000),protoinfo=(state=ESTABLISHED)
>>>>
>>>> The issue exists even in the opposite case (with A trying to connect
>>>> to C using B's IP after establishing a direct connection from A to C).
>>>>
>>>> This commit refactors the relevant function in a way that both of the
>>>> previously mentioned cases are handled as well.
>>>>
>>>> Suggested-by: Eelco Chaudron <[email protected]>
>>>> Signed-off-by: Paolo Valerio <[email protected]>
>>>> ---
>>>
>>> Hi Paolo,
>>>
>>> There seems to be a functionality difference between kernel and
>>> userspace datapath, wrt. all-zero IP SNAT.
>>>
>>> With the sample script below, when using the kernel datapath, p1 is able
>>> to connect to the VIP (DNAT) with backend p2.  However, when using the
>>> userspace datapath, it doesn't work because userspace conntrack will
>>> always perform an additional SNAT if two commit operations happen for
>>> the same packet in the same zone.
>>>
>>> Arguably this pipeline could be optimized, but it's actually what OVN
>>> generates today, with the addition that on the second commit we also
>>> perform SNAT(0.0.0.0).  OVN first executes the load balancing stage
>>> (DNAT + commit) and later executes the firewalling stage (ACL) and if
>>> the packet should be allowed the connection gets committed again.  This
>>> also covers the case when packets are not load balanced, and only
>>> committed in the ACL stage.
>>>
>> 
>> The behavior is present in the userspace dp without this series.
>> As of today, multiple commits appear to be managed in kernel dp but not
>> in userspace dp.
>
> Ok, I see.  The thing is in OVN we don't have chains of multiple commits
> performing NAT in the same zone.  But that will have to change when we
> start using all-zero IP SNAT.  So, even though this is not a new issue,
> it's being hit now that we start using all-zero IP SNAT.
>

Absolutely, that was just useful to know that a possible fix should be done in
a different patch, IMO.

>> 
>> E.g. with the current release (so no null snat involved), using your
>> script, setting an IP address other than all-zero (even the same source
>> IP of the sender in the orig direction) in the snat rule:
>> 
>> ovs-ofctl add-flow br-test 'table=1,tcp 
>> actions=ct(commit,nat(src=42.42.42.3),table=2)'
>> 
>> there's basically the same behavior for both datapaths:
>> - one working entry for kernel dp
>> - two for userspace with the subsequent failure
>> 
>> Kernel dp tries to prevent the second one from being executed
>> caching and doing other checks, while the userspace dp doesn't take
>> care of it.
>> 
>> Of course, in this case, the example is made up and just like you said,
>> the problem can be avoided changing the rules.
>
> Yes, we can try to work around in OVN but it seems a bit artificial to
> me to change the generic openflow pipeline configured by OVN just so
> that we avoid datapath-specific bugs.
>

I see.

>> 
>>> While I think we can try and work around this in OVN by using additional
>>> openflow register bits to differentiate between traffic that was load
>>> balanced or not, it would make it way easier if the sequence:
>>>
>>> table=T: ct(commit, nat(dst=x.y.z.t), zone=Z, table=T+1)
>>>
>>> followed by (without changing conntrack zone)
>>>
>>> table=T+1: ct(commit, nat(src=0.0.0.0), zone=Z)
>>>
>>> would translate the second commit to a no-op similar to what the kernel
>>> does.
>>>
>> 
>> It would be good to prevent failures (as long as it is possible) and
>> have a consistent behavior between datapaths.
>
> Yes, I think that's one of the goals indeed, consistency between
> datapaths.  However, at least from my knowledge, most OVN deployments
> today use the kernel datapath.  So it would be great if the patch we
> discussed with Aaron, to add the ct_all_zero_snat (or whatever name we
> decide on) capability to the Datapath ovsdb record and mark it as
> supported for the kernel datapath, would make it in OVS earlier rather
> than later.
>

Yes, I agree, that would be very useful.

> I'm also assuming that such a patch would be backportable to stable OVS
> branches which would unblock OVN and allow us to fix at least the
> deployments that use the kernel datapath.
>
> I went ahead and sent an OVN RFC series to implement that:
>
> http://patchwork.ozlabs.org/project/ovn/list/?series=245843
>

great! Thank you for that.

>> 
>> Nice catch though!
>> 
>> Paolo
>
> Thanks,
> Dumitru
>
>> 
>>> Thanks,
>>> Dumitru
>>>
>>> ovs-vsctl add-br br-test
>>> ovs-vsctl set bridge br-test datapath_type=netdev
>>> ovs-vsctl add-port br-test p1 -- set interface p1 type=internal
>>> ovs-vsctl add-port br-test p2 -- set interface p2 type=internal
>>> ip netns add p1
>>> ip link set p1 netns p1
>>> ip netns exec p1 ip link set up dev p1
>>> ip netns exec p1 ip link set p1 address 00:00:00:00:00:01
>>> ip netns exec p1 ip addr add dev p1 42.42.42.1/24
>>> ip netns exec p1 ip neigh add 42.42.42.42 lladdr 00:00:00:00:00:02 dev p1
>>>
>>> ip netns add p2
>>> ip link set p2 netns p2
>>> ip netns exec p2 ip link set up dev p2
>>> ip netns exec p2 ip link set p2 address 00:00:00:00:00:02
>>> ip netns exec p2 ip addr add dev p2 42.42.42.2/24
>>>
>>> ovs-ofctl del-flows br-test
>>> ovs-ofctl add-flow br-test 
>>> 'table=0,priority=2,in_port=1,tcp,nw_dst=42.42.42.42,tp_dst=8080 
>>> actions=ct(commit,nat(dst=42.42.42.2:80),table=1)'
>>> ovs-ofctl add-flow br-test 'table=0,priority=2,in_port=2,tcp 
>>> actions=ct(nat,table=1)'
>>> ovs-ofctl add-flow br-test 'table=0,priority=1 actions=resubmit(,2)'
>>> ovs-ofctl add-flow br-test 'table=1,tcp 
>>> actions=ct(commit,nat(src=0.0.0.0),table=2)'
>>> ovs-ofctl add-flow br-test 'table=2,in_port=1 actions=2'
>>> ovs-ofctl add-flow br-test 'table=2,in_port=2 actions=1'
>>>
>>> # Start a server.
>>> ip netns exec p2 nc -l -v -k 42.42.42.2 80
>>>
>>> # Start client1 (should succeed):
>>> ip netns exec p1 nc -v -z -s 42.42.42.1 -p 4000 42.42.42.42 8080
>> 

_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to