On 5/21/22 12:49, Frode Nordahl wrote:
> On Thu, May 19, 2022 at 3:39 PM Frode Nordahl
> <[email protected]> wrote:
>>
>> On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <[email protected]> wrote:
>>>
>>> On 5/13/22 10:36, Frode Nordahl wrote:
>>>> On Fri, Mar 11, 2022 at 2:04 PM Liam Young <[email protected]>
>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr>
>>>>
>>>> After OVN started colocating sNAT and dNAT operations in the same CT
>>>> zone [0] the above mentioned OVS commit appears to also break OVN [1].
>>>> I have been discussing with Numan and he has found a correlation with
>>>> the behavior of the open vswitch kernel data path conntrack
>>>> `skb_nfct_cached` function, i.e. if that is changed to always return
>>>> 'false' the erratic behavior disappears.
>>>>
>>>> So I guess the question then becomes, is this a OVS userspace or OVS
>>>> kernel problem?
>>>>
>>>> We have a reproducer in [3].
>>>>
>>>> 0:
>>>> https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a
>>>> 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856
>>>> 2:
>>>> https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683
>>>> 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6
>>>>
>>>
>>> Hrm. I think, there is a logical bug in implementations of ct()
>>> datapath action in both datapaths.
>>>
>>> The problem appears when the OpenFlow pipeline for the packet contains
>>> several ct() actions. NXAST_CT action (i.e. every ct() action) must
>>> always put the packet into an untracked state. Tracked state will be
>>> set only in the 'recirc_table' where the packet is cloned by the ct()
>>> action for further processing.
>>>
>>> If an OF pipeline looks like this:
>>>
>>> actions=ct(),something,something,ct(),something
>>>
>>> each action must be entered with the 'untracked' conntrack state in
>>> the packet metadata.
>>>
>>> However, ct() action in the datapath, unlike OpenFlow, doesn't work
>>> this way. It modifies the conntrack state for the packet it is processing.
>>> During the flow translation OVS inserts recirculation right after the
>>> datapath ct() action. This way conntrack state affects only processing
>>> after recirculation. Sort of.
>>>
>>> The issue is that not all OpenFlow ct() actions have recirc_table
>>> specified. These actions supposed to change the state of the conntrack
>>> subsystem, but they still must leave the packet itself in the untracked
>>> state with no strings attached. But since datapath ct() actions doesn't
>>> work that way and since recirculation is not inserted (no recirc_table
>>> specified), packet after conntrack execution carries the state along to
>>> all other actions.
>>> This doesn't impact normal actions, but seems to break subsequent ct()
>>> actions on the same pipeline.
>>>
>>> In general, it looks to me that ct() action in the datapath is
>>> not supposed to cache any connection data inside the packet's metadata.
>>> This seems to be the root cause of the problem. Fields that OF knows
>>> about should not trigger any issues if carried along with a packet,
>>> but datapath-specific metadata can not be cleared, because OFproto
>>> simply doesn't know about it.
>>>
>>> But, I guess, not caching the connection and other internal structures
>>> might significantly impact datapath performance. Will it?
>>>
>>> Looking at the reproducer in [3], it has, for example, the flow with the
>>> following actions:
>>>
>>> actions:ct(commit,zone=8,mark=0/0x1,nat(src)),
>>> set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)),
>>> set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)),
>>> ct(zone=8),recirc(0x4)
>>>
>>> So, if the first ct() will change some datapath-specific packet metadata,
>>> the second ct() may try to use that information.
>>>
>>> It looks like during the flow translation we must add ct_clear action
>>> explicitly after every ct() action, unless it was the last action in
>>> the list. This will end up with adding a lot of ct_clear actions
>>> everywhere.
>>>
>>> Another option is the patch below that tries to track if ct_clear
>>> is required and adds that action before the next ct() action in
>>> the datapath.
>>>
>>> BTW, the test [3] fails with both kernel and userspace datapaths.
>>>
>>> The change below should fix the problem, I think.
>>> It looks like we also have to put ct_clear action before translating
>>> output to the patch port if we're in 'conntracked' state.
>>>
>>> And I do not know how to fix the problem for kernels without ct_clear
>>> support. I don't think we can clear metadata that is internal to
>>> the kernel. Ideas are welcome.
>>>
>>> CC: Aaron, Paolo.
>>>
>>> Any thoughts?
>>
>> Thank you so much for the detailed explanation of what's going on and
>> for providing a proposed fix.
>>
>> I took it for a spin and it does indeed appear to fix the OVN hairpin
>> issue, but it does unfortunately not appear to fix the issue Liam
>> reported for the ML2/OVS use case [4].
>>
>> When trying that use case with your patch I see this in the Open vSwitch log:
>> 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over
>> max translation depth 64 on bridge br-int while processing
>> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
>> 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped
>> 2 log messages in last 205 seconds (most recently, 187 seconds ago)
>> due to excessive rate
>> 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow:
>> ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0
>>
>> bridge("br-int")
>> ----------------
>> 0. priority 0, cookie 0x56d6d58c018b51bd
>> goto_table:60
>> 60. priority 3, cookie 0x56d6d58c018b51bd
>> NORMAL
>> >>>> received packet on unknown port 5 <<<<
>> >> no input bundle, dropping
>>
>> Final flow: unchanged
>> Megaflow:
>> recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no
>> Datapath actions: drop
>>
>>
>> On the back of your explanation and the log output I made an attempt
>> at adding this patch on top of yours:
>> ---
>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
>> index 110dab0ec..2955e8e4d 100644
>> --- a/ofproto/ofproto-dpif-xlate.c
>> +++ b/ofproto/ofproto-dpif-xlate.c
>> @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact *ofpacts,
>> size_t ofpacts_len,
>> * resubmit to the frozen actions.
>> */
>> case OFPACT_RESUBMIT:
>> + ctx->pending_ct_clear = true;
>> xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), last);
>> continue;
>> case OFPACT_GOTO_TABLE:
>> ---
>>
>> And the two patches together do appear to resolve the issue reported
>> in [4] as well as the OVN hairpin issue [1]. It does however make a
>> couple of tests fail, so I need to look into if that is expected from
>> the change or if the approach must be changed.
>>
>> 4: https://bugs.launchpad.net/openvswitch/+bug/1964117
>
> The following patch also works and does not cause any tests to fail.
>
> ---
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 110dab0ec..905f6994d 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts,
> size_t ofpacts_len,
> break;
>
> case OFPACT_CT_CLEAR:
> - if (ctx->conntracked || ctx->pending_ct_clear) {
> + if (ctx->conntracked || ctx->pending_ct_clear
> + || (flow->ct_state && flow->ct_state & CS_TRACKED))
> + {
> compose_ct_clear_action(ctx);
> }
> break;
> ---
>
> The OpenStack ML2/OVS Open vSwitch firewall driver appears to make
> explicit use of ct_clear action as part of progressing packets through
> the tables and the optimization in 355fef6f2 interrupts that workflow.
>
> I am not quite sure if the definition of `struct
> xlate_ctx->conntracked` is "this flow has been exposed to conntrack"
> or if it is "conntrack sees this flow as established", if anyone can
> shed light on that we would know if the above patch is ok as is or if
> this flow state should have set struct xlate_ctx->conntracked to true
> in the first place.
I beleive that ctx->conntracked should men that "this flow has been exposed
to conntrack". And it looks like the condition:
(!ctx->conntracked && flow->ct_state & CS_TRACKED)
should not be possible. We're, probably, missing clear_conntrack() call
somewhere.
One thing I seem to miss in my patch is the conntrack clear after returning
from the patch port processing:
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 110dab0ec..53d4f78b2 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3955,8 +3955,13 @@ patch_port_output(struct xlate_ctx *ctx, const struct
xport *in_dev,
/* The out bridge's conntrack execution should have no effect on the
* original bridge. */
- ctx->conntracked = old_conntrack;
ctx->pending_ct_clear = old_ct_clear;
+ if (ctx->conntracked) {
+ /* Conntrack was involved in the other bridge. We need to clear
+ * whatever information was cached in the datapath. */
+ ctx->pending_ct_clear = true;
+ }
+ ctx->conntracked = old_conntrack;
/* The fact that the out bridge exits (for any reason) does not mean
* that the original bridge should exit. Specifically, if the out
---
But that doesn't seem to be related to the (!ctx->conntracked &&
flow->ct_state & CS_TRACKED) case...
>
> The output from `conntrack -E` with the above patch is in [5], and
> without it is in [6].
>
> 5: https://pastebin.ubuntu.com/p/tQr6PhXD9X/
> 6: https://pastebin.ubuntu.com/p/WkDmpc6xRZ/
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev