On Thu, May 19, 2022 at 3:39 PM Frode Nordahl <frode.nord...@canonical.com> wrote: > > On Sat, May 14, 2022 at 2:10 AM Ilya Maximets <i.maxim...@ovn.org> wrote: > > > > On 5/13/22 10:36, Frode Nordahl wrote: > > > On Fri, Mar 11, 2022 at 2:04 PM Liam Young <liam.yo...@canonical.com> > > > wrote: > > >> > > >> Hi, > > >> > > >> <tl;dr> Commit 355fef6f2 seems to break connectivity in my setup</tl;dr> > > > > > > After OVN started colocating sNAT and dNAT operations in the same CT > > > zone [0] the above mentioned OVS commit appears to also break OVN [1]. > > > I have been discussing with Numan and he has found a correlation with > > > the behavior of the open vswitch kernel data path conntrack > > > `skb_nfct_cached` function, i.e. if that is changed to always return > > > 'false' the erratic behavior disappears. > > > > > > So I guess the question then becomes, is this a OVS userspace or OVS > > > kernel problem? > > > > > > We have a reproducer in [3]. > > > > > > 0: > > > https://github.com/ovn-org/ovn/commit/4deac4509abbedd6ffaecf27eed01ddefccea40a > > > 1: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856 > > > 2: > > > https://elixir.bootlin.com/linux/latest/source/net/openvswitch/conntrack.c#L683 > > > 3: https://bugs.launchpad.net/ubuntu/+source/ovn/+bug/1967856/comments/6 > > > > > > > Hrm. I think, there is a logical bug in implementations of ct() > > datapath action in both datapaths. > > > > The problem appears when the OpenFlow pipeline for the packet contains > > several ct() actions. NXAST_CT action (i.e. every ct() action) must > > always put the packet into an untracked state. Tracked state will be > > set only in the 'recirc_table' where the packet is cloned by the ct() > > action for further processing. > > > > If an OF pipeline looks like this: > > > > actions=ct(),something,something,ct(),something > > > > each action must be entered with the 'untracked' conntrack state in > > the packet metadata. > > > > However, ct() action in the datapath, unlike OpenFlow, doesn't work > > this way. It modifies the conntrack state for the packet it is processing. > > During the flow translation OVS inserts recirculation right after the > > datapath ct() action. This way conntrack state affects only processing > > after recirculation. Sort of. > > > > The issue is that not all OpenFlow ct() actions have recirc_table > > specified. These actions supposed to change the state of the conntrack > > subsystem, but they still must leave the packet itself in the untracked > > state with no strings attached. But since datapath ct() actions doesn't > > work that way and since recirculation is not inserted (no recirc_table > > specified), packet after conntrack execution carries the state along to > > all other actions. > > This doesn't impact normal actions, but seems to break subsequent ct() > > actions on the same pipeline. > > > > In general, it looks to me that ct() action in the datapath is > > not supposed to cache any connection data inside the packet's metadata. > > This seems to be the root cause of the problem. Fields that OF knows > > about should not trigger any issues if carried along with a packet, > > but datapath-specific metadata can not be cleared, because OFproto > > simply doesn't know about it. > > > > But, I guess, not caching the connection and other internal structures > > might significantly impact datapath performance. Will it? > > > > Looking at the reproducer in [3], it has, for example, the flow with the > > following actions: > > > > actions:ct(commit,zone=8,mark=0/0x1,nat(src)), > > set(eth(src=00:00:00:00:00:01,dst=00:00:00:00:00:06)), > > set(ipv4(src=172.18.2.10,dst=192.168.100.6,ttl=62)), > > ct(zone=8),recirc(0x4) > > > > So, if the first ct() will change some datapath-specific packet metadata, > > the second ct() may try to use that information. > > > > It looks like during the flow translation we must add ct_clear action > > explicitly after every ct() action, unless it was the last action in > > the list. This will end up with adding a lot of ct_clear actions > > everywhere. > > > > Another option is the patch below that tries to track if ct_clear > > is required and adds that action before the next ct() action in > > the datapath. > > > > BTW, the test [3] fails with both kernel and userspace datapaths. > > > > The change below should fix the problem, I think. > > It looks like we also have to put ct_clear action before translating > > output to the patch port if we're in 'conntracked' state. > > > > And I do not know how to fix the problem for kernels without ct_clear > > support. I don't think we can clear metadata that is internal to > > the kernel. Ideas are welcome. > > > > CC: Aaron, Paolo. > > > > Any thoughts? > > Thank you so much for the detailed explanation of what's going on and > for providing a proposed fix. > > I took it for a spin and it does indeed appear to fix the OVN hairpin > issue, but it does unfortunately not appear to fix the issue Liam > reported for the ML2/OVS use case [4]. > > When trying that use case with your patch I see this in the Open vSwitch log: > 2022-05-19T08:17:02.668Z|00001|ofproto_dpif_xlate(handler1)|WARN|over > max translation depth 64 on bridge br-int while processing > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=3,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 > 2022-05-19T08:17:02.668Z|00002|ofproto_dpif_upcall(handler1)|WARN|Dropped > 2 log messages in last 205 seconds (most recently, 187 seconds ago) > due to excessive rate > 2022-05-19T08:17:02.668Z|00003|ofproto_dpif_upcall(handler1)|WARN|Flow: > ct_state=new|trk,ct_ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ct_ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ct_nw_proto=58,ct_tp_src=128,ct_tp_dst=0,icmp6,in_port=5,vlan_tci=0x0000,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,ipv6_src=fc00:24:159c:555b:f816:3eff:fe8b:3ad0,ipv6_dst=fc00:24:159c:555b:f816:3eff:fe8f:9302,ipv6_label=0xac5db,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=128,icmp_code=0 > > bridge("br-int") > ---------------- > 0. priority 0, cookie 0x56d6d58c018b51bd > goto_table:60 > 60. priority 3, cookie 0x56d6d58c018b51bd > NORMAL > >>>> received packet on unknown port 5 <<<< > >> no input bundle, dropping > > Final flow: unchanged > Megaflow: > recirc_id=0,eth,ipv6,in_port=5,dl_src=fa:16:3e:8b:3a:d0,dl_dst=fa:16:3e:8f:93:02,nw_frag=no > Datapath actions: drop > > > On the back of your explanation and the log output I made an attempt > at adding this patch on top of yours: > --- > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 110dab0ec..2955e8e4d 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -7144,6 +7144,7 @@ do_xlate_actions(const struct ofpact *ofpacts, > size_t ofpacts_len, > * resubmit to the frozen actions. > */ > case OFPACT_RESUBMIT: > + ctx->pending_ct_clear = true; > xlate_ofpact_resubmit(ctx, ofpact_get_RESUBMIT(a), last); > continue; > case OFPACT_GOTO_TABLE: > --- > > And the two patches together do appear to resolve the issue reported > in [4] as well as the OVN hairpin issue [1]. It does however make a > couple of tests fail, so I need to look into if that is expected from > the change or if the approach must be changed. > > 4: https://bugs.launchpad.net/openvswitch/+bug/1964117
The following patch also works and does not cause any tests to fail. --- diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 110dab0ec..905f6994d 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -7354,7 +7354,9 @@ do_xlate_actions(const struct ofpact *ofpacts, size_t ofpacts_len, break; case OFPACT_CT_CLEAR: - if (ctx->conntracked || ctx->pending_ct_clear) { + if (ctx->conntracked || ctx->pending_ct_clear + || (flow->ct_state && flow->ct_state & CS_TRACKED)) + { compose_ct_clear_action(ctx); } break; --- The OpenStack ML2/OVS Open vSwitch firewall driver appears to make explicit use of ct_clear action as part of progressing packets through the tables and the optimization in 355fef6f2 interrupts that workflow. I am not quite sure if the definition of `struct xlate_ctx->conntracked` is "this flow has been exposed to conntrack" or if it is "conntrack sees this flow as established", if anyone can shed light on that we would know if the above patch is ok as is or if this flow state should have set struct xlate_ctx->conntracked to true in the first place. The output from `conntrack -E` with the above patch is in [5], and without it is in [6]. 5: https://pastebin.ubuntu.com/p/tQr6PhXD9X/ 6: https://pastebin.ubuntu.com/p/WkDmpc6xRZ/ -- Frode Nordahl > -- > Frode Nordahl > > > > > > Best regards, Ilya Maximets. > > > > --- > > diff --git a/include/openvswitch/ofp-packet.h > > b/include/openvswitch/ofp-packet.h > > index 77128d829..26b07e07f 100644 > > --- a/include/openvswitch/ofp-packet.h > > +++ b/include/openvswitch/ofp-packet.h > > @@ -133,6 +133,9 @@ struct ofputil_packet_in_private { > > /* NXCPT_CONNTRACKED. */ > > bool conntracked; > > > > + /* NXCPT_PENDING_CT_CLEAR. */ > > + bool pending_ct_clear; > > + > > /* NXCPT_ACTIONS. */ > > struct ofpact *actions; > > size_t actions_len; > > diff --git a/lib/ofp-packet.c b/lib/ofp-packet.c > > index 9485ddfc9..1df092471 100644 > > --- a/lib/ofp-packet.c > > +++ b/lib/ofp-packet.c > > @@ -425,6 +425,7 @@ enum nx_continuation_prop_type { > > NXCPT_ACTIONS, > > NXCPT_ACTION_SET, > > NXCPT_ODP_PORT, > > + NXCPT_PENDING_CT_CLEAR, > > }; > > > > /* Only NXT_PACKET_IN2 (not NXT_RESUME) should include NXCPT_USERDATA, so > > this > > @@ -460,6 +461,10 @@ ofputil_put_packet_in_private(const struct > > ofputil_packet_in_private *pin, > > ofpprop_put_flag(msg, NXCPT_CONNTRACKED); > > } > > > > + if (pin->pending_ct_clear) { > > + ofpprop_put_flag(msg, NXCPT_PENDING_CT_CLEAR); > > + } > > + > > if (pin->actions_len) { > > /* Divide 'pin->actions' into groups that begins with an > > * unroll_xlate action. For each group, emit a NXCPT_TABLE_ID and > > @@ -863,6 +868,10 @@ ofputil_decode_packet_in_private(const struct > > ofp_header *oh, bool loose, > > pin->conntracked = true; > > break; > > > > + case NXCPT_PENDING_CT_CLEAR: > > + pin->pending_ct_clear = true; > > + break; > > + > > case NXCPT_TABLE_ID: > > error = ofpprop_parse_u8(&payload, &table_id); > > break; > > diff --git a/ofproto/ofproto-dpif-rid.h b/ofproto/ofproto-dpif-rid.h > > index 4df630c62..af6aee475 100644 > > --- a/ofproto/ofproto-dpif-rid.h > > +++ b/ofproto/ofproto-dpif-rid.h > > @@ -153,6 +153,7 @@ struct frozen_state { > > size_t stack_size; > > mirror_mask_t mirrors; /* Mirrors already output. */ > > bool conntracked; /* Conntrack occurred prior to freeze. */ > > + bool pending_ct_clear; /* ct_clear required before the next > > ct(). */ > > bool was_mpls; /* MPLS packet */ > > struct uuid xport_uuid; /* UUID of 1st port packet received on. > > */ > > > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > index 8e5d030ac..2826b3d10 100644 > > --- a/ofproto/ofproto-dpif-xlate.c > > +++ b/ofproto/ofproto-dpif-xlate.c > > @@ -396,6 +396,10 @@ struct xlate_ctx { > > * state from the datapath should be honored after thawing. */ > > bool conntracked; > > > > + /* True if datapath ct_clear action has to be performed before the next > > + * ct() action. */ > > + bool pending_ct_clear; > > + > > /* Pointer to an embedded NAT action in a conntrack action, or NULL. */ > > struct ofpact_nat *ct_nat_action; > > > > @@ -3858,6 +3862,7 @@ patch_port_output(struct xlate_ctx *ctx, const struct > > xport *in_dev, > > struct flow old_flow = ctx->xin->flow; > > struct flow_tnl old_flow_tnl_wc = ctx->wc->masks.tunnel; > > bool old_conntrack = ctx->conntracked; > > + bool old_ct_clear = ctx->pending_ct_clear; > > bool old_was_mpls = ctx->was_mpls; > > ovs_version_t old_version = ctx->xin->tables_version; > > struct ofpbuf old_stack = ctx->stack; > > @@ -3877,6 +3882,12 @@ patch_port_output(struct xlate_ctx *ctx, const > > struct xport *in_dev, > > ctx->wc->masks.tunnel.metadata.tab = flow->tunnel.metadata.tab; > > memset(flow->regs, 0, sizeof flow->regs); > > flow->actset_output = OFPP_UNSET; > > + if (ctx->conntracked) { > > + /* Patch ports do not carry conntrack information, so any conntrack > > + * state must be removed from the packet before processing ct > > actions > > + * in another bridge. */ > > + ctx->pending_ct_clear = true; > > + } > > clear_conntrack(ctx); > > ctx->xin->trace = xlate_report(ctx, OFT_BRIDGE, "bridge(\"%s\")", > > out_dev->xbridge->name); > > @@ -3945,6 +3956,7 @@ patch_port_output(struct xlate_ctx *ctx, const struct > > xport *in_dev, > > /* The out bridge's conntrack execution should have no effect on the > > * original bridge. */ > > ctx->conntracked = old_conntrack; > > + ctx->pending_ct_clear = old_ct_clear; > > > > /* The fact that the out bridge exits (for any reason) does not mean > > * that the original bridge should exit. Specifically, if the out > > @@ -4930,6 +4942,7 @@ xlate_controller_action(struct xlate_ctx *ctx, int > > len, > > .stack_size = ctx->stack.size, > > .mirrors = ctx->mirrors, > > .conntracked = ctx->conntracked, > > + .pending_ct_clear = ctx->pending_ct_clear, > > .was_mpls = ctx->was_mpls, > > .ofpacts = NULL, > > .ofpacts_len = 0, > > @@ -5005,6 +5018,7 @@ finish_freezing__(struct xlate_ctx *ctx, uint8_t > > table) > > .stack_size = ctx->stack.size, > > .mirrors = ctx->mirrors, > > .conntracked = ctx->conntracked, > > + .pending_ct_clear = ctx->pending_ct_clear, > > .was_mpls = ctx->was_mpls, > > .xport_uuid = ctx->xin->xport_uuid, > > .ofpacts = ctx->frozen_actions.data, > > @@ -5835,6 +5849,7 @@ clone_xlate_actions(const struct ofpact *actions, > > size_t actions_len, > > struct flow old_base = ctx->base_flow; > > bool old_was_mpls = ctx->was_mpls; > > bool old_conntracked = ctx->conntracked; > > + bool old_ct_clear = ctx->pending_ct_clear; > > > > /* The actions are not reversible, a datapath clone action is > > * required to encode the translation. Select the clone action > > @@ -5883,6 +5898,7 @@ dp_clone_done: > > /* The clone's conntrack execution should have no effect on the > > original > > * packet. */ > > ctx->conntracked = old_conntracked; > > + ctx->pending_ct_clear = old_ct_clear; > > > > /* Popping MPLS from the clone should have no effect on the original > > * packet. */ > > @@ -6128,6 +6144,18 @@ freeze_unroll_actions(const struct ofpact *a, const > > struct ofpact *end, > > } > > } > > > > +static void > > +compose_ct_clear_action(struct xlate_ctx *ctx) > > +{ > > + clear_conntrack(ctx); > > + ctx->pending_ct_clear = false; > > + /* This action originally existed without dpif support. So to preserve > > + * compatibility, only append it if the dpif supports it. */ > > + if (ctx->xbridge->support.ct_clear) { > > + nl_msg_put_flag(ctx->odp_actions, OVS_ACTION_ATTR_CT_CLEAR); > > + } > > +} > > + > > static void > > put_ct_mark(const struct flow *flow, struct ofpbuf *odp_actions, > > struct flow_wildcards *wc) > > @@ -6336,25 +6364,21 @@ compose_conntrack_action(struct xlate_ctx *ctx, > > struct ofpact_conntrack *ofc, > > compose_recirculate_and_fork(ctx, ofc->recirc_table, zone); > > } > > > > + /* Datapath will update the conntrack state for the packet, but > > + * NXAST_CT action must always put the packet into an untracked > > + * state. Execution continues in the current table without > > + * recirculation, hence conntrack state should not be available. > > + * We'll ask the datapath to clear what OVS_ACTION_ATTR_CT might > > + * have changed in the packet metadata before the next ct() action. */ > > + ctx->pending_ct_clear = true; > > + > > ctx->ct_nat_action = NULL; > > > > /* The ct_* fields are only available in the scope of the > > 'recirc_table' > > * call chain. */ > > - flow_clear_conntrack(&ctx->xin->flow); > > xlate_report(ctx, OFT_DETAIL, "Sets the packet to an untracked state, " > > "and clears all the conntrack fields."); > > - ctx->conntracked = false; > > -} > > - > > -static void > > -compose_ct_clear_action(struct xlate_ctx *ctx) > > -{ > > clear_conntrack(ctx); > > - /* This action originally existed without dpif support. So to preserve > > - * compatibility, only append it if the dpif supports it. */ > > - if (ctx->xbridge->support.ct_clear) { > > - nl_msg_put_flag(ctx->odp_actions, OVS_ACTION_ATTR_CT_CLEAR); > > - } > > } > > > > /* check_pkt_larger action checks the packet length and stores the > > @@ -6415,6 +6439,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx, > > struct flow old_base = ctx->base_flow; > > bool old_was_mpls = ctx->was_mpls; > > bool old_conntracked = ctx->conntracked; > > + bool old_ct_clear = ctx->pending_ct_clear; > > > > size_t offset = nl_msg_start_nested(ctx->odp_actions, > > OVS_ACTION_ATTR_CHECK_PKT_LEN); > > @@ -6436,6 +6461,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx, > > ctx->base_flow = old_base; > > ctx->was_mpls = old_was_mpls; > > ctx->conntracked = old_conntracked; > > + ctx->pending_ct_clear = old_ct_clear; > > ctx->xin->flow = old_flow; > > > > /* If the flow translation for the IF_GREATER case requires freezing, > > @@ -6466,6 +6492,7 @@ xlate_check_pkt_larger(struct xlate_ctx *ctx, > > ctx->base_flow = old_base; > > ctx->was_mpls = old_was_mpls; > > ctx->conntracked = old_conntracked; > > + ctx->pending_ct_clear = old_ct_clear; > > ctx->xin->flow = old_flow; > > ctx->exit = old_exit; > > } > > @@ -7316,11 +7343,17 @@ do_xlate_actions(const struct ofpact *ofpacts, > > size_t ofpacts_len, > > } > > > > case OFPACT_CT: > > + if (ctx->pending_ct_clear) { > > + /* Previous ct() action might have made some changes but > > + * the packet must be untracked. Clearing the state before > > + * executing a new ct() action. */ > > + compose_ct_clear_action(ctx); > > + } > > compose_conntrack_action(ctx, ofpact_get_CT(a), last); > > break; > > > > case OFPACT_CT_CLEAR: > > - if (ctx->conntracked) { > > + if (ctx->conntracked || ctx->pending_ct_clear) { > > compose_ct_clear_action(ctx); > > } > > break; > > @@ -7710,6 +7743,7 @@ xlate_actions(struct xlate_in *xin, struct xlate_out > > *xout) > > > > .was_mpls = false, > > .conntracked = false, > > + .pending_ct_clear = false, > > > > .ct_nat_action = NULL, > > > > @@ -7774,6 +7808,8 @@ xlate_actions(struct xlate_in *xin, struct xlate_out > > *xout) > > clear_conntrack(&ctx); > > } > > > > + ctx.pending_ct_clear = state->pending_ct_clear; > > + > > /* Restore pipeline metadata. May change flow's in_port and other > > * metadata to the values that existed when freezing was > > triggered. */ > > frozen_metadata_to_flow(&ctx.xbridge->ofproto->up, > > @@ -8113,6 +8149,7 @@ xlate_resume(struct ofproto_dpif *ofproto, > > .stack_size = pin->stack_size, > > .mirrors = pin->mirrors, > > .conntracked = pin->conntracked, > > + .pending_ct_clear = pin->pending_ct_clear, > > .xport_uuid = UUID_ZERO, > > > > /* When there are no actions, xlate_actions() will search the flow > > --- _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss