On Mon, Jan 26, 2026 at 9:25 PM Erlon Rodrigues Cruz <[email protected]>
wrote:

> Hey Ales,
>
>
> On Mon, Jan 26, 2026 at 10:18 AM Ales Musil <[email protected]> wrote:
>
>>
>>
>> On Fri, Jan 23, 2026 at 8:00 PM Erlon R. Cruz <[email protected]>
>> wrote:
>>
>>> v2: Rebased on current upstream main, removed external python
>>> code traffic generators, added scenario tests for: dhcp, negative
>>> udp and ovn rule generation, documentation and many code clean ups.
>>> DHCP traffic is being dropped for some reason. Still need to figure
>>> that out for v3.
>>>
>>> At the current state, OVN can not handle fragmented traffic for ACLs
>>> in the userspace datapath (DPDK). Just like in the case of LB
>>> (commit 20a96b9), the kernel DP will try to reassemble the fragments
>>> during CT lookup, however userspace won't reassemble them.
>>>
>>> This patch allows OVN to handle fragmented traffic by defining a
>>> translation table on southbound that leverages OpenFlow connection
>>> tracking capabilities. When a stateful flow is created on NB, we add
>>> a hint in the flow. This hint will be read in SB and if the
>>> connection tracking is set to be used, SB will use the alternative
>>> translation table that will use the connection tracking information.
>>>
>>> This approach should not change the current behavior and it's only
>>> enabled if acl_ct_translation is set:
>>>
>>> ovn-nbctl set NB_Global . options:acl_ct_translation=true
>>>
>>> Signed-off-by: Erlon R. Cruz <[email protected]>
>>> ---
>>>
>>
>> Hi Erlon,
>>
>> thank you for the v2. It needs a rebase. I have also some comments down
>> below.
>>
>>
>>>  NEWS                         |   5 +
>>>  controller/lflow.c           |  24 ++-
>>>  controller/lflow.h           |   1 +
>>>  include/ovn/logical-fields.h |   3 +
>>>  lib/logical-fields.c         |  67 ++++++
>>>  northd/en-global-config.c    |   5 +
>>>  northd/lflow-mgr.c           |  51 ++++-
>>>  northd/lflow-mgr.h           |   8 +-
>>>  northd/northd.c              |  41 +++-
>>>  ovn-nb.xml                   |  20 ++
>>>  tests/atlocal.in             |   3 +
>>>  tests/ovn.at                 |  71 +++++++
>>>  tests/system-ovn.at          | 389 +++++++++++++++++++++++++++++++++++
>>>  13 files changed, 668 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 9883fb81d..7062701c8 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -1,5 +1,10 @@
>>>  Post v25.09.0
>>>  -------------
>>> +   - Fixed support for fragmented traffic in the userspace datapath.
>>> Added the
>>> +     "acl_ct_translation" NB_Global option to enable connection tracking
>>> +     based L4 field translation for stateful ACLs. When enabled allows
>>> proper
>>> +     handling of IP fragmentation in userspace datapaths. This option
>>> breaks
>>> +     hardware offloading and is disabled by default.
>>>
>>
>> We are usually adding NEWS at the end.
>>
>>
>>>     - Added support for TLS Server Name Indication (SNI) with the new
>>>       --ssl-server-name option in OVN utilities and daemons. This allows
>>>       specifying the server name for SNI, which is useful when connecting
>>> diff --git a/controller/lflow.c b/controller/lflow.c
>>> index 784a0d2dd..139181546 100644
>>> --- a/controller/lflow.c
>>> +++ b/controller/lflow.c
>>> @@ -51,10 +51,19 @@ COVERAGE_DEFINE(consider_logical_flow);
>>>  /* Contains "struct expr_symbol"s for fields supported by OVN lflows. */
>>>  static struct shash symtab;
>>>
>>> +/* Alternative symbol table for ACL CT translation.
>>> + * This symbol table maps L4 port fields (tcp/udp/sctp) to their
>>> connection
>>> + * tracking equivalents (ct_tp_src/ct_tp_dst with ct_proto predicates).
>>> + * This allows matching on all IP fragments (not just the first
>>> fragment)
>>> + * so that all fragments can be matched based on the connection
>>> tracking state.
>>> + */
>>> +static struct shash acl_ct_symtab;
>>> +
>>>  void
>>>  lflow_init(void)
>>>  {
>>>      ovn_init_symtab(&symtab);
>>> +    ovn_init_acl_ct_symtab(&acl_ct_symtab);
>>>  }
>>>
>>>  struct lookup_port_aux {
>>> @@ -984,7 +993,16 @@ convert_match_to_expr(const struct
>>> sbrec_logical_flow *lflow,
>>>                       lflow->match);
>>>          return NULL;
>>>      }
>>> -    struct expr *e = expr_parse_string(lex_str_get(&match_s), &symtab,
>>> +
>>> +    /* Check if this logical flow requires ACL CT translation.
>>> +     * If the tags contains "acl_ct_translation"="true", we use the
>>> +     * alternative symbol table that maps L4 fields (tcp/udp/sctp ports)
>>> +     * to their CT equivalents. */
>>> +    const char *ct_trans = smap_get(&lflow->tags, "acl_ct_translation");
>>> +    struct shash *symtab_to_use = (ct_trans && !strcmp(ct_trans, "true")
>>> +                                   ? &acl_ct_symtab : &symtab);
>>>
>>
>> nit: smap_get_bool(..., false);
>>
>>
>>> +
>>> +    struct expr *e = expr_parse_string(lex_str_get(&match_s),
>>> symtab_to_use,
>>>                                         addr_sets, port_groups,
>>> &addr_sets_ref,
>>>                                         &port_groups_ref,
>>>                                         ldp->datapath->tunnel_key,
>>> @@ -1016,7 +1034,7 @@ convert_match_to_expr(const struct
>>> sbrec_logical_flow *lflow,
>>>              e = expr_combine(EXPR_T_AND, e, *prereqs);
>>>              *prereqs = NULL;
>>>          }
>>> -        e = expr_annotate(e, &symtab, &error);
>>> +        e = expr_annotate(e, symtab_to_use, &error);
>>>      }
>>>      if (error) {
>>>          static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>>> @@ -2045,6 +2063,8 @@ lflow_destroy(void)
>>>  {
>>>      expr_symtab_destroy(&symtab);
>>>      shash_destroy(&symtab);
>>> +    expr_symtab_destroy(&acl_ct_symtab);
>>> +    shash_destroy(&acl_ct_symtab);
>>>  }
>>>
>>>  bool
>>> diff --git a/controller/lflow.h b/controller/lflow.h
>>> index 30c6dbebd..3e762d49b 100644
>>> --- a/controller/lflow.h
>>> +++ b/controller/lflow.h
>>> @@ -156,6 +156,7 @@ struct lflow_ctx_out {
>>>  };
>>>
>>>  void lflow_init(void);
>>> +void ovn_init_acl_ct_symtab(struct shash *acl_symtab);
>>>
>>
>> nit: Please remove this declaration it moved into logical-fields.h.
>>
>>
>>>  void lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *);
>>>  void lflow_handle_cached_flows(struct lflow_cache *,
>>>                                 const struct sbrec_logical_flow_table *);
>>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h
>>> index f0d34196a..8f23249a0 100644
>>> --- a/include/ovn/logical-fields.h
>>> +++ b/include/ovn/logical-fields.h
>>> @@ -236,6 +236,9 @@ const char *event_to_string(enum
>>> ovn_controller_event event);
>>>  int string_to_event(const char *s);
>>>  const struct ovn_field *ovn_field_from_name(const char *name);
>>>
>>> +void ovn_init_acl_ct_symtab(struct shash *symtab);
>>> +void expr_symtab_remove(struct shash *symtab, const char *name);
>>> +
>>>  /* OVN CT label values
>>>   * ===================
>>>   * These are specific ct.label bit values OVN uses to track different
>>> types
>>> diff --git a/lib/logical-fields.c b/lib/logical-fields.c
>>> index c8bddcdc5..334a7e244 100644
>>> --- a/lib/logical-fields.c
>>> +++ b/lib/logical-fields.c
>>> @@ -451,3 +451,70 @@ ovn_field_from_name(const char *name)
>>>
>>>      return shash_find_data(&ovnfield_by_name, name);
>>>  }
>>> +
>>> +/* Removes the symbol with 'name' from 'symtab', freeing its memory. */
>>> +void
>>> +expr_symtab_remove(struct shash *symtab, const char *name)
>>> +{
>>> +    struct expr_symbol *symbol = shash_find_and_delete(symtab, name);
>>> +    if (symbol) {
>>> +        free(symbol->name);
>>> +        free(symbol->prereqs);
>>> +        free(symbol->predicate);
>>> +        free(symbol);
>>> +    }
>>> +}
>>> +
>>> +/* Initialize a symbol table for ACL CT translation.
>>> + * This creates an alternative symbol table that maps L4 port fields
>>> + * (tcp/udp/sctp) to their connection tracking equivalents. This allows
>>> + * matching on all IP fragments (not just the first) by using CT state
>>> + * which is available for all fragments. */
>>> +void
>>> +ovn_init_acl_ct_symtab(struct shash *acl_symtab)
>>> +{
>>> +    /* Initialize with the standard symbol table. */
>>> +    ovn_init_symtab(acl_symtab);
>>> +
>>> +    /* Remove the original tcp/udp/sctp symbols that we will override.
>>> */
>>> +    expr_symtab_remove(acl_symtab, "tcp.src");
>>> +    expr_symtab_remove(acl_symtab, "tcp.dst");
>>> +    expr_symtab_remove(acl_symtab, "tcp");
>>> +    expr_symtab_remove(acl_symtab, "udp.src");
>>> +    expr_symtab_remove(acl_symtab, "udp.dst");
>>> +    expr_symtab_remove(acl_symtab, "udp");
>>> +    expr_symtab_remove(acl_symtab, "sctp.src");
>>> +    expr_symtab_remove(acl_symtab, "sctp.dst");
>>> +    expr_symtab_remove(acl_symtab, "sctp");
>>> +
>>> +    /* Add ct_proto field - CT original direction protocol. Used in the
>>> +     * tcp/udp/sctp predicate expansions below. */
>>> +    expr_symtab_add_field(acl_symtab, "ct_proto", MFF_CT_NW_PROTO,
>>> +                          "ct.trk", false);
>>> +
>>> +    /* Override TCP protocol and port fields to use CT equivalents.
>>> +     * When "tcp" is used as a predicate, it expands to "ct_proto == 6"
>>> +     * instead of "ip.proto == 6". */
>>> +    expr_symtab_add_predicate(acl_symtab, "tcp",
>>> +                              "ct.trk && !ct.inv && ct_proto == 6");
>>> +    expr_symtab_add_field(acl_symtab, "tcp.src", MFF_CT_TP_SRC,
>>> +                          "tcp", false);
>>> +    expr_symtab_add_field(acl_symtab, "tcp.dst", MFF_CT_TP_DST,
>>> +                          "tcp", false);
>>> +
>>> +    /* Override UDP protocol and port fields */
>>> +    expr_symtab_add_predicate(acl_symtab, "udp",
>>> +                              "ct.trk && !ct.inv && ct_proto == 17");
>>> +    expr_symtab_add_field(acl_symtab, "udp.src", MFF_CT_TP_SRC,
>>> +                          "udp", false);
>>> +    expr_symtab_add_field(acl_symtab, "udp.dst", MFF_CT_TP_DST,
>>> +                          "udp", false);
>>> +
>>> +    /* Override SCTP protocol and port fields */
>>> +    expr_symtab_add_predicate(acl_symtab, "sctp",
>>> +                              "ct.trk && !ct.inv && ct_proto == 132");
>>> +    expr_symtab_add_field(acl_symtab, "sctp.src", MFF_CT_TP_SRC,
>>> +                          "sctp", false);
>>> +    expr_symtab_add_field(acl_symtab, "sctp.dst", MFF_CT_TP_DST,
>>> +                          "sctp", false);
>>> +}
>>> diff --git a/northd/en-global-config.c b/northd/en-global-config.c
>>> index 2556b2888..a6f37b0c3 100644
>>> --- a/northd/en-global-config.c
>>> +++ b/northd/en-global-config.c
>>> @@ -717,6 +717,11 @@ check_nb_options_out_of_sync(
>>>          return true;
>>>      }
>>>
>>> +    if (config_out_of_sync(&nb->options, &config_data->nb_options,
>>> +                           "acl_ct_translation", false)) {
>>> +        return true;
>>> +    }
>>> +
>>>      return false;
>>>  }
>>>
>>> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
>>> index f21024903..66a519780 100644
>>> --- a/northd/lflow-mgr.c
>>> +++ b/northd/lflow-mgr.c
>>> @@ -184,6 +184,7 @@ struct ovn_lflow {
>>>      struct ovn_dp_group *dpg;    /* Link to unique Sb datapath group. */
>>>      const char *where;
>>>      const char *flow_desc;
>>> +    bool acl_ct_translation;     /* Use CT-based L4 field translation.
>>> */
>>>
>>>      struct uuid sb_uuid;         /* SB DB row uuid, specified by
>>> northd. */
>>>      struct ovs_list referenced_by;  /* List of struct lflow_ref_node. */
>>> @@ -725,7 +726,7 @@ lflow_ref_sync_lflows(struct lflow_ref *lflow_ref,
>>>   * then it may corrupt the hmap.  Caller should ensure thread safety
>>>   * for such scenarios.
>>>   */
>>> -static void
>>> +static struct ovn_lflow *
>>>  lflow_table_add_lflow__(struct lflow_table *lflow_table,
>>>                         const struct ovn_synced_datapath *sdp,
>>>                         const unsigned long *dp_bitmap, size_t
>>> dp_bitmap_len,
>>> @@ -798,9 +799,17 @@ lflow_table_add_lflow__(struct lflow_table
>>> *lflow_table,
>>>      ovn_dp_group_add_with_reference(lflow, sdp, dp_bitmap,
>>> dp_bitmap_len);
>>>
>>>      lflow_hash_unlock(hash_lock);
>>> +    return lflow;
>>>  }
>>>
>>>  void
>>> +ovn_lflow_set_acl_ct_translation(struct ovn_lflow *lflow,
>>> +                                 bool acl_ct_translation)
>>> +{
>>> +    lflow->acl_ct_translation = acl_ct_translation;
>>> +}
>>> +
>>> +struct ovn_lflow *
>>>  lflow_table_add_lflow(struct lflow_table_add_args *args)
>>>  {
>>>      /* It is invalid for both args->dp_bitmap and args->sdp to be
>>> @@ -811,11 +820,13 @@ lflow_table_add_lflow(struct lflow_table_add_args
>>> *args)
>>>          args->sdp = NULL;
>>>      }
>>>
>>> -    lflow_table_add_lflow__(args->table, args->sdp, args->dp_bitmap,
>>> -                            args->dp_bitmap_len, args->stage,
>>> args->priority,
>>> -                            args->match, args->actions, args->io_port,
>>> -                            args->ctrl_meter, args->stage_hint,
>>> args->where,
>>> -                            args->flow_desc, args->lflow_ref);
>>> +    return lflow_table_add_lflow__(args->table, args->sdp,
>>> args->dp_bitmap,
>>> +                                    args->dp_bitmap_len, args->stage,
>>> +                                    args->priority, args->match,
>>> +                                    args->actions, args->io_port,
>>> +                                    args->ctrl_meter, args->stage_hint,
>>> +                                    args->where, args->flow_desc,
>>> +                                    args->lflow_ref);
>>>  }
>>>
>>>  struct ovn_dp_group *
>>> @@ -945,6 +956,7 @@ ovn_lflow_init(struct ovn_lflow *lflow,
>>>      lflow->stage_hint = stage_hint;
>>>      lflow->ctrl_meter = ctrl_meter;
>>>      lflow->flow_desc = flow_desc;
>>> +    lflow->acl_ct_translation = false;
>>>      lflow->dpg = NULL;
>>>      lflow->where = where;
>>>      lflow->sb_uuid = sbuuid;
>>> @@ -1122,12 +1134,20 @@ sync_lflow_to_sb(struct ovn_lflow *lflow,
>>>          sbrec_logical_flow_set_match(sbflow, lflow->match);
>>>          sbrec_logical_flow_set_actions(sbflow, lflow->actions);
>>>          sbrec_logical_flow_set_flow_desc(sbflow, lflow->flow_desc);
>>> -        if (lflow->io_port) {
>>> +
>>> +        /* Set tags for io_port and/or acl_ct_translation if needed. */
>>> +        if (lflow->io_port || lflow->acl_ct_translation) {
>>>              struct smap tags = SMAP_INITIALIZER(&tags);
>>> -            smap_add(&tags, "in_out_port", lflow->io_port);
>>> +            if (lflow->io_port) {
>>> +                smap_add(&tags, "in_out_port", lflow->io_port);
>>> +            }
>>> +            if (lflow->acl_ct_translation) {
>>> +                smap_add(&tags, "acl_ct_translation", "true");
>>> +            }
>>>              sbrec_logical_flow_set_tags(sbflow, &tags);
>>>              smap_destroy(&tags);
>>>          }
>>> +
>>>          sbrec_logical_flow_set_controller_meter(sbflow,
>>> lflow->ctrl_meter);
>>>
>>>          /* Trim the source locator lflow->where, which looks something
>>> like
>>> @@ -1193,6 +1213,21 @@ sync_lflow_to_sb(struct ovn_lflow *lflow,
>>>                  }
>>>              }
>>>          }
>>> +
>>> +        /* Update acl_ct_translation marker in tags if needed.
>>> +         * This must be outside ovn_internal_version_changed check
>>> because
>>> +         * the option can be enabled/disabled at runtime. */
>>> +        bool cur_has_ct_trans = smap_get_bool(&sbflow->tags,
>>> +                                              "acl_ct_translation",
>>> false);
>>> +        if (lflow->acl_ct_translation != cur_has_ct_trans) {
>>> +            if (lflow->acl_ct_translation) {
>>> +                sbrec_logical_flow_update_tags_setkey(
>>> +                    sbflow, "acl_ct_translation", "true");
>>> +            } else {
>>> +                sbrec_logical_flow_update_tags_delkey(
>>> +                    sbflow, "acl_ct_translation");
>>> +            }
>>> +        }
>>>      }
>>>
>>>      if (lflow->dp) {
>>> diff --git a/northd/lflow-mgr.h b/northd/lflow-mgr.h
>>> index 4a1655c35..16aa0dd07 100644
>>> --- a/northd/lflow-mgr.h
>>> +++ b/northd/lflow-mgr.h
>>> @@ -24,6 +24,7 @@
>>>  struct ovsdb_idl_txn;
>>>  struct ovn_datapath;
>>>  struct ovsdb_idl_row;
>>> +struct ovn_lflow;
>>>
>>>  /* lflow map which stores the logical flows. */
>>>  struct lflow_table {
>>> @@ -89,7 +90,12 @@ struct lflow_table_add_args {
>>>      const char *where;
>>>  };
>>>
>>> -void lflow_table_add_lflow(struct lflow_table_add_args *args);
>>> +struct ovn_lflow *lflow_table_add_lflow(struct lflow_table_add_args
>>> *args);
>>> +
>>> +/* Set the acl_ct_translation flag on an lflow. When true,
>>> ovn-controller will
>>> + * use a symbol table that maps L4 port fields to CT equivalents. */
>>> +void ovn_lflow_set_acl_ct_translation(struct ovn_lflow *lflow,
>>> +                                      bool acl_ct_translation);
>>>
>>
>> We should use the new macro approach e.g. WITH_CT_TRANSLATION
>> as we have for io port etc. instead of defining some extra function.
>>
>>
>>>
>>>
>>>  #define WITH_HINT(HINT) .stage_hint = HINT
>>> diff --git a/northd/northd.c b/northd/northd.c
>>> index d79fe40c9..52233260f 100644
>>> --- a/northd/northd.c
>>> +++ b/northd/northd.c
>>> @@ -83,12 +83,18 @@ static bool check_lsp_is_up;
>>>  static bool install_ls_lb_from_router;
>>>
>>>  /* Use common zone for SNAT and DNAT if this option is set to "true". */
>>> -static bool use_common_zone = false;
>>> +static bool use_common_zone;
>>>
>>>  /* If this option is 'true' northd will make use of ct.inv match fields.
>>>   * Otherwise, it will avoid using it.  The default is true. */
>>>  static bool use_ct_inv_match = true;
>>>
>>> +/* If this option is 'true' northd will flag the related ACL flows to
>>> use
>>> + * connection tracking fields to properly handle IP fragments. By
>>> default this
>>> + * option is set to 'false'.
>>> + */
>>> +static bool acl_ct_translation = false;
>>> +
>>>  /* If this option is 'true' northd will implicitly add a lowest-priority
>>>   * drop rule in the ACL stage of logical switches that have at least one
>>>   * ACL.
>>> @@ -7199,6 +7205,12 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>          match_tier_len = match->length;
>>>      }
>>>
>>> +    /* Check if this ACL needs CT translation for fragment handling.
>>> +     * All stateful ACLs are marked when the option is enabled; the
>>> actual
>>> +     * translation only affects L4 port fields in ovn-controller. */
>>> +    bool needs_ct_trans = has_stateful && acl_ct_translation;
>>> +    struct ovn_lflow *lflow;
>>> +
>>>      if (!has_stateful
>>>          || !strcmp(acl->action, "pass")
>>>          || !strcmp(acl->action, "allow-stateless")) {
>>> @@ -7216,6 +7228,7 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>
>>>          ds_put_cstr(actions, "next;");
>>>          ds_put_format(match, "(%s)", acl->match);
>>> +        /* Stateless ACLs don't need CT translation. */
>>>          ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match),
>>>                        ds_cstr(actions), lflow_ref,
>>> WITH_HINT(&acl->header_));
>>>          return;
>>> @@ -7284,8 +7297,10 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>                            (uint8_t) acl->network_function_group->id);
>>>          }
>>>          ds_put_cstr(actions, "next;");
>>> -        ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match),
>>> -                      ds_cstr(actions), lflow_ref,
>>> WITH_HINT(&acl->header_));
>>> +        lflow = ovn_lflow_add(lflows, od, stage, priority,
>>> ds_cstr(match),
>>> +                              ds_cstr(actions), lflow_ref,
>>> +                              WITH_HINT(&acl->header_));
>>> +        ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans);
>>>
>>>          /* Match on traffic in the request direction for an established
>>>           * connection tracking entry that has not been marked for
>>> @@ -7314,8 +7329,10 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>                            (uint8_t) acl->network_function_group->id);
>>>          }
>>>          ds_put_cstr(actions, "next;");
>>> -        ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match),
>>> -                      ds_cstr(actions), lflow_ref,
>>> WITH_HINT(&acl->header_));
>>> +        lflow = ovn_lflow_add(lflows, od, stage, priority,
>>> ds_cstr(match),
>>> +                              ds_cstr(actions), lflow_ref,
>>> +                              WITH_HINT(&acl->header_));
>>> +        ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans);
>>>      } else if (!strcmp(acl->action, "drop")
>>>                 || !strcmp(acl->action, "reject")) {
>>>          if (acl->network_function_group) {
>>> @@ -7340,8 +7357,10 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>          build_acl_sample_label_action(actions, acl, acl->sample_new,
>>> NULL,
>>>                                        obs_stage);
>>>          ds_put_cstr(actions, "next;");
>>> -        ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match),
>>> -                      ds_cstr(actions), lflow_ref,
>>> WITH_HINT(&acl->header_));
>>> +        lflow = ovn_lflow_add(lflows, od, stage, priority,
>>> ds_cstr(match),
>>> +                              ds_cstr(actions), lflow_ref,
>>> +                              WITH_HINT(&acl->header_));
>>> +        ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans);
>>>          /* For an existing connection without ct_mark.blocked set, we've
>>>           * encountered a policy change. ACLs previously allowed
>>>           * this connection and we committed the connection tracking
>>> @@ -7366,8 +7385,10 @@ consider_acl(struct lflow_table *lflows, const
>>> struct ovn_datapath *od,
>>>          ds_put_format(actions,
>>>                        "ct_commit { ct_mark.blocked = 1; "
>>>                        "ct_label.obs_point_id = %"PRIu32"; }; next;",
>>> obs_pid);
>>> -        ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match),
>>> -                      ds_cstr(actions), lflow_ref,
>>> WITH_HINT(&acl->header_));
>>> +        lflow = ovn_lflow_add(lflows, od, stage, priority,
>>> ds_cstr(match),
>>> +                              ds_cstr(actions), lflow_ref,
>>> +                              WITH_HINT(&acl->header_));
>>> +        ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans);
>>>      }
>>>  }
>>>
>>> @@ -20386,6 +20407,8 @@ ovnnb_db_run(struct northd_input *input_data,
>>>
>>>      use_ct_inv_match = smap_get_bool(input_data->nb_options,
>>>                                       "use_ct_inv_match", true);
>>> +    acl_ct_translation = smap_get_bool(input_data->nb_options,
>>> +                                       "acl_ct_translation", false);
>>>
>>>      /* deprecated, use --event instead */
>>>      controller_event_en = smap_get_bool(input_data->nb_options,
>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>> index e74c0d010..2b610ff1c 100644
>>> --- a/ovn-nb.xml
>>> +++ b/ovn-nb.xml
>>> @@ -327,6 +327,26 @@
>>>          </p>
>>>        </column>
>>>
>>> +      <column name="options" key="acl_ct_translation">
>>> +        <p>
>>> +          If set to <code>true</code>, <code>ovn-northd</code> will
>>> enable
>>> +          connection tracking based L4 field translation for stateful
>>> ACLs.
>>> +          When enabled, ACL matches on L4 port fields (tcp/udp/sctp) use
>>> +          connection tracking state instead of packet headers. This
>>> allows
>>> +          proper handling of IP fragmentation in userspace datapaths
>>> (e.g.,
>>> +          DPDK) where fragments are not automatically reassembled during
>>> +          connection tracking lookup.
>>> +        </p>
>>> +        <p>
>>> +          <em>Important:</em> Enabling this option breaks hardware
>>> offloading
>>> +          of flows. Only enable this option if you need to handle
>>> fragmented
>>> +          traffic with stateful ACLs in userspace datapaths.
>>> +        </p>
>>> +        <p>
>>> +          The default value is <code>false</code>.
>>> +        </p>
>>> +      </column>
>>> +
>>>        <column name="options" key="default_acl_drop">
>>>          <p>
>>>            If set to <code>true</code>., <code>ovn-northd</code> will
>>> diff --git a/tests/atlocal.in b/tests/atlocal.in
>>> index 9a3cf2d16..f0d5eb243 100644
>>> --- a/tests/atlocal.in
>>> +++ b/tests/atlocal.in
>>> @@ -189,6 +189,9 @@ find_command dhcpd
>>>  # Set HAVE_DHCLIENT
>>>  find_command dhclient
>>>
>>> +# Set HAVE_DNSMASQ
>>> +find_command dnsmasq
>>> +
>>>  # Set HAVE_BFDD_BEACON
>>>  find_command bfdd-beacon
>>>
>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>> index 445a74ce5..9399bcb5a 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -44152,3 +44152,74 @@ check ovn-nbctl --wait=hv lsp-set-type down_ext
>>> localnet
>>>  OVN_CLEANUP([hv1],[hv2])
>>>  AT_CLEANUP
>>>  ])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([ACL CT translation - Rules])
>>> +AT_KEYWORDS([acl_ct_translation_rules])
>>> +ovn_start
>>> +
>>> +check ovn-nbctl ls-add ls1
>>> +
>>> +check ovn-nbctl lsp-add ls1 lp1 \
>>> +    -- lsp-set-addresses lp1 "f0:00:00:00:00:01 10.0.0.1"
>>> +check ovn-nbctl lsp-add ls1 lp2 \
>>> +    -- lsp-set-addresses lp2 "f0:00:00:00:00:02 10.0.0.2"
>>> +
>>> +net_add n1
>>> +sim_add hv1
>>> +
>>> +as hv1
>>> +ovs-vsctl add-br br-phys
>>> +ovn_attach n1 br-phys 192.168.0.1
>>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>>> +    set interface hv1-vif1 external-ids:iface-id=lp1 ofport-request=1
>>
>> +ovs-vsctl -- add-port br-int hv1-vif2 -- \
>>> +    set interface hv1-vif2 external-ids:iface-id=lp2 ofport-request=2
>>>
>>
>> nit: Please drop the ofport-request.
>>
>>
>>> +
>>> +# Create port group and add ACLs with TCP/UDP matches
>>> +lp1_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=lp1)
>>> +lp2_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=lp2)
>>> +check ovn-nbctl pg-add pg1 $lp1_uuid $lp2_uuid
>>> +
>>> +check ovn-nbctl acl-add pg1 from-lport 1002 \
>>> +    "inport == @pg1 && ip4 && tcp && tcp.dst == 80" allow-related
>>> +check ovn-nbctl acl-add pg1 from-lport 1002 \
>>> +    "inport == @pg1 && ip4 && udp && udp.dst == 53" allow-related
>>> +check ovn-nbctl acl-add pg1 to-lport 1002 \
>>> +    "outport == @pg1 && ip4 && tcp && tcp.src == 80" allow-related
>>> +check ovn-nbctl acl-add pg1 to-lport 1002 \
>>> +    "outport == @pg1 && ip4 && udp && udp.src == 53" allow-related
>>> +check ovn-nbctl --wait=hv --log acl-add pg1 to-lport 100 "outport ==
>>> @pg1" drop
>>> +
>>> +# Verify lflows do NOT have acl_ct_translation tag when option is
>>> disabled (default)
>>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c
>>> "acl_ct_translation"], [1], [ignore], [ignore])
>>>
>>
>> nit: Please use "row_count" instead. That applies to the whole test.
>>
>>
>>> +
>>> +# Verify OpenFlows use packet header fields (tcp.dst, udp.dst) not CT
>>> fields
>>> +# When CT translation is OFF, we should see tp_dst matches
>>> +as hv1
>>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E "tp_dst=80|tp_dst=53" |
>>> head -1 | grep -q "tp_dst"], [0])
>>>
>>
>> We can actually use getting a table directly via "ovn-debug
>> lflow-stage-to-oftable <STAGE_NAME>".
>> That also applies to the whole test.
>>
>> +
>>> +# Now enable ACL CT translation
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=true
>>> +
>>> +# Verify lflows now have acl_ct_translation tag in the tags column
>>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c
>>> "acl_ct_translation"], [0], [ignore])
>>>
>> +
>>> +# Verify OpenFlows now use CT fields (ct_tp_dst) instead of packet
>>> headers
>>> +# When CT translation is ON, we should see ct_tp_dst matches
>>> +as hv1
>>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E
>>> "ct_tp_dst=80|ct_tp_dst=53" | head -1 | grep -q "ct_tp_dst"], [0])
>>> +
>>> +# Now disable ACL CT translation again
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=false
>>> +
>>> +# Verify lflows no longer have acl_ct_translation tag
>>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c
>>> "acl_ct_translation"], [1], [ignore], [ignore])
>>> +
>>> +# Verify OpenFlows reverted to packet header fields
>>> +as hv1
>>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E "tp_dst=80|tp_dst=53" |
>>> head -1 | grep -q "tp_dst"], [0])
>>> +
>>> +OVN_CLEANUP([hv1])
>>> +AT_CLEANUP
>>> +])
>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
>>> index 5b3dc47fd..539a8fa3d 100644
>>> --- a/tests/system-ovn.at
>>> +++ b/tests/system-ovn.at
>>> @@ -20347,3 +20347,392 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error
>>> receiving.*/d
>>>  /.*terminating with signal 15.*/d"])
>>>  AT_CLEANUP
>>>  ])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([ACL CT translation - UDP fragmentation])
>>> +AT_KEYWORDS([acl_ct_translation_udp_fragmentation])
>>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>>> +
>>> +CHECK_CONNTRACK()
>>> +
>>> +ovn_start
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +ADD_BR([br-int])
>>> +
>>> +# Set external-ids in br-int needed for ovn-controller
>>> +check ovs-vsctl \
>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>>> +        -- set Open_vSwitch .
>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>>> +        -- set bridge br-int fail-mode=secure
>>> other-config:disable-in-band=true
>>> +
>>> +# Start ovn-controller
>>> +start_daemon ovn-controller
>>> +
>>> +# Set the minimal fragment size for userspace DP.
>>> +ovs-appctl dpctl/ipf-set-min-frag v4 500
>>> +
>>> +check ovn-nbctl ls-add internal
>>> +check ovn-nbctl lsp-add internal client \
>>> +    -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3"
>>> +check ovn-nbctl lsp-add internal server \
>>> +    -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2"
>>> +
>>> +ADD_NAMESPACES(client)
>>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +NS_EXEC([client], [ip l set dev client mtu 900])
>>> +
>>> +ADD_NAMESPACES(server)
>>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +NS_EXEC([server], [ip l set dev server mtu 900])
>>> +
>>> +# Create data file for traffic testing (8000 bytes will be fragmented
>>> with MTU 900)
>>> +printf %8000s > datafile
>>> +
>>> +# Create port group with both client and server
>>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=client)
>>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=server)
>>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid
>>> +
>>> +# ACL rules - allow outgoing traffic
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4 && udp" allow-related
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4" allow-related
>>> +
>>> +# ACL rules - allow incoming UDP on specific ports
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && ip4 && udp && udp.dst == 5060" allow-related
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && ip4 && udp && udp.dst == 5061" allow-related
>>> +
>>> +# Allow ARP
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && arp" allow-related
>>> +
>>> +# Drop rule
>>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100
>>> "outport == @internal_vms" drop
>>> +
>>> +# Enable ACL CT translation for fragmentation handling
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=true
>>> +
>>> +check ovn-nbctl --wait=hv sync
>>> +check ovs-appctl dpctl/flush-conntrack
>>> +
>>> +# Start tcpdump to capture IP fragments on both sides
>>> +NETNS_START_TCPDUMP([server], [-U -i server -Q in -nn ip and
>>> '(ip[[6:2]] & 0x3fff != 0)'], [tcpdump-udp-server])
>>> +NETNS_START_TCPDUMP([client], [-U -i client -Q in -nn ip and
>>> '(ip[[6:2]] & 0x3fff != 0)'], [tcpdump-udp-client])
>>> +
>>> +# Start UDP listeners on both sides
>>> +NETNS_DAEMONIZE([server], [nc -l -u 172.16.1.2 5060 > udp_server.rcvd],
>>> [server.pid])
>>> +NETNS_DAEMONIZE([client], [nc -l -u 172.16.1.3 5061 > udp_client.rcvd],
>>> [client.pid])
>>> +
>>> +# Client sends to server (will be fragmented due to MTU 900)
>>> +NS_CHECK_EXEC([client], [cat datafile | nc -u 172.16.1.2 5060 -q 1],
>>> [0], [ignore], [ignore])
>>>
>>
>> My nc version doesn't support -q. We probably need a different trick to
>> make it wait.
>>
>>
>>> +
>>> +# Server sends to client (will be fragmented due to MTU 900)
>>> +NS_CHECK_EXEC([server], [cat datafile | nc -u 172.16.1.3 5061 -q 1],
>>> [0], [ignore], [ignore])
>>> +
>>> +OVS_WAIT_UNTIL([test -s udp_server.rcvd])
>>> +OVS_WAIT_UNTIL([test -s udp_client.rcvd])
>>> +
>>> +# Verify both sides received data
>>> +check cmp datafile udp_server.rcvd
>>> +check cmp datafile udp_client.rcvd
>>> +
>>> +# Verify IP fragments were received on both sides (at least 5 fragments
>>> confirms fragmentation)
>>> +OVS_WAIT_UNTIL([test $(cat tcpdump-udp-server.tcpdump | wc -l) -ge 5])
>>> +OVS_WAIT_UNTIL([test $(cat tcpdump-udp-client.tcpdump | wc -l) -ge 5])
>>> +
>>> +AS_BOX([UDP fragmentation test passed - bi-directional traffic with IP
>>> fragments])
>>>
>>
>> nit: We can drop this box the test would indicate failure.
>>
>>
>>> +
>>> +OVN_CLEANUP_CONTROLLER([hv1])
>>> +OVN_CLEANUP_NORTHD
>>> +
>>> +as
>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> +/connection dropped.*/d
>>> +/WARN|netdev@ovs-netdev: execute.*/d
>>> +/dpif|WARN|system@ovs-system: execute.*/d
>>> +"])
>>> +AT_CLEANUP
>>> +])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([ACL CT translation - TCP traffic])
>>> +AT_KEYWORDS([acl_ct_translation_tcp_fragmentation])
>>> +
>>> +CHECK_CONNTRACK()
>>> +
>>> +ovn_start
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +ADD_BR([br-int])
>>> +
>>> +# Set external-ids in br-int needed for ovn-controller
>>> +check ovs-vsctl \
>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>>> +        -- set Open_vSwitch .
>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>>> +        -- set bridge br-int fail-mode=secure
>>> other-config:disable-in-band=true
>>> +
>>> +# Start ovn-controller
>>> +start_daemon ovn-controller
>>> +
>>> +check ovn-nbctl ls-add internal
>>> +check ovn-nbctl lsp-add internal client \
>>> +    -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3"
>>> +check ovn-nbctl lsp-add internal server \
>>> +    -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2"
>>> +
>>> +ADD_NAMESPACES(client)
>>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +
>>> +ADD_NAMESPACES(server)
>>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +
>>> +# Create data file for traffic testing
>>> +printf %8000s > datafile
>>> +
>>> +# Create port group with both client and server
>>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=client)
>>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=server)
>>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid
>>> +
>>> +# ACL rules - allow outgoing traffic
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4 && tcp" allow-related
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4" allow-related
>>> +
>>> +# ACL rules - allow incoming TCP on specific ports
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && ip4 && tcp && tcp.dst == 8080" allow-related
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && ip4 && tcp && tcp.dst == 8081" allow-related
>>> +
>>> +# Allow ARP
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && arp" allow-related
>>> +
>>> +# Drop rule
>>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100
>>> "outport == @internal_vms" drop
>>> +
>>> +# Enable ACL CT translation
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=true
>>> +
>>> +check ovn-nbctl --wait=hv sync
>>> +check ovs-appctl dpctl/flush-conntrack
>>> +
>>> +# Test client -> server direction
>>> +NETNS_DAEMONIZE([server], [nc -l 172.16.1.2 8080 > tcp_server.rcvd],
>>> [server.pid])
>>> +NS_CHECK_EXEC([client], [cat datafile | nc 172.16.1.2 8080 -q 1], [0],
>>> [ignore], [ignore])
>>> +
>>> +OVS_WAIT_UNTIL([test -s tcp_server.rcvd])
>>> +check cmp datafile tcp_server.rcvd
>>> +
>>> +# Clean up first direction
>>> +kill $(cat server.pid) 2>/dev/null || true
>>> +
>>> +# Test server -> client direction
>>> +NETNS_DAEMONIZE([client], [nc -l 172.16.1.3 8081 > tcp_client.rcvd],
>>> [client.pid])
>>> +NS_CHECK_EXEC([server], [cat datafile | nc 172.16.1.3 8081 -q 1], [0],
>>> [ignore], [ignore])
>>> +
>>> +OVS_WAIT_UNTIL([test -s tcp_client.rcvd])
>>> +check cmp datafile tcp_client.rcvd
>>> +
>>> +AS_BOX([TCP traffic test passed - bi-directional traffic])
>>> +
>>> +OVN_CLEANUP_CONTROLLER([hv1])
>>> +OVN_CLEANUP_NORTHD
>>> +
>>> +as
>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> +/connection dropped.*/d
>>> +"])
>>> +AT_CLEANUP
>>> +])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([ACL CT translation - negative test])
>>>
>>
>> The UDP + TCP + negative use the same infrastructure, maybe
>> we can squash them in a single test that just changes the ACL rules,
>> WDYT?
>>
>>
> I was conducting all the tests within the same infrastructure, but this
> made it more difficult to debug and identify which tests were failing, and
> the test was becoming very large.
>
>
>
>> +AT_KEYWORDS([acl_ct_translation_udp_fragmentation_negative])
>>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no])
>>> +
>>> +CHECK_CONNTRACK()
>>> +
>>> +ovn_start
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +ADD_BR([br-int])
>>> +
>>> +# Set external-ids in br-int needed for ovn-controller
>>> +check ovs-vsctl \
>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>>> +        -- set Open_vSwitch .
>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>>> +        -- set bridge br-int fail-mode=secure
>>> other-config:disable-in-band=true
>>> +
>>> +# Start ovn-controller
>>> +start_daemon ovn-controller
>>> +
>>> +check ovn-nbctl ls-add internal
>>> +check ovn-nbctl lsp-add internal client \
>>> +    -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3"
>>> +check ovn-nbctl lsp-add internal server \
>>> +    -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2"
>>> +
>>> +ADD_NAMESPACES(client)
>>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +
>>> +ADD_NAMESPACES(server)
>>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03",
>>> \
>>> +         "172.16.1.1")
>>> +
>>> +# Create port group with both client and server
>>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=client)
>>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=server)
>>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid
>>> +
>>> +# ACL rules - allow outgoing traffic
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4 && udp" allow-related
>>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport ==
>>> @internal_vms && ip4" allow-related
>>> +
>>> +# ACL rules - allow incoming UDP ONLY on port 5060 (NOT 4000)
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && ip4 && udp && udp.dst == 5060" allow-related
>>> +
>>> +# Allow ARP
>>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport ==
>>> @internal_vms && arp" allow-related
>>> +
>>> +# Drop rule - this should drop traffic on port 4000
>>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100
>>> "outport == @internal_vms" drop
>>> +
>>> +# Enable ACL CT translation
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=true
>>> +
>>> +check ovn-nbctl --wait=hv sync
>>> +check ovs-appctl dpctl/flush-conntrack
>>> +
>>> +# Start tcpdump to capture any incoming packets on server
>>> +NETNS_START_TCPDUMP([server], [-U -i server -Q in -nn udp port 4000],
>>> [tcpdump-drop-server])
>>> +
>>> +# Start UDP listener on server (on port 4000 which is NOT allowed by
>>> ACLs)
>>> +NETNS_DAEMONIZE([server], [nc -l -u 172.16.1.2 4000 > udp_drop.rcvd],
>>> [drop_server.pid])
>>> +
>>> +# Client sends to server on disallowed port
>>> +NS_CHECK_EXEC([client], [echo "test" | timeout 2 nc -u 172.16.1.2 4000
>>> -q 1], [0], [ignore], [ignore])
>>> +
>>> +# Wait a bit for any packets to arrive
>>> +sleep 2
>>> +
>>> +# Verify server did NOT receive any data (file should be empty or not
>>> exist)
>>> +AT_CHECK([test ! -s udp_drop.rcvd], [0])
>>> +
>>> +# Verify tcpdump captured no packets (traffic was dropped by ACL)
>>> +AT_CHECK([test $(cat tcpdump-drop-server.tcpdump | wc -l) -eq 0], [0])
>>> +
>>> +AS_BOX([Negative test passed - traffic on disallowed port was dropped])
>>> +
>>> +OVN_CLEANUP_CONTROLLER([hv1])
>>> +OVN_CLEANUP_NORTHD
>>> +
>>> +as
>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> +/connection dropped.*/d
>>> +"])
>>> +AT_CLEANUP
>>> +])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([ACL CT translation with DHCP traffic])
>>> +AT_KEYWORDS([acl_ct_translation_dhcp_fragmentation])
>>> +AT_SKIP_IF([test $HAVE_DNSMASQ = no])
>>> +AT_SKIP_IF([test $HAVE_DHCLIENT = no])
>>>
>>
>> If we want to keep this test we need to install both dnsmasq and dhclient
>> to the container images.
>>
>>
>>> +
>>> +CHECK_CONNTRACK()
>>> +
>>> +ovn_start
>>> +OVS_TRAFFIC_VSWITCHD_START()
>>> +ADD_BR([br-int])
>>> +
>>> +# Set external-ids in br-int needed for ovn-controller
>>> +check ovs-vsctl \
>>> +        -- set Open_vSwitch . external-ids:system-id=hv1 \
>>> +        -- set Open_vSwitch .
>>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
>>> +        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
>>> +        -- set bridge br-int fail-mode=secure
>>> other-config:disable-in-band=true
>>> +
>>> +# Start ovn-controller
>>> +start_daemon ovn-controller
>>> +
>>> +# Create logical switch
>>> +check ovn-nbctl ls-add ls1
>>> +
>>> +# Create DHCP server port
>>> +check ovn-nbctl lsp-add ls1 dhcp-server \
>>> +    -- lsp-set-addresses dhcp-server "00:00:00:00:02:00 192.168.1.2"
>>> +
>>> +# Create DHCP client port
>>> +check ovn-nbctl lsp-add ls1 dhcp-client \
>>> +    -- lsp-set-addresses dhcp-client "00:00:00:00:02:01"
>>> +
>>> +# Add OVS ports
>>> +ADD_NAMESPACES(server, client)
>>> +ADD_VETH(server, server, br-int, "192.168.1.2/24", "00:00:00:00:02:00")
>>> +ADD_VETH(client, client, br-int, "0.0.0.0/0", "00:00:00:00:02:01")
>>> +
>>> +# Bind logical ports to OVS ports
>>> +check ovs-vsctl set Interface ovs-server
>>> external_ids:iface-id=dhcp-server
>>> +check ovs-vsctl set Interface ovs-client
>>> external_ids:iface-id=dhcp-client
>>> +
>>> +# Create port group with both ports
>>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=dhcp-server)
>>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port
>>> name=dhcp-client)
>>> +check ovn-nbctl pg-add dhcp_vms $server_uuid $client_uuid
>>> +
>>> +# Add ACL rules - allow outgoing traffic from both ports
>>> +check ovn-nbctl acl-add dhcp_vms from-lport 1002 "inport == @dhcp_vms
>>> && ip4" allow-related
>>> +check ovn-nbctl acl-add dhcp_vms from-lport 1002 "inport == @dhcp_vms
>>> && udp" allow-related
>>> +
>>> +# Add ACL rules for DHCP traffic (ports 67 and 68)
>>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms &&
>>> ip4 && udp && udp.dst == 67" allow-related
>>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms &&
>>> ip4 && udp && udp.dst == 68" allow-related
>>> +
>>> +# Allow ARP and broadcast
>>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms &&
>>> arp" allow-related
>>> +
>>> +# Add drop rule
>>> +check ovn-nbctl --log --severity=info acl-add dhcp_vms to-lport 100
>>> "outport == @dhcp_vms" drop
>>> +
>>> +# Enable ACL CT translation
>>> +check ovn-nbctl --wait=hv set NB_Global .
>>> options:acl_ct_translation=false
>>> +
>>> +# Wait for flows to be installed
>>> +check ovn-nbctl --wait=hv sync
>>> +
>>> +# Start dnsmasq as DHCP server in the server namespace
>>> +# dnsmasq doesn't have AppArmor restrictions like dhcpd
>>> +NETNS_DAEMONIZE([server], [dnsmasq --no-daemon --no-resolv --no-hosts \
>>> +    --interface=server --bind-interfaces \
>>> +    --dhcp-range=192.168.1.100,192.168.1.100,255.255.255.0,1h \
>>> +    --dhcp-option=option:router,192.168.1.2 \
>>> +    --log-dhcp], [dnsmasq.pid])
>>> +
>>> +# Give dnsmasq time to start
>>> +sleep 1
>>> +
>>> +# Request IP via DHCP using dhclient
>>> +NS_CHECK_EXEC([client], [dhclient -1 -v client], [0], [ignore],
>>> [ignore])
>>>
>>
>> This is leaking the dhclient, in other words it keeps running even when
>> the test is done.
>>
>>
> Good catch.
>
>
>> +
>>> +# Verify client got an IP address from DHCP
>>> +NS_CHECK_EXEC([client], [ip addr show client | grep -q
>>> "192.168.1.100"], [0])
>>> +
>>> +AS_BOX([DHCP test passed - client received IP 192.168.1.100 via DHCP
>>> with ACL CT translation enabled])
>>> +
>>> +OVN_CLEANUP_CONTROLLER([hv1])
>>> +
>>> +OVN_CLEANUP_NORTHD
>>> +
>>> +as
>>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
>>> +/connection dropped.*/d
>>> +/dnsmasq.*/d
>>> +"])
>>> +AT_CLEANUP
>>> +])
>>> --
>>> 2.43.0
>>>
>>>
>>
>> I was able to figure out why is it happening, dunno if we should keep the
>> test around
>> but it can be reproduced even with multicast UDP traffic so it might be
>> enough to adjust
>> the UDP test to also send multicast.  The following two logical flows are
>> to "blame":
>>
>>
> Well, I think the broader the coverage, the better. The advantage of
> having a DHCP client/server is that the traffic will contain all the
> nuances of real traffic, and we don't have to encode all protocol data. For
> example, the return traffic should be difficult to generate since it will
> not be multicast. I tried to reuse NETNS_START_DHCPD, but I encountered
> various permission errors and couldn't get it to work. Dnsmasq seems easier
> to use and doesn't rely on an external config file.
>

We should reuse what is already provided in the test instead
of adding a new dependency, also one thing to note about dhclient
it's unmaintained and deprecated in Fedora, we should find
a different client to use.


>
>
>>         /* Do not send multicast packets to conntrack. */
>>         ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110, "eth.mcast",
>>                       "next;", lflow_ref);
>>         ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110, "eth.mcast",
>>                       "next;", lflow_ref);
>>
>> We basically skip CT in case of multicast, and because the translation
>> adds new
>> prereq of +ct.trk we will never match that flow. I'm afraid that there
>> isn't any
>> easy/good solution. If we disable those flows we are imposing performance
>> penalty on multicast traffic. If we say that's acceptable we need to
>> clearly
>> state that too in the documentation it's in the same ballpark with
>> "broken"
>> HW offload.
>>
>
> What if we discard it only for DHCP traffic? I got it passing with
> something like this:
>
>         if (acl_udp_ct_translation) {
>             ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 120,
>                           "eth.mcast && udp && "
>                           "(udp.src == 67 || udp.src == 68 || "
>                           "udp.dst == 67 || udp.dst == 68)",
>                           REGBIT_CONNTRACK_DEFRAG" = 1; next;",
>                           lflow_ref);
>         }
>
> This takes higher priority than the more general multicast rule, so
> performance should not be a problem if we apply it for DHCP.
>

As I replied to Brian it might make more sense
CT for DHCP traffic in general regardless of this
option. Unfortunately that still doesn't solve the
issue with multicast traffic in general.


>
>> Regards,
>> Ales
>>
>
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to