On Mon, Jan 26, 2026 at 9:25 PM Erlon Rodrigues Cruz <[email protected]> wrote:
> Hey Ales, > > > On Mon, Jan 26, 2026 at 10:18 AM Ales Musil <[email protected]> wrote: > >> >> >> On Fri, Jan 23, 2026 at 8:00 PM Erlon R. Cruz <[email protected]> >> wrote: >> >>> v2: Rebased on current upstream main, removed external python >>> code traffic generators, added scenario tests for: dhcp, negative >>> udp and ovn rule generation, documentation and many code clean ups. >>> DHCP traffic is being dropped for some reason. Still need to figure >>> that out for v3. >>> >>> At the current state, OVN can not handle fragmented traffic for ACLs >>> in the userspace datapath (DPDK). Just like in the case of LB >>> (commit 20a96b9), the kernel DP will try to reassemble the fragments >>> during CT lookup, however userspace won't reassemble them. >>> >>> This patch allows OVN to handle fragmented traffic by defining a >>> translation table on southbound that leverages OpenFlow connection >>> tracking capabilities. When a stateful flow is created on NB, we add >>> a hint in the flow. This hint will be read in SB and if the >>> connection tracking is set to be used, SB will use the alternative >>> translation table that will use the connection tracking information. >>> >>> This approach should not change the current behavior and it's only >>> enabled if acl_ct_translation is set: >>> >>> ovn-nbctl set NB_Global . options:acl_ct_translation=true >>> >>> Signed-off-by: Erlon R. Cruz <[email protected]> >>> --- >>> >> >> Hi Erlon, >> >> thank you for the v2. It needs a rebase. I have also some comments down >> below. >> >> >>> NEWS | 5 + >>> controller/lflow.c | 24 ++- >>> controller/lflow.h | 1 + >>> include/ovn/logical-fields.h | 3 + >>> lib/logical-fields.c | 67 ++++++ >>> northd/en-global-config.c | 5 + >>> northd/lflow-mgr.c | 51 ++++- >>> northd/lflow-mgr.h | 8 +- >>> northd/northd.c | 41 +++- >>> ovn-nb.xml | 20 ++ >>> tests/atlocal.in | 3 + >>> tests/ovn.at | 71 +++++++ >>> tests/system-ovn.at | 389 +++++++++++++++++++++++++++++++++++ >>> 13 files changed, 668 insertions(+), 20 deletions(-) >>> >>> diff --git a/NEWS b/NEWS >>> index 9883fb81d..7062701c8 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -1,5 +1,10 @@ >>> Post v25.09.0 >>> ------------- >>> + - Fixed support for fragmented traffic in the userspace datapath. >>> Added the >>> + "acl_ct_translation" NB_Global option to enable connection tracking >>> + based L4 field translation for stateful ACLs. When enabled allows >>> proper >>> + handling of IP fragmentation in userspace datapaths. This option >>> breaks >>> + hardware offloading and is disabled by default. >>> >> >> We are usually adding NEWS at the end. >> >> >>> - Added support for TLS Server Name Indication (SNI) with the new >>> --ssl-server-name option in OVN utilities and daemons. This allows >>> specifying the server name for SNI, which is useful when connecting >>> diff --git a/controller/lflow.c b/controller/lflow.c >>> index 784a0d2dd..139181546 100644 >>> --- a/controller/lflow.c >>> +++ b/controller/lflow.c >>> @@ -51,10 +51,19 @@ COVERAGE_DEFINE(consider_logical_flow); >>> /* Contains "struct expr_symbol"s for fields supported by OVN lflows. */ >>> static struct shash symtab; >>> >>> +/* Alternative symbol table for ACL CT translation. >>> + * This symbol table maps L4 port fields (tcp/udp/sctp) to their >>> connection >>> + * tracking equivalents (ct_tp_src/ct_tp_dst with ct_proto predicates). >>> + * This allows matching on all IP fragments (not just the first >>> fragment) >>> + * so that all fragments can be matched based on the connection >>> tracking state. >>> + */ >>> +static struct shash acl_ct_symtab; >>> + >>> void >>> lflow_init(void) >>> { >>> ovn_init_symtab(&symtab); >>> + ovn_init_acl_ct_symtab(&acl_ct_symtab); >>> } >>> >>> struct lookup_port_aux { >>> @@ -984,7 +993,16 @@ convert_match_to_expr(const struct >>> sbrec_logical_flow *lflow, >>> lflow->match); >>> return NULL; >>> } >>> - struct expr *e = expr_parse_string(lex_str_get(&match_s), &symtab, >>> + >>> + /* Check if this logical flow requires ACL CT translation. >>> + * If the tags contains "acl_ct_translation"="true", we use the >>> + * alternative symbol table that maps L4 fields (tcp/udp/sctp ports) >>> + * to their CT equivalents. */ >>> + const char *ct_trans = smap_get(&lflow->tags, "acl_ct_translation"); >>> + struct shash *symtab_to_use = (ct_trans && !strcmp(ct_trans, "true") >>> + ? &acl_ct_symtab : &symtab); >>> >> >> nit: smap_get_bool(..., false); >> >> >>> + >>> + struct expr *e = expr_parse_string(lex_str_get(&match_s), >>> symtab_to_use, >>> addr_sets, port_groups, >>> &addr_sets_ref, >>> &port_groups_ref, >>> ldp->datapath->tunnel_key, >>> @@ -1016,7 +1034,7 @@ convert_match_to_expr(const struct >>> sbrec_logical_flow *lflow, >>> e = expr_combine(EXPR_T_AND, e, *prereqs); >>> *prereqs = NULL; >>> } >>> - e = expr_annotate(e, &symtab, &error); >>> + e = expr_annotate(e, symtab_to_use, &error); >>> } >>> if (error) { >>> static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); >>> @@ -2045,6 +2063,8 @@ lflow_destroy(void) >>> { >>> expr_symtab_destroy(&symtab); >>> shash_destroy(&symtab); >>> + expr_symtab_destroy(&acl_ct_symtab); >>> + shash_destroy(&acl_ct_symtab); >>> } >>> >>> bool >>> diff --git a/controller/lflow.h b/controller/lflow.h >>> index 30c6dbebd..3e762d49b 100644 >>> --- a/controller/lflow.h >>> +++ b/controller/lflow.h >>> @@ -156,6 +156,7 @@ struct lflow_ctx_out { >>> }; >>> >>> void lflow_init(void); >>> +void ovn_init_acl_ct_symtab(struct shash *acl_symtab); >>> >> >> nit: Please remove this declaration it moved into logical-fields.h. >> >> >>> void lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *); >>> void lflow_handle_cached_flows(struct lflow_cache *, >>> const struct sbrec_logical_flow_table *); >>> diff --git a/include/ovn/logical-fields.h b/include/ovn/logical-fields.h >>> index f0d34196a..8f23249a0 100644 >>> --- a/include/ovn/logical-fields.h >>> +++ b/include/ovn/logical-fields.h >>> @@ -236,6 +236,9 @@ const char *event_to_string(enum >>> ovn_controller_event event); >>> int string_to_event(const char *s); >>> const struct ovn_field *ovn_field_from_name(const char *name); >>> >>> +void ovn_init_acl_ct_symtab(struct shash *symtab); >>> +void expr_symtab_remove(struct shash *symtab, const char *name); >>> + >>> /* OVN CT label values >>> * =================== >>> * These are specific ct.label bit values OVN uses to track different >>> types >>> diff --git a/lib/logical-fields.c b/lib/logical-fields.c >>> index c8bddcdc5..334a7e244 100644 >>> --- a/lib/logical-fields.c >>> +++ b/lib/logical-fields.c >>> @@ -451,3 +451,70 @@ ovn_field_from_name(const char *name) >>> >>> return shash_find_data(&ovnfield_by_name, name); >>> } >>> + >>> +/* Removes the symbol with 'name' from 'symtab', freeing its memory. */ >>> +void >>> +expr_symtab_remove(struct shash *symtab, const char *name) >>> +{ >>> + struct expr_symbol *symbol = shash_find_and_delete(symtab, name); >>> + if (symbol) { >>> + free(symbol->name); >>> + free(symbol->prereqs); >>> + free(symbol->predicate); >>> + free(symbol); >>> + } >>> +} >>> + >>> +/* Initialize a symbol table for ACL CT translation. >>> + * This creates an alternative symbol table that maps L4 port fields >>> + * (tcp/udp/sctp) to their connection tracking equivalents. This allows >>> + * matching on all IP fragments (not just the first) by using CT state >>> + * which is available for all fragments. */ >>> +void >>> +ovn_init_acl_ct_symtab(struct shash *acl_symtab) >>> +{ >>> + /* Initialize with the standard symbol table. */ >>> + ovn_init_symtab(acl_symtab); >>> + >>> + /* Remove the original tcp/udp/sctp symbols that we will override. >>> */ >>> + expr_symtab_remove(acl_symtab, "tcp.src"); >>> + expr_symtab_remove(acl_symtab, "tcp.dst"); >>> + expr_symtab_remove(acl_symtab, "tcp"); >>> + expr_symtab_remove(acl_symtab, "udp.src"); >>> + expr_symtab_remove(acl_symtab, "udp.dst"); >>> + expr_symtab_remove(acl_symtab, "udp"); >>> + expr_symtab_remove(acl_symtab, "sctp.src"); >>> + expr_symtab_remove(acl_symtab, "sctp.dst"); >>> + expr_symtab_remove(acl_symtab, "sctp"); >>> + >>> + /* Add ct_proto field - CT original direction protocol. Used in the >>> + * tcp/udp/sctp predicate expansions below. */ >>> + expr_symtab_add_field(acl_symtab, "ct_proto", MFF_CT_NW_PROTO, >>> + "ct.trk", false); >>> + >>> + /* Override TCP protocol and port fields to use CT equivalents. >>> + * When "tcp" is used as a predicate, it expands to "ct_proto == 6" >>> + * instead of "ip.proto == 6". */ >>> + expr_symtab_add_predicate(acl_symtab, "tcp", >>> + "ct.trk && !ct.inv && ct_proto == 6"); >>> + expr_symtab_add_field(acl_symtab, "tcp.src", MFF_CT_TP_SRC, >>> + "tcp", false); >>> + expr_symtab_add_field(acl_symtab, "tcp.dst", MFF_CT_TP_DST, >>> + "tcp", false); >>> + >>> + /* Override UDP protocol and port fields */ >>> + expr_symtab_add_predicate(acl_symtab, "udp", >>> + "ct.trk && !ct.inv && ct_proto == 17"); >>> + expr_symtab_add_field(acl_symtab, "udp.src", MFF_CT_TP_SRC, >>> + "udp", false); >>> + expr_symtab_add_field(acl_symtab, "udp.dst", MFF_CT_TP_DST, >>> + "udp", false); >>> + >>> + /* Override SCTP protocol and port fields */ >>> + expr_symtab_add_predicate(acl_symtab, "sctp", >>> + "ct.trk && !ct.inv && ct_proto == 132"); >>> + expr_symtab_add_field(acl_symtab, "sctp.src", MFF_CT_TP_SRC, >>> + "sctp", false); >>> + expr_symtab_add_field(acl_symtab, "sctp.dst", MFF_CT_TP_DST, >>> + "sctp", false); >>> +} >>> diff --git a/northd/en-global-config.c b/northd/en-global-config.c >>> index 2556b2888..a6f37b0c3 100644 >>> --- a/northd/en-global-config.c >>> +++ b/northd/en-global-config.c >>> @@ -717,6 +717,11 @@ check_nb_options_out_of_sync( >>> return true; >>> } >>> >>> + if (config_out_of_sync(&nb->options, &config_data->nb_options, >>> + "acl_ct_translation", false)) { >>> + return true; >>> + } >>> + >>> return false; >>> } >>> >>> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c >>> index f21024903..66a519780 100644 >>> --- a/northd/lflow-mgr.c >>> +++ b/northd/lflow-mgr.c >>> @@ -184,6 +184,7 @@ struct ovn_lflow { >>> struct ovn_dp_group *dpg; /* Link to unique Sb datapath group. */ >>> const char *where; >>> const char *flow_desc; >>> + bool acl_ct_translation; /* Use CT-based L4 field translation. >>> */ >>> >>> struct uuid sb_uuid; /* SB DB row uuid, specified by >>> northd. */ >>> struct ovs_list referenced_by; /* List of struct lflow_ref_node. */ >>> @@ -725,7 +726,7 @@ lflow_ref_sync_lflows(struct lflow_ref *lflow_ref, >>> * then it may corrupt the hmap. Caller should ensure thread safety >>> * for such scenarios. >>> */ >>> -static void >>> +static struct ovn_lflow * >>> lflow_table_add_lflow__(struct lflow_table *lflow_table, >>> const struct ovn_synced_datapath *sdp, >>> const unsigned long *dp_bitmap, size_t >>> dp_bitmap_len, >>> @@ -798,9 +799,17 @@ lflow_table_add_lflow__(struct lflow_table >>> *lflow_table, >>> ovn_dp_group_add_with_reference(lflow, sdp, dp_bitmap, >>> dp_bitmap_len); >>> >>> lflow_hash_unlock(hash_lock); >>> + return lflow; >>> } >>> >>> void >>> +ovn_lflow_set_acl_ct_translation(struct ovn_lflow *lflow, >>> + bool acl_ct_translation) >>> +{ >>> + lflow->acl_ct_translation = acl_ct_translation; >>> +} >>> + >>> +struct ovn_lflow * >>> lflow_table_add_lflow(struct lflow_table_add_args *args) >>> { >>> /* It is invalid for both args->dp_bitmap and args->sdp to be >>> @@ -811,11 +820,13 @@ lflow_table_add_lflow(struct lflow_table_add_args >>> *args) >>> args->sdp = NULL; >>> } >>> >>> - lflow_table_add_lflow__(args->table, args->sdp, args->dp_bitmap, >>> - args->dp_bitmap_len, args->stage, >>> args->priority, >>> - args->match, args->actions, args->io_port, >>> - args->ctrl_meter, args->stage_hint, >>> args->where, >>> - args->flow_desc, args->lflow_ref); >>> + return lflow_table_add_lflow__(args->table, args->sdp, >>> args->dp_bitmap, >>> + args->dp_bitmap_len, args->stage, >>> + args->priority, args->match, >>> + args->actions, args->io_port, >>> + args->ctrl_meter, args->stage_hint, >>> + args->where, args->flow_desc, >>> + args->lflow_ref); >>> } >>> >>> struct ovn_dp_group * >>> @@ -945,6 +956,7 @@ ovn_lflow_init(struct ovn_lflow *lflow, >>> lflow->stage_hint = stage_hint; >>> lflow->ctrl_meter = ctrl_meter; >>> lflow->flow_desc = flow_desc; >>> + lflow->acl_ct_translation = false; >>> lflow->dpg = NULL; >>> lflow->where = where; >>> lflow->sb_uuid = sbuuid; >>> @@ -1122,12 +1134,20 @@ sync_lflow_to_sb(struct ovn_lflow *lflow, >>> sbrec_logical_flow_set_match(sbflow, lflow->match); >>> sbrec_logical_flow_set_actions(sbflow, lflow->actions); >>> sbrec_logical_flow_set_flow_desc(sbflow, lflow->flow_desc); >>> - if (lflow->io_port) { >>> + >>> + /* Set tags for io_port and/or acl_ct_translation if needed. */ >>> + if (lflow->io_port || lflow->acl_ct_translation) { >>> struct smap tags = SMAP_INITIALIZER(&tags); >>> - smap_add(&tags, "in_out_port", lflow->io_port); >>> + if (lflow->io_port) { >>> + smap_add(&tags, "in_out_port", lflow->io_port); >>> + } >>> + if (lflow->acl_ct_translation) { >>> + smap_add(&tags, "acl_ct_translation", "true"); >>> + } >>> sbrec_logical_flow_set_tags(sbflow, &tags); >>> smap_destroy(&tags); >>> } >>> + >>> sbrec_logical_flow_set_controller_meter(sbflow, >>> lflow->ctrl_meter); >>> >>> /* Trim the source locator lflow->where, which looks something >>> like >>> @@ -1193,6 +1213,21 @@ sync_lflow_to_sb(struct ovn_lflow *lflow, >>> } >>> } >>> } >>> + >>> + /* Update acl_ct_translation marker in tags if needed. >>> + * This must be outside ovn_internal_version_changed check >>> because >>> + * the option can be enabled/disabled at runtime. */ >>> + bool cur_has_ct_trans = smap_get_bool(&sbflow->tags, >>> + "acl_ct_translation", >>> false); >>> + if (lflow->acl_ct_translation != cur_has_ct_trans) { >>> + if (lflow->acl_ct_translation) { >>> + sbrec_logical_flow_update_tags_setkey( >>> + sbflow, "acl_ct_translation", "true"); >>> + } else { >>> + sbrec_logical_flow_update_tags_delkey( >>> + sbflow, "acl_ct_translation"); >>> + } >>> + } >>> } >>> >>> if (lflow->dp) { >>> diff --git a/northd/lflow-mgr.h b/northd/lflow-mgr.h >>> index 4a1655c35..16aa0dd07 100644 >>> --- a/northd/lflow-mgr.h >>> +++ b/northd/lflow-mgr.h >>> @@ -24,6 +24,7 @@ >>> struct ovsdb_idl_txn; >>> struct ovn_datapath; >>> struct ovsdb_idl_row; >>> +struct ovn_lflow; >>> >>> /* lflow map which stores the logical flows. */ >>> struct lflow_table { >>> @@ -89,7 +90,12 @@ struct lflow_table_add_args { >>> const char *where; >>> }; >>> >>> -void lflow_table_add_lflow(struct lflow_table_add_args *args); >>> +struct ovn_lflow *lflow_table_add_lflow(struct lflow_table_add_args >>> *args); >>> + >>> +/* Set the acl_ct_translation flag on an lflow. When true, >>> ovn-controller will >>> + * use a symbol table that maps L4 port fields to CT equivalents. */ >>> +void ovn_lflow_set_acl_ct_translation(struct ovn_lflow *lflow, >>> + bool acl_ct_translation); >>> >> >> We should use the new macro approach e.g. WITH_CT_TRANSLATION >> as we have for io port etc. instead of defining some extra function. >> >> >>> >>> >>> #define WITH_HINT(HINT) .stage_hint = HINT >>> diff --git a/northd/northd.c b/northd/northd.c >>> index d79fe40c9..52233260f 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -83,12 +83,18 @@ static bool check_lsp_is_up; >>> static bool install_ls_lb_from_router; >>> >>> /* Use common zone for SNAT and DNAT if this option is set to "true". */ >>> -static bool use_common_zone = false; >>> +static bool use_common_zone; >>> >>> /* If this option is 'true' northd will make use of ct.inv match fields. >>> * Otherwise, it will avoid using it. The default is true. */ >>> static bool use_ct_inv_match = true; >>> >>> +/* If this option is 'true' northd will flag the related ACL flows to >>> use >>> + * connection tracking fields to properly handle IP fragments. By >>> default this >>> + * option is set to 'false'. >>> + */ >>> +static bool acl_ct_translation = false; >>> + >>> /* If this option is 'true' northd will implicitly add a lowest-priority >>> * drop rule in the ACL stage of logical switches that have at least one >>> * ACL. >>> @@ -7199,6 +7205,12 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> match_tier_len = match->length; >>> } >>> >>> + /* Check if this ACL needs CT translation for fragment handling. >>> + * All stateful ACLs are marked when the option is enabled; the >>> actual >>> + * translation only affects L4 port fields in ovn-controller. */ >>> + bool needs_ct_trans = has_stateful && acl_ct_translation; >>> + struct ovn_lflow *lflow; >>> + >>> if (!has_stateful >>> || !strcmp(acl->action, "pass") >>> || !strcmp(acl->action, "allow-stateless")) { >>> @@ -7216,6 +7228,7 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> >>> ds_put_cstr(actions, "next;"); >>> ds_put_format(match, "(%s)", acl->match); >>> + /* Stateless ACLs don't need CT translation. */ >>> ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match), >>> ds_cstr(actions), lflow_ref, >>> WITH_HINT(&acl->header_)); >>> return; >>> @@ -7284,8 +7297,10 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> (uint8_t) acl->network_function_group->id); >>> } >>> ds_put_cstr(actions, "next;"); >>> - ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match), >>> - ds_cstr(actions), lflow_ref, >>> WITH_HINT(&acl->header_)); >>> + lflow = ovn_lflow_add(lflows, od, stage, priority, >>> ds_cstr(match), >>> + ds_cstr(actions), lflow_ref, >>> + WITH_HINT(&acl->header_)); >>> + ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans); >>> >>> /* Match on traffic in the request direction for an established >>> * connection tracking entry that has not been marked for >>> @@ -7314,8 +7329,10 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> (uint8_t) acl->network_function_group->id); >>> } >>> ds_put_cstr(actions, "next;"); >>> - ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match), >>> - ds_cstr(actions), lflow_ref, >>> WITH_HINT(&acl->header_)); >>> + lflow = ovn_lflow_add(lflows, od, stage, priority, >>> ds_cstr(match), >>> + ds_cstr(actions), lflow_ref, >>> + WITH_HINT(&acl->header_)); >>> + ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans); >>> } else if (!strcmp(acl->action, "drop") >>> || !strcmp(acl->action, "reject")) { >>> if (acl->network_function_group) { >>> @@ -7340,8 +7357,10 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> build_acl_sample_label_action(actions, acl, acl->sample_new, >>> NULL, >>> obs_stage); >>> ds_put_cstr(actions, "next;"); >>> - ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match), >>> - ds_cstr(actions), lflow_ref, >>> WITH_HINT(&acl->header_)); >>> + lflow = ovn_lflow_add(lflows, od, stage, priority, >>> ds_cstr(match), >>> + ds_cstr(actions), lflow_ref, >>> + WITH_HINT(&acl->header_)); >>> + ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans); >>> /* For an existing connection without ct_mark.blocked set, we've >>> * encountered a policy change. ACLs previously allowed >>> * this connection and we committed the connection tracking >>> @@ -7366,8 +7385,10 @@ consider_acl(struct lflow_table *lflows, const >>> struct ovn_datapath *od, >>> ds_put_format(actions, >>> "ct_commit { ct_mark.blocked = 1; " >>> "ct_label.obs_point_id = %"PRIu32"; }; next;", >>> obs_pid); >>> - ovn_lflow_add(lflows, od, stage, priority, ds_cstr(match), >>> - ds_cstr(actions), lflow_ref, >>> WITH_HINT(&acl->header_)); >>> + lflow = ovn_lflow_add(lflows, od, stage, priority, >>> ds_cstr(match), >>> + ds_cstr(actions), lflow_ref, >>> + WITH_HINT(&acl->header_)); >>> + ovn_lflow_set_acl_ct_translation(lflow, needs_ct_trans); >>> } >>> } >>> >>> @@ -20386,6 +20407,8 @@ ovnnb_db_run(struct northd_input *input_data, >>> >>> use_ct_inv_match = smap_get_bool(input_data->nb_options, >>> "use_ct_inv_match", true); >>> + acl_ct_translation = smap_get_bool(input_data->nb_options, >>> + "acl_ct_translation", false); >>> >>> /* deprecated, use --event instead */ >>> controller_event_en = smap_get_bool(input_data->nb_options, >>> diff --git a/ovn-nb.xml b/ovn-nb.xml >>> index e74c0d010..2b610ff1c 100644 >>> --- a/ovn-nb.xml >>> +++ b/ovn-nb.xml >>> @@ -327,6 +327,26 @@ >>> </p> >>> </column> >>> >>> + <column name="options" key="acl_ct_translation"> >>> + <p> >>> + If set to <code>true</code>, <code>ovn-northd</code> will >>> enable >>> + connection tracking based L4 field translation for stateful >>> ACLs. >>> + When enabled, ACL matches on L4 port fields (tcp/udp/sctp) use >>> + connection tracking state instead of packet headers. This >>> allows >>> + proper handling of IP fragmentation in userspace datapaths >>> (e.g., >>> + DPDK) where fragments are not automatically reassembled during >>> + connection tracking lookup. >>> + </p> >>> + <p> >>> + <em>Important:</em> Enabling this option breaks hardware >>> offloading >>> + of flows. Only enable this option if you need to handle >>> fragmented >>> + traffic with stateful ACLs in userspace datapaths. >>> + </p> >>> + <p> >>> + The default value is <code>false</code>. >>> + </p> >>> + </column> >>> + >>> <column name="options" key="default_acl_drop"> >>> <p> >>> If set to <code>true</code>., <code>ovn-northd</code> will >>> diff --git a/tests/atlocal.in b/tests/atlocal.in >>> index 9a3cf2d16..f0d5eb243 100644 >>> --- a/tests/atlocal.in >>> +++ b/tests/atlocal.in >>> @@ -189,6 +189,9 @@ find_command dhcpd >>> # Set HAVE_DHCLIENT >>> find_command dhclient >>> >>> +# Set HAVE_DNSMASQ >>> +find_command dnsmasq >>> + >>> # Set HAVE_BFDD_BEACON >>> find_command bfdd-beacon >>> >>> diff --git a/tests/ovn.at b/tests/ovn.at >>> index 445a74ce5..9399bcb5a 100644 >>> --- a/tests/ovn.at >>> +++ b/tests/ovn.at >>> @@ -44152,3 +44152,74 @@ check ovn-nbctl --wait=hv lsp-set-type down_ext >>> localnet >>> OVN_CLEANUP([hv1],[hv2]) >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([ACL CT translation - Rules]) >>> +AT_KEYWORDS([acl_ct_translation_rules]) >>> +ovn_start >>> + >>> +check ovn-nbctl ls-add ls1 >>> + >>> +check ovn-nbctl lsp-add ls1 lp1 \ >>> + -- lsp-set-addresses lp1 "f0:00:00:00:00:01 10.0.0.1" >>> +check ovn-nbctl lsp-add ls1 lp2 \ >>> + -- lsp-set-addresses lp2 "f0:00:00:00:00:02 10.0.0.2" >>> + >>> +net_add n1 >>> +sim_add hv1 >>> + >>> +as hv1 >>> +ovs-vsctl add-br br-phys >>> +ovn_attach n1 br-phys 192.168.0.1 >>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \ >>> + set interface hv1-vif1 external-ids:iface-id=lp1 ofport-request=1 >> >> +ovs-vsctl -- add-port br-int hv1-vif2 -- \ >>> + set interface hv1-vif2 external-ids:iface-id=lp2 ofport-request=2 >>> >> >> nit: Please drop the ofport-request. >> >> >>> + >>> +# Create port group and add ACLs with TCP/UDP matches >>> +lp1_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=lp1) >>> +lp2_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=lp2) >>> +check ovn-nbctl pg-add pg1 $lp1_uuid $lp2_uuid >>> + >>> +check ovn-nbctl acl-add pg1 from-lport 1002 \ >>> + "inport == @pg1 && ip4 && tcp && tcp.dst == 80" allow-related >>> +check ovn-nbctl acl-add pg1 from-lport 1002 \ >>> + "inport == @pg1 && ip4 && udp && udp.dst == 53" allow-related >>> +check ovn-nbctl acl-add pg1 to-lport 1002 \ >>> + "outport == @pg1 && ip4 && tcp && tcp.src == 80" allow-related >>> +check ovn-nbctl acl-add pg1 to-lport 1002 \ >>> + "outport == @pg1 && ip4 && udp && udp.src == 53" allow-related >>> +check ovn-nbctl --wait=hv --log acl-add pg1 to-lport 100 "outport == >>> @pg1" drop >>> + >>> +# Verify lflows do NOT have acl_ct_translation tag when option is >>> disabled (default) >>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c >>> "acl_ct_translation"], [1], [ignore], [ignore]) >>> >> >> nit: Please use "row_count" instead. That applies to the whole test. >> >> >>> + >>> +# Verify OpenFlows use packet header fields (tcp.dst, udp.dst) not CT >>> fields >>> +# When CT translation is OFF, we should see tp_dst matches >>> +as hv1 >>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E "tp_dst=80|tp_dst=53" | >>> head -1 | grep -q "tp_dst"], [0]) >>> >> >> We can actually use getting a table directly via "ovn-debug >> lflow-stage-to-oftable <STAGE_NAME>". >> That also applies to the whole test. >> >> + >>> +# Now enable ACL CT translation >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=true >>> + >>> +# Verify lflows now have acl_ct_translation tag in the tags column >>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c >>> "acl_ct_translation"], [0], [ignore]) >>> >> + >>> +# Verify OpenFlows now use CT fields (ct_tp_dst) instead of packet >>> headers >>> +# When CT translation is ON, we should see ct_tp_dst matches >>> +as hv1 >>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E >>> "ct_tp_dst=80|ct_tp_dst=53" | head -1 | grep -q "ct_tp_dst"], [0]) >>> + >>> +# Now disable ACL CT translation again >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=false >>> + >>> +# Verify lflows no longer have acl_ct_translation tag >>> +AT_CHECK([ovn-sbctl --bare --columns=tags find Logical_Flow | grep -c >>> "acl_ct_translation"], [1], [ignore], [ignore]) >>> + >>> +# Verify OpenFlows reverted to packet header fields >>> +as hv1 >>> +AT_CHECK([ovs-ofctl dump-flows br-int | grep -E "tp_dst=80|tp_dst=53" | >>> head -1 | grep -q "tp_dst"], [0]) >>> + >>> +OVN_CLEANUP([hv1]) >>> +AT_CLEANUP >>> +]) >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index 5b3dc47fd..539a8fa3d 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -20347,3 +20347,392 @@ OVS_TRAFFIC_VSWITCHD_STOP(["/.*error >>> receiving.*/d >>> /.*terminating with signal 15.*/d"]) >>> AT_CLEANUP >>> ]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([ACL CT translation - UDP fragmentation]) >>> +AT_KEYWORDS([acl_ct_translation_udp_fragmentation]) >>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) >>> + >>> +CHECK_CONNTRACK() >>> + >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> +ADD_BR([br-int]) >>> + >>> +# Set external-ids in br-int needed for ovn-controller >>> +check ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . >>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure >>> other-config:disable-in-band=true >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +# Set the minimal fragment size for userspace DP. >>> +ovs-appctl dpctl/ipf-set-min-frag v4 500 >>> + >>> +check ovn-nbctl ls-add internal >>> +check ovn-nbctl lsp-add internal client \ >>> + -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3" >>> +check ovn-nbctl lsp-add internal server \ >>> + -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2" >>> + >>> +ADD_NAMESPACES(client) >>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03", >>> \ >>> + "172.16.1.1") >>> +NS_EXEC([client], [ip l set dev client mtu 900]) >>> + >>> +ADD_NAMESPACES(server) >>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03", >>> \ >>> + "172.16.1.1") >>> +NS_EXEC([server], [ip l set dev server mtu 900]) >>> + >>> +# Create data file for traffic testing (8000 bytes will be fragmented >>> with MTU 900) >>> +printf %8000s > datafile >>> + >>> +# Create port group with both client and server >>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=client) >>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=server) >>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid >>> + >>> +# ACL rules - allow outgoing traffic >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4 && udp" allow-related >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4" allow-related >>> + >>> +# ACL rules - allow incoming UDP on specific ports >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && ip4 && udp && udp.dst == 5060" allow-related >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && ip4 && udp && udp.dst == 5061" allow-related >>> + >>> +# Allow ARP >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && arp" allow-related >>> + >>> +# Drop rule >>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100 >>> "outport == @internal_vms" drop >>> + >>> +# Enable ACL CT translation for fragmentation handling >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=true >>> + >>> +check ovn-nbctl --wait=hv sync >>> +check ovs-appctl dpctl/flush-conntrack >>> + >>> +# Start tcpdump to capture IP fragments on both sides >>> +NETNS_START_TCPDUMP([server], [-U -i server -Q in -nn ip and >>> '(ip[[6:2]] & 0x3fff != 0)'], [tcpdump-udp-server]) >>> +NETNS_START_TCPDUMP([client], [-U -i client -Q in -nn ip and >>> '(ip[[6:2]] & 0x3fff != 0)'], [tcpdump-udp-client]) >>> + >>> +# Start UDP listeners on both sides >>> +NETNS_DAEMONIZE([server], [nc -l -u 172.16.1.2 5060 > udp_server.rcvd], >>> [server.pid]) >>> +NETNS_DAEMONIZE([client], [nc -l -u 172.16.1.3 5061 > udp_client.rcvd], >>> [client.pid]) >>> + >>> +# Client sends to server (will be fragmented due to MTU 900) >>> +NS_CHECK_EXEC([client], [cat datafile | nc -u 172.16.1.2 5060 -q 1], >>> [0], [ignore], [ignore]) >>> >> >> My nc version doesn't support -q. We probably need a different trick to >> make it wait. >> >> >>> + >>> +# Server sends to client (will be fragmented due to MTU 900) >>> +NS_CHECK_EXEC([server], [cat datafile | nc -u 172.16.1.3 5061 -q 1], >>> [0], [ignore], [ignore]) >>> + >>> +OVS_WAIT_UNTIL([test -s udp_server.rcvd]) >>> +OVS_WAIT_UNTIL([test -s udp_client.rcvd]) >>> + >>> +# Verify both sides received data >>> +check cmp datafile udp_server.rcvd >>> +check cmp datafile udp_client.rcvd >>> + >>> +# Verify IP fragments were received on both sides (at least 5 fragments >>> confirms fragmentation) >>> +OVS_WAIT_UNTIL([test $(cat tcpdump-udp-server.tcpdump | wc -l) -ge 5]) >>> +OVS_WAIT_UNTIL([test $(cat tcpdump-udp-client.tcpdump | wc -l) -ge 5]) >>> + >>> +AS_BOX([UDP fragmentation test passed - bi-directional traffic with IP >>> fragments]) >>> >> >> nit: We can drop this box the test would indicate failure. >> >> >>> + >>> +OVN_CLEANUP_CONTROLLER([hv1]) >>> +OVN_CLEANUP_NORTHD >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> +/connection dropped.*/d >>> +/WARN|netdev@ovs-netdev: execute.*/d >>> +/dpif|WARN|system@ovs-system: execute.*/d >>> +"]) >>> +AT_CLEANUP >>> +]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([ACL CT translation - TCP traffic]) >>> +AT_KEYWORDS([acl_ct_translation_tcp_fragmentation]) >>> + >>> +CHECK_CONNTRACK() >>> + >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> +ADD_BR([br-int]) >>> + >>> +# Set external-ids in br-int needed for ovn-controller >>> +check ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . >>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure >>> other-config:disable-in-band=true >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +check ovn-nbctl ls-add internal >>> +check ovn-nbctl lsp-add internal client \ >>> + -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3" >>> +check ovn-nbctl lsp-add internal server \ >>> + -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2" >>> + >>> +ADD_NAMESPACES(client) >>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03", >>> \ >>> + "172.16.1.1") >>> + >>> +ADD_NAMESPACES(server) >>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03", >>> \ >>> + "172.16.1.1") >>> + >>> +# Create data file for traffic testing >>> +printf %8000s > datafile >>> + >>> +# Create port group with both client and server >>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=client) >>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=server) >>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid >>> + >>> +# ACL rules - allow outgoing traffic >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4 && tcp" allow-related >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4" allow-related >>> + >>> +# ACL rules - allow incoming TCP on specific ports >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && ip4 && tcp && tcp.dst == 8080" allow-related >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && ip4 && tcp && tcp.dst == 8081" allow-related >>> + >>> +# Allow ARP >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && arp" allow-related >>> + >>> +# Drop rule >>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100 >>> "outport == @internal_vms" drop >>> + >>> +# Enable ACL CT translation >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=true >>> + >>> +check ovn-nbctl --wait=hv sync >>> +check ovs-appctl dpctl/flush-conntrack >>> + >>> +# Test client -> server direction >>> +NETNS_DAEMONIZE([server], [nc -l 172.16.1.2 8080 > tcp_server.rcvd], >>> [server.pid]) >>> +NS_CHECK_EXEC([client], [cat datafile | nc 172.16.1.2 8080 -q 1], [0], >>> [ignore], [ignore]) >>> + >>> +OVS_WAIT_UNTIL([test -s tcp_server.rcvd]) >>> +check cmp datafile tcp_server.rcvd >>> + >>> +# Clean up first direction >>> +kill $(cat server.pid) 2>/dev/null || true >>> + >>> +# Test server -> client direction >>> +NETNS_DAEMONIZE([client], [nc -l 172.16.1.3 8081 > tcp_client.rcvd], >>> [client.pid]) >>> +NS_CHECK_EXEC([server], [cat datafile | nc 172.16.1.3 8081 -q 1], [0], >>> [ignore], [ignore]) >>> + >>> +OVS_WAIT_UNTIL([test -s tcp_client.rcvd]) >>> +check cmp datafile tcp_client.rcvd >>> + >>> +AS_BOX([TCP traffic test passed - bi-directional traffic]) >>> + >>> +OVN_CLEANUP_CONTROLLER([hv1]) >>> +OVN_CLEANUP_NORTHD >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> +/connection dropped.*/d >>> +"]) >>> +AT_CLEANUP >>> +]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([ACL CT translation - negative test]) >>> >> >> The UDP + TCP + negative use the same infrastructure, maybe >> we can squash them in a single test that just changes the ACL rules, >> WDYT? >> >> > I was conducting all the tests within the same infrastructure, but this > made it more difficult to debug and identify which tests were failing, and > the test was becoming very large. > > > >> +AT_KEYWORDS([acl_ct_translation_udp_fragmentation_negative]) >>> +AT_SKIP_IF([test $HAVE_TCPDUMP = no]) >>> + >>> +CHECK_CONNTRACK() >>> + >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> +ADD_BR([br-int]) >>> + >>> +# Set external-ids in br-int needed for ovn-controller >>> +check ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . >>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure >>> other-config:disable-in-band=true >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +check ovn-nbctl ls-add internal >>> +check ovn-nbctl lsp-add internal client \ >>> + -- lsp-set-addresses client "f0:00:00:01:02:03 172.16.1.3" >>> +check ovn-nbctl lsp-add internal server \ >>> + -- lsp-set-addresses server "f0:00:0f:01:02:03 172.16.1.2" >>> + >>> +ADD_NAMESPACES(client) >>> +ADD_VETH(client, client, br-int, "172.16.1.3/24", "f0:00:00:01:02:03", >>> \ >>> + "172.16.1.1") >>> + >>> +ADD_NAMESPACES(server) >>> +ADD_VETH(server, server, br-int, "172.16.1.2/24", "f0:00:0f:01:02:03", >>> \ >>> + "172.16.1.1") >>> + >>> +# Create port group with both client and server >>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=client) >>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=server) >>> +check ovn-nbctl pg-add internal_vms $client_uuid $server_uuid >>> + >>> +# ACL rules - allow outgoing traffic >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4 && udp" allow-related >>> +check ovn-nbctl acl-add internal_vms from-lport 1002 "inport == >>> @internal_vms && ip4" allow-related >>> + >>> +# ACL rules - allow incoming UDP ONLY on port 5060 (NOT 4000) >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && ip4 && udp && udp.dst == 5060" allow-related >>> + >>> +# Allow ARP >>> +check ovn-nbctl acl-add internal_vms to-lport 1002 "outport == >>> @internal_vms && arp" allow-related >>> + >>> +# Drop rule - this should drop traffic on port 4000 >>> +check ovn-nbctl --log --severity=info acl-add internal_vms to-lport 100 >>> "outport == @internal_vms" drop >>> + >>> +# Enable ACL CT translation >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=true >>> + >>> +check ovn-nbctl --wait=hv sync >>> +check ovs-appctl dpctl/flush-conntrack >>> + >>> +# Start tcpdump to capture any incoming packets on server >>> +NETNS_START_TCPDUMP([server], [-U -i server -Q in -nn udp port 4000], >>> [tcpdump-drop-server]) >>> + >>> +# Start UDP listener on server (on port 4000 which is NOT allowed by >>> ACLs) >>> +NETNS_DAEMONIZE([server], [nc -l -u 172.16.1.2 4000 > udp_drop.rcvd], >>> [drop_server.pid]) >>> + >>> +# Client sends to server on disallowed port >>> +NS_CHECK_EXEC([client], [echo "test" | timeout 2 nc -u 172.16.1.2 4000 >>> -q 1], [0], [ignore], [ignore]) >>> + >>> +# Wait a bit for any packets to arrive >>> +sleep 2 >>> + >>> +# Verify server did NOT receive any data (file should be empty or not >>> exist) >>> +AT_CHECK([test ! -s udp_drop.rcvd], [0]) >>> + >>> +# Verify tcpdump captured no packets (traffic was dropped by ACL) >>> +AT_CHECK([test $(cat tcpdump-drop-server.tcpdump | wc -l) -eq 0], [0]) >>> + >>> +AS_BOX([Negative test passed - traffic on disallowed port was dropped]) >>> + >>> +OVN_CLEANUP_CONTROLLER([hv1]) >>> +OVN_CLEANUP_NORTHD >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> +/connection dropped.*/d >>> +"]) >>> +AT_CLEANUP >>> +]) >>> + >>> +OVN_FOR_EACH_NORTHD([ >>> +AT_SETUP([ACL CT translation with DHCP traffic]) >>> +AT_KEYWORDS([acl_ct_translation_dhcp_fragmentation]) >>> +AT_SKIP_IF([test $HAVE_DNSMASQ = no]) >>> +AT_SKIP_IF([test $HAVE_DHCLIENT = no]) >>> >> >> If we want to keep this test we need to install both dnsmasq and dhclient >> to the container images. >> >> >>> + >>> +CHECK_CONNTRACK() >>> + >>> +ovn_start >>> +OVS_TRAFFIC_VSWITCHD_START() >>> +ADD_BR([br-int]) >>> + >>> +# Set external-ids in br-int needed for ovn-controller >>> +check ovs-vsctl \ >>> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >>> + -- set Open_vSwitch . >>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \ >>> + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ >>> + -- set bridge br-int fail-mode=secure >>> other-config:disable-in-band=true >>> + >>> +# Start ovn-controller >>> +start_daemon ovn-controller >>> + >>> +# Create logical switch >>> +check ovn-nbctl ls-add ls1 >>> + >>> +# Create DHCP server port >>> +check ovn-nbctl lsp-add ls1 dhcp-server \ >>> + -- lsp-set-addresses dhcp-server "00:00:00:00:02:00 192.168.1.2" >>> + >>> +# Create DHCP client port >>> +check ovn-nbctl lsp-add ls1 dhcp-client \ >>> + -- lsp-set-addresses dhcp-client "00:00:00:00:02:01" >>> + >>> +# Add OVS ports >>> +ADD_NAMESPACES(server, client) >>> +ADD_VETH(server, server, br-int, "192.168.1.2/24", "00:00:00:00:02:00") >>> +ADD_VETH(client, client, br-int, "0.0.0.0/0", "00:00:00:00:02:01") >>> + >>> +# Bind logical ports to OVS ports >>> +check ovs-vsctl set Interface ovs-server >>> external_ids:iface-id=dhcp-server >>> +check ovs-vsctl set Interface ovs-client >>> external_ids:iface-id=dhcp-client >>> + >>> +# Create port group with both ports >>> +server_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=dhcp-server) >>> +client_uuid=$(ovn-nbctl --bare --columns=_uuid find logical_switch_port >>> name=dhcp-client) >>> +check ovn-nbctl pg-add dhcp_vms $server_uuid $client_uuid >>> + >>> +# Add ACL rules - allow outgoing traffic from both ports >>> +check ovn-nbctl acl-add dhcp_vms from-lport 1002 "inport == @dhcp_vms >>> && ip4" allow-related >>> +check ovn-nbctl acl-add dhcp_vms from-lport 1002 "inport == @dhcp_vms >>> && udp" allow-related >>> + >>> +# Add ACL rules for DHCP traffic (ports 67 and 68) >>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms && >>> ip4 && udp && udp.dst == 67" allow-related >>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms && >>> ip4 && udp && udp.dst == 68" allow-related >>> + >>> +# Allow ARP and broadcast >>> +check ovn-nbctl acl-add dhcp_vms to-lport 1002 "outport == @dhcp_vms && >>> arp" allow-related >>> + >>> +# Add drop rule >>> +check ovn-nbctl --log --severity=info acl-add dhcp_vms to-lport 100 >>> "outport == @dhcp_vms" drop >>> + >>> +# Enable ACL CT translation >>> +check ovn-nbctl --wait=hv set NB_Global . >>> options:acl_ct_translation=false >>> + >>> +# Wait for flows to be installed >>> +check ovn-nbctl --wait=hv sync >>> + >>> +# Start dnsmasq as DHCP server in the server namespace >>> +# dnsmasq doesn't have AppArmor restrictions like dhcpd >>> +NETNS_DAEMONIZE([server], [dnsmasq --no-daemon --no-resolv --no-hosts \ >>> + --interface=server --bind-interfaces \ >>> + --dhcp-range=192.168.1.100,192.168.1.100,255.255.255.0,1h \ >>> + --dhcp-option=option:router,192.168.1.2 \ >>> + --log-dhcp], [dnsmasq.pid]) >>> + >>> +# Give dnsmasq time to start >>> +sleep 1 >>> + >>> +# Request IP via DHCP using dhclient >>> +NS_CHECK_EXEC([client], [dhclient -1 -v client], [0], [ignore], >>> [ignore]) >>> >> >> This is leaking the dhclient, in other words it keeps running even when >> the test is done. >> >> > Good catch. > > >> + >>> +# Verify client got an IP address from DHCP >>> +NS_CHECK_EXEC([client], [ip addr show client | grep -q >>> "192.168.1.100"], [0]) >>> + >>> +AS_BOX([DHCP test passed - client received IP 192.168.1.100 via DHCP >>> with ACL CT translation enabled]) >>> + >>> +OVN_CLEANUP_CONTROLLER([hv1]) >>> + >>> +OVN_CLEANUP_NORTHD >>> + >>> +as >>> +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d >>> +/connection dropped.*/d >>> +/dnsmasq.*/d >>> +"]) >>> +AT_CLEANUP >>> +]) >>> -- >>> 2.43.0 >>> >>> >> >> I was able to figure out why is it happening, dunno if we should keep the >> test around >> but it can be reproduced even with multicast UDP traffic so it might be >> enough to adjust >> the UDP test to also send multicast. The following two logical flows are >> to "blame": >> >> > Well, I think the broader the coverage, the better. The advantage of > having a DHCP client/server is that the traffic will contain all the > nuances of real traffic, and we don't have to encode all protocol data. For > example, the return traffic should be difficult to generate since it will > not be multicast. I tried to reuse NETNS_START_DHCPD, but I encountered > various permission errors and couldn't get it to work. Dnsmasq seems easier > to use and doesn't rely on an external config file. > We should reuse what is already provided in the test instead of adding a new dependency, also one thing to note about dhclient it's unmaintained and deprecated in Fedora, we should find a different client to use. > > >> /* Do not send multicast packets to conntrack. */ >> ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 110, "eth.mcast", >> "next;", lflow_ref); >> ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 110, "eth.mcast", >> "next;", lflow_ref); >> >> We basically skip CT in case of multicast, and because the translation >> adds new >> prereq of +ct.trk we will never match that flow. I'm afraid that there >> isn't any >> easy/good solution. If we disable those flows we are imposing performance >> penalty on multicast traffic. If we say that's acceptable we need to >> clearly >> state that too in the documentation it's in the same ballpark with >> "broken" >> HW offload. >> > > What if we discard it only for DHCP traffic? I got it passing with > something like this: > > if (acl_udp_ct_translation) { > ovn_lflow_add(lflows, od, S_SWITCH_IN_PRE_ACL, 120, > "eth.mcast && udp && " > "(udp.src == 67 || udp.src == 68 || " > "udp.dst == 67 || udp.dst == 68)", > REGBIT_CONNTRACK_DEFRAG" = 1; next;", > lflow_ref); > } > > This takes higher priority than the more general multicast rule, so > performance should not be a problem if we apply it for DHCP. > As I replied to Brian it might make more sense CT for DHCP traffic in general regardless of this option. Unfortunately that still doesn't solve the issue with multicast traffic in general. > >> Regards, >> Ales >> > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
