Hi Xie Liu,

Thanks for the patch!

On 11/7/25 5:48 AM, [email protected] wrote:
> From: Xie Liu <[email protected]>
> 
> Consider the case of stateful Firewall for N-S traffic:
> 
> PUBLIC---S1-(S1-R1)---------(R1-S1)-R1 -------- S2 ---- VM1
> 
> Configuration:
> 
> ovn-nbctl pg-add pg_dgw
> ovn-nbctl pg-set-ports pg_dgw S1-R1
> ovn-nbctl acl-add pg_dgw from-lport 2000 "inport == @pg_dgw && ip4  && icmp4" 
> allow-related
> ovn-nbctl acl-add pg_dgw from-lport 1000 "outport == @pg_dgw && ip4" drop
> ovn-nbctl acl-add pg_dgw to-lport 1000 "outport == @pg_dgw && ip4" drop
> ovn-nbctl lsp-set-options S1-R1 router-port=R1-S1 enable_router_port_acl=true
> 
> VM1 pings external network.
> 
> Through this patch[1], the ovn-controller assigned a CT zone ID
> to the localnet LSP but not the dgw LSP.
> 
> This caused ACL failures: ICMP reply packets from external networks
> performed CT lookups in the wrong zone, couldn't match established
> connections, and were incorrectly dropped.

While I understand why the CT lookup in the wrong zone (in the VIF zone)
causes issues, I have my concerns about this behavior change.

> 
> Fix by ensuring ports without CT zone allocation use default zone 0,
> preventing incorrect zone inheritance and restoring proper ACL behavior
> for distributed gateway scenarios.
> 

Default zone 0 is most likely already used by the host traffic.  Which
means we might have unexpected collisions here.  Please see the
ovn-kubernetes use case where traffic from the host itself is also sent
through conntrack (also in zone 0).

Would a correct approach be to instead allocate explicit zone ids for
ports that have enable_router_port_acl=true?

> [1]https://github.com/ovn-org/ovn/commit/5ae7d2cb60a50541e88e8f5c74a669e2aa7acdda
> 
> Reported-at: https://github.com/ovn-org/ovn/issues/264
> Signed-off-by: Xie Liu <[email protected]>
> ---
>  controller/physical.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/controller/physical.c b/controller/physical.c
> index 6ac5dcd3f..127f21e59 100644
> --- a/controller/physical.c
> +++ b/controller/physical.c
> @@ -1141,9 +1141,9 @@ static void
>  put_zones_ofpacts(const struct zone_ids *zone_ids, struct ofpbuf *ofpacts_p)
>  {
>      if (zone_ids) {
> -        if (zone_ids->ct) {
> -            put_load(zone_ids->ct, MFF_LOG_CT_ZONE, 0, 16, ofpacts_p);
> -        }
> +        /* Always load CT zone (0 if unallocated) to ensure
> +         * consistent connection tracking metadata. */
> +        put_load(zone_ids->ct, MFF_LOG_CT_ZONE, 0, 16, ofpacts_p);
>          if (zone_ids->dnat) {
>              put_load(zone_ids->dnat, MFF_LOG_DNAT_ZONE, 0, 32, ofpacts_p);
>          }
> @@ -2012,7 +2012,6 @@ consider_port_binding(const struct physical_ctx *ctx,
>          ofpact_put_CT_CLEAR(ofpacts_p);
>          put_load(0, MFF_LOG_DNAT_ZONE, 0, 32, ofpacts_p);
>          put_load(0, MFF_LOG_SNAT_ZONE, 0, 32, ofpacts_p);
> -        put_load(0, MFF_LOG_CT_ZONE, 0, 16, ofpacts_p);
>          struct zone_ids peer_zones = get_zone_ids(peer, ctx->ct_zones);
>          load_logical_ingress_metadata(peer, &peer_zones, ctx->n_encap_ips,
>                                        ctx->encap_ips, ofpacts_p, false);
> @@ -2560,9 +2559,7 @@ static void
>  local_set_ct_zone_and_output_pb(int tunnel_key, int64_t zone_id,
>                                  struct ofpbuf *ofpacts)
>  {
> -    if (zone_id) {
> -        put_load(zone_id, MFF_LOG_CT_ZONE, 0, 16, ofpacts);
> -    }
> +    put_load(zone_id, MFF_LOG_CT_ZONE, 0, 16, ofpacts);
>      put_load(tunnel_key, MFF_LOG_OUTPORT, 0, 32, ofpacts);
>      put_resubmit(OFTABLE_CHECK_LOOPBACK, ofpacts);
>  }

It would also be great if you could add a test for this use case.

Thank you!

Regards,
Dumitru


_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to