Re: [ovs-dev] [PATCH net-next v4] net: openvswitch: Check vport netdev name

2024-04-19 Thread Jakub Kicinski
On Fri, 19 Apr 2024 12:31:33 +0800 Jun Gu wrote:
> From: "jun.gu" 
> 
> Ensure that the provided netdev name is not one of its aliases to
> prevent unnecessary creation and destruction of the vport by
> ovs-vswitchd.
> 
> Signed-off-by: jun.gu  
> Acked-by: Eelco Chaudron 

I said: When you repost, start a new thread, do not post new version
in-reply-to.

If you don't understand what something means - ask :|
Now try again.
-- 
pw-bot: cr
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [ovs-discuss] [branch-2.16] ovn distributed gateway chassisredirect strip_vlan not taking effect with stt

2024-04-19 Thread aginwala
Hi All:

Part of upgrading OVN north south gateway to the new 5.4 kernel , VMs
connectivity is lost when setting chassis for provider network lrp to this
new gateway. For interconnection gateways and hypervisors its not an issue/
lrp
_uuid   : 387a735d-fc11-4e90-8655-07785aa024af
chassis : b80a285b-586a-42d9-b189-69d641f143b1
datapath: d9219b69-5961-4f24-8414-1d4054b23169
external_ids: {}
gateway_chassis : [728adc6d-3236-4637-86e3-0f6745cf1b50,
7a372e68-c228-400b-9a4b-439cf234ed40, 82295a9c-02aa-416b-bac3-83755c687caf,
d1b42374-c475-4745-abdb-36e72140c5b5]
logical_port: "cr-lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"
mac : ["74:db:d1:80:d3:af 10.169.247.140/24"]
nat_addresses   : []
options :
{distributed-port="lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e"}
parent_port : []
tag : []
tunnel_key  : 2
type: chassisredirect

provider network
port provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90
type: localnet
tag: 20
addresses: ["unknown"]
## encap ip for ovn is on eth0

## gw interfaces brens2f0 hosts uplink provider network
ovs-vsctl list-br
br-int
brens2f0
ovs-vsctl list-ports brens2f0
ens2f0
patch-provnet-f239a6e8-73a5-4f95-8410-f7b3e0befe90-to-br-int
## fail mode secure
ovs-vsctl get-fail-mode br-int
secure
## set chassis
ovn-nbctl lrp-set-gateway-chassis lrp-9a1d2341-efb0-4e7e-839d-99b01944ba2e
cee81be9-f782-4c82-800e-c5c5327531e4 101

ovn-controller is running as a container on the new gateway
ovn-controller --version
ovn-controller (Open vSwitch) 2.11.1-13
OpenFlow versions 0x4:0x4

## ovs on the host 5.4 kernel
ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.16.0
DB Schema 8.3.0

ovs-ofctl --version
ovs-ofctl (Open vSwitch) 2.16.0
OpenFlow versions 0x1:0x6


Digging further with tcpdump on the destination vm interface shows vlan
being present causing connectivity failure and no reply packet
20:26:06.371540 74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q
(0x8100), length 102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id
53702, offset 0, flags [none], proto ICMP (1), length 84) 10.228.4.180 >
10.78.8.42: ICMP echo request, id 7765, seq 791, length 64 20:26:07.375960
74:db:d1:80:09:01 > 74:db:d1:80:0a:15, ethertype 802.1Q (0x8100), length
102: vlan 20, p 0, ethertype IPv4, (tos 0x0, ttl 56, id 36269, offset 0,
flags [none], proto ICMP (1), length 84) 10.228.4.180 > 10.78.8.42: ICMP
echo request, id 7765, seq 792, length 64

openflow rules for atrip vlan 20 is correct that are programmed with ovn on
new/old gw :
ovs-ofctl dump-flows br-int | grep strip_vlan | grep 20
cookie=0x0, duration=27.894s, table=65, n_packets=136, n_bytes=19198,
idle_age=0, priority=100,reg15=0x1,metadata=0x1
actions=mod_vlan_vid:20,output:161,strip_vlan
cookie=0x0, duration=30.055s, table=0, n_packets=1592, n_bytes=130783,
idle_age=0, priority=150,in_port=161,dl_vlan=20
actions=strip_vlan,load:0xe1->NXM_NX_REG13[],load:0x36->NXM_NX_REG11[],load:0xd7->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],resubmit(,8)


Checking ovs datapath flow shows vlan being present
ovs-dpctl dump-flows  | grep vlan
recirc_id(0x422),tunnel(tun_id=0x1006605,src=10.172.66.144,dst=10.173.84.83,flags(-df+csum+key)),in_port(1),ct_state(+new-est-rel-rpl-inv+trk),ct_label(0/0x1),eth(dst=74:db:d1:80:0a:15),eth_type(0x8100),vlan(vid=20/0x14),encap(eth_type(0x0800),ipv4(frag=no)),
packets:1713, bytes:174726, used:0.145s, actions:5

Couldn't find much drift with ofproto/trace
ovs-appctl ofproto/trace br-int in_port=2321,dl_vlan=20
running on old/new gw (replace with in_port)


Tried stripping on the hypervisor/compute and data plane is ok but thats
not the right approach
ovs-ofctl add-flow br-int "priority=65535,dl_vlan=20
actions=strip_vlan,output:4597"

Downgrading the kernel to 4.15 and pinning to ovs 2.11 restores the data
plane with no vlan and 802.1q in the tcpdump on the destion workload tap
interface.


Is it a bug or known issue with later versions; post 2.11 version of ovs
when tagged vlan is present for provider network?

Tried to pin oflow version to 1.4 too but didn't help much as strip_vlan
flows are good. Any pointers further would be great as we continue to debug.


Regards,
Aliasgar
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v5] northd, controller: Use paused controller action for packet buffering.

2024-04-19 Thread Numan Siddique
On Fri, Apr 19, 2024 at 7:09 AM Ales Musil  wrote:

> The current packet injection loses ct_state in the process. When
> the ct_state is lost we might commit to DNAT zone and perform
> zero SNAT after the packet injection. This causes the first session
> to be wrong as the reply packets are not unDNATted.
>
> Instead of re-injecting the packet back into the pipeline when
> we get the MAC binding, use paused controller action. The paused
> controller action stores ct_state, which is important for the behavior
> of the resumed packet.
>
> At the same time bump the OvS submodule latest branch-3.3. This is
> mainly for [0], which fixes metering for paused controller actions.
>
> In order to make sure that the paused action works during upgrade add
> the output implicitly. Once the upgrade is done northd will create option
> to inform controllers that the implicit action is no longer needed.
>
> [0] c560f6ca3257 ("ofproto-dpif-xlate: Fix continuations with associated
> metering.")
>
> Reported-at: https://issues.redhat.com/browse/FDP-439
> Signed-off-by: Ales Musil 
> ---
> v5: Rebase on top of current main.
> v4: Fix copy paste error in the global_handler.
> v3: Rebase on top of current main.
> Add flag to ensure that the paused action works during upgrade.
> v2: Fix the Jira link and add ack from Mark.
>

Thanks.  I applied this patch to the main.
It doesn't apply cleanly to branch-24.03.  Please submit a backport patch
for branches - 24.03 and 23.09.

Numan


---
>  controller/lflow.c  |  1 +
>  controller/lflow.h  |  1 +
>  controller/mac-learn.c  | 30 
>  controller/mac-learn.h  |  9 ++--
>  controller/ovn-controller.c | 21 
>  controller/pinctrl.c| 64 +
>  include/ovn/actions.h   |  3 ++
>  lib/actions.c   | 47 +-
>  northd/en-global-config.c   |  4 ++
>  northd/northd.c |  6 +--
>  tests/multinode.at  |  8 
>  tests/ovn-northd.at |  3 ++
>  tests/ovn.at|  8 ++--
>  tests/system-ovn.at | 95 +
>  tests/test-ovn.c|  1 +
>  15 files changed, 239 insertions(+), 62 deletions(-)
>
> diff --git a/controller/lflow.c b/controller/lflow.c
> index 895d17d19..760ec0b41 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -874,6 +874,7 @@ add_matches_to_flow_table(const struct
> sbrec_logical_flow *lflow,
>  .collector_ids = l_ctx_in->collector_ids,
>  .lflow_uuid = lflow->header_.uuid,
>  .dp_key = ldp->datapath->tunnel_key,
> +.explicit_arp_ns_output = l_ctx_in->explicit_arp_ns_output,
>
>  .pipeline = ingress ? OVNACT_P_INGRESS : OVNACT_P_EGRESS,
>  .ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE,
> diff --git a/controller/lflow.h b/controller/lflow.h
> index 9b7ffa19c..295d004f4 100644
> --- a/controller/lflow.h
> +++ b/controller/lflow.h
> @@ -130,6 +130,7 @@ struct lflow_ctx_in {
>  bool lb_hairpin_use_ct_mark;
>  bool localnet_learn_fdb;
>  bool localnet_learn_fdb_changed;
> +bool explicit_arp_ns_output;
>  };
>
>  struct lflow_ctx_out {
> diff --git a/controller/mac-learn.c b/controller/mac-learn.c
> index 071f01b4f..0c3b60c23 100644
> --- a/controller/mac-learn.c
> +++ b/controller/mac-learn.c
> @@ -199,15 +199,24 @@ ovn_fdb_add(struct hmap *fdbs, uint32_t dp_key,
> struct eth_addr mac,
>  /* packet buffering functions */
>
>  struct packet_data *
> -ovn_packet_data_create(struct ofpbuf ofpacts,
> -   const struct dp_packet *original_packet)
> +ovn_packet_data_create(const struct ofputil_packet_in *pin,
> +   const struct ofpbuf *continuation)
>  {
>  struct packet_data *pd = xmalloc(sizeof *pd);
>
> -pd->ofpacts = ofpacts;
> -/* clone the packet to send it later with correct L2 address */
> -pd->p = dp_packet_clone_data(dp_packet_data(original_packet),
> - dp_packet_size(original_packet));
> +pd->pin = (struct ofputil_packet_in) {
> +.packet = xmemdup(pin->packet, pin->packet_len),
> +.packet_len = pin->packet_len,
> +.flow_metadata = pin->flow_metadata,
> +.reason = pin->reason,
> +.table_id = pin->table_id,
> +.cookie = pin->cookie,
> +/* Userdata are empty on purpose,
> + * it is not needed for the continuation. */
> +.userdata = NULL,
> +.userdata_len = 0,
> +};
> +pd->continuation = ofpbuf_clone(continuation);
>
>  return pd;
>  }
> @@ -216,8 +225,8 @@ ovn_packet_data_create(struct ofpbuf ofpacts,
>  void
>  ovn_packet_data_destroy(struct packet_data *pd)
>  {
> -dp_packet_delete(pd->p);
> -ofpbuf_uninit(>ofpacts);
> +free(pd->pin.packet);
> +ofpbuf_delete(pd->continuation);
>  free(pd);
>  }
>
> @@ -307,7 +316,10 @@ ovn_buffered_packets_ctx_run(struct
> buffered_packets_ctx *ctx,
>
>  struct packet_data 

[ovs-dev] [PATCH ovn] northd: Do not incrementally proccess changes for disabled LR.

2024-04-19 Thread Ales Musil
When the logical router is disabled it is not populated in the datapaths
and is not internally available to I-P engine in the hmap. This could
lead to a crash during I-P processing:

To fix that make sure we process incrementally only LR that are enabled.

Reported-at: https://issues.redhat.com/browse/FDP-571
Signed-off-by: Ales Musil 
---
 northd/northd.c | 5 +
 tests/ovn-northd.at | 7 +++
 2 files changed, 12 insertions(+)

diff --git a/northd/northd.c b/northd/northd.c
index 37f443e70..b834f1274 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -4689,6 +4689,11 @@ fail:
 static bool
 lr_changes_can_be_handled(const struct nbrec_logical_router *lr)
 {
+/* We can't do I-P processing when the router is disabled. */
+if (!lrouter_is_enabled(lr)) {
+return false;
+}
+
 /* Check if the columns are changed in this row. */
 enum nbrec_logical_router_column_id col;
 for (col = 0; col < NBREC_LOGICAL_ROUTER_N_COLUMNS; col++) {
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index be006fb32..4f59a64db 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -11452,6 +11452,13 @@ check_engine_stats lflow recompute nocompute
 check_engine_stats sync_to_sb_lb recompute compute
 CHECK_NO_CHANGE_AFTER_RECOMPUTE
 
+check ovn-nbctl --wait=sb lr-add lr2 -- set logical_router lr2 enabled=false
+check as northd ovn-appctl -t ovn-northd inc-engine/clear-stats
+check ovn-nbctl --wait=sb lb-add lb4 10.0.0.40:4040 10.0.40.40:4050 \
+-- lr-lb-add lr2 lb4
+check_engine_stats northd recompute nocompute
+CHECK_NO_CHANGE_AFTER_RECOMPUTE
+
 AT_CLEANUP
 ])
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Allow DHCP request from the lport if enabled DHCPv4

2024-04-19 Thread Numan Siddique
On Thu, Apr 18, 2024 at 11:20 PM shy liu  wrote:

> >
> >
> >
> > On Tue, Apr 16, 2024 at 1:22 AM  wrote:
> >>
> >> From: shylou 
> >>
> >> DHCP for VM fails when removing default security group rules
> >> using a CMS like Neutron ML2/OVN [1]. This is because DHCP
> >> requests from VMs may be dropped by ACLs. To fix this issue,
> >> we add a lflow with a priority of 34000 to allow DHCP requests
> >> from the logical port if the CMS has enabled native DHCPv4
> >> for this port.
> >>
> >> [1]https://bugs.launchpad.net/neutron/+bug/1926515
> >>
> >> Signed-off-by: Xie Liu 
> >
> >
> > Thanks for the patch.
> >
> > I don't think this is the correct fix.  If neutron wants to allow DHCP
> overriding any ACL rules,
> > it should add a high priority ACL to allow DHCP.   What if a user wants
> to add an explicit ACL to
> > drop DHCP from certain ports ?
> >
> > Thanks
> > Numan
> >
> Thanks,  Numan
> I have a puzzling question: Why would users want to block
> DHCP request packets after enabling DHCP for any LSPs ?
> They could justly not enable DHCP for them at all, right?
>

When a user doesn't enable DHCP for an LSP,  it doesn't mean it will
be blocked.  It just means that OVN will not respond to that DHCP request.
The DHCP discovery/request packet will continue the pipeline and if there
is any DHCP server
configured, it will reply to it.  Whereas an ACL to block DHCP will just
drop the packet from that LSP
in the ACL pipeline.

Thanks
Numan



> >> ---
> >>  northd/northd.c | 10 ++
> >>  tests/ovn-northd.at |  5 +
> >>  2 files changed, 15 insertions(+)
> >>
> >> diff --git a/northd/northd.c b/northd/northd.c
> >> index 2c3560ce2..ca641a19e 100644
> >> --- a/northd/northd.c
> >> +++ b/northd/northd.c
> >> @@ -8414,6 +8414,16 @@ build_dhcpv4_options_flows(struct ovn_port *op,
> >>   meter_groups),
> >>
> >nbsp->dhcpv4_options->header_,
> >>lflow_ref);
> >> +/* Add 34000 priority flow to allow DHCP request from the
> lport
> >> + * if the CMS has enabled native DHCPv4 for this lport.
> >> + * */
> >> +ovn_lflow_add_with_lport_and_hint(lflows, op->od,
> >> +  S_SWITCH_IN_ACL_EVAL,
> 34000,
> >> +  ds_cstr(),
> >> +
> REGBIT_ACL_VERDICT_ALLOW" = 1; next;",
> >> +  op->key,
> >> +  >nbsp->header_,
> >> +  lflow_ref);
> >>  ds_clear();
> >>
> >>  /* If REGBIT_DHCP_OPTS_RESULT is set, it means the
> >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> >> index 6fdd761da..7657aefff 100644
> >> --- a/tests/ovn-northd.at
> >> +++ b/tests/ovn-northd.at
> >> @@ -4897,6 +4897,11 @@ ovn-nbctl --wait=sb lsp-set-dhcpv4-options
> sw0-port1 $CIDR_UUID
> >>  ovn-sbctl dump-flows sw0 > sw0flows
> >>  AT_CAPTURE_FILE([sw0flows])
> >>
> >> +AT_CHECK([grep "_acl_eval" sw0flows | grep sw0-port1 |
> ovn_strip_lflows], [0], [dnl
> >> +  table=??(ls_in_acl_eval ), priority=34000, match=(inport ==
> "sw0-port1" && eth.src == 50:54:00:00:00:01 && (ip4.src == {10.0.0.2,
> 0.0.0.0} && ip4.dst == {10.0.0.1, 255.255.255.255}) && udp.src == 68 &&
> udp.dst == 67), action=(reg8[[16]] = 1; next;)
> >> +  table=??(ls_out_acl_eval), priority=34000, match=(outport ==
> "sw0-port1" && eth.src == c0:ff:ee:00:00:01 && ip4.src == 10.0.0.1 && udp
> && udp.src == 67 && udp.dst == 68), action=(reg8[[16]] = 1; next;)
> >> +])
> >> +
> >>  AT_CHECK([grep -w "ls_in_dhcp_options" sw0flows | ovn_strip_lflows],
> [0], [dnl
> >>table=??(ls_in_dhcp_options ), priority=0, match=(1),
> action=(next;)
> >>table=??(ls_in_dhcp_options ), priority=100  , match=(inport ==
> "sw0-port1" && eth.src == 50:54:00:00:00:01 && (ip4.src == {10.0.0.2,
> 0.0.0.0} && ip4.dst == {10.0.0.1, 255.255.255.255}) && udp.src == 68 &&
> udp.dst == 67), action=(reg0[[3]] = put_dhcp_opts(offerip = 10.0.0.2,
> hostname = "foo", lease_time = 3600, netmask = 255.255.255.0, router =
> 10.0.0.1, server_id = 10.0.0.1); next;)
> >> --
> >> 2.42.0.windows.2
> >>
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 6/6] netdev-dpdk: Refactor tunnel checksum offloading.

2024-04-19 Thread David Marchand
All informations required for checksum offloading can be deduced by
already tracked dp_packet l3_ofs, l4_ofs, inner_l3_ofs and inner_l4_ofs
fields.
Remove DPDK specific l[2-4]_len from generic OVS code.

netdev-dpdk code then fills mbuf specifics step by step:
- outer_l2_len and outer_l3_len are needed for tunneling (and below
  features),
- l2_len and l3_len are needed for IP and L4 checksum (and below features),
- l4_len and tso_segsz are needed when doing TSO,

Signed-off-by: David Marchand 
---
 lib/dp-packet.h | 37 --
 lib/netdev-dpdk.c   | 35 ++---
 lib/netdev-native-tnl.c | 50 +
 3 files changed, 27 insertions(+), 95 deletions(-)

diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 3622764c47..a75b1c5cdb 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -604,25 +604,6 @@ dp_packet_get_nd_payload(const struct dp_packet *b)
 }
 
 #ifdef DPDK_NETDEV
-static inline void
-dp_packet_set_l2_len(struct dp_packet *b, size_t l2_len)
-{
-b->mbuf.l2_len = l2_len;
-}
-
-static inline void
-dp_packet_set_l3_len(struct dp_packet *b, size_t l3_len)
-{
-b->mbuf.l3_len = l3_len;
-}
-
-static inline void
-dp_packet_set_l4_len(struct dp_packet *b, size_t l4_len)
-{
-b->mbuf.l4_len = l4_len;
-}
-
-
 static inline uint64_t *
 dp_packet_ol_flags_ptr(const struct dp_packet *b)
 {
@@ -642,24 +623,6 @@ dp_packet_flow_mark_ptr(const struct dp_packet *b)
 }
 
 #else
-static inline void
-dp_packet_set_l2_len(struct dp_packet *b OVS_UNUSED, size_t l2_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
-static inline void
-dp_packet_set_l3_len(struct dp_packet *b OVS_UNUSED, size_t l3_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
-static inline void
-dp_packet_set_l4_len(struct dp_packet *b OVS_UNUSED, size_t l4_len OVS_UNUSED)
-{
-/* There is no implementation. */
-}
-
 static inline uint32_t *
 dp_packet_ol_flags_ptr(const struct dp_packet *b)
 {
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 1dad2ef833..31dd6f1d5a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2584,6 +2584,9 @@ static bool
 netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
 {
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
+void *l2;
+void *l3;
+void *l4;
 
 const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
  RTE_MBUF_F_TX_L4_MASK |
@@ -2613,11 +2616,6 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 return true;
 }
 
-ovs_assert(dp_packet_l4(pkt));
-
-/* If packet is vxlan or geneve tunnel packet, calculate outer
- * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
- * before. */
 const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK;
 if (OVS_UNLIKELY(tunnel_type &&
  tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE &&
@@ -2635,6 +2633,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
  (char *) dp_packet_eth(pkt);
 mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
  (char *) dp_packet_l3(pkt);
+
+/* Inner L2 length must account for the tunnel header length. */
+l2 = dp_packet_l4(pkt);
+l3 = dp_packet_inner_l3(pkt);
+l4 = dp_packet_inner_l4(pkt);
 } else {
 /* If no outer offloading is requested, clear outer marks. */
 mbuf->ol_flags &= ~all_outer_marks;
@@ -2642,8 +2645,9 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->outer_l3_len = 0;
 
 /* Skip outer headers. */
-mbuf->l2_len += (char *) dp_packet_l4(pkt) -
-(char *) dp_packet_eth(pkt);
+l2 = dp_packet_eth(pkt);
+l3 = dp_packet_inner_l3(pkt);
+l4 = dp_packet_inner_l4(pkt);
 }
 } else {
 if (tunnel_type) {
@@ -2663,22 +2667,27 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 }
 mbuf->outer_l2_len = 0;
 mbuf->outer_l3_len = 0;
-mbuf->l2_len = (char *) dp_packet_l3(pkt) -
-   (char *) dp_packet_eth(pkt);
-mbuf->l3_len = (char *) dp_packet_l4(pkt) -
-   (char *) dp_packet_l3(pkt);
+
+l2 = dp_packet_eth(pkt);
+l3 = dp_packet_l3(pkt);
+l4 = dp_packet_l4(pkt);
 }
 
+ovs_assert(l4);
+
+mbuf->l2_len = (char *) l3 - (char *) l2;
+mbuf->l3_len = (char *) l4 - (char *) l3;
+
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-struct tcp_header *th = dp_packet_l4(pkt);
+struct tcp_header *th = l4;
 uint16_t link_tso_segsz;
 int hdr_len;
 
+mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
   

[ovs-dev] [PATCH v3 5/6] netdev-dpdk: Use guest TSO segmentation size hint.

2024-04-19 Thread David Marchand
In a typical setup like:
guest A <-virtio-> OVS A <-vxlan-> OVS B <-virtio-> guest B

TSO packets from guest A are segmented against the OVS A physical port
mtu adjusted by the vxlan tunnel header size, regardless of guest A
interface mtu.

As an example, let's say guest A and guest B mtu are set to 1500 bytes.
OVS A and OVS B physical ports mtu are set to 1600 bytes.
Guest A will request TCP segmentation for 1448 bytes segments.
On the other hand, OVS A will request 1498 bytes segments to the HW.
This results in OVS B dropping packets because decapsulated packets
are larger than the vhost-user port (serving guest B) mtu.

2024-04-17T14:13:01.239Z|2|netdev_dpdk(pmd-c03/id:7)|WARN|vhost0:
Too big size 1564 max_packet_len 1518

vhost-user ports expose a guest mtu by filling mbuf->tso_segsz.
Use it as a hint.

This may result in segments (on the wire) slightly shorter than the
optimal size.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Signed-off-by: David Marchand 
---
Note:
As we trust the guest with this change, should we put a lower limit on
mbuf->tso_segsz?

---
 lib/netdev-dpdk.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 661269e4b6..1dad2ef833 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2671,14 +2671,19 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
 struct tcp_header *th = dp_packet_l4(pkt);
+uint16_t link_tso_segsz;
 int hdr_len;
 
 if (tunnel_type) {
-mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
-  mbuf->l4_len - mbuf->outer_l3_len;
+link_tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
+ mbuf->l4_len - mbuf->outer_l3_len;
 } else {
 mbuf->l4_len = TCP_OFFSET(th->tcp_ctl) * 4;
-mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+link_tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
+}
+
+if (!mbuf->tso_segsz || mbuf->tso_segsz > link_tso_segsz) {
+mbuf->tso_segsz = link_tso_segsz;
 }
 
 hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 4/6] netdev-dpdk: Refactor TSO request code.

2024-04-19 Thread David Marchand
Replace check on th == NULL with an assert() because dp_packet_l4(pkt)
is priorly used to compute (outer) L3 length.

Besides, filling l4_len and tso_segsz only matters to TSO, so there is
no need to check for other L4 checksum offloading requests.

Signed-off-by: David Marchand 
---
 lib/netdev-dpdk.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 8b6a3ed189..661269e4b6 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2584,7 +2584,6 @@ static bool
 netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf)
 {
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
-struct tcp_header *th;
 
 const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
  RTE_MBUF_F_TX_L4_MASK |
@@ -2614,6 +2613,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 return true;
 }
 
+ovs_assert(dp_packet_l4(pkt));
+
 /* If packet is vxlan or geneve tunnel packet, calculate outer
  * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
  * before. */
@@ -2667,22 +2668,10 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->l3_len = (char *) dp_packet_l4(pkt) -
(char *) dp_packet_l3(pkt);
 }
-th = dp_packet_l4(pkt);
 
 if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-if (!th) {
-VLOG_WARN_RL(, "%s: TCP Segmentation without L4 header"
- " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
-return false;
-}
-}
-
-if ((mbuf->ol_flags & RTE_MBUF_F_TX_L4_MASK) == RTE_MBUF_F_TX_TCP_CKSUM) {
-if (!th) {
-VLOG_WARN_RL(, "%s: TCP offloading without L4 header"
- " pkt len: %"PRIu32"", dev->up.name, mbuf->pkt_len);
-return false;
-}
+struct tcp_header *th = dp_packet_l4(pkt);
+int hdr_len;
 
 if (tunnel_type) {
 mbuf->tso_segsz = dev->mtu - mbuf->l2_len - mbuf->l3_len -
@@ -2692,16 +2681,13 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 mbuf->tso_segsz = dev->mtu - mbuf->l3_len - mbuf->l4_len;
 }
 
-if (mbuf->ol_flags & RTE_MBUF_F_TX_TCP_SEG) {
-int hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
-if (OVS_UNLIKELY((hdr_len +
-  mbuf->tso_segsz) > dev->max_packet_len)) {
-VLOG_WARN_RL(, "%s: Oversized TSO packet. hdr: %"PRIu32", "
- "gso: %"PRIu32", max len: %"PRIu32"",
- dev->up.name, hdr_len, mbuf->tso_segsz,
- dev->max_packet_len);
-return false;
-}
+hdr_len = mbuf->l2_len + mbuf->l3_len + mbuf->l4_len;
+if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) {
+VLOG_WARN_RL(, "%s: Oversized TSO packet. hdr: %"PRIu32", "
+ "gso: %"PRIu32", max len: %"PRIu32"",
+ dev->up.name, hdr_len, mbuf->tso_segsz,
+ dev->max_packet_len);
+return false;
 }
 }
 
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 3/6] netdev-dpdk: Fix inner checksum when outer is not supported.

2024-04-19 Thread David Marchand
If outer checksum is not supported and OVS already set L3/L4 outer
checksums in the packet, no outer mark should be left in ol_flags
(as it confuses some driver, like net/ixgbe).

l2_len must be adjusted to account for the tunnel header.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: David Marchand 
---
 lib/netdev-dpdk.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index f732716141..8b6a3ed189 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2629,10 +2629,21 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 }
 
 if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) {
-mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
- (char *) dp_packet_eth(pkt);
-mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
- (char *) dp_packet_l3(pkt);
+if (mbuf->ol_flags & all_outer_requests) {
+mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
+ (char *) dp_packet_eth(pkt);
+mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
+ (char *) dp_packet_l3(pkt);
+} else {
+/* If no outer offloading is requested, clear outer marks. */
+mbuf->ol_flags &= ~all_outer_marks;
+mbuf->outer_l2_len = 0;
+mbuf->outer_l3_len = 0;
+
+/* Skip outer headers. */
+mbuf->l2_len += (char *) dp_packet_l4(pkt) -
+(char *) dp_packet_eth(pkt);
+}
 } else {
 if (tunnel_type) {
 /* No inner offload is requested, fallback to non tunnel
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 1/6] netdev-dpdk: Fallback to non tunnel checksum offloading.

2024-04-19 Thread David Marchand
The outer checksum offloading API in DPDK is ambiguous and was
implemented by Intel folks in their drivers with the assumption that
any outer offloading always goes with an inner offloading request.

With net/i40e and net/ice drivers, in the case of encapsulating a ARP
packet in a vxlan tunnel (which results in requesting outer ip checksum
with a tunnel context but no inner offloading request), a Tx failure is
triggered, associated with a port MDD event.
2024-03-27T16:02:07.084Z|00018|dpdk|WARN|ice_interrupt_handler(): OICR:
MDD event

To avoid this situation, if no checksum or segmentation offloading is
requested on the inner part of a packet, fallback to "normal" (non outer)
offloading request.

Reported-at: https://github.com/openvswitch/ovs-issues/issues/321
Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Fixes: f81d782c1906 ("netdev-native-tnl: Mark all vxlan/geneve packets as 
tunneled.")
Signed-off-by: David Marchand 
---
Changes since v2:
- kept offloads disabled for net/i40e and net/ice as this patch does not
  fix outer udp checksum (a DPDK fix is required),
- updated commitlog with details to reproduce the issue,
- adjusted indent,

Changes since v1:
- reset inner marks before converting outer requests,
- fixed some coding style,

---
 lib/netdev-dpdk.c | 71 +++
 1 file changed, 41 insertions(+), 30 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2111f77681..7e109903c0 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2584,16 +2584,18 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
 struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf);
 struct tcp_header *th;
 
-const uint64_t all_requests = (RTE_MBUF_F_TX_IP_CKSUM |
-   RTE_MBUF_F_TX_L4_MASK  |
-   RTE_MBUF_F_TX_OUTER_IP_CKSUM  |
-   RTE_MBUF_F_TX_OUTER_UDP_CKSUM |
-   RTE_MBUF_F_TX_TCP_SEG);
-const uint64_t all_marks = (RTE_MBUF_F_TX_IPV4 |
-RTE_MBUF_F_TX_IPV6 |
-RTE_MBUF_F_TX_OUTER_IPV4 |
-RTE_MBUF_F_TX_OUTER_IPV6 |
-RTE_MBUF_F_TX_TUNNEL_MASK);
+const uint64_t all_inner_requests = (RTE_MBUF_F_TX_IP_CKSUM |
+ RTE_MBUF_F_TX_L4_MASK |
+ RTE_MBUF_F_TX_TCP_SEG);
+const uint64_t all_outer_requests = (RTE_MBUF_F_TX_OUTER_IP_CKSUM |
+ RTE_MBUF_F_TX_OUTER_UDP_CKSUM);
+const uint64_t all_requests = all_inner_requests | all_outer_requests;
+const uint64_t all_inner_marks = (RTE_MBUF_F_TX_IPV4 |
+  RTE_MBUF_F_TX_IPV6);
+const uint64_t all_outer_marks = (RTE_MBUF_F_TX_OUTER_IPV4 |
+  RTE_MBUF_F_TX_OUTER_IPV6 |
+  RTE_MBUF_F_TX_TUNNEL_MASK);
+const uint64_t all_marks = all_inner_marks | all_outer_marks;
 
 if (!(mbuf->ol_flags & all_requests)) {
 /* No offloads requested, no marks should be set. */
@@ -2614,34 +2616,43 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, 
struct rte_mbuf *mbuf)
  * l2 len and outer l3 len. Inner l2/l3/l4 len are calculated
  * before. */
 const uint64_t tunnel_type = mbuf->ol_flags & RTE_MBUF_F_TX_TUNNEL_MASK;
-if (tunnel_type == RTE_MBUF_F_TX_TUNNEL_GENEVE ||
-tunnel_type == RTE_MBUF_F_TX_TUNNEL_VXLAN) {
-mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
- (char *) dp_packet_eth(pkt);
-mbuf->outer_l3_len = (char *) dp_packet_l4(pkt) -
- (char *) dp_packet_l3(pkt);
-
-/* If neither inner checksums nor TSO is requested, inner marks
- * should not be set. */
-if (!(mbuf->ol_flags & (RTE_MBUF_F_TX_IP_CKSUM |
-RTE_MBUF_F_TX_L4_MASK  |
-RTE_MBUF_F_TX_TCP_SEG))) {
-mbuf->ol_flags &= ~(RTE_MBUF_F_TX_IPV4 |
-RTE_MBUF_F_TX_IPV6);
-}
-} else if (OVS_UNLIKELY(tunnel_type)) {
+if (OVS_UNLIKELY(tunnel_type &&
+ tunnel_type != RTE_MBUF_F_TX_TUNNEL_GENEVE &&
+ tunnel_type != RTE_MBUF_F_TX_TUNNEL_VXLAN)) {
 VLOG_WARN_RL(, "%s: Unexpected tunnel type: %#"PRIx64,
  netdev_get_name(>up), tunnel_type);
 netdev_dpdk_mbuf_dump(netdev_get_name(>up),
   "Packet with unexpected tunnel type", mbuf);
 return false;
+}
+
+if (tunnel_type && (mbuf->ol_flags & all_inner_requests)) {
+mbuf->outer_l2_len = (char *) dp_packet_l3(pkt) -
+ (char *) dp_packet_eth(pkt);
+mbuf->outer_l3_len = (char *) 

[ovs-dev] [PATCH v3 2/6] netdev-dpdk: Disable outer UDP checksum for net/iavf.

2024-04-19 Thread David Marchand
Same as the commit 6f93d8e62f13 ("netdev-dpdk: Disable outer UDP checksum
offload for ice/i40e driver."), disable outer UDP checksum and related
offloads for net/iavf.

Fixes: 084c8087292c ("userspace: Support VXLAN and GENEVE TSO.")
Signed-off-by: David Marchand 
---
Note:
- DPDK (in progress) fixes can be found at:
  https://patchwork.dpdk.org/project/dpdk/list/?series=31780=*

---
 lib/netdev-dpdk.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7e109903c0..f732716141 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1355,12 +1355,14 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev)
 }
 
 if (!strcmp(info.driver_name, "net_ice")
-|| !strcmp(info.driver_name, "net_i40e")) {
+|| !strcmp(info.driver_name, "net_i40e")
+|| !strcmp(info.driver_name, "net_iavf")) {
 /* FIXME: Driver advertises the capability but doesn't seem
  * to actually support it correctly.  Can remove this once
  * the driver is fixed on DPDK side. */
 VLOG_INFO("%s: disabled Tx outer udp checksum offloads for a "
-  "net/ice or net/i40e port.", netdev_get_name(>up));
+  "net/ice, net/i40e or net/iavf port.",
+  netdev_get_name(>up));
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM;
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO;
 info.tx_offload_capa &= ~RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO;
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [Patch ovn v4 2/2] northd: Fix direct access to SNAT network.

2024-04-19 Thread Martin Kalcok
This change fixes bug that breaks ability of machines from external
networks, to communicate with machines in SNATed networks (specifically
when using a Distributed router).

Currently when a machine (S1) on an external network tries to talk
over TCP with a machine (A1) in a network that has enabled SNAT, the
connection is established successfully. However after the three-way
handshake, any packets that come from the A1 machine will have their
source address translated by the Distributed router, breaking the
communication.

Existing rule in `build_lrouter_out_snat_flow` that decides which
packets should be SNATed already tries to avoid SNATing packets in
reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages
in the distributed LR egress pipeline do not initiate the CT state.

Additionally we need to commit new connections that originate from
external networks into CT, so that the packets in the reply direction
(back to the external network) can be properly identified.

Rationale:

In my original RFC [0], there were questions about the motivation for
fixing this issue. I'll try to summarize why I think this is a bug
that should be fixed.

1. Current implementation for Distributed router already tries to
   avoid SNATing packets in the reply direction, it's just missing
   initialized CT states to make proper decisions.

2. This same scenario works with Gateway Router. I tested with
   following setup:

foo -- R1 -- join - R3 -- alice
  |
bar --R2

R1 is a Distributed router with SNAT for foo. R2 is a Gateway
router with SNAT for bar. R3 is a Gateway router with no SNAT.
Using 'alice1' as a client I was able to talk over TCP with
'bar1' but connection with 'foo1' failed.

3. Regarding security and "leaking" of internal IPs. Reading through
   RFC 4787 [1], 5382 [2] and their update in 7857 [3], the
   specifications do not seem to mandate that SNAT implementations
   must filter incoming traffic destined directly to the internal
   network. Sections like "5. Filtering Behavior" in 4787 and
   "4.3 Externally Initiated Connections" in 5382 describe only
   behavior for traffic destined to external IP/Port associated
   with NAT on the device that performs NAT.

   Besides, with the current implementation, it's already possible
   to scan the internal network with pings and TCP syn scanning.

4. We do have customers/clouds that depend on this functionality.
   This is a scenario that used to work in Openstack with ML2/OVS
   and migrating those clouds to ML2/OVN would break it.

[0]https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html
[1]https://datatracker.ietf.org/doc/html/rfc4787
[2]https://datatracker.ietf.org/doc/html/rfc5382
[3]https://datatracker.ietf.org/doc/html/rfc7857

Signed-off-by: Martin Kalcok 
---
 northd/northd.c | 66 ---
 northd/ovn-northd.8.xml | 28 +++
 tests/ovn-northd.at | 76 +
 tests/system-ovn.at | 68 
 4 files changed, 219 insertions(+), 19 deletions(-)

diff --git a/northd/northd.c b/northd/northd.c
index 37f443e70..2c2eee445 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -14425,20 +14425,27 @@ build_lrouter_out_is_dnat_local(struct lflow_table 
*lflows,
 
 static void
 build_lrouter_out_snat_match(struct lflow_table *lflows,
- const struct ovn_datapath *od,
- const struct nbrec_nat *nat, struct ds *match,
- bool distributed_nat, int cidr_bits, bool is_v6,
- struct ovn_port *l3dgw_port,
- struct lflow_ref *lflow_ref)
+ const struct ovn_datapath *od,
+ const struct nbrec_nat *nat,
+ struct ds *match,
+ bool distributed_nat, int cidr_bits,
+ bool is_v6,
+ struct ovn_port *l3dgw_port,
+ struct lflow_ref *lflow_ref,
+ bool is_reverse)
 {
 ds_clear(match);
 
-ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4',
+ds_put_format(match, "ip && ip%c.%s == %s",
+  is_v6 ? '6' : '4',
+  is_reverse ? "dst" : "src",
   nat->logical_ip);
 
 if (!od->is_gw_router) {
 /* Distributed router. */
-ds_put_format(match, " && outport == %s", l3dgw_port->json_key);
+ds_put_format(match, " && %s == %s",
+  is_reverse ? "inport" : "outport",
+  l3dgw_port->json_key);
 if (od->n_l3dgw_ports) {
 ds_put_format(match, " && is_chassis_resident(\"%s\")",
   

[ovs-dev] [Patch ovn v4 1/2] actions: New action ct_commit_to_zone.

2024-04-19 Thread Martin Kalcok
This change adds a new action 'ct_commit_to_zone' that enables users to commit
the flow into a specific zone in the connection tracker.

A new feature flag, OVN_FEATURE_CT_COMMIT_TO_ZONE, is also included to avoid
issues during upgrade in case the northd is upgraded to a version that supports
this action before the controller is upgraded.

Note that this action is meaningful only in the context of Logical Router
datapath. Logical Switch datapath does not use multiple zones and this action
falls back to committing the connection into the default zone for the Logical
Switch.

Signed-off-by: Martin Kalcok 
---
 controller/chassis.c  |   8 ++
 include/ovn/actions.h |   8 +-
 include/ovn/features.h|   1 +
 lib/actions.c | 155 +-
 northd/en-global-config.c |  10 +++
 northd/en-global-config.h |   1 +
 ovn-sb.xml|  22 +-
 tests/ovn.at  |  16 
 utilities/ovn-trace.c |  45 ---
 9 files changed, 199 insertions(+), 67 deletions(-)

diff --git a/controller/chassis.c b/controller/chassis.c
index ad75df288..9bb2eba95 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -371,6 +371,7 @@ chassis_build_other_config(const struct ovs_chassis_cfg 
*ovs_cfg,
 smap_replace(config, OVN_FEATURE_FDB_TIMESTAMP, "true");
 smap_replace(config, OVN_FEATURE_LS_DPG_COLUMN, "true");
 smap_replace(config, OVN_FEATURE_CT_COMMIT_NAT_V2, "true");
+smap_replace(config, OVN_FEATURE_CT_COMMIT_TO_ZONE, "true");
 }
 
 /*
@@ -516,6 +517,12 @@ chassis_other_config_changed(const struct ovs_chassis_cfg 
*ovs_cfg,
 return true;
 }
 
+if (!smap_get_bool(_rec->other_config,
+   OVN_FEATURE_CT_COMMIT_TO_ZONE,
+   false)) {
+return true;
+}
+
 return false;
 }
 
@@ -648,6 +655,7 @@ update_supported_sset(struct sset *supported)
 sset_add(supported, OVN_FEATURE_FDB_TIMESTAMP);
 sset_add(supported, OVN_FEATURE_LS_DPG_COLUMN);
 sset_add(supported, OVN_FEATURE_CT_COMMIT_NAT_V2);
+sset_add(supported, OVN_FEATURE_CT_COMMIT_TO_ZONE);
 }
 
 static void
diff --git a/include/ovn/actions.h b/include/ovn/actions.h
index 8e794450c..aa5e4ab89 100644
--- a/include/ovn/actions.h
+++ b/include/ovn/actions.h
@@ -68,6 +68,7 @@ struct collector_set_ids;
 OVNACT(CT_NEXT,   ovnact_ct_next) \
 /* CT_COMMIT_V1 is not supported anymore. */  \
 OVNACT(CT_COMMIT_V2,  ovnact_nest)\
+OVNACT(CT_COMMIT_TO_ZONE, ovnact_ct_commit_to_zone) \
 OVNACT(CT_DNAT,   ovnact_ct_nat)  \
 OVNACT(CT_SNAT,   ovnact_ct_nat)  \
 OVNACT(CT_DNAT_IN_CZONE,  ovnact_ct_nat)  \
@@ -76,7 +77,7 @@ struct collector_set_ids;
 OVNACT(CT_LB_MARK,ovnact_ct_lb)   \
 OVNACT(SELECT,ovnact_select)  \
 OVNACT(CT_CLEAR,  ovnact_null)\
-OVNACT(CT_COMMIT_NAT, ovnact_ct_commit_nat)   \
+OVNACT(CT_COMMIT_NAT, ovnact_ct_commit_to_zone) \
 OVNACT(CLONE, ovnact_nest)\
 OVNACT(ARP,   ovnact_nest)\
 OVNACT(ICMP4, ovnact_nest)\
@@ -290,11 +291,12 @@ struct ovnact_ct_nat {
 uint8_t ltable; /* Logical table ID of next table. */
 };
 
-/* OVNACT_CT_COMMIT_NAT. */
-struct ovnact_ct_commit_nat {
+/* OVNACT_CT_COMMIT_TO_ZONE, OVNACT_CT_COMMIT_NAT. */
+struct ovnact_ct_commit_to_zone {
 struct ovnact ovnact;
 
 bool dnat_zone;
+bool do_nat;
 uint8_t ltable;
 };
 
diff --git a/include/ovn/features.h b/include/ovn/features.h
index 08f1d8288..35a5d8ba0 100644
--- a/include/ovn/features.h
+++ b/include/ovn/features.h
@@ -28,6 +28,7 @@
 #define OVN_FEATURE_FDB_TIMESTAMP "fdb-timestamp"
 #define OVN_FEATURE_LS_DPG_COLUMN "ls-dpg-column"
 #define OVN_FEATURE_CT_COMMIT_NAT_V2 "ct-commit-nat-v2"
+#define OVN_FEATURE_CT_COMMIT_TO_ZONE "ct-commit-to-zone"
 
 /* OVS datapath supported features.  Based on availability OVN might generate
  * different types of openflows.
diff --git a/lib/actions.c b/lib/actions.c
index 39bb5bc8a..b9bdcd48d 100644
--- a/lib/actions.c
+++ b/lib/actions.c
@@ -870,6 +870,45 @@ parse_ct_nat(struct action_context *ctx, const char *name,
 }
 }
 
+static void
+parse_ct_commit_to_zone(struct action_context *ctx,  const char *name,
+bool do_nat, bool require_param,
+struct ovnact_ct_commit_to_zone *cn)
+{
+add_prerequisite(ctx, "ip");
+
+if (ctx->pp->cur_ltable >= ctx->pp->n_tables) {
+lexer_error(ctx->lexer,
+"\"%s\" action not allowed in last table.", name);
+return;
+}
+
+cn->ltable = ctx->pp->cur_ltable + 1;
+cn->do_nat = do_nat;
+cn->dnat_zone = true;
+
+if (require_param) {
+lexer_force_match(ctx->lexer, LEX_T_LPAREN);
+} else {
+if (!lexer_match(ctx->lexer, 

Re: [ovs-dev] [PATCH ovn] controller: Remove the ovn-set-local-ip option.

2024-04-19 Thread Dumitru Ceara
On 4/19/24 13:14, Ales Musil wrote:
> The local_ip should be present for chassis with single encap whenever
> we configure its interface in OvS. Not having the local_ip can lead to
> traffic being dropped on the other side of tunnel because the source
> IP might be different, this is more likely to happen in pure IPv6
> deployments.
> 
> Remove the option as with the local_ip being enforced
> also for single encap it became "true" in all scenarios, and it's not
> needed anymore.
> 
> Reported-at: https://issues.redhat.com/browse/FDP-570
> Signed-off-by: Ales Musil 
> ---

Hi Han,

When you have time would you mind double checking this in case we missed
some scenario?

Thanks,
Dumitru

>  NEWS|  3 +++
>  controller/encaps.c | 31 +++
>  controller/ovn-controller.8.xml | 14 +---
>  tests/ovn-controller.at | 38 +++--
>  4 files changed, 39 insertions(+), 47 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 141f1831c..9adf6a31c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -13,6 +13,9 @@ Post v24.03.0
>  "lflow-stage-to-oftable STAGE_NAME" that converts stage name into 
> OpenFlow
>  table id.
>- Rename the ovs-sandbox script to ovn-sandbox.
> +  - Remove "ovn-set-local-ip" config option from vswitchd
> +external-ids, the option is no longer needed as it became effectively
> +"true" for all scenarios.
>  
>  OVN v24.03.0 - 01 Mar 2024
>  --
> diff --git a/controller/encaps.c b/controller/encaps.c
> index a9cb604b8..b5ef66371 100644
> --- a/controller/encaps.c
> +++ b/controller/encaps.c
> @@ -208,11 +208,12 @@ out:
>  static void
>  tunnel_add(struct tunnel_ctx *tc, const struct sbrec_sb_global *sbg,
> const char *new_chassis_id, const struct sbrec_encap *encap,
> -   bool must_set_local_ip, const char *local_ip,
> +   const char *local_ip,
> const struct ovsrec_open_vswitch_table *ovs_table)
>  {
>  struct smap options = SMAP_INITIALIZER();
>  smap_add(, "remote_ip", encap->ip);
> +smap_add(, "local_ip", local_ip);
>  smap_add(, "key", "flow");
>  const char *dst_port = smap_get(>options, "dst_port");
>  const char *csum = smap_get(>options, "csum");
> @@ -239,7 +240,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
> sbrec_sb_global *sbg,
>  const struct ovsrec_open_vswitch *cfg =
>  ovsrec_open_vswitch_table_first(ovs_table);
>  
> -bool set_local_ip = must_set_local_ip;
>  if (cfg) {
>  /* If the tos option is configured, get it */
>  const char *encap_tos =
> @@ -259,19 +259,10 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
> sbrec_sb_global *sbg,
>  if (encap_df) {
>  smap_add(, "df_default", encap_df);
>  }
> -
> -if (!set_local_ip) {
> -/* If ovn-set-local-ip option is configured, get it */
> -set_local_ip =
> -get_chassis_external_id_value_bool(
> ->external_ids, tc->this_chassis->name,
> -"ovn-set-local-ip", false);
> -}
>  }
>  
>  /* Add auth info if ipsec is enabled. */
>  if (sbg->ipsec) {
> -set_local_ip = true;
>  smap_add(, "remote_name", new_chassis_id);
>  
>  /* Force NAT-T traversal via configuration */
> @@ -290,10 +281,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
> sbrec_sb_global *sbg,
>  }
>  }
>  
> -if (set_local_ip) {
> -smap_add(, "local_ip", local_ip);
> -}
> -
>  /* If there's an existing tunnel record that does not need any change,
>   * keep it.  Otherwise, create a new record (if there was an existing
>   * record, the new record will supplant it and encaps_run() will delete
> @@ -412,18 +399,6 @@ chassis_tunnel_add(const struct sbrec_chassis 
> *chassis_rec,
>  continue;
>  }
>  
> -/* Check if need to pass the local ip. We always set local ip if 
> there
> - * are multiple local IPs for the selected encap type. */
> -int count = 0;
> -bool set_local_ip = false;
> -for (int j = 0; j < this_chassis->n_encaps; j++) {
> -if (pref_type == get_tunnel_type(this_chassis->encaps[j]->type) 
> &&
> -count++ > 0) {
> -set_local_ip = true;
> -break;
> -}
> -}
> -
>  for (int j = 0; j < this_chassis->n_encaps; j++) {
>  if (pref_type != get_tunnel_type(this_chassis->encaps[j]->type)) 
> {
>  continue;
> @@ -431,7 +406,7 @@ chassis_tunnel_add(const struct sbrec_chassis 
> *chassis_rec,
>  VLOG_DBG("tunnel_add: '%s', local ip: %s", chassis_rec->name,
>   this_chassis->encaps[j]->ip);
>  tunnel_add(tc, sbg, chassis_rec->name, chassis_rec->encaps[i],
> -   set_local_ip, this_chassis->encaps[j]->ip, 

[ovs-dev] [PATCH ovn] controller: Remove the ovn-set-local-ip option.

2024-04-19 Thread Ales Musil
The local_ip should be present for chassis with single encap whenever
we configure its interface in OvS. Not having the local_ip can lead to
traffic being dropped on the other side of tunnel because the source
IP might be different, this is more likely to happen in pure IPv6
deployments.

Remove the option as with the local_ip being enforced
also for single encap it became "true" in all scenarios, and it's not
needed anymore.

Reported-at: https://issues.redhat.com/browse/FDP-570
Signed-off-by: Ales Musil 
---
 NEWS|  3 +++
 controller/encaps.c | 31 +++
 controller/ovn-controller.8.xml | 14 +---
 tests/ovn-controller.at | 38 +++--
 4 files changed, 39 insertions(+), 47 deletions(-)

diff --git a/NEWS b/NEWS
index 141f1831c..9adf6a31c 100644
--- a/NEWS
+++ b/NEWS
@@ -13,6 +13,9 @@ Post v24.03.0
 "lflow-stage-to-oftable STAGE_NAME" that converts stage name into OpenFlow
 table id.
   - Rename the ovs-sandbox script to ovn-sandbox.
+  - Remove "ovn-set-local-ip" config option from vswitchd
+external-ids, the option is no longer needed as it became effectively
+"true" for all scenarios.
 
 OVN v24.03.0 - 01 Mar 2024
 --
diff --git a/controller/encaps.c b/controller/encaps.c
index a9cb604b8..b5ef66371 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -208,11 +208,12 @@ out:
 static void
 tunnel_add(struct tunnel_ctx *tc, const struct sbrec_sb_global *sbg,
const char *new_chassis_id, const struct sbrec_encap *encap,
-   bool must_set_local_ip, const char *local_ip,
+   const char *local_ip,
const struct ovsrec_open_vswitch_table *ovs_table)
 {
 struct smap options = SMAP_INITIALIZER();
 smap_add(, "remote_ip", encap->ip);
+smap_add(, "local_ip", local_ip);
 smap_add(, "key", "flow");
 const char *dst_port = smap_get(>options, "dst_port");
 const char *csum = smap_get(>options, "csum");
@@ -239,7 +240,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 const struct ovsrec_open_vswitch *cfg =
 ovsrec_open_vswitch_table_first(ovs_table);
 
-bool set_local_ip = must_set_local_ip;
 if (cfg) {
 /* If the tos option is configured, get it */
 const char *encap_tos =
@@ -259,19 +259,10 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 if (encap_df) {
 smap_add(, "df_default", encap_df);
 }
-
-if (!set_local_ip) {
-/* If ovn-set-local-ip option is configured, get it */
-set_local_ip =
-get_chassis_external_id_value_bool(
->external_ids, tc->this_chassis->name,
-"ovn-set-local-ip", false);
-}
 }
 
 /* Add auth info if ipsec is enabled. */
 if (sbg->ipsec) {
-set_local_ip = true;
 smap_add(, "remote_name", new_chassis_id);
 
 /* Force NAT-T traversal via configuration */
@@ -290,10 +281,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 }
 }
 
-if (set_local_ip) {
-smap_add(, "local_ip", local_ip);
-}
-
 /* If there's an existing tunnel record that does not need any change,
  * keep it.  Otherwise, create a new record (if there was an existing
  * record, the new record will supplant it and encaps_run() will delete
@@ -412,18 +399,6 @@ chassis_tunnel_add(const struct sbrec_chassis *chassis_rec,
 continue;
 }
 
-/* Check if need to pass the local ip. We always set local ip if there
- * are multiple local IPs for the selected encap type. */
-int count = 0;
-bool set_local_ip = false;
-for (int j = 0; j < this_chassis->n_encaps; j++) {
-if (pref_type == get_tunnel_type(this_chassis->encaps[j]->type) &&
-count++ > 0) {
-set_local_ip = true;
-break;
-}
-}
-
 for (int j = 0; j < this_chassis->n_encaps; j++) {
 if (pref_type != get_tunnel_type(this_chassis->encaps[j]->type)) {
 continue;
@@ -431,7 +406,7 @@ chassis_tunnel_add(const struct sbrec_chassis *chassis_rec,
 VLOG_DBG("tunnel_add: '%s', local ip: %s", chassis_rec->name,
  this_chassis->encaps[j]->ip);
 tunnel_add(tc, sbg, chassis_rec->name, chassis_rec->encaps[i],
-   set_local_ip, this_chassis->encaps[j]->ip, ovs_table);
+   this_chassis->encaps[j]->ip, ovs_table);
 tuncnt++;
 }
 }
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 5ebef048d..85e7966d7 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -367,16 +367,6 @@
 of how many entries there are in the cache.  By default this is set to

[ovs-dev] [PATCH ovn v5] northd, controller: Use paused controller action for packet buffering.

2024-04-19 Thread Ales Musil
The current packet injection loses ct_state in the process. When
the ct_state is lost we might commit to DNAT zone and perform
zero SNAT after the packet injection. This causes the first session
to be wrong as the reply packets are not unDNATted.

Instead of re-injecting the packet back into the pipeline when
we get the MAC binding, use paused controller action. The paused
controller action stores ct_state, which is important for the behavior
of the resumed packet.

At the same time bump the OvS submodule latest branch-3.3. This is
mainly for [0], which fixes metering for paused controller actions.

In order to make sure that the paused action works during upgrade add
the output implicitly. Once the upgrade is done northd will create option
to inform controllers that the implicit action is no longer needed.

[0] c560f6ca3257 ("ofproto-dpif-xlate: Fix continuations with associated 
metering.")

Reported-at: https://issues.redhat.com/browse/FDP-439
Signed-off-by: Ales Musil 
---
v5: Rebase on top of current main.
v4: Fix copy paste error in the global_handler.
v3: Rebase on top of current main.
Add flag to ensure that the paused action works during upgrade.
v2: Fix the Jira link and add ack from Mark.
---
 controller/lflow.c  |  1 +
 controller/lflow.h  |  1 +
 controller/mac-learn.c  | 30 
 controller/mac-learn.h  |  9 ++--
 controller/ovn-controller.c | 21 
 controller/pinctrl.c| 64 +
 include/ovn/actions.h   |  3 ++
 lib/actions.c   | 47 +-
 northd/en-global-config.c   |  4 ++
 northd/northd.c |  6 +--
 tests/multinode.at  |  8 
 tests/ovn-northd.at |  3 ++
 tests/ovn.at|  8 ++--
 tests/system-ovn.at | 95 +
 tests/test-ovn.c|  1 +
 15 files changed, 239 insertions(+), 62 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 895d17d19..760ec0b41 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -874,6 +874,7 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 .collector_ids = l_ctx_in->collector_ids,
 .lflow_uuid = lflow->header_.uuid,
 .dp_key = ldp->datapath->tunnel_key,
+.explicit_arp_ns_output = l_ctx_in->explicit_arp_ns_output,
 
 .pipeline = ingress ? OVNACT_P_INGRESS : OVNACT_P_EGRESS,
 .ingress_ptable = OFTABLE_LOG_INGRESS_PIPELINE,
diff --git a/controller/lflow.h b/controller/lflow.h
index 9b7ffa19c..295d004f4 100644
--- a/controller/lflow.h
+++ b/controller/lflow.h
@@ -130,6 +130,7 @@ struct lflow_ctx_in {
 bool lb_hairpin_use_ct_mark;
 bool localnet_learn_fdb;
 bool localnet_learn_fdb_changed;
+bool explicit_arp_ns_output;
 };
 
 struct lflow_ctx_out {
diff --git a/controller/mac-learn.c b/controller/mac-learn.c
index 071f01b4f..0c3b60c23 100644
--- a/controller/mac-learn.c
+++ b/controller/mac-learn.c
@@ -199,15 +199,24 @@ ovn_fdb_add(struct hmap *fdbs, uint32_t dp_key, struct 
eth_addr mac,
 /* packet buffering functions */
 
 struct packet_data *
-ovn_packet_data_create(struct ofpbuf ofpacts,
-   const struct dp_packet *original_packet)
+ovn_packet_data_create(const struct ofputil_packet_in *pin,
+   const struct ofpbuf *continuation)
 {
 struct packet_data *pd = xmalloc(sizeof *pd);
 
-pd->ofpacts = ofpacts;
-/* clone the packet to send it later with correct L2 address */
-pd->p = dp_packet_clone_data(dp_packet_data(original_packet),
- dp_packet_size(original_packet));
+pd->pin = (struct ofputil_packet_in) {
+.packet = xmemdup(pin->packet, pin->packet_len),
+.packet_len = pin->packet_len,
+.flow_metadata = pin->flow_metadata,
+.reason = pin->reason,
+.table_id = pin->table_id,
+.cookie = pin->cookie,
+/* Userdata are empty on purpose,
+ * it is not needed for the continuation. */
+.userdata = NULL,
+.userdata_len = 0,
+};
+pd->continuation = ofpbuf_clone(continuation);
 
 return pd;
 }
@@ -216,8 +225,8 @@ ovn_packet_data_create(struct ofpbuf ofpacts,
 void
 ovn_packet_data_destroy(struct packet_data *pd)
 {
-dp_packet_delete(pd->p);
-ofpbuf_uninit(>ofpacts);
+free(pd->pin.packet);
+ofpbuf_delete(pd->continuation);
 free(pd);
 }
 
@@ -307,7 +316,10 @@ ovn_buffered_packets_ctx_run(struct buffered_packets_ctx 
*ctx,
 
 struct packet_data *pd;
 LIST_FOR_EACH_POP (pd, node, >queue) {
-struct eth_header *eth = dp_packet_data(pd->p);
+struct dp_packet packet;
+dp_packet_use_const(, pd->pin.packet, pd->pin.packet_len);
+
+struct eth_header *eth = dp_packet_data();
 eth->eth_dst = mac;
 
 ovs_list_push_back(>ready_packets_data, >node);
diff --git a/controller/mac-learn.h 

Re: [ovs-dev] [PATCH ovn] ovs: Bump the submodule to the tip of branch-3.3.

2024-04-19 Thread Dumitru Ceara
On 4/19/24 10:50, Ales Musil wrote:
> There are some improvements that are needed for further work on OVN
> side. Most notably the commits mentioned below:
> 
> a0df9c85def0 ("netdev-dummy: Add local route entries for IP addresses.")
> c67de08f1d82 ("dpif-netdev: Increase MAX_RECIRC_DEPTH to 8.")
> c560f6ca3257 ("ofproto-dpif-xlate: Fix continuations with associated 
> metering.")
> 6448c1b697d7 ("vlog: Log stack trace on vlog_abort.")
> 
> Signed-off-by: Ales Musil 
> ---

Thanks, Ales!  Applied to main and 24.03.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] ovs: Bump the submodule to the tip of branch-3.3.

2024-04-19 Thread Ales Musil
There are some improvements that are needed for further work on OVN
side. Most notably the commits mentioned below:

a0df9c85def0 ("netdev-dummy: Add local route entries for IP addresses.")
c67de08f1d82 ("dpif-netdev: Increase MAX_RECIRC_DEPTH to 8.")
c560f6ca3257 ("ofproto-dpif-xlate: Fix continuations with associated metering.")
6448c1b697d7 ("vlog: Log stack trace on vlog_abort.")

Signed-off-by: Ales Musil 
---
 ovs | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ovs b/ovs
index fe55ce37a..f19448b86 16
--- a/ovs
+++ b/ovs
@@ -1 +1 @@
-Subproject commit fe55ce37a7b090d09dee5c01ae0797320ad678f6
+Subproject commit f19448b8618967a108ec6f34713dd811ce1d1334
-- 
2.44.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] northd: Add support for disabling vxlan mode.

2024-04-19 Thread Dumitru Ceara
On 4/4/24 20:41, Vladislav Odintsov wrote:
> Commit [1] introduced a "vxlan mode" concept.  It brought a limitation
> for available tunnel IDs because of lack of space in VXLAN VNI.
> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs)
> and 2047 logical switch ports per datapath.
> 
> Prior to this patch vxlan mode was enabled automatically if at least one
> chassis had encap of vxlan type.  In scenarios where one want to use VXLAN
> only for HW VTEP (RAMP) switch, such limitation makes no sence.
> 
> This patch adds support for explicit disabling of vxlan mode via
> Northbound database.
> 
> 1: https://github.com/ovn-org/ovn/commit/b07f1bc3d068
> 
> CC: Ihar Hrachyshka 
> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath bindings")
> Signed-off-by: Vladislav Odintsov 
> ---

Thanks for the v3, Vladislav!

[...]

> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index b652046a7..9b2cb355e 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -381,6 +381,18 @@
>  of SB changes would be very noticeable.
>
>  
> +  
> +Be default if at least one chassis in OVN cluster has VXLAN encap,

Typo: "Be default".

> +northd will run in a `vxlan mode`, which means it splits 24-bit
> +VXLAN VNI for datapath and port tunnel IDs allocation evenly.  
> Maximum
> +datapaths count in this mode is 4095 and maximum ports per datapath 
> is
> +2047.  Rest of IDs are used for multicast groups.

Maybe "The rest of the IDs are used for.."?

> +In case VXLAN encaps are needed on chassis only to support HW VTEP
> +functionality, and main encap type is GENEVE or STT, set this option 
> to
> +`false` to use defaults -- 16-bit space for datapath tunnel IDS and 
> 15
> +bits for port tunnel IDs.
> +  
> +
>
>  
>These options control how routes are advertised between OVN
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index fc2c972a4..6edb1129e 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -2819,6 +2819,35 @@ AT_CHECK(
>  get_tunnel_keys
>  AT_CHECK([test $lsp02 = 3 && test $ls1 = 123])
>  
> +AT_CLEANUP
> +])
> +OVN_FOR_EACH_NORTHD_NO_HV([
> +AT_SETUP([check vxlan mode disabling])
> +ovn_start
> +
> +# Create a fake chassis with vxlan encap to implicitly enable vxlan mode.
> +ovn-sbctl \
> +--id=@e create encap chassis_name=hv1 ip="192.168.0.1" type="vxlan" \
> +-- --id=@c create chassis name=hv1 encaps=@e
> +
> +cmd="ovn-nbctl --wait=sb"
> +for i in {1..4097..1}; do
> +cmd="${cmd} -- ls-add lsw-${i}"
> +done
> +
> +eval $cmd

Couldn't this be just "check $cmd"?

> +
> +check_row_count nb:Logical_Switch 4097
> +wait_row_count sb:Datapath_Binding 4095
> +
> +OVS_WAIT_UNTIL([grep "all datapath tunnel ids exhausted" 
> northd/ovn-northd.log])
> +
> +# Explicitly disable vxlan mode and check that two remaining datapaths were 
> created.
> +check ovn-nbctl set NB_Global . options:disable_vxlan_mode=true
> +
> +check_row_count nb:Logical_Switch 4097
> +wait_row_count sb:Datapath_Binding 4097
> +
>  AT_CLEANUP
>  ])
>  

Aside for the minor things above (which I can address when applying the
patch so no need for v4) the rest looks good to me.  Ihar, what do you
think?

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: Add support for disabling vxlan mode.

2024-04-19 Thread Dumitru Ceara
On 4/4/24 21:23, Ihar Hrachyshka wrote:
> On Thu, Apr 4, 2024 at 6:06 AM Dumitru Ceara  wrote:
> 
>> On 4/3/24 22:05, Vladislav Odintsov wrote:
>>> re-sending email because ovs list rejected previous its content for some
>> reason:
>>>
>>> Hi Ihar,
>>>
>>
>> Hi Vladislav, Ihar,
>>
>>> thanks for your quick reaction!
>>> I didn’t see mentioned thread, but I think that it is not safe enough to
>> have automatic detection of this scenario here.
>>>
>>> Imagine: for VXLAN with HW VTEP scenario besides VXLAN encap one must
>> configure also either GENEVE and/or STT encap(s) for HV chassis.
>>>
>>> So, detection could be implemented like this:
>>> Check all non-VTEP chassis' encaps and find "effective encap" for each
>> of them. If we detect at least one chassis with "effective encap" == vxlan,
>> then enable vxlan mode. Normal mode otherwise.
>>> "effective encap" means that for 'vxlan,geneve,stt' encaps effective is
>> geneve, for 'vxlan,stt' -> stt, for 'vxlan' -> vxlan.
>>> Such behavior was my first idea.
>>>
>>> But I decided that there possible flapping of modes if there is a
>> problem/bug in deployment tooling and it is enough to have only one chassis
>> with wrong encap set to affect vxlan mode for entire OVN cluster. Such mode
>> flapping can result in problems with tunnel ids allocation.
>>
>> These are valid points.
>>
>>
> This is a valid concern. Should it perhaps be handled in the general case
> then? Meaning: calculate the max tunid once and store it in db? (I am
> thinking that maybe recalculating the max_tunid on every engine cycle - or
> on every controller restart - is unsafe?)
> 

OTOH, this is a rather "static" characteristic of the cluster and the
CMS can easily know what kind of encapsulation it's configuring OVN to use.

I think I'd keep it simple (for OVN).

> 
>>> So it seems that to have an option that statically sets vxlan mode is
>> more resilient.
>>
>> In general we try to avoid new config knobs.
>> .
>>> What do you think?
>>>
>>
>> But in this case it make actually be easier if we offload the work of
>> determining vxlan-mode to the CMS.
>>
>>>
 On 3 Apr 2024, at 20:43, Ihar Hrachyshka  wrote:

 Thank you Vladislav.

 FYI it was reported in the past in
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2022-July/051931.html
>> but fell through cracks then. Thanks for picking it up!

 In your patch, you introduce a new config option to disable the
>> 'vxlan-mode' behavior. This will definitely work. But I wonder if we can
>> automatically detect this scenario by ignoring the chassis that are VTEP
>> from consideration? I believe ovn-controller-vtep sets `is-vtep` in
>> other_config, so - would it work if we modify is_vxlan_mode to consider it
>> too?

 Thanks again for looking into this.
 Ihar

 On Wed, Apr 3, 2024 at 6:34 AM Vladislav Odintsov > > wrote:
> Commit [1] introduced a "vxlan mode" concept.  It brought a limitation
> for available tunnel IDs because of lack of space in VXLAN VNI.
> In vxlan mode OVN is limited by 4095 datapaths (LRs or non-transit LSs)
> and 2047 logical switch ports per datapath.
>
> Prior to this patch vxlan mode was enabled automatically if at least
>> one
> chassis had encap of vxlan type.  In scenarios where one want to use
>> VXLAN
> only for HW VTEP (RAMP) switch, such limitation makes no sence.
>
> This patch adds support for explicit disabling of vxlan mode via
> Northbound database.
>
> 0: https://github.com/ovn-org/ovn/commit/b07f1bc3d068
>
> CC: Ihar Hrachyshka mailto:ihrac...@redhat.com>>
> Fixes: b07f1bc3d068 ("Add VXLAN support for non-VTEP datapath
>> bindings")
> Signed-off-by: Vladislav Odintsov > odiv...@gmail.com>>
> ---
>  northd/en-global-config.c |  9 +++-
>  northd/northd.c   | 90 ++-
>  northd/northd.h   |  6 ++-
>  ovn-nb.xml| 12 ++
>  tests/ovn-northd.at    | 29 +
>  5 files changed, 94 insertions(+), 52 deletions(-)
>
> diff --git a/northd/en-global-config.c b/northd/en-global-config.c
> index 34e393b33..9310c4575 100644
> --- a/northd/en-global-config.c
> +++ b/northd/en-global-config.c
> @@ -115,8 +115,8 @@ en_global_config_run(struct engine_node *node ,
>> void *data)
>   config_data->svc_monitor_mac);
>  }
>
> -char *max_tunid = xasprintf("%d",
> -get_ovn_max_dp_key_local(sbrec_chassis_table));
> +init_vxlan_mode(>options, sbrec_chassis_table);
> +char *max_tunid = xasprintf("%d", get_ovn_max_dp_key_local());
>  smap_replace(options, "max_tunid", max_tunid);
>  free(max_tunid);
>
> @@ -523,6 +523,11 @@ check_nb_options_out_of_sync(const struct
>> nbrec_nb_global *nb,
>  return true;
>  }
>
> +  

Re: [ovs-dev] [PATCH ovn] controller: Fix an issue wrt cleanup of stale patch port.

2024-04-19 Thread Dumitru Ceara
On 3/28/24 13:28, Priyankar Jain wrote:
> Issue:
> Upon updating the network_name option of localnet port from one physical
> bridge to another, ovn-controller fails to cleanup the peer localnet
> patch port from the old bridge and ends up creating a duplicate peer
> localnet patch port which fails in the following ovsdb transaction:
> 
> ```
> "Transaction causes multiple rows in \"Interface\" table to have
> identical values
> (patch-localnet_a7d47490-8a90-4c4d-8266-2915ad07c185-to-br-int)
> ```
> 
> Current workflow:
> 1. Keep a set of all existing localnet ports on br-int. Let us call this
>set: existing_ports.
> 2. For each localnet port in SB:
>2.1 Create a patch port on br-int. (if it already exists on br-int,
>do not create but remove the entry from exisitng_ports set).
>2.2 Create a peer patch port on br-phys-x. (if it already exists on the
>bridge, do not create but remove the entry from exisitng_ports set).
> 3. Finally, cleanup all the ports and its peers from exisiting_ports.
> 
> With the above workflow, upon network_name change of localnet LSP, since
> ports on br-int does not change, only peer port needs to be move from
> br-phys-x to br-phys-y, the localnet port is removed from
> exisiting_ports in #2.1 and its peer never becomes eligible for cleanup.
> 
> Fix:
> Include both patch port on br-int and its peer port in the
> exisiting_ports set as part of #1.
> This make sures that the peer port is only removed from existing_ports
> only if it is already present on the correct bridge. (#2.1/#2.2)
> Otherwise, during the cleanup in #3 it will be considered.
> 

Hi Priyankar,

Thanks for the patch!

> Fixes: b600316 ("Don't delete patch ports that don't belong to br-int")

This should actually be:

Fixes: b600316f252a ("Don't delete patch ports that don't belong to br-int")

> Signed-off-by: Priyankar Jain 
> ---
>  controller/patch.c |  32 +++--
>  tests/ovn.at   | 109 +
>  2 files changed, 124 insertions(+), 17 deletions(-)
> 
> diff --git a/controller/patch.c b/controller/patch.c
> index c1cd5060d..4fed6e375 100644
> --- a/controller/patch.c
> +++ b/controller/patch.c
> @@ -314,6 +314,21 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>  || smap_get(>external_ids, "ovn-l3gateway-port")
>  || smap_get(>external_ids, "ovn-logical-patch-port")) {
>  shash_add(_ports, port->name, port);
> +/* Also add peer ports to the list. */
> +for (size_t j = 0; j < port->n_interfaces; j++) {
> +struct ovsrec_interface *p_iface = port->interfaces[j];
> +if (strcmp(p_iface->type, "patch")) {
> +continue;
> +}
> +const char *peer_name = smap_get(_iface->options, "peer");
> +if (peer_name) {
> +const struct ovsrec_port *peer_port =
> +get_port(ovsrec_port_by_name, peer_name);
> +if (peer_port) {
> +shash_add(_ports, peer_port->name, 
> peer_port);
> +}
> +}
> +}
>  }
>  }
>  
> @@ -336,23 +351,6 @@ patch_run(struct ovsdb_idl_txn *ovs_idl_txn,
>   * ovn-controller.  Otherwise it may cause unncessary dataplane
>   * interruption during restart/upgrade. */
>  if (!daemon_started_recently()) {
> -/* delete peer patch port first */
> -for (size_t i = 0; i < port->n_interfaces; i++) {
> -struct ovsrec_interface *iface = port->interfaces[i];
> -if (strcmp(iface->type, "patch")) {
> -continue;
> -}
> -const char *iface_peer = smap_get(>options, "peer");
> -if (iface_peer) {
> -const struct ovsrec_port *peer_port =
> -get_port(ovsrec_port_by_name, iface_peer);
> -if (peer_port) {
> -remove_port(bridge_table, peer_port);
> -}
> -}
> -}
> -
> -/* now delete integration bridge patch port */
>  remove_port(bridge_table, port);
>  }
>  }

The fix looks correct to me.

> diff --git a/tests/ovn.at b/tests/ovn.at
> index 4d0c7ad53..487e727c0 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -36454,6 +36454,115 @@ OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
>  
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([ovn-controller update network_name option for localnet port])
> +ovn_start
> +net_add n1
> +
> +sim_add hv1
> +as hv1
> +
> +# Bridge Topology
> +# Initial: br-int and br-phys-1 connected through ovn-localnet patch port.
> +#
> +# br-phys-1 -- br-int
> +#
> +# Final: br-int and br-phys-3 connected through ovn-localnet patch port.
> +# Similarly br-int-2 and br-hys-2 connected through localnet patch port
> +# but not owned by this 

Re: [ovs-dev] [PATCH ovn] northd: Support routing over other address families.

2024-04-19 Thread Dumitru Ceara
On 3/27/24 09:43, Felix Huettner via dev wrote:
> In most cases IPv4 packets are routed only over other IPv4 networks and
> IPv6 packets are routed only over IPv6 networks. However there is no
> interent reason for this limitation. Routing IPv4 packets over IPv6
> networks just requires the router to contain a route for an IPv4 network
> with an IPv6 nexthop.
> 
> This was previously prevented in OVN in ovn-nbctl and northd. By
> removing these filters the forwarding will work if the mac addresses are
> prepopulated.
> 
> If the mac addresses are not prepopulated we will attempt to resolve them 
> using
> the original address family of the packet and not the address family of the
> nexthop. This will fail and we will not forward the packet.
> 
> This feature can for example be used by service providers to
> interconnect multiple IPv4 networks of a customer without needing to
> negotiate free IPv4 addresses by just using any IPv6 address.
> 
> Signed-off-by: Felix Huettner 
> ---

Hi Felix,

Thanks for the patch!  It's a very useful addition to the OVN feature
set.  The code looks mostly OK to me, I only had some minor comments,
please see below.

>  NEWS  |   4 +
>  northd/northd.c   |  45 ++--
>  tests/ovn-nbctl.at|   8 +-
>  tests/ovn.at  | 615 ++
>  utilities/ovn-nbctl.c |  12 +-
>  5 files changed, 650 insertions(+), 34 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 4d6ebea89..b419b2628 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -12,6 +12,10 @@ Post v24.03.0
>  flow table id.
>  "lflow-stage-to-oftable STAGE_NAME" that converts stage name into 
> OpenFlow
>  table id.
> +  - Allow Static Routes where the address families of ip_prefix and nexthop
> +diverge (e.g. IPv4 packets over IPv6 links). This is currently limited to
> +nexthops that have their mac addresses prepopulated (so
> +dynamic_neigh_routers must be false).
>  
>  OVN v24.03.0 - 01 Mar 2024
>  --
> diff --git a/northd/northd.c b/northd/northd.c
> index 1839b7d8b..0359cde89 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -10238,18 +10238,6 @@ parsed_routes_add(struct ovn_datapath *od, const 
> struct hmap *lr_ports,
>  return NULL;
>  }
>  
> -/* Verify that ip_prefix and nexthop have same address familiy. */
> -if (valid_nexthop) {
> -if (IN6_IS_ADDR_V4MAPPED() != IN6_IS_ADDR_V4MAPPED()) 
> {
> -static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> -VLOG_WARN_RL(, "Address family doesn't match between 
> 'ip_prefix'"
> - " %s and 'nexthop' %s in static route "UUID_FMT,
> - route->ip_prefix, route->nexthop,
> - UUID_ARGS(>header_.uuid));
> -return NULL;
> -}
> -}
> -
>  /* Verify that ip_prefix and nexthop are on the same network. */
>  if (!is_discard_route &&
>  !find_static_route_outport(od, lr_ports, route,
> @@ -10666,7 +10654,7 @@ build_ecmp_route_flow(struct lflow_table *lflows, 
> struct ovn_datapath *od,
>struct lflow_ref *lflow_ref)
>  
>  {
> -bool is_ipv4 = IN6_IS_ADDR_V4MAPPED(>prefix);
> +bool is_ipv4_network = IN6_IS_ADDR_V4MAPPED(>prefix);
>  uint16_t priority;
>  struct ecmp_route_list_node *er;
>  struct ds route_match = DS_EMPTY_INITIALIZER;
> @@ -10675,7 +10663,8 @@ build_ecmp_route_flow(struct lflow_table *lflows, 
> struct ovn_datapath *od,
>  int ofs = !strcmp(eg->origin, ROUTE_ORIGIN_CONNECTED) ?
>  ROUTE_PRIO_OFFSET_CONNECTED: ROUTE_PRIO_OFFSET_STATIC;
>  build_route_match(NULL, eg->route_table_id, prefix_s, eg->plen,
> -  eg->is_src_route, is_ipv4, _match, , 
> ofs);
> +  eg->is_src_route, is_ipv4_network, _match,
> +  , ofs);
>  free(prefix_s);
>  
>  struct ds actions = DS_EMPTY_INITIALIZER;
> @@ -10708,7 +10697,11 @@ build_ecmp_route_flow(struct lflow_table *lflows, 
> struct ovn_datapath *od,
>  /* Find the outgoing port. */
>  const char *lrp_addr_s = NULL;
>  struct ovn_port *out_port = NULL;
> -if (!find_static_route_outport(od, lr_ports, route, is_ipv4,
> +bool is_ipv4_gateway = is_ipv4_network;
> +if (route->nexthop && route->nexthop[0]) {
> +  is_ipv4_gateway = strchr(route->nexthop, '.') ? true : false;
> +}

It might be a bit cleaner to just store the next-hop family in the
parsed_route structure at parse time, in parsed_routes_add().

> +if (!find_static_route_outport(od, lr_ports, route, is_ipv4_gateway,
> _addr_s, _port)) {
>  continue;
>  }
> @@ -10733,9 +10726,9 @@ build_ecmp_route_flow(struct lflow_table *lflows, 
> struct ovn_datapath *od,
>"eth.src = %s; "
>"outport = %s; "
>   

[ovs-dev] [PATCH net-next v4] net: openvswitch: Check vport netdev name

2024-04-19 Thread Jun Gu
Ensure that the provided netdev name is not one of its aliases to
prevent unnecessary creation and destruction of the vport by
ovs-vswitchd.

Signed-off-by: Jun Gu 
Acked-by: Eelco Chaudron 
---
 net/openvswitch/vport-netdev.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/openvswitch/vport-netdev.c b/net/openvswitch/vport-netdev.c
index 903537a5da22..618edc346c0f 100644
--- a/net/openvswitch/vport-netdev.c
+++ b/net/openvswitch/vport-netdev.c
@@ -78,7 +78,10 @@ struct vport *ovs_netdev_link(struct vport *vport, const 
char *name)
int err;
 
vport->dev = dev_get_by_name(ovs_dp_get_net(vport->dp), name);
-   if (!vport->dev) {
+   /* Ensure that the device exists and that the provided
+* name is not one of its aliases.
+*/
+   if (!vport->dev || strcmp(name, ovs_vport_name(vport))) {
err = -ENODEV;
goto error_free_vport;
}
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev