Re: [ovs-dev] [PATCH v2] openvswitch: Add support to count upall packets

2022-11-14 Thread Eelco Chaudron


On 15 Nov 2022, at 8:28, wangchuanlei wrote:

> Added the total number of upcalls and the total of upcall failures.
> Due to ovs-userspace do not support NLA_NESTED, here still use the
> "struct ovs_vport_upcall_stats"

I guess that was the idea, we should start using the NLA_NESTED attribute 
rather than just embedding data structures.
The required backend support should be there, functions likes 
nl_msg_start_nested(), and NL_NESTED_FOR_EACH() exists.

> Thank you
> wangchuanlei
>
> On 14 Sep 2022, at 14:14, wangchuanlei wrote:
>
>> Add support to count upcall packets on every interface.
>> I always encounter high cpu use of ovs-vswictchd, this help to check
>> which interface send too many packets without open vlog/set switch,
>> and have no influence on datapath.
>
> Hi,
>
> I did not do a full review, but I think we should not try to make the
>  same mistake as before and embed a structure inside a netlink message.
>
> You are adding “struct ovs_vport_upcall_stats” but in theory,
>  you could have just added the new entry to “ovs_vport_stats”.
>  But this is breaking userspace as it expects an exact structure size :(
>
> So I think the right approach would be to have “
> OVS_VPORT_ATTR_UPCALL_STATS” be an NLA_NESTED type, and have
> individual stat attributes as NLA_U64 (or whatever type you need).
>
> What is also confusing is that you use upcall_packets in
>  ovs_vport_upcall_stats, which to me are the total of up calls,
> but you called it n_missed in your stats. I think you should try to
>  avoid missed in the upcall path, and just call it n_upcall_packets also.
>
> In addition, I think you should keep two types of statics, and make them
>  available, namely the total number of upcalls and the total of upcall
> failures.
>
> Cheers,
>
> Eelco
>
> Signed-off-by: wangchuanlei 
> ---
>  include/uapi/linux/openvswitch.h |  6 
>  net/openvswitch/datapath.c   | 54 +++-
>  net/openvswitch/datapath.h   | 12 +++
>  net/openvswitch/vport.c  | 33 +++
>  net/openvswitch/vport.h  |  4 +++
>  5 files changed, 108 insertions(+), 1 deletion(-)
>
> diff --git a/include/uapi/linux/openvswitch.h 
> b/include/uapi/linux/openvswitch.h
> index 94066f87e9ee..bb671d92b711 100644
> --- a/include/uapi/linux/openvswitch.h
> +++ b/include/uapi/linux/openvswitch.h
> @@ -126,6 +126,11 @@ struct ovs_vport_stats {
>   __u64   tx_dropped; /* no space available in linux  */
>  };
>
> +struct ovs_vport_upcall_stats {
> + __u64   upcall_success; /* total packets upcalls  succeed */
> + __u64   upcall_fail;/* total packets 
> upcalls  failed */
> +};
> +
>  /* Allow last Netlink attribute to be unaligned */
>  #define OVS_DP_F_UNALIGNED   (1 << 0)
>
> @@ -277,6 +282,7 @@ enum ovs_vport_attr {
>   OVS_VPORT_ATTR_PAD,
>   OVS_VPORT_ATTR_IFINDEX,
>   OVS_VPORT_ATTR_NETNSID,
> + OVS_VPORT_ATTR_UPCALL_STATS, /* struct ovs_vport_upcall_stats */
>   __OVS_VPORT_ATTR_MAX
>  };
>
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index c8a9075ddd0a..8b8ea95f94ae 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -209,6 +209,28 @@ static struct vport *new_vport(const struct vport_parms 
> *parms)
>   return vport;
>  }
>
> +static void ovs_vport_upcalls(struct sk_buff *skb,
> +   const struct dp_upcall_info *upcall_info,
> +   bool upcall_success)
> +{
> + if (upcall_info->cmd == OVS_PACKET_CMD_MISS ||
> + upcall_info->cmd == OVS_PACKET_CMD_ACTION) {
> + const struct vport *p = OVS_CB(skb)->input_vport;
> + struct vport_upcall_stats_percpu *vport_stats;
> + u64 *stats_counter_upcall;
> +
> + vport_stats = this_cpu_ptr(p->vport_upcall_stats_percpu);
> + if (upcall_success)
> + stats_counter_upcall = _stats->n_upcall_success;
> + else
> + stats_counter_upcall = _stats->n_upcall_fail;
> +
> + u64_stats_update_begin(_stats->syncp);
> + (*stats_counter_upcall)++;
> + u64_stats_update_end(_stats->syncp);
> + }
> +}
> +
>  void ovs_dp_detach_port(struct vport *p)
>  {
>   ASSERT_OVSL();
> @@ -216,6 +238,9 @@ void ovs_dp_detach_port(struct vport *p)
>   /* First drop references to device. */
>   hlist_del_rcu(>dp_hash_node);
>
> + /* Free percpu memory */
> + free_percpu(p->vport_upcall_stats_percpu);
> +
>   /* Then destroy it. */
>   ovs_vport_del(p);
>  }
> @@ -305,8 +330,12 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff 
> *skb,
>   err = queue_userspace_packet(dp, skb, key, upcall_info, cutlen);
>   else
>   err = queue_gso_packets(dp, skb, key, upcall_info, cutlen);
> - if (err)
> + if (err) {
> + ovs_vport_upcalls(skb, upcall_info, 

[ovs-dev] [PATCH v2] openvswitch: Add support to count upall packets

2022-11-14 Thread wangchuanlei
Added the total number of upcalls and the total of upcall failures.
Due to ovs-userspace do not support NLA_NESTED, here still use the
"struct ovs_vport_upcall_stats"

Thank you
wangchuanlei

On 14 Sep 2022, at 14:14, wangchuanlei wrote:

> Add support to count upcall packets on every interface.
> I always encounter high cpu use of ovs-vswictchd, this help to check
> which interface send too many packets without open vlog/set switch,
> and have no influence on datapath.

Hi,

I did not do a full review, but I think we should not try to make the
 same mistake as before and embed a structure inside a netlink message.

You are adding “struct ovs_vport_upcall_stats” but in theory,
 you could have just added the new entry to “ovs_vport_stats”.
 But this is breaking userspace as it expects an exact structure size :(

So I think the right approach would be to have “
OVS_VPORT_ATTR_UPCALL_STATS” be an NLA_NESTED type, and have
individual stat attributes as NLA_U64 (or whatever type you need).

What is also confusing is that you use upcall_packets in
 ovs_vport_upcall_stats, which to me are the total of up calls,
but you called it n_missed in your stats. I think you should try to
 avoid missed in the upcall path, and just call it n_upcall_packets also.

In addition, I think you should keep two types of statics, and make them
 available, namely the total number of upcalls and the total of upcall
failures.

Cheers,

Eelco

Signed-off-by: wangchuanlei 
---
 include/uapi/linux/openvswitch.h |  6 
 net/openvswitch/datapath.c   | 54 +++-
 net/openvswitch/datapath.h   | 12 +++
 net/openvswitch/vport.c  | 33 +++
 net/openvswitch/vport.h  |  4 +++
 5 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/openvswitch.h b/include/uapi/linux/openvswitch.h
index 94066f87e9ee..bb671d92b711 100644
--- a/include/uapi/linux/openvswitch.h
+++ b/include/uapi/linux/openvswitch.h
@@ -126,6 +126,11 @@ struct ovs_vport_stats {
__u64   tx_dropped; /* no space available in linux  */
 };
 
+struct ovs_vport_upcall_stats {
+   __u64   upcall_success; /* total packets upcalls  succeed */
+   __u64   upcall_fail;/* total packets 
upcalls  failed */
+};
+
 /* Allow last Netlink attribute to be unaligned */
 #define OVS_DP_F_UNALIGNED (1 << 0)
 
@@ -277,6 +282,7 @@ enum ovs_vport_attr {
OVS_VPORT_ATTR_PAD,
OVS_VPORT_ATTR_IFINDEX,
OVS_VPORT_ATTR_NETNSID,
+   OVS_VPORT_ATTR_UPCALL_STATS, /* struct ovs_vport_upcall_stats */
__OVS_VPORT_ATTR_MAX
 };
 
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index c8a9075ddd0a..8b8ea95f94ae 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -209,6 +209,28 @@ static struct vport *new_vport(const struct vport_parms 
*parms)
return vport;
 }
 
+static void ovs_vport_upcalls(struct sk_buff *skb,
+ const struct dp_upcall_info *upcall_info,
+ bool upcall_success)
+{
+   if (upcall_info->cmd == OVS_PACKET_CMD_MISS ||
+   upcall_info->cmd == OVS_PACKET_CMD_ACTION) {
+   const struct vport *p = OVS_CB(skb)->input_vport;
+   struct vport_upcall_stats_percpu *vport_stats;
+   u64 *stats_counter_upcall;
+
+   vport_stats = this_cpu_ptr(p->vport_upcall_stats_percpu);
+   if (upcall_success)
+   stats_counter_upcall = _stats->n_upcall_success;
+   else
+   stats_counter_upcall = _stats->n_upcall_fail;
+
+   u64_stats_update_begin(_stats->syncp);
+   (*stats_counter_upcall)++;
+   u64_stats_update_end(_stats->syncp);
+   }
+}
+
 void ovs_dp_detach_port(struct vport *p)
 {
ASSERT_OVSL();
@@ -216,6 +238,9 @@ void ovs_dp_detach_port(struct vport *p)
/* First drop references to device. */
hlist_del_rcu(>dp_hash_node);
 
+   /* Free percpu memory */
+   free_percpu(p->vport_upcall_stats_percpu);
+
/* Then destroy it. */
ovs_vport_del(p);
 }
@@ -305,8 +330,12 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
err = queue_userspace_packet(dp, skb, key, upcall_info, cutlen);
else
err = queue_gso_packets(dp, skb, key, upcall_info, cutlen);
-   if (err)
+   if (err) {
+   ovs_vport_upcalls(skb, upcall_info, false);
goto err;
+   } else {
+   ovs_vport_upcalls(skb, upcall_info, true);
+   }
 
return 0;
 
@@ -1825,6 +1854,13 @@ static int ovs_dp_cmd_new(struct sk_buff *skb, struct 
genl_info *info)
goto err_destroy_portids;
}
 
+   vport->vport_upcall_stats_percpu =
+   netdev_alloc_pcpu_stats(struct 
vport_upcall_stats_percpu);
+   if 

Re: [ovs-dev] [PATCH ovn] pinctrl: Use 16-aligned 32-bit fields in bfd_msg.

2022-11-14 Thread Ales Musil
On Wed, Nov 2, 2022 at 2:33 PM Dumitru Ceara  wrote:

> This avoids misaligned accesses as flagged by UBSan when running CoPP
> system tests:
>
>   controller/pinctrl.c:7129:28: runtime error: member access within
> misaligned address 0x61b627be for type 'const struct bfd_msg', which
> requires 4 byte alignment
>   0x61b627be: note: pointer points here
>00 20 d3 d4 20 60  05 18 a8 99 e7 7b 92 1d  36 73 00 0f 42 40 00 0f  42
> 40 00 00 00 00 00 00  00 03
>^
>   #0 0x621b8f in pinctrl_check_bfd_msg
> /root/ovn/controller/pinctrl.c:7129:28
>   #1 0x621b8f in pinctrl_handle_bfd_msg
> /root/ovn/controller/pinctrl.c:7183:10
>   #2 0x621b8f in process_packet_in
> /root/ovn/controller/pinctrl.c:3320:9
>   #3 0x621b8f in pinctrl_recv /root/ovn/controller/pinctrl.c:3386:9
>   #4 0x621b8f in pinctrl_handler /root/ovn/controller/pinctrl.c:3481:17
>   #5 0xa2d89a in ovsthread_wrapper
> /root/ovn/ovs/lib/ovs-thread.c:422:12
>   #6 0x7fcb598081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
>   #7 0x7fcb58439dd2 in clone (/lib64/libc.so.6+0x39dd2)
>
> Fixes: 117203584d98 ("controller: introduce BFD tx path in
> ovn-controller.")
> Reported-by: Ilya Maximets 
> Signed-off-by: Dumitru Ceara 
> ---
>  controller/pinctrl.c | 26 ++
>  lib/ovn-l7.h | 10 +-
>  2 files changed, 19 insertions(+), 17 deletions(-)
>
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index 8859cb080d..9913d5ae12 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -6890,13 +6890,14 @@ pinctrl_find_bfd_monitor_entry_by_port(char *ip,
> uint16_t port)
>  }
>
>  static struct bfd_entry *
> -pinctrl_find_bfd_monitor_entry_by_disc(char *ip, ovs_be32 disc)
> +pinctrl_find_bfd_monitor_entry_by_disc(const char *ip,
> +   const ovs_16aligned_be32 *disc)
>  {
>  struct bfd_entry *ret = NULL, *entry;
>
>  HMAP_FOR_EACH_WITH_HASH (entry, node, hash_string(ip, 0),
>   _monitor_map) {
> -if (entry->local_disc == disc) {
> +if (entry->local_disc == get_16aligned_be32(disc)) {
>  ret = entry;
>  break;
>  }
> @@ -6955,11 +6956,11 @@ bfd_monitor_put_bfd_msg(struct bfd_entry *entry,
> struct dp_packet *packet,
>  msg->length = BFD_PACKET_LEN;
>  msg->flags = final ? BFD_FLAG_FINAL : 0;
>  msg->flags |= entry->state << 6;
> -msg->my_disc = entry->local_disc;
> -msg->your_disc = entry->remote_disc;
> +put_16aligned_be32(>my_disc, entry->local_disc);
> +put_16aligned_be32(>your_disc, entry->remote_disc);
>  /* min_tx and min_rx are in us - RFC 5880 page 9 */
> -msg->min_tx = htonl(entry->local_min_tx * 1000);
> -msg->min_rx = htonl(entry->local_min_rx * 1000);
> +put_16aligned_be32(>min_tx, htonl(entry->local_min_tx * 1000));
> +put_16aligned_be32(>min_rx, htonl(entry->local_min_rx * 1000));
>
>  if (!IN6_IS_ADDR_V4MAPPED(>ip_src)) {
>  /* IPv6 needs UDP checksum calculated */
> @@ -7154,7 +7155,7 @@ pinctrl_check_bfd_msg(const struct flow *ip_flow,
> struct dp_packet *pkt_in)
>  return false;
>  }
>
> -if (!msg->my_disc) {
> +if (!get_16aligned_be32(>my_disc)) {
>  VLOG_DBG_RL(, "BFD action: my_disc not set");
>  return false;
>  }
> @@ -7165,7 +7166,8 @@ pinctrl_check_bfd_msg(const struct flow *ip_flow,
> struct dp_packet *pkt_in)
>  }
>
>  enum bfd_state peer_state = msg->flags >> 6;
> -if (peer_state >= BFD_STATE_INIT && !msg->your_disc) {
> +if (peer_state >= BFD_STATE_INIT
> +&& !get_16aligned_be32(>your_disc)) {
>  VLOG_DBG_RL(,
>  "BFD action: remote state is %s and your_disc is not
> set",
>  bfd_get_status(peer_state));
> @@ -7193,7 +7195,7 @@ pinctrl_handle_bfd_msg(struct rconn *swconn, const
> struct flow *ip_flow,
>
>  const struct bfd_msg *msg = dp_packet_get_udp_payload(pkt_in);
>  struct bfd_entry *entry =
> -pinctrl_find_bfd_monitor_entry_by_disc(ip_src, msg->your_disc);
> +pinctrl_find_bfd_monitor_entry_by_disc(ip_src, >your_disc);
>
>  if (!entry) {
>  free(ip_src);
> @@ -7201,9 +7203,9 @@ pinctrl_handle_bfd_msg(struct rconn *swconn, const
> struct flow *ip_flow,
>  }
>
>  bool change_state = false;
> -entry->remote_disc = msg->my_disc;
> -uint32_t remote_min_tx = ntohl(msg->min_tx) / 1000;
> -entry->remote_min_rx = ntohl(msg->min_rx) / 1000;
> +entry->remote_disc = get_16aligned_be32(>my_disc);
> +uint32_t remote_min_tx = ntohl(get_16aligned_be32(>min_tx)) /
> 1000;
> +entry->remote_min_rx = ntohl(get_16aligned_be32(>min_rx)) / 1000;
>  entry->detection_timeout = msg->mult * MAX(remote_min_tx,
> entry->local_min_rx);
>
> diff --git a/lib/ovn-l7.h b/lib/ovn-l7.h
> index b03b945d89..2b20bc380a 100644
> --- 

Re: [ovs-dev] [PATCH ovn] controller: Fix QoS for ports with undetected or too low link speed.

2022-11-14 Thread Ales Musil
On Tue, Nov 1, 2022 at 3:01 PM Ilya Maximets  wrote:

> If netdev_set_qos() is called without specifying max-rate, it will
> use reported link speed of that interface instead as a ceil.
>
> If OVS libraries are not able to detect the link speed, the 100 Mbps
> will be used.  Also, TAP interfaces do report 10 Mbps as their link
> speed by default.
>
> In both cases all the traffic will be unnecessary limited to just
> 100 or 10 Mpbs regardless of the 'qos_max_rate' configured for that
> port in OVN databases, because maximum rate of a single queue can not
> be higher than a total maximum rate of a QoS type.
>
> Fix that by always specifying the max-rate as a maximum supported
> value - (2^32 - 1) * 8 bits per second.  This way netdev_set_qos()
> will not need to check the link speed or guess a value.
>
> Fixes: 8dda482b8059 ("Check and allocate free qdisc queue id for ports
> with qos parameters")
> Reported-at: https://bugzilla.redhat.com/2136716
> Signed-off-by: Ilya Maximets 
> ---
>  controller/binding.c | 9 -
>  tests/system-ovn.at  | 6 +-
>  2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index c3d2b2e42..a50379895 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -220,7 +220,14 @@ set_noop_qos(struct ovsdb_idl_txn *ovs_idl_txn,
>  static void
>  set_qos_type(struct netdev *netdev, const char *type)
>  {
> -int error = netdev_set_qos(netdev, type, NULL);
> +/* 34359738360 == (2^32 - 1) * 8.  netdev_set_qos() doesn't support
> + * 64-bit rate netlink attributes, so the maximum value is 2^32 - 1
> bytes.
> + * The 'max-rate' config option is in bits, so multiplying by 8.
> + * Without setting max-rate the reported link speed will be used,
> which
> + * can be unrecognized for certain NICs or reported too low for
> virtual
> + * interfaces. */
> +const struct smap conf = SMAP_CONST1(, "max-rate",
> "34359738360");
> +int error = netdev_set_qos(netdev, type, );
>  if (error) {
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1);
>  VLOG_WARN_RL(, "%s: could not set qdisc type \"%s\" (%s)",
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 20c058415..cc267ba25 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -6353,8 +6353,12 @@ OVS_WAIT_UNTIL([tc qdisc show | grep -q 'htb 1: dev
> ovs-public'])
>  OVS_WAIT_UNTIL([tc class show dev ovs-public | \
>  grep -q 'class htb .* rate 200Kbit ceil 300Kbit burst
> 375000b cburst 375000b'])
>
> -AT_CHECK([ovn-nbctl remove Logical_Switch_Port public options
> qos_min_rate=20])
> +
>  AT_CHECK([ovn-nbctl remove Logical_Switch_Port public options
> qos_max_rate=30])
> +OVS_WAIT_UNTIL([tc class show dev ovs-public | \
> +grep -q 'class htb .* rate 200Kbit ceil 34359Mbit burst
> 375000b .*'])
> +
> +AT_CHECK([ovn-nbctl remove Logical_Switch_Port public options
> qos_min_rate=20])
>  AT_CHECK([ovn-nbctl remove Logical_Switch_Port public options
> qos_burst=300])
>  OVS_WAIT_UNTIL([test "$(tc qdisc show | grep 'htb 1: dev ovs-public')" =
> ""])
>
> --
> 2.37.3
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks.

Acked-by: Ales Musil 
-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 ovn] controller: improve buffered packets management

2022-11-14 Thread Ales Musil
On Mon, Oct 24, 2022 at 11:29 AM Lorenzo Bianconi <
lorenzo.bianc...@redhat.com> wrote:

> Improve buffered packet management in ovn-controller avoid useless loop
> in run_buffered_binding routine and using datapath key and output port
> key as buffered_packets_map hashmap hash. Add new selftest for buffered
> packets.
>
> Signed-off-by: Lorenzo Bianconi 
> ---
> Changes since v2:
> - improve hash function
> Changes since v1:
> - improve code readability
> ---
>  controller/pinctrl.c | 118 
>  tests/system-ovn.at  | 124 +++
>  2 files changed, 208 insertions(+), 34 deletions(-)
>
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index 8859cb080..dfcd0cea8 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -182,6 +182,7 @@ static void destroy_buffered_packets_map(void);
>  static void
>  run_buffered_binding(struct ovsdb_idl_index
> *sbrec_mac_binding_by_lport_ip,
>   struct ovsdb_idl_index
> *sbrec_port_binding_by_datapath,
> + struct ovsdb_idl_index *sbrec_port_binding_by_name,
>   const struct hmap *local_datapaths)
>  OVS_REQUIRES(pinctrl_mutex);
>
> @@ -1430,6 +1431,9 @@ struct buffered_packets {
>  struct in6_addr ip;
>  struct eth_addr ea;
>
> +uint64_t dp_key;
> +uint64_t port_key;
> +
>  long long int timestamp;
>
>  struct buffer_info data[BUFFER_QUEUE_DEPTH];
> @@ -1556,38 +1560,38 @@ buffered_packets_map_gc(void)
>  }
>
>  static struct buffered_packets *
> -pinctrl_find_buffered_packets(const struct in6_addr *ip, uint32_t hash)
> +pinctrl_find_buffered_packets(uint64_t dp_key, uint64_t port_key,
> +  uint32_t hash)
>  {
>  struct buffered_packets *qp;
> -
> -HMAP_FOR_EACH_WITH_HASH (qp, hmap_node, hash,
> - _packets_map) {
> -if (IN6_ARE_ADDR_EQUAL(>ip, ip)) {
> +HMAP_FOR_EACH_WITH_HASH (qp, hmap_node, hash, _packets_map) {
> +if (qp->dp_key == dp_key && qp->port_key == port_key) {
>  return qp;
>  }
>  }
>  return NULL;
>  }
>
> +static uint32_t
> +pinctrl_buffer_apcket_hash(uint64_t dp_key, uint64_t port_key)
>

nit: typo in the function name "pinctrl_buffer_apcket_hash" ->
"pinctrl_buffer_packet_hash".
I guess this can be fixed during merge, let's see if others agree.


> +{
> +uint32_t hash = 0;
> +hash = hash_add64(hash, port_key);
> +hash = hash_add64(hash, dp_key);
> +return hash_finish(hash, 16);
> +}
> +
>  /* Called with in the pinctrl_handler thread context. */
>  static int
>  pinctrl_handle_buffered_packets(struct dp_packet *pkt_in,
>  const struct match *md, bool is_arp)
>  OVS_REQUIRES(pinctrl_mutex)
>  {
> -struct buffered_packets *bp;
> -struct dp_packet *clone;
> -struct in6_addr addr;
> -
> -if (is_arp) {
> -addr = in6_addr_mapped_ipv4(htonl(md->flow.regs[0]));
> -} else {
> -ovs_be128 ip6 = hton128(flow_get_xxreg(>flow, 0));
> -memcpy(, , sizeof addr);
> -}
> -
> -uint32_t hash = hash_bytes(, sizeof addr, 0);
> -bp = pinctrl_find_buffered_packets(, hash);
> +uint64_t dp_key = ntohll(md->flow.metadata);
> +uint64_t oport_key = md->flow.regs[MFF_LOG_OUTPORT - MFF_REG0];
> +uint32_t hash = pinctrl_buffer_apcket_hash(dp_key, oport_key);
> +struct buffered_packets *bp
> += pinctrl_find_buffered_packets(dp_key, oport_key, hash);
>  if (!bp) {
>  if (hmap_count(_packets_map) >= 1000) {
>  COVERAGE_INC(pinctrl_drop_buffered_packets_map);
> @@ -1597,12 +1601,20 @@ pinctrl_handle_buffered_packets(struct dp_packet
> *pkt_in,
>  bp = xmalloc(sizeof *bp);
>  hmap_insert(_packets_map, >hmap_node, hash);
>  bp->head = bp->tail = 0;
> -bp->ip = addr;
> +if (is_arp) {
> +bp->ip = in6_addr_mapped_ipv4(htonl(md->flow.regs[0]));
> +} else {
> +ovs_be128 ip6 = hton128(flow_get_xxreg(>flow, 0));
> +memcpy(>ip, , sizeof bp->ip);
> +}
> +bp->dp_key = dp_key;
> +bp->port_key = oport_key;
>  }
>  bp->timestamp = time_msec();
>  /* clone the packet to send it later with correct L2 address */
> -clone = dp_packet_clone_data(dp_packet_data(pkt_in),
> - dp_packet_size(pkt_in));
> +struct dp_packet *clone
> += dp_packet_clone_data(dp_packet_data(pkt_in),
> +   dp_packet_size(pkt_in));
>  buffered_push_packet(bp, clone, md);
>
>  return 0;
> @@ -3586,6 +3598,7 @@ pinctrl_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>sbrec_ip_multicast_opts);
>  run_buffered_binding(sbrec_mac_binding_by_lport_ip,
>   sbrec_port_binding_by_datapath,
> + sbrec_port_binding_by_name,
>   

Re: [ovs-dev] [PATCH ovn] tests: Fix flaky test "IPv6 Neighbor Solicitation for unknown MAC"

2022-11-14 Thread Ales Musil
On Mon, Nov 14, 2022 at 11:51 AM Xavier Simonart 
wrote:

> Fixes: 9ac548524218 ("pinctrl: Send RARPs for external ipv6 interfaces")
>
> Signed-off-by: Xavier Simonart 
> ---
>  tests/ovn.at | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 6552681bd..3131b8d02 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -13954,8 +13954,9 @@ AT_CHECK([cat 2.packets | cut -c 117-], [0],
> [expout])
>
>  # Check that NS packets are not flooded across routing domains. That means
>  # that hv2 should not send any packets across the physical network.
> +# Use the grep here to filter out rarp packets that might have arrived
>  $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-phys_n1-tx.pcap | \
> -trim_zeros > 2.packets
> +grep -v  | trim_zeros > 2.packets
>  AT_CHECK([cat 2.packets], [0], [])
>
>  # Now send a packet with destination ip other than
> @@ -13995,8 +13996,9 @@ AT_CHECK([cat 2.packets | cut -c 117-], [0],
> [expout])
>
>  # Check that NS packets are not flooded across routing domains. That means
>  # that hv2 should not send any packets across the physical network.
> +# Use the grep here to filter out rarp packets that might have arrived
>  $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-phys_n1-tx.pcap | \
> -trim_zeros > 2.packets
> +grep -v  | trim_zeros > 2.packets
>  AT_CHECK([cat 2.packets], [0], [])
>
>  OVN_CLEANUP([hv1])
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Looks good to me, thanks!

Acked-by: Ales Musil 

-- 

Ales Musil

Senior Software Engineer - OVN Core

Red Hat EMEA 

amu...@redhat.comIM: amusil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] utilities/ofctl: add-meters for save and restore

2022-11-14 Thread Wan Junjie
put dump-meters' result in one line so add-meters can handle.
save and restore meters when restart ovs.
bundle functions are not implemented in this patch.

Signed-off-by: Wan Junjie 
---
v3:
add '--oneline' option for dump-meter(s) command

v2:
fix failed testcases, as dump-meters format changes
---
 include/openvswitch/ofp-meter.h |   9 +-
 lib/ofp-meter.c | 103 -
 lib/ofp-print.c |   5 +-
 tests/dpif-netdev.at|   9 ++
 utilities/ovs-ofctl.8.in|  20 ++-
 utilities/ovs-ofctl.c   | 263 +---
 utilities/ovs-save  |   8 +
 7 files changed, 383 insertions(+), 34 deletions(-)

diff --git a/include/openvswitch/ofp-meter.h b/include/openvswitch/ofp-meter.h
index 6776eae87..0514b4ec4 100644
--- a/include/openvswitch/ofp-meter.h
+++ b/include/openvswitch/ofp-meter.h
@@ -61,7 +61,8 @@ int ofputil_decode_meter_config(struct ofpbuf *,
 struct ofputil_meter_config *,
 struct ofpbuf *bands);
 void ofputil_format_meter_config(struct ds *,
- const struct ofputil_meter_config *);
+ const struct ofputil_meter_config *,
+ int);
 
 struct ofputil_meter_mod {
 uint16_t command;
@@ -79,6 +80,12 @@ char *parse_ofp_meter_mod_str(struct ofputil_meter_mod *, 
const char *string,
 OVS_WARN_UNUSED_RESULT;
 void ofputil_format_meter_mod(struct ds *, const struct ofputil_meter_mod *);
 
+char *parse_ofp_meter_mod_file(const char *file_name,
+ int command,
+ struct ofputil_meter_mod **mms, size_t *n_mms,
+ enum ofputil_protocol *usable_protocols)
+OVS_WARN_UNUSED_RESULT;
+
 struct ofputil_meter_stats {
 uint32_t meter_id;
 uint32_t flow_count;
diff --git a/lib/ofp-meter.c b/lib/ofp-meter.c
index 9ea40a0bf..b94cb6a05 100644
--- a/lib/ofp-meter.c
+++ b/lib/ofp-meter.c
@@ -15,6 +15,7 @@
  */
 
 #include 
+#include 
 #include "openvswitch/ofp-meter.h"
 #include "byte-order.h"
 #include "nx-match.h"
@@ -57,7 +58,7 @@ void
 ofputil_format_meter_band(struct ds *s, enum ofp13_meter_flags flags,
   const struct ofputil_meter_band *mb)
 {
-ds_put_cstr(s, "\ntype=");
+ds_put_cstr(s, "type=");
 switch (mb->type) {
 case OFPMBT13_DROP:
 ds_put_cstr(s, "drop");
@@ -343,7 +344,7 @@ ofp_print_meter_flags(struct ds *s, enum ofp13_meter_flags 
flags)
 
 void
 ofputil_format_meter_config(struct ds *s,
-const struct ofputil_meter_config *mc)
+const struct ofputil_meter_config *mc, int oneline)
 {
 uint16_t i;
 
@@ -354,9 +355,12 @@ ofputil_format_meter_config(struct ds *s,
 
 ds_put_cstr(s, "bands=");
 for (i = 0; i < mc->n_bands; ++i) {
+ds_put_cstr(s, oneline > 0 ? " ": "\n");
 ofputil_format_meter_band(s, mc->flags, >bands[i]);
 }
-ds_put_char(s, '\n');
+if (oneline == 0) {
+ds_put_char(s, '\n');
+}
 }
 
 static enum ofperr
@@ -578,6 +582,24 @@ parse_ofp_meter_mod_str__(struct ofputil_meter_mod *mm, 
char *string,
 
 /* Meters require at least OF 1.3. */
 *usable_protocols = OFPUTIL_P_OF13_UP;
+if (command == -2) {
+size_t len;
+
+string += strspn(string, " \t\r\n");   /* Skip white space. */
+len = strcspn(string, ", \t\r\n"); /* Get length of the first token. */
+
+if (!strncmp(string, "add", len)) {
+command = OFPMC13_ADD;
+} else if (!strncmp(string, "delete", len)) {
+command = OFPMC13_DELETE;
+} else if (!strncmp(string, "modify", len)) {
+command = OFPMC13_MODIFY;
+} else {
+len = 0;
+command = OFPMC13_ADD;
+}
+string += len;
+}
 
 switch (command) {
 case -1:
@@ -606,6 +628,11 @@ parse_ofp_meter_mod_str__(struct ofputil_meter_mod *mm, 
char *string,
 mm->meter.n_bands = 0;
 mm->meter.bands = NULL;
 
+if (command == OFPMC13_DELETE && string[0] == '\0') {
+mm->meter.meter_id = OFPM13_ALL;
+return NULL;
+}
+
 if (fields & F_BANDS) {
 band_str = strstr(string, "band");
 if (!band_str) {
@@ -805,5 +832,73 @@ ofputil_format_meter_mod(struct ds *s, const struct 
ofputil_meter_mod *mm)
 ds_put_format(s, " cmd:%d ", mm->command);
 }
 
-ofputil_format_meter_config(s, >meter);
+ofputil_format_meter_config(s, >meter, 0);
+}
+
+/* If 'command' is given as -2, each line may start with a command name ("add",
+ * "modify", "delete").  A missing command name is treated as "add".
+ */
+char * OVS_WARN_UNUSED_RESULT
+parse_ofp_meter_mod_file(const char *file_name,
+ int command,
+ struct ofputil_meter_mod **mms, size_t *n_mms,
+ enum ofputil_protocol 

Re: [ovs-dev] [PATCH ovn v2 2/2] northd: Add I-P for syncing SB address sets.

2022-11-14 Thread Numan Siddique
On Mon, Nov 14, 2022 at 3:23 PM Mark Michelson  wrote:
>
> Hi Numan, I have just one minor suggestion below.
>
> On 11/14/22 11:48, num...@ovn.org wrote:
> > From: Numan Siddique 
> >
> > Updates to NB address sets and NB port groups are handled
> > incrementally for syncing the SB address sets.  This patch
> > doesn't support syncing the SB Address sets for the router
> > load balancer vips incrementally, instead a full recompute is
> > triggered for any changes to NB load balancers, NB load balancer
> > groups and NB logical routers.
> >
> > Signed-off-by: Numan Siddique 
> > ---
> >   northd/en-sb-sync.c  | 202 ---
> >   northd/en-sb-sync.h  |   6 ++
> >   northd/inc-proc-northd.c |  18 +++-
> >   tests/ovn-northd.at  |  52 ++
> >   4 files changed, 260 insertions(+), 18 deletions(-)
> >
> > diff --git a/northd/en-sb-sync.c b/northd/en-sb-sync.c
> > index c3ba315df..8a17998ec 100644
> > --- a/northd/en-sb-sync.c
> > +++ b/northd/en-sb-sync.c
> > @@ -22,6 +22,7 @@
> >   #include "openvswitch/util.h"
> >
> >   #include "en-sb-sync.h"
> > +#include "include/ovn/expr.h"
> >   #include "lib/inc-proc-eng.h"
> >   #include "lib/lb.h"
> >   #include "lib/ovn-nb-idl.h"
> > @@ -41,6 +42,13 @@ static void sync_address_sets(const struct 
> > nbrec_address_set_table *,
> > const struct sbrec_address_set_table *,
> > struct ovsdb_idl_txn *ovnsb_txn,
> > struct hmap *datapaths);
> > +static const struct sbrec_address_set *sb_address_set_lookup_by_name(
> > +struct ovsdb_idl_index *, const char *name);
> > +static void update_sb_addr_set(const char **nb_addresses, size_t 
> > n_addresses,
> > +   const struct sbrec_address_set *);
> > +static void build_port_group_address_set(const struct nbrec_port_group *,
> > + struct svec *ipv4_addrs,
> > + struct svec *ipv6_addrs);
> >
> >   void *
> >   en_sb_sync_init(struct engine_node *node OVS_UNUSED,
> > @@ -94,6 +102,98 @@ en_address_set_sync_cleanup(void *data OVS_UNUSED)
> >
> >   }
> >
> > +bool
> > +address_set_sync_nb_address_set_handler(struct engine_node *node 
> > OVS_UNUSED,
> > +void *data OVS_UNUSED)
> > +{
> > +const struct nbrec_address_set_table *nb_address_set_table =
> > +EN_OVSDB_GET(engine_get_input("NB_address_set", node));
> > +
> > +/* Return false if an address set is created or deleted.
> > + * Handle I-P for only updated address sets. */
> > +const struct nbrec_address_set *nb_addr_set;
> > +NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
> > +  nb_address_set_table) {
> > +if (nbrec_address_set_is_new(nb_addr_set) ||
> > +nbrec_address_set_is_deleted(nb_addr_set)) {
> > +return false;
> > +}
> > +}
> > +
> > +struct ovsdb_idl_index *sbrec_address_set_by_name =
> > +engine_ovsdb_node_get_index(
> > +engine_get_input("SB_address_set", node),
> > +"sbrec_address_set_by_name");
> > +
> > +NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
> > +  nb_address_set_table) {
> > +const struct sbrec_address_set *sb_addr_set =
> > +sb_address_set_lookup_by_name(sbrec_address_set_by_name,
> > +  nb_addr_set->name);
> > +if (!sb_addr_set) {
> > +return false;
> > +}
> > +update_sb_addr_set((const char **) nb_addr_set->addresses,
> > +   nb_addr_set->n_addresses, sb_addr_set);
> > +}
> > +
> > +return true;
> > +}
> > +
> > +bool
> > +address_set_sync_nb_port_group_handler(struct engine_node *node OVS_UNUSED,
> > +   void *data OVS_UNUSED)
> > +{
> > +const struct nbrec_port_group *nb_pg;
> > +const struct nbrec_port_group_table *nb_port_group_table =
> > +EN_OVSDB_GET(engine_get_input("NB_port_group", node));
> > +NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
> > +if (nbrec_port_group_is_new(nb_pg) ||
> > +nbrec_port_group_is_deleted(nb_pg)) {
> > +return false;
> > +}
> > +}
> > +
> > +struct ovsdb_idl_index *sbrec_address_set_by_name =
> > +engine_ovsdb_node_get_index(
> > +engine_get_input("SB_address_set", node),
> > +"sbrec_address_set_by_name");
> > +NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
> > +char *ipv4_addrs_name = xasprintf("%s_ip4", nb_pg->name);
> > +const struct sbrec_address_set *sb_addr_set_v4 =
> > +sb_address_set_lookup_by_name(sbrec_address_set_by_name,
> > + 

Re: [ovs-dev] [PATCH ovn v2 1/2] northd IP: Add a new engine node 'en_sb_sync' to sync SB tables.

2022-11-14 Thread Mark Michelson

Acked-by: Mark Michelson 

On 11/14/22 11:47, num...@ovn.org wrote:

From: Numan Siddique 

A sub-engine node 'en_address_set_sync' is added with-in the
'en_sb_sync' node to sync the Address_Set table in the
SB database.  To start with, it falls back to full recompute
all the time.  Upcoming patch will add the incremental processing
support to sync the SB Address_Set table.

'en_sb_sync' engine node can be enhanced further to sync other
SB tables like - Port_Group, DHCP_Options, DNS etc.

Signed-off-by: Numan Siddique 
---
  lib/ovn-util.c|  30 +
  lib/ovn-util.h|   3 +
  northd/automake.mk|   4 +
  northd/en-northd-output.c |  57 ++
  northd/en-northd-output.h |  17 +++
  northd/en-sb-sync.c   | 230 ++
  northd/en-sb-sync.h   |  14 +++
  northd/inc-proc-northd.c  |  30 -
  northd/northd.c   | 173 ++--
  northd/northd.h   |   1 +
  10 files changed, 394 insertions(+), 165 deletions(-)
  create mode 100644 northd/en-northd-output.c
  create mode 100644 northd/en-northd-output.h
  create mode 100644 northd/en-sb-sync.c
  create mode 100644 northd/en-sb-sync.h

diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 5dca72714..4f939f460 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -938,3 +938,33 @@ daemon_started_recently(void)
  /* Ensure that at least an amount of time has passed. */
  return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS;
  }
+
+/* Builds a unique address set compatible name ([a-zA-Z_.][a-zA-Z_.0-9]*)
+ * for the router's load balancer VIP address set, combining the logical
+ * router's datapath tunnel key and address family.
+ *
+ * Also prefixes the name with 'prefix'.
+ */
+static char *
+lr_lb_address_set_name_(uint32_t lr_tunnel_key, const char *prefix,
+int addr_family)
+{
+return xasprintf("%s_rtr_lb_%"PRIu32"_ip%s", prefix, lr_tunnel_key,
+ addr_family == AF_INET ? "4" : "6");
+}
+
+/* Builds the router's load balancer VIP address set name. */
+char *
+lr_lb_address_set_name(uint32_t lr_tunnel_key, int addr_family)
+{
+return lr_lb_address_set_name_(lr_tunnel_key, "", addr_family);
+}
+
+/* Builds a string that refers to the the router's load balancer VIP address
+ * set name, that is: $.
+ */
+char *
+lr_lb_address_set_ref(uint32_t lr_tunnel_key, int addr_family)
+{
+return lr_lb_address_set_name_(lr_tunnel_key, "$", addr_family);
+}
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index a1f1cf0ad..809ff1d36 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -315,4 +315,7 @@ void daemon_started_recently_ignore(void);
  bool daemon_started_recently(void);
  int64_t daemon_startup_ts(void);
  
+char *lr_lb_address_set_name(uint32_t lr_tunnel_key, int addr_family);

+char *lr_lb_address_set_ref(uint32_t lr_tunnel_key, int addr_family);
+
  #endif /* OVN_UTIL_H */
diff --git a/northd/automake.mk b/northd/automake.mk
index 81582867d..138b7b32e 100644
--- a/northd/automake.mk
+++ b/northd/automake.mk
@@ -10,6 +10,10 @@ northd_ovn_northd_SOURCES = \
northd/en-northd.h \
northd/en-lflow.c \
northd/en-lflow.h \
+   northd/en-northd-output.c \
+   northd/en-northd-output.h \
+   northd/en-sb-sync.c \
+   northd/en-sb-sync.h \
northd/inc-proc-northd.c \
northd/inc-proc-northd.h \
northd/ipam.c \
diff --git a/northd/en-northd-output.c b/northd/en-northd-output.c
new file mode 100644
index 0..0033d371e
--- /dev/null
+++ b/northd/en-northd-output.c
@@ -0,0 +1,57 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include "openvswitch/util.h"
+
+#include "en-northd-output.h"
+#include "lib/inc-proc-eng.h"
+
+void *
+en_northd_output_init(struct engine_node *node OVS_UNUSED,
+  struct engine_arg *arg OVS_UNUSED)
+{
+return NULL;
+}
+
+void
+en_northd_output_run(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+}
+
+void
+en_northd_output_cleanup(void *data OVS_UNUSED)
+{
+
+}
+
+bool
+northd_output_sb_sync_handler(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+return true;
+}
+
+bool
+northd_output_lflow_handler(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+return true;
+}

Re: [ovs-dev] [PATCH ovn v2 2/2] northd: Add I-P for syncing SB address sets.

2022-11-14 Thread Mark Michelson

Hi Numan, I have just one minor suggestion below.

On 11/14/22 11:48, num...@ovn.org wrote:

From: Numan Siddique 

Updates to NB address sets and NB port groups are handled
incrementally for syncing the SB address sets.  This patch
doesn't support syncing the SB Address sets for the router
load balancer vips incrementally, instead a full recompute is
triggered for any changes to NB load balancers, NB load balancer
groups and NB logical routers.

Signed-off-by: Numan Siddique 
---
  northd/en-sb-sync.c  | 202 ---
  northd/en-sb-sync.h  |   6 ++
  northd/inc-proc-northd.c |  18 +++-
  tests/ovn-northd.at  |  52 ++
  4 files changed, 260 insertions(+), 18 deletions(-)

diff --git a/northd/en-sb-sync.c b/northd/en-sb-sync.c
index c3ba315df..8a17998ec 100644
--- a/northd/en-sb-sync.c
+++ b/northd/en-sb-sync.c
@@ -22,6 +22,7 @@
  #include "openvswitch/util.h"
  
  #include "en-sb-sync.h"

+#include "include/ovn/expr.h"
  #include "lib/inc-proc-eng.h"
  #include "lib/lb.h"
  #include "lib/ovn-nb-idl.h"
@@ -41,6 +42,13 @@ static void sync_address_sets(const struct 
nbrec_address_set_table *,
const struct sbrec_address_set_table *,
struct ovsdb_idl_txn *ovnsb_txn,
struct hmap *datapaths);
+static const struct sbrec_address_set *sb_address_set_lookup_by_name(
+struct ovsdb_idl_index *, const char *name);
+static void update_sb_addr_set(const char **nb_addresses, size_t n_addresses,
+   const struct sbrec_address_set *);
+static void build_port_group_address_set(const struct nbrec_port_group *,
+ struct svec *ipv4_addrs,
+ struct svec *ipv6_addrs);
  
  void *

  en_sb_sync_init(struct engine_node *node OVS_UNUSED,
@@ -94,6 +102,98 @@ en_address_set_sync_cleanup(void *data OVS_UNUSED)
  
  }
  
+bool

+address_set_sync_nb_address_set_handler(struct engine_node *node OVS_UNUSED,
+void *data OVS_UNUSED)
+{
+const struct nbrec_address_set_table *nb_address_set_table =
+EN_OVSDB_GET(engine_get_input("NB_address_set", node));
+
+/* Return false if an address set is created or deleted.
+ * Handle I-P for only updated address sets. */
+const struct nbrec_address_set *nb_addr_set;
+NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
+  nb_address_set_table) {
+if (nbrec_address_set_is_new(nb_addr_set) ||
+nbrec_address_set_is_deleted(nb_addr_set)) {
+return false;
+}
+}
+
+struct ovsdb_idl_index *sbrec_address_set_by_name =
+engine_ovsdb_node_get_index(
+engine_get_input("SB_address_set", node),
+"sbrec_address_set_by_name");
+
+NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
+  nb_address_set_table) {
+const struct sbrec_address_set *sb_addr_set =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  nb_addr_set->name);
+if (!sb_addr_set) {
+return false;
+}
+update_sb_addr_set((const char **) nb_addr_set->addresses,
+   nb_addr_set->n_addresses, sb_addr_set);
+}
+
+return true;
+}
+
+bool
+address_set_sync_nb_port_group_handler(struct engine_node *node OVS_UNUSED,
+   void *data OVS_UNUSED)
+{
+const struct nbrec_port_group *nb_pg;
+const struct nbrec_port_group_table *nb_port_group_table =
+EN_OVSDB_GET(engine_get_input("NB_port_group", node));
+NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
+if (nbrec_port_group_is_new(nb_pg) ||
+nbrec_port_group_is_deleted(nb_pg)) {
+return false;
+}
+}
+
+struct ovsdb_idl_index *sbrec_address_set_by_name =
+engine_ovsdb_node_get_index(
+engine_get_input("SB_address_set", node),
+"sbrec_address_set_by_name");
+NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
+char *ipv4_addrs_name = xasprintf("%s_ip4", nb_pg->name);
+const struct sbrec_address_set *sb_addr_set_v4 =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  ipv4_addrs_name);
+if (!sb_addr_set_v4) {
+free(ipv4_addrs_name);
+return false;
+}
+char *ipv6_addrs_name = xasprintf("%s_ip6", nb_pg->name);
+const struct sbrec_address_set *sb_addr_set_v6 =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  ipv6_addrs_name);
+if (!sb_addr_set_v6) {
+free(ipv4_addrs_name);
+  

[ovs-dev] [PATCH v2] rhel: move conf.db to /var/lib/openvswitch, using symlinks

2022-11-14 Thread Timothy Redaelli
conf.db is by default at /etc/openvswitch, but it should be at
/var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.

If conf.db already exists in /etc/openvswitch then it's moved to
/var/lib/openvswitch.
Symlinks are created for conf.db and .conf.db.~lock~ into /etc/openvswitch
for backward compatibility.

Reported-at: https://bugzilla.redhat.com/1830857
Reported-by: Yedidyah Bar David 
Signed-off-by: Timothy Redaelli 
---
v1 -> v2:
- Use hugetlbfs group instead of openvswitch when the package is built
  with dpdk (as reported by Flavio)
---
 rhel/openvswitch-fedora.spec.in | 27 +++
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index df296ea08..c8d238fbc 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -239,8 +239,6 @@ rm -rf $RPM_BUILD_ROOT/%{_datadir}/openvswitch/python/
 
 install -d -m 0755 $RPM_BUILD_ROOT/%{_sharedstatedir}/openvswitch
 
-touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/conf.db
-touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/.conf.db.~lock~
 touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/system-id.conf
 
 install -p -m 644 -D selinux/openvswitch-custom.pp \
@@ -329,6 +327,27 @@ if [ $1 -eq 1 ]; then
 fi
 %endif
 
+# Ensure that /etc/openvswitch/conf.db links to /var/lib/openvswitch,
+# moving an existing file if there is one.
+#
+# Ditto for .conf.db.~lock~.
+for base in conf.db .conf.db.~lock~; do
+new=/var/lib/openvswitch/$base
+old=/etc/openvswitch/$base
+if test -f $old && test ! -e $new; then
+mv $old $new
+fi
+if test ! -e $old && test ! -h $old; then
+ln -s $new $old
+fi
+touch $new
+%if %{with dpdk}
+chown openvswitch:hugetlbfs $new
+%else
+chown openvswitch:openvswitch $new
+%endif
+done
+
 %if 0%{?systemd_post:1}
 # This may not enable openvswitch service or do daemon-reload.
 %systemd_post %{name}.service
@@ -414,8 +433,8 @@ fi
 %endif
 %dir %{_sysconfdir}/openvswitch
 %{_sysconfdir}/openvswitch/default.conf
-%config %ghost %{_sysconfdir}/openvswitch/conf.db
-%ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~
+%config %ghost %{_sharedstatedir}/openvswitch/conf.db
+%ghost %{_sharedstatedir}/openvswitch/.conf.db.~lock~
 %config %ghost %{_sysconfdir}/openvswitch/system-id.conf
 %config(noreplace) %{_sysconfdir}/sysconfig/openvswitch
 %defattr(-,root,root)
-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd: bypass connection tracking for stateless flows when there are LB flows present

2022-11-14 Thread Numan Siddique
On Wed, Nov 9, 2022 at 4:15 PM Numan Siddique  wrote:
>
> On Wed, Nov 9, 2022 at 4:11 PM Han Zhou  wrote:
> >
> > On Tue, Nov 8, 2022 at 7:51 AM venu.iyer  wrote:
> > >
> > > Currently, even stateless flows are subject to connection tracking when
> > there are
> > > LB rules (for DNAT). However, if a flow needs to be subjected to LB, then
> > it shouldn't
> > > be configured as stateless.
> > >
> > > A stateless flow means we should not track it, and this change exempts
> > stateless
> > > flows from being tracked regardless of whether LB rules are present or
> > not.
> > >
> > > Signed-off-by: venu.iyer 
> > > Acked-by: Han Zhou 

I had a quick look and it looks ok to me.
Can you please add system tests to cover some scenarios for this use
case to avoid future regressions ?

Thanks
Numan




> > > ---
> > >  northd/northd.c | 24 +-
> > >  northd/ovn-northd.8.xml | 56 ++---
> > >  ovn-nb.xml  |  3 +++
> > >  tests/ovn-northd.at | 48 ---
> > >  tests/ovn.at|  4 +--
> > >  5 files changed, 69 insertions(+), 66 deletions(-)
> > >
> > > diff --git a/northd/northd.c b/northd/northd.c
> > > index b7388afc5..da4beede6 100644
> > > --- a/northd/northd.c
> > > +++ b/northd/northd.c
> > > @@ -137,8 +137,8 @@ enum ovn_stage {
> > >  PIPELINE_STAGE(SWITCH, IN,  L2_UNKNOWN,24, "ls_in_l2_unknown")
> >  \
> > >
> >  \
> > >  /* Logical switch egress stages. */
> >   \
> > > -PIPELINE_STAGE(SWITCH, OUT, PRE_LB,   0, "ls_out_pre_lb")
> >   \
> > > -PIPELINE_STAGE(SWITCH, OUT, PRE_ACL,  1, "ls_out_pre_acl")
> >  \
> > > +PIPELINE_STAGE(SWITCH, OUT, PRE_ACL,  0, "ls_out_pre_acl")
> >  \
> > > +PIPELINE_STAGE(SWITCH, OUT, PRE_LB,   1, "ls_out_pre_lb")
> >   \
> > >  PIPELINE_STAGE(SWITCH, OUT, PRE_STATEFUL, 2, "ls_out_pre_stateful")
> >   \
> > >  PIPELINE_STAGE(SWITCH, OUT, ACL_HINT, 3, "ls_out_acl_hint")
> >   \
> > >  PIPELINE_STAGE(SWITCH, OUT, ACL,  4, "ls_out_acl")
> >  \
> > > @@ -210,6 +210,7 @@ enum ovn_stage {
> > >  #define REGBIT_ACL_LABEL  "reg0[13]"
> > >  #define REGBIT_FROM_RAMP  "reg0[14]"
> > >  #define REGBIT_PORT_SEC_DROP  "reg0[15]"
> > > +#define REGBIT_ACL_STATELESS  "reg0[16]"
> > >
> > >  #define REG_ORIG_DIP_IPV4 "reg1"
> > >  #define REG_ORIG_DIP_IPV6 "xxreg1"
> > > @@ -271,7 +272,7 @@ enum ovn_stage {
> > >   * | R0 | REGBIT_{CONNTRACK/DHCP/DNS}  |   |
> >  |
> > >   * || REGBIT_{HAIRPIN/HAIRPIN_REPLY}   |   |
> >  |
> > >   * || REGBIT_ACL_HINT_{ALLOW_NEW/ALLOW/DROP/BLOCK} |   |
> >  |
> > > - * || REGBIT_ACL_LABEL | X |
> >  |
> > > + * || REGBIT_ACL_{LABEL/STATELESS} | X |
> >  |
> > >   * ++--+ X |
> >  |
> > >   * | R1 | ORIG_DIP_IPV4 (>= IN_PRE_STATEFUL)   | R |
> >  |
> > >   * ++--+ E |
> >  |
> > > @@ -5677,17 +5678,18 @@ build_stateless_filter(struct ovn_datapath *od,
> > > const struct nbrec_acl *acl,
> > > struct hmap *lflows)
> > >  {
> > > +const char *action = REGBIT_ACL_STATELESS" = 1; next;";
> > >  if (!strcmp(acl->direction, "from-lport")) {
> > >  ovn_lflow_add_with_hint(lflows, od, S_SWITCH_IN_PRE_ACL,
> > >  acl->priority + OVN_ACL_PRI_OFFSET,
> > >  acl->match,
> > > -"next;",
> > > +action,
> > >  >header_);
> > >  } else {
> > >  ovn_lflow_add_with_hint(lflows, od, S_SWITCH_OUT_PRE_ACL,
> > >  acl->priority + OVN_ACL_PRI_OFFSET,
> > >  acl->match,
> > > -"next;",
> > > +action,
> > >  >header_);
> > >  }
> > >  }
> > > @@ -5779,6 +5781,10 @@ build_pre_acls(struct ovn_datapath *od, const
> > struct hmap *port_groups,
> > >REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> > >  ovn_lflow_add(lflows, od, S_SWITCH_OUT_PRE_ACL, 100, "ip",
> > >REGBIT_CONNTRACK_DEFRAG" = 1; next;");
> > > +} else if (od->has_lb_vip) {
> > > +/* We'll build stateless filters if there are LB rules so that
> > > + * the stateless flows are not tracked in pre-lb. */
> > > +build_stateless_filters(od, port_groups, lflows);
> > >  }
> > >  }
> > >
> > > @@ -5913,6 +5919,11 @@ build_pre_lb(struct ovn_datapath *od, const struct
> > shash *meter_groups,
> > >   S_SWITCH_IN_PRE_LB, S_SWITCH_OUT_PRE_LB,
> > >   110, lflows);
> > >  

[ovs-dev] [PATCH ovn v2 2/2] northd: Add I-P for syncing SB address sets.

2022-11-14 Thread numans
From: Numan Siddique 

Updates to NB address sets and NB port groups are handled
incrementally for syncing the SB address sets.  This patch
doesn't support syncing the SB Address sets for the router
load balancer vips incrementally, instead a full recompute is
triggered for any changes to NB load balancers, NB load balancer
groups and NB logical routers.

Signed-off-by: Numan Siddique 
---
 northd/en-sb-sync.c  | 202 ---
 northd/en-sb-sync.h  |   6 ++
 northd/inc-proc-northd.c |  18 +++-
 tests/ovn-northd.at  |  52 ++
 4 files changed, 260 insertions(+), 18 deletions(-)

diff --git a/northd/en-sb-sync.c b/northd/en-sb-sync.c
index c3ba315df..8a17998ec 100644
--- a/northd/en-sb-sync.c
+++ b/northd/en-sb-sync.c
@@ -22,6 +22,7 @@
 #include "openvswitch/util.h"
 
 #include "en-sb-sync.h"
+#include "include/ovn/expr.h"
 #include "lib/inc-proc-eng.h"
 #include "lib/lb.h"
 #include "lib/ovn-nb-idl.h"
@@ -41,6 +42,13 @@ static void sync_address_sets(const struct 
nbrec_address_set_table *,
   const struct sbrec_address_set_table *,
   struct ovsdb_idl_txn *ovnsb_txn,
   struct hmap *datapaths);
+static const struct sbrec_address_set *sb_address_set_lookup_by_name(
+struct ovsdb_idl_index *, const char *name);
+static void update_sb_addr_set(const char **nb_addresses, size_t n_addresses,
+   const struct sbrec_address_set *);
+static void build_port_group_address_set(const struct nbrec_port_group *,
+ struct svec *ipv4_addrs,
+ struct svec *ipv6_addrs);
 
 void *
 en_sb_sync_init(struct engine_node *node OVS_UNUSED,
@@ -94,6 +102,98 @@ en_address_set_sync_cleanup(void *data OVS_UNUSED)
 
 }
 
+bool
+address_set_sync_nb_address_set_handler(struct engine_node *node OVS_UNUSED,
+void *data OVS_UNUSED)
+{
+const struct nbrec_address_set_table *nb_address_set_table =
+EN_OVSDB_GET(engine_get_input("NB_address_set", node));
+
+/* Return false if an address set is created or deleted.
+ * Handle I-P for only updated address sets. */
+const struct nbrec_address_set *nb_addr_set;
+NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
+  nb_address_set_table) {
+if (nbrec_address_set_is_new(nb_addr_set) ||
+nbrec_address_set_is_deleted(nb_addr_set)) {
+return false;
+}
+}
+
+struct ovsdb_idl_index *sbrec_address_set_by_name =
+engine_ovsdb_node_get_index(
+engine_get_input("SB_address_set", node),
+"sbrec_address_set_by_name");
+
+NBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (nb_addr_set,
+  nb_address_set_table) {
+const struct sbrec_address_set *sb_addr_set =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  nb_addr_set->name);
+if (!sb_addr_set) {
+return false;
+}
+update_sb_addr_set((const char **) nb_addr_set->addresses,
+   nb_addr_set->n_addresses, sb_addr_set);
+}
+
+return true;
+}
+
+bool
+address_set_sync_nb_port_group_handler(struct engine_node *node OVS_UNUSED,
+   void *data OVS_UNUSED)
+{
+const struct nbrec_port_group *nb_pg;
+const struct nbrec_port_group_table *nb_port_group_table =
+EN_OVSDB_GET(engine_get_input("NB_port_group", node));
+NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
+if (nbrec_port_group_is_new(nb_pg) ||
+nbrec_port_group_is_deleted(nb_pg)) {
+return false;
+}
+}
+
+struct ovsdb_idl_index *sbrec_address_set_by_name =
+engine_ovsdb_node_get_index(
+engine_get_input("SB_address_set", node),
+"sbrec_address_set_by_name");
+NBREC_PORT_GROUP_TABLE_FOR_EACH_TRACKED (nb_pg, nb_port_group_table) {
+char *ipv4_addrs_name = xasprintf("%s_ip4", nb_pg->name);
+const struct sbrec_address_set *sb_addr_set_v4 =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  ipv4_addrs_name);
+if (!sb_addr_set_v4) {
+free(ipv4_addrs_name);
+return false;
+}
+char *ipv6_addrs_name = xasprintf("%s_ip6", nb_pg->name);
+const struct sbrec_address_set *sb_addr_set_v6 =
+sb_address_set_lookup_by_name(sbrec_address_set_by_name,
+  ipv6_addrs_name);
+if (!sb_addr_set_v6) {
+free(ipv4_addrs_name);
+free(ipv6_addrs_name);
+return false;
+}
+
+struct svec ipv4_addrs = 

[ovs-dev] [PATCH ovn v2 1/2] northd IP: Add a new engine node 'en_sb_sync' to sync SB tables.

2022-11-14 Thread numans
From: Numan Siddique 

A sub-engine node 'en_address_set_sync' is added with-in the
'en_sb_sync' node to sync the Address_Set table in the
SB database.  To start with, it falls back to full recompute
all the time.  Upcoming patch will add the incremental processing
support to sync the SB Address_Set table.

'en_sb_sync' engine node can be enhanced further to sync other
SB tables like - Port_Group, DHCP_Options, DNS etc.

Signed-off-by: Numan Siddique 
---
 lib/ovn-util.c|  30 +
 lib/ovn-util.h|   3 +
 northd/automake.mk|   4 +
 northd/en-northd-output.c |  57 ++
 northd/en-northd-output.h |  17 +++
 northd/en-sb-sync.c   | 230 ++
 northd/en-sb-sync.h   |  14 +++
 northd/inc-proc-northd.c  |  30 -
 northd/northd.c   | 173 ++--
 northd/northd.h   |   1 +
 10 files changed, 394 insertions(+), 165 deletions(-)
 create mode 100644 northd/en-northd-output.c
 create mode 100644 northd/en-northd-output.h
 create mode 100644 northd/en-sb-sync.c
 create mode 100644 northd/en-sb-sync.h

diff --git a/lib/ovn-util.c b/lib/ovn-util.c
index 5dca72714..4f939f460 100644
--- a/lib/ovn-util.c
+++ b/lib/ovn-util.c
@@ -938,3 +938,33 @@ daemon_started_recently(void)
 /* Ensure that at least an amount of time has passed. */
 return time_wall_msec() - startup_ts <= DAEMON_STARTUP_DELAY_MS;
 }
+
+/* Builds a unique address set compatible name ([a-zA-Z_.][a-zA-Z_.0-9]*)
+ * for the router's load balancer VIP address set, combining the logical
+ * router's datapath tunnel key and address family.
+ *
+ * Also prefixes the name with 'prefix'.
+ */
+static char *
+lr_lb_address_set_name_(uint32_t lr_tunnel_key, const char *prefix,
+int addr_family)
+{
+return xasprintf("%s_rtr_lb_%"PRIu32"_ip%s", prefix, lr_tunnel_key,
+ addr_family == AF_INET ? "4" : "6");
+}
+
+/* Builds the router's load balancer VIP address set name. */
+char *
+lr_lb_address_set_name(uint32_t lr_tunnel_key, int addr_family)
+{
+return lr_lb_address_set_name_(lr_tunnel_key, "", addr_family);
+}
+
+/* Builds a string that refers to the the router's load balancer VIP address
+ * set name, that is: $.
+ */
+char *
+lr_lb_address_set_ref(uint32_t lr_tunnel_key, int addr_family)
+{
+return lr_lb_address_set_name_(lr_tunnel_key, "$", addr_family);
+}
diff --git a/lib/ovn-util.h b/lib/ovn-util.h
index a1f1cf0ad..809ff1d36 100644
--- a/lib/ovn-util.h
+++ b/lib/ovn-util.h
@@ -315,4 +315,7 @@ void daemon_started_recently_ignore(void);
 bool daemon_started_recently(void);
 int64_t daemon_startup_ts(void);
 
+char *lr_lb_address_set_name(uint32_t lr_tunnel_key, int addr_family);
+char *lr_lb_address_set_ref(uint32_t lr_tunnel_key, int addr_family);
+
 #endif /* OVN_UTIL_H */
diff --git a/northd/automake.mk b/northd/automake.mk
index 81582867d..138b7b32e 100644
--- a/northd/automake.mk
+++ b/northd/automake.mk
@@ -10,6 +10,10 @@ northd_ovn_northd_SOURCES = \
northd/en-northd.h \
northd/en-lflow.c \
northd/en-lflow.h \
+   northd/en-northd-output.c \
+   northd/en-northd-output.h \
+   northd/en-sb-sync.c \
+   northd/en-sb-sync.h \
northd/inc-proc-northd.c \
northd/inc-proc-northd.h \
northd/ipam.c \
diff --git a/northd/en-northd-output.c b/northd/en-northd-output.c
new file mode 100644
index 0..0033d371e
--- /dev/null
+++ b/northd/en-northd-output.c
@@ -0,0 +1,57 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+
+#include "openvswitch/util.h"
+
+#include "en-northd-output.h"
+#include "lib/inc-proc-eng.h"
+
+void *
+en_northd_output_init(struct engine_node *node OVS_UNUSED,
+  struct engine_arg *arg OVS_UNUSED)
+{
+return NULL;
+}
+
+void
+en_northd_output_run(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+}
+
+void
+en_northd_output_cleanup(void *data OVS_UNUSED)
+{
+
+}
+
+bool
+northd_output_sb_sync_handler(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+return true;
+}
+
+bool
+northd_output_lflow_handler(struct engine_node *node, void *data OVS_UNUSED)
+{
+engine_set_node_state(node, EN_UPDATED);
+return true;
+}
diff --git a/northd/en-northd-output.h b/northd/en-northd-output.h
new file mode 100644

[ovs-dev] [PATCH ovn v2 0/2] northd IP for address sets

2022-11-14 Thread numans
From: Numan Siddique 

This patch series adds incremental processing of address sets in
ovn-northd.

v1 -> v2
---
  - Added the test case in ovn-northd.at
  - Full recompute function of address sets will also use mutate instead
of setting all the addresses of the address set.


Numan Siddique (2):
  northd IP: Add a new engine node 'en_sb_sync' to sync SB tables.
  northd: Add I-P for syncing SB address sets.

 lib/ovn-util.c|  30 +++
 lib/ovn-util.h|   3 +
 northd/automake.mk|   4 +
 northd/en-northd-output.c |  57 ++
 northd/en-northd-output.h |  17 ++
 northd/en-sb-sync.c   | 402 ++
 northd/en-sb-sync.h   |  20 ++
 northd/inc-proc-northd.c  |  42 +++-
 northd/northd.c   | 173 +---
 northd/northd.h   |   1 +
 tests/ovn-northd.at   |  52 +
 11 files changed, 636 insertions(+), 165 deletions(-)
 create mode 100644 northd/en-northd-output.c
 create mode 100644 northd/en-northd-output.h
 create mode 100644 northd/en-sb-sync.c
 create mode 100644 northd/en-sb-sync.h

-- 
2.38.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] vswitchd: Publish per iface received multicast packets.

2022-11-14 Thread Mike Pattrick
On Wed, Nov 9, 2022 at 3:32 PM David Marchand  wrote:
>
> The count of received multicast packets has been computed internally,
> but not exposed to ovsdb. Fix this.
>
> Signed-off-by: David Marchand 

Acked-by: Mike Pattrick 

> ---
> Strictly speaking, nothing was broken so I put no Fixes: tag.
> However, this is probably something we should backport to enhance
> exposed stats.
>
> ---
>  vswitchd/bridge.c | 1 +
>  1 file changed, 1 insertion(+)
>

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC PATCH ovn] northd, controller: Commit flows dropped by ACLs in a separate CT zone

2022-11-14 Thread Adrian Moreno

Hi,

On 10/20/22 15:49, Abhiram Sangana wrote:

Hi Dumitru,

Thanks for reviewing the patch.


On 19 Oct 2022, at 14:09, Dumitru Ceara  wrote:

Hi Abhiram,

Thanks for the patch!  I only skimmed the changes so this is not a full
review but more of a discussion starter.

On 10/18/22 17:33, Abhiram Sangana wrote:

To identify connections dropped by ACLs, users can enable logging for ACLs
but this approach does not scale. ACL logging uses "controller" action
which causes a significant spike in the CPU usage of ovs-vswitchd (and
ovn-controller to a lesser extent) even with metering enabled (observed
65% ovs-vswitchd CPU usage for logging 1000 packets per second). Another
approach is to use drop sampling (patch by Adrian Moreno currently in
review) but we might miss specific connections of interest with this
approach.

This patch commits connections dropped by ACLs to the connection tracking
table with a specific ACL label that was introduced in 0e0228be (
northd: Add ACL label). The dropped connections are committed in a separate
CT zone so that they can be managed independently.



I'm not sure I understand how the CMS can manage this.  How is this
better than sampling?  Committed connections are going to time out at
some point (30 sec by default for udp/icmp with the kernel datapath).
So the CMS will have to continuously monitor the contents of the
conntrack zone?  Aren't we just moving the CPU load somewhere else with
this?  Even so, there's a chance an entry is missed.


Linux nf_conntrack module supports sending connection tracking events
to userspace via ctnetlink ("net.netfilter.nf_conntrack_events" kernel
parameter). So, CMS can parse the stream of new connection events from
conntrack and log the packets based on CT label.



Isn't this datapath-specific? this won't be available for the netdev datapath, 
right?



An issue with sampling is that if there are a large number of packets
for a particular connection(s), packets of other connections might not
get sampled and we miss information about these connections. With the
conntrack approach, we get a single event for each connection (until
they time out), so there is lesser load on the CMS/collector and lesser
likelihood of missing connections.


I don't see why sampling would inherently miss packets. The current RFC for ACL 
sampling (different from the generic drop sampling one) allows the user to 
specify a probability per ACL so 100% of packets can be sampled in connection 
establishment drops while we sample N% of accepted traffic having a lot of 
flexibility in the inevitable performance vs accuracy tradeoff.


I'd like to better understand what are the limitations of the current approach 
to see if it can be improved in any way.


Thanks,
--
Adrián






Each logical port is assigned a new zone for committing dropped flows.
The zone is loaded into register MFF_LOG_ACL_DROP_ZONE.

A new lflow action "ct_commit_drop" is introduced that commits flows
to connection tracking table in a zone identified by
MFF_LOG_ACL_DROP_ZONE register.

An ACL with "drop" action and non-empty label is translated to "ct_commit_drop"
instead of silently dropping the packet.

Signed-off-by: Abhiram Sangana 
---
controller/ovn-controller.c  | 23 ++---
controller/physical.c| 32 +--
include/ovn/actions.h|  1 +
include/ovn/logical-fields.h |  1 +
lib/actions.c| 50 
lib/ovn-util.c   |  4 +--
lib/ovn-util.h   |  2 +-
northd/northd.c  | 14 --
northd/ovn-northd.8.xml  | 14 --
ovn-sb.xml   | 17 
ovs  |  2 +-


This shouldn't need to change the OVS submodule.


My bad. Will fix this.

Thanks,
Abhiram Sangana


Thanks,
Dumitru


utilities/ovn-nbctl.c|  7 ++---
12 files changed, 151 insertions(+), 16 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index c97744d57..1ad20fe55 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -660,8 +660,15 @@ update_ct_zones(const struct shash *binding_lports,
 unsigned long unreq_snat_zones[BITMAP_N_LONGS(MAX_CT_ZONES)];

 struct shash_node *shash_node;
+const struct binding_lport *lport;
 SHASH_FOR_EACH (shash_node, binding_lports) {
 sset_add(_users, shash_node->name);
+
+/* Zone for committing dropped connections of a vNIC. */
+lport = shash_node->data;
+char *drop_zone = alloc_ct_zone_key(>pb->header_.uuid, "drop");
+sset_add(_users, drop_zone);
+free(drop_zone);
 }

 /* Local patched datapath (gateway routers) need zones assigned. */
@@ -670,8 +677,8 @@ update_ct_zones(const struct shash *binding_lports,
 HMAP_FOR_EACH (ld, hmap_node, local_datapaths) {
 /* XXX Add method to limit zone assignment to logical router
  * datapaths with NAT */
-char *dnat = 

Re: [ovs-dev] [PATCH v3] netdev-dpdk: add control plane protection support

2022-11-14 Thread Robin Jarry
Hi Kevin,

Kevin Traynor, Nov 11, 2022 at 19:15:
> > +reta_conf_size = (dev->reta_size / RTE_ETH_RETA_GROUP_SIZE)
> > +* sizeof(struct rte_eth_rss_reta_entry64);
>
> In dpdk_eth_dev_init, we get reta_size from driver,
>
> mlx5_ethdev.c
>
> 333├>info->reta_size = priv->reta_idx_n ?
> 334│ priv->reta_idx_n : config->ind_table_max_size;
>
> (gdb) p priv->reta_idx_n
> $5 = 1
> (gdb) p config->ind_table_max_size
> $6 = 512
>
> and store:
> dev->reta_size = info.reta_size;
>
> Now we use it,
> dev->reta_size = 1 / RTE_ETH_RETA_GROUP_SIZE   (64)
> but it results in reta_conf_size = 0
>
> > +reta_conf = xmalloc(reta_conf_size);
>
> xmalloc only allocates 1 byte (void *p = malloc(size ? size : 1);)

Hmm indeed this is bad :)

Since reworking the RSS fallback I may not have re-tested with CX-5,
only with Intel NICs.

I am quite surprised of this return value. Could it be an issue with the
mlx5 dpdk driver?

Thanks for testing!

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] tests: Fix flaky test "IPv6 Neighbor Solicitation for unknown MAC"

2022-11-14 Thread Xavier Simonart
Fixes: 9ac548524218 ("pinctrl: Send RARPs for external ipv6 interfaces")

Signed-off-by: Xavier Simonart 
---
 tests/ovn.at | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/ovn.at b/tests/ovn.at
index 6552681bd..3131b8d02 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -13954,8 +13954,9 @@ AT_CHECK([cat 2.packets | cut -c 117-], [0], [expout])
 
 # Check that NS packets are not flooded across routing domains. That means
 # that hv2 should not send any packets across the physical network.
+# Use the grep here to filter out rarp packets that might have arrived
 $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-phys_n1-tx.pcap | \
-trim_zeros > 2.packets
+grep -v  | trim_zeros > 2.packets
 AT_CHECK([cat 2.packets], [0], [])
 
 # Now send a packet with destination ip other than
@@ -13995,8 +13996,9 @@ AT_CHECK([cat 2.packets | cut -c 117-], [0], [expout])
 
 # Check that NS packets are not flooded across routing domains. That means
 # that hv2 should not send any packets across the physical network.
+# Use the grep here to filter out rarp packets that might have arrived
 $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in" hv2/br-phys_n1-tx.pcap | \
-trim_zeros > 2.packets
+grep -v  | trim_zeros > 2.packets
 AT_CHECK([cat 2.packets], [0], [])
 
 OVN_CLEANUP([hv1])
-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] debian, rhel: Enable Werror option in spec files

2022-11-14 Thread Simon Horman
On Mon, Nov 14, 2022 at 10:46:40AM +, Eli Britstein wrote:
> 
> 
> >-Original Message-
> >From: Simon Horman 
> >Sent: Monday, 14 November 2022 12:44
> >To: Eli Britstein 
> >Cc: d...@openvswitch.org; Ilya Maximets ; Simon
> >Horman ; Salem Sol 
> >Subject: Re: [PATCH] debian, rhel: Enable Werror option in spec files
> >
> >External email: Use caution opening links or attachments
> >
> >
> >On Sun, Nov 13, 2022 at 04:46:23PM +0200, Eli Britstein wrote:
> >> Following resolving DPDK cast align warnings as stated in [1], enforce
> >> -Werror for RPM builds too.
> >>
> >> [1] 0b6d2faace76 ("ci: Remove -Wno-cast-align from CI.")
> >>
> >> Signed-off-by: Eli Britstein 
> >
> >Reviewed-by: Simon Horman 
> >
> >I'll let this sit a little longer for review to accumulate before applying.
> Actually, I see CI fails with it, as there are some warnings that are now 
> discovered. We need to fix them first.

Ok, lets drop this for now.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] debian, rhel: Enable Werror option in spec files

2022-11-14 Thread Eli Britstein via dev



>-Original Message-
>From: Simon Horman 
>Sent: Monday, 14 November 2022 12:44
>To: Eli Britstein 
>Cc: d...@openvswitch.org; Ilya Maximets ; Simon
>Horman ; Salem Sol 
>Subject: Re: [PATCH] debian, rhel: Enable Werror option in spec files
>
>External email: Use caution opening links or attachments
>
>
>On Sun, Nov 13, 2022 at 04:46:23PM +0200, Eli Britstein wrote:
>> Following resolving DPDK cast align warnings as stated in [1], enforce
>> -Werror for RPM builds too.
>>
>> [1] 0b6d2faace76 ("ci: Remove -Wno-cast-align from CI.")
>>
>> Signed-off-by: Eli Britstein 
>
>Reviewed-by: Simon Horman 
>
>I'll let this sit a little longer for review to accumulate before applying.
Actually, I see CI fails with it, as there are some warnings that are now 
discovered. We need to fix them first.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] debian, rhel: Enable Werror option in spec files

2022-11-14 Thread Simon Horman
On Sun, Nov 13, 2022 at 04:46:23PM +0200, Eli Britstein wrote:
> Following resolving DPDK cast align warnings as stated in [1], enforce
> -Werror for RPM builds too.
> 
> [1] 0b6d2faace76 ("ci: Remove -Wno-cast-align from CI.")
> 
> Signed-off-by: Eli Britstein 

Reviewed-by: Simon Horman 

I'll let this sit a little longer for review to accumulate before
applying.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3] netdev-dpdk: add control plane protection support

2022-11-14 Thread Kevin Traynor

Fixing To: to add back in OVS ML.

+cc Ori, as mlx5 operation discussed below.

On 11/11/2022 18:15, Kevin Traynor wrote:

Hi Robin,

On 21/10/2022 15:53, Robin Jarry wrote:

Some control protocols are used to maintain link status between
forwarding engines (e.g. LACP). When the system is not sized properly,
the PMD threads may not be able to process all incoming traffic from the
configured Rx queues. When a signaling packet of such protocols is
dropped, it can cause link flapping, worsening the situation.

Use the RTE flow API to redirect these protocols into a dedicated Rx
queue. The assumption is made that the ratio between control protocol
traffic and user data traffic is very low and thus this dedicated Rx
queue will never get full. The RSS redirection table is re-programmed to
only use the other Rx queues. The RSS table size is stored in the
netdev_dpdk structure at port initialization to avoid requesting the
information again when changing the port configuration.

The additional Rx queue will be assigned a PMD core like any other Rx
queue. Polling that extra queue may introduce increased latency and
a slight performance penalty at the benefit of preventing link flapping.

This feature must be enabled per port on specific protocols via the
cp-protection option. This option takes a coma-separated list of
protocol names. It is only supported on ethernet ports.

If the user has already configured multiple Rx queues on the port, an
additional one will be allocated for control plane packets. If the
hardware cannot satisfy the requested number of requested Rx queues, the
last Rx queue will be assigned for control plane. If only one Rx queue
is available, the cp-protection feature will be disabled. If the
hardware does not support the RTE flow matchers/actions, the feature
will be disabled.

It cannot be enabled when other_config:hw-offload=true as it may
conflict with the offloaded RTE flows. Similarly, if hw-offload is
enabled while some ports already have cp-protection enabled, the RTE
flow offloading will be disabled on these ports.

Example use:

   ovs-vsctl add-bond br-phy bond0 phy0 phy1 -- \
 set interface phy0 type=dpdk options:dpdk-devargs=:ca:00.0 -- \
 set interface phy0 options:cp-protection=lacp -- \
 set interface phy1 type=dpdk options:dpdk-devargs=:ca:00.1 -- \
 set interface phy1 options:cp-protection=lacp

As a starting point, only one protocol is supported: LACP. Other
protocols can be added in the future. NIC compatibility should be
checked.

To validate that this works as intended, I used a traffic generator to
generate random traffic slightly above the machine capacity at line rate
on a two ports bond interface. OVS is configured to receive traffic on
two VLANs and pop/push them in a br-int bridge based on tags set on
patch ports.

 +--+
 | DUT  |
 |++|
 ||   br-int   || default flow, action=NORMAL
 ||||
 || patch10patch11 ||
 |+---|---|+|
 ||   | |
 |+---|---|+|
 || patch00patch01 ||
 ||  tag:10tag:20  ||
 ||||
 ||   br-phy   || default flow, action=NORMAL
 ||||
 ||   bond0|| balance-slb, lacp=passive, lacp-time=fast
 ||phy0   phy1 ||
 |+--|-|---+|
 +---|-|+
 | |
 +---|-|+
 | port0  port1 | balance L3/L4, lacp=active, lacp-time=fast
 | lag  | mode trunk VLANs 10, 20
 |  |
 |switch|
 |  |
 |  vlan 10vlan 20  |  mode access
 |   port2  port3   |
 +-|--|-+
   |  |
 +-|--|-+
 |   port0  port1   |  Random traffic that is properly balanced
 |  |  across the bond ports in both directions.
 |  traffic generator   |
 +--+

Without cp-protection, the bond0 links are randomly switching to
"defaulted" when one of the LACP packets sent by the switch is dropped
because the RX queues are full and the PMD threads did not process them
fast enough. When that happens, all traffic must go through a single
link which causes above line rate traffic to be dropped.

When cp-protection is enabled, no LACP packet is dropped and the bond
links remain enabled at all times, maximizing the throughput.

This feature may be considered as "QoS". However, it does not work by
limiting the rate of traffic explicitly. It only guarantees that some
protocols have a lower chance of being dropped because the PMD cores
cannot keep up with regular traffic.

The choice of protocols is limited on purpose. This is not meant to be
configurable by users. Some limited configurability could be considered
in the future but it would expose to more 

[ovs-dev] [PATCH ovn v5] ovn-controller: Fixed missing flows after interface deletion

2022-11-14 Thread Xavier Simonart
In the following scenario:
- interface "old" is created and external_ids:iface-id is set (to lp)
- interface "new" is created and external_ids:iface-id is set (to same lp)
- interface "old" is deleted
flows related to lp were deleted.

Note that after "new" interface is created, flows use "new" ofport.
The state where old and new interfaces have both external_ids:iface-id set at
the same time is "invalid", and all flows are not installed for lpold.

Fixes: 3ae8470edc64 ("I-P: Handle runtime data changes for pflow_output 
engine.")
Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2129866

Signed-off-by: Xavier Simonart 

---
v2: - Use bool instead of int for binding count to better reflect only
  one additional binding is supported.
- Fix use after free.
- Remove debug logging from test case
v3: - Based on Dumitru's and Mark's feedback:
  - Support any number of interfaces bound to the same port
  - Use recomputes to make code simpler (this is corner case)
  - Added test case using three interfaces bound to te same port
v4: - Updated based on Ales' feedback
- Also support scenario for port-types different than localport
- Added test case for VIF port
- Rebased on latest main
v5: - Updated test case based on Numan/Dumitru's feedback (hit ofport = 0)
  and added comments in code.
- Rebased on latest main.
---
 controller/binding.c |  36 ++
 controller/binding.h |   1 +
 tests/ovn.at | 168 +++
 3 files changed, 205 insertions(+)

diff --git a/controller/binding.c b/controller/binding.c
index c3d2b2e42..5f04b9a74 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1866,6 +1866,7 @@ build_local_bindings(struct binding_ctx_in *b_ctx_in,
 lbinding = local_binding_create(iface_id, iface_rec);
 local_binding_add(local_bindings, lbinding);
 } else {
+lbinding->multiple_bindings = true;
 static struct vlog_rate_limit rl =
 VLOG_RATE_LIMIT_INIT(1, 5);
 VLOG_WARN_RL(
@@ -2156,6 +2157,10 @@ consider_iface_claim(const struct ovsrec_interface 
*iface_rec,
 lbinding = local_binding_create(iface_id, iface_rec);
 local_binding_add(local_bindings, lbinding);
 } else {
+if (lbinding->iface && lbinding->iface != iface_rec) {
+lbinding->multiple_bindings = true;
+b_ctx_out->local_lports_changed = true;
+}
 lbinding->iface = iface_rec;
 }
 
@@ -2174,6 +2179,13 @@ consider_iface_claim(const struct ovsrec_interface 
*iface_rec,
 return true;
 }
 
+/* If multiple bindings to the same port, remove the "old" binding.
+ * This ensures that change tracking is correct.
+ */
+if (lbinding->multiple_bindings) {
+remove_related_lport(pb, b_ctx_out);
+}
+
 enum en_lport_type lport_type = get_lport_type(pb);
 if (lport_type == LP_LOCALPORT) {
 return consider_localport(pb, b_ctx_in, b_ctx_out);
@@ -2226,6 +2238,29 @@ consider_iface_release(const struct ovsrec_interface 
*iface_rec,
 struct shash *binding_lports = _ctx_out->lbinding_data->lports;
 
 lbinding = local_binding_find(local_bindings, iface_id);
+
+if (lbinding) {
+if (lbinding->multiple_bindings) {
+VLOG_INFO("Multiple bindings for %s: force recompute to clean up",
+  iface_id);
+return false;
+} else {
+int64_t ofport = iface_rec->n_ofport ? *iface_rec->ofport : 0;
+if (lbinding->iface != iface_rec && !ofport) {
+/* If external_ids:iface-id is set within the same transaction
+ * as adding an interface to a bridge, ovn-controller is
+ * usually initially notified of ovs interface changes with
+ * ofport == 0. If the lport was bound to a different interface
+ * we do not want to release it.
+ */
+VLOG_DBG("Not releasing lport %s as %s was claimed "
+ "and %s was never bound)", iface_id, lbinding->iface ?
+ lbinding->iface->name : "", iface_rec->name);
+return true;
+}
+}
+}
+
 struct binding_lport *b_lport =
 local_binding_get_primary_or_localport_lport(lbinding);
 if (is_binding_lport_this_chassis(b_lport, b_ctx_in->chassis_rec)) {
@@ -3034,6 +3069,7 @@ local_binding_create(const char *name, const struct 
ovsrec_interface *iface)
 struct local_binding *lbinding = xzalloc(sizeof *lbinding);
 lbinding->name = xstrdup(name);
 lbinding->iface = iface;
+lbinding->multiple_bindings = false;
 ovs_list_init(>binding_lports);
 
 return lbinding;
diff --git a/controller/binding.h b/controller/binding.h
index ad959a9e6..6c3a98b02 100644
--- a/controller/binding.h
+++