Re: [ovs-dev] [PATCH v3] system-dpdk.at: Add warning log in mfex fuzzy test.

2022-02-09 Thread David Marchand
On Wed, Feb 9, 2022 at 11:11 AM Kumar Amber  wrote:
>
> Some specific warning are seen on various systems
> which may not be visible on others but good to add
> such logs to test to avoid test-case failure.
>
> Thw warning only effects the fuzzy tests due to
> more than 1000+ flows being offloading simultanously.
>
> Wilcarding flow count number as for different systems
> under test the number could vary in the warning log.
>
> Suggested-by: Eelco Chaudron 
> Signed-off-by: Kumar Amber 

Though I can't reproduce the issue, the patch lgtm.
Reviewed-by: David Marchand 


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 07/18] python: introduce OpenFlow Flow parsing

2022-02-09 Thread Adrian Moreno

Hi Mark

On 2/5/22 03:55, Mark Michelson wrote:

On 1/31/22 01:20, Adrian Moreno wrote:



On 1/28/22 22:18, Ilya Maximets wrote:

On 1/28/22 17:04, Adrian Moreno wrote:

Introduce OFPFlow class and all its decoders.

Most of the decoders are generic (from decoders.py). Some have special
syntax and need a specific implementation.

Decoders for nat are moved to the common decoders.py because it's syntax
is shared with other types of flows (e.g: dpif flows).

Signed-off-by: Adrian Moreno 
---
  python/automake.mk   |   2 +
  python/ovs/flows/decoders.py | 108 +
  python/ovs/flows/ofp.py  | 453 +++
  python/ovs/flows/ofp_act.py  | 243 +++
  4 files changed, 806 insertions(+)
  create mode 100644 python/ovs/flows/ofp.py
  create mode 100644 python/ovs/flows/ofp_act.py

diff --git a/python/automake.mk b/python/automake.mk
index b7debfbd9..7b6d6596f 100644
--- a/python/automake.mk
+++ b/python/automake.mk
@@ -31,6 +31,8 @@ ovs_pyfiles = \
  python/ovs/flows/flow.py \
  python/ovs/flows/kv.py \
  python/ovs/flows/list.py \
+    python/ovs/flows/ofp.py \
+    python/ovs/flows/ofp_act.py \
  python/ovs/json.py \
  python/ovs/jsonrpc.py \
  python/ovs/ovsuuid.py \
diff --git a/python/ovs/flows/decoders.py b/python/ovs/flows/decoders.py
index 2f8e5bd0a..1462b0b9d 100644
--- a/python/ovs/flows/decoders.py
+++ b/python/ovs/flows/decoders.py
@@ -6,6 +6,7 @@ object.
  """
  import netaddr
+import re
  class Decoder(object):
@@ -414,3 +415,110 @@ class IPMask(Decoder):
  def to_json(self):
  return str(self)
+
+
+def decode_free_output(value):
+    """The value of the output action can be found free, i.e: without the
+    'output' keyword. This decoder decodes its value when found this way."""
+    try:
+    return "output", {"port": int(value)}
+    except ValueError:
+    return "output", {"port": value.strip('"')}
+
+
+ipv4 = r"(?:\d{1,3}.?){3}\d{1,3}"
+ipv4_capture = r"({ipv4})".format(ipv4=ipv4)
+"""
+The following IPv6 regexp is a modified version of the one in:
+https://community.helpsystems.com/forums/intermapper/miscellaneous-topics/5acc4fcf-fa83-e511-80cf-0050568460e4?_ga=2.113564423.1432958022.1523882681-2146416484.1523557976 


+
+It matches all these types of ipv6 addresses:
+fe80::::0204:61ff:fe9d:f156
+fe80:0:0:0:204:61ff:fe9d:f156
+fe80::204:61ff:fe9d:f156
+fe80::::0204:61ff:254.157.241.86
+fe80:0:0:0:0204:61ff:254.157.241.86
+fe80::204:61ff:254.157.241.86
+::1
+2001::
+fe80::
+"""
+ipv6 = 
r"(?:(?:(?:[0-9A-Fa-f]{1,4}:){7}(?:[0-9A-Fa-f]{1,4}|:))|(?:(?:[0-9A-Fa-f]{1,4}:){6}(?::[0-9A-Fa-f]{1,4}|(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:[0-9A-Fa-f]{1,4}:){5}(?:(?:(?::[0-9A-Fa-f]{1,4}){1,2})|:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3})|:))|(?:(?:[0-9A-Fa-f]{1,4}:){4}(?:(?:(?::[0-9A-Fa-f]{1,4}){1,3})|(?:(?::[0-9A-Fa-f]{1,4})?:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(?:(?:[0-9A-Fa-f]{1,4}:){3}(?:(?:(?::[0-9A-Fa-f]{1,4}){1,4})|(?:(?::[0-9A-Fa-f]{1,4}){0,2}:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(?:(?:[0-9A-Fa-f]{1,4}:){2}(?:(?:(?::[0-9A-Fa-f]{1,4}){1,5})|(?:(?::[0-9A-Fa-f]{1,4}){0,3}:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(?:(?:[0-9A-Fa-f]{1,4}:){1}(?:(?:(?::[0-9A-Fa-f]{1,4}){1,6})|(?:(?::[0-9A-Fa-f]{1,4}){0,4}:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:))|(?::(?:(?:(?::[0-9A-Fa-f]{1,4}){1,7})|(?:(?::[0-9A-Fa-f]{1,4}){0,5}:(?:(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d)){3}))|:)))" 
# noqa: E501


This line seems to be corrupted the patch in patchwork.

In any case, can we not have this line in the patch?
It doesn't look maintainable.

Can we use something like 'ipaddress' module to parse
ip addresses instead?



I agree having this huge blob is not very maintainable, maybe we can roll back 
to the previous, more simplistic, regexp.


The goal of this regexp is not to parse the address but to extract it from the 
ip/port range where we can have [ip_start:ip_end]:port_start:port_end. After 
the substrings are extracted we use 'netaddr' module to extract it.
I think that regex is not a good choice here. There are too many variables (IPv4 
vs IPv6) and optional parameters (ports may or may not be present, range may 
only have one value) that regexes become complex very quickly. Throw in the fact 
that ":" is the separator for IPv6 address sections AND is the separator between 
IP addresses and ports, plus the fact that the presence of a port changes how to 
express an IPv6 address, and you're talking about some unmaintainable regexes, 
even if they're not as monstrous as the current IPv6 one.




If we go back to the previous approach I think it doesn't get _too_ 

Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread Hu, Jiayu
Hi all,

> -Original Message-
> From: Pai G, Sunil 
> Sent: Wednesday, February 9, 2022 4:32 PM
> To: Ilya Maximets ; Maxime Coquelin
> ; d...@openvswitch.org; Hu, Jiayu
> 
> Cc: Van Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian ;
> david.march...@redhat.com; Mcnamara, John 
> Subject: RE: [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.
> 
> > >>> This version of the patch seems to have negative impact on
> > >>> performance
> > >> for burst traffic profile[1].
> > >>> Benefits seen with the previous version (v2) was up to ~1.6x for
> > >>> 1568 byte
> > >> packets compared to ~1.2x seen with the current design (v3) as
> > >> measured on new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.
> > >>> The cause of the drop seems to be because of the excessive vhost
> > >>> txq
> > >> contention across the PMD threads.
> > >>
> > >> So it means the Tx/Rx queue pairs aren't consumed by the same PMD
> > >> thread. can you confirm?
> > >
> > > Yes, the completion polls for a given txq happens on a single PMD
> > thread(on the same thread where its corresponding rxq is being polled)
> > but other threads can submit(enqueue) packets on the same txq,  which
> > leads to contention.

It seems 40% perf degradation is caused by virtqueue contention between Rx and
Tx PMD threads. But I am really curious about what causes up to 40% perf drop?
It's core busy-waiting due to spin-lock or cache thrashing of virtqueue struct?
Or something else?

In the latest vhost patch, I have replaced spinlock to try-lock to avoid 
busy-waiting.
If OVS data path can also avoid busy-waiting, will it help on performance? 
Could we
have a try?

> >
> > Why this process can't be lockless?
> > If we have to lock the device, maybe we can do both submission and
> > completion from the thread that polls corresponding Rx queue?
> > Tx threads may enqueue mbufs to some lockless ring inside the
> > rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit jobs
> > to dma device and check completions.  No locks required.

The lockless ring is like batching or caching for Tx packets. It can be directly
done in OVS, IMHO. For example, a Tx queue has a lockless ring, and Tx thread
inserts packets to the ring, and Rx thread consumes packets from the ring and
submits copy and polls completion.

Thanks,
Jiayu
> >
> 
> Thank you for the comments, Ilya.
> 
> Hi Jiayu, Maxime,
> 
> Could I request your opinions on this from the vhost library perspective ?
> 
> Thanks and regards,
> Sunil

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v2] tunnel: Remove padding from packet when encapsulating.

2022-02-09 Thread Tonghao Zhang
On Tue, Jan 25, 2022 at 3:16 PM Harold Huang  wrote:
>
> Looks good to me.
>
> Reviewed-by: Harold Huang 
Hi Ilya, do you have an opinion?

>  于2022年1月25日周二 13:25写道:
> >
> > From: Tonghao Zhang 
> >
> > The old version of openvswitch doesn't remove the padding from
> > packet before L3+ conntrack processing and then packets are dropped
> > in linux kernel stack. The patch [1] fixes the issue. We fix this
> > issue on gateway which running ovs-dpdk as a quick workaround. Padding
> > should be removed because tunnel size + inner size > 64B.
> > More detailes, see [1]
> >
> > [1] - 
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9382fe71c0058465e942a633869629929102843d
> > Signed-off-by: Tonghao Zhang 
> > ---
> > v2: add OVS_UNLIKELY
> > v1: this version was submitted a year ago
> > http://patchwork.ozlabs.org/project/openvswitch/patch/20201214030936.87354-1-xiangxia.m@gmail.com/
> > ---
> >  lib/netdev-native-tnl.c | 4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> > index b89dfdd52a86..e23d71d4aec1 100644
> > --- a/lib/netdev-native-tnl.c
> > +++ b/lib/netdev-native-tnl.c
> > @@ -149,11 +149,15 @@ void *
> >  netdev_tnl_push_ip_header(struct dp_packet *packet,
> > const void *header, int size, int *ip_tot_size)
> >  {
> > +int padding = dp_packet_l2_pad_size(packet);
> >  struct eth_header *eth;
> >  struct ip_header *ip;
> >  struct ovs_16aligned_ip6_hdr *ip6;
> >
> >  eth = dp_packet_push_uninit(packet, size);
> > +if (OVS_UNLIKELY(padding)) {
> > +dp_packet_set_size(packet, dp_packet_size(packet) - padding);
> > +}
> >  *ip_tot_size = dp_packet_size(packet) - sizeof (struct eth_header);
> >
> >  memcpy(eth, header, size);
> > --
> > 2.27.0
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev



-- 
Best regards, Tonghao
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-02-09 Thread Jakub Kicinski
On Wed, 9 Feb 2022 12:46:01 -0800 Cpp Code wrote:
> > ok, I see advantage of using skb_header_pointer() in this case, but
> > replacing all check_header() with skb_header_pointer() would add lot
> > of copy operation in flow extract. Anyways for this use case
> > skb_header_pointer() is fine.
> >
> > Acked-by: Pravin B Shelar   
> 
> Could this be applied please.

Please repost with Pravin's ack included.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS with Bifurcation Example

2022-02-09 Thread Tonghao Zhang
On Wed, Feb 9, 2022 at 11:52 PM Lazuardi Nasution
 wrote:
>
> HI Tonghao,
>
> Do you know how to enable isolate mode on OVS-DPDK? By searching 
> "rte_flow_isolate" on OVS source code, I think OVS-DPDK doesn't support it 
> right now.
The patchset [1] is not accepted in upstream(I plan to send v2), you
should download the [1] and apply them to your repo.

and try the command
ovs-vsctl set interface  options:isolate-offload-mode=true

[1] - http://patchwork.ozlabs.org/project/openvswitch/list/?series=220295
> Best regards.
>
> On Wed, Feb 9, 2022 at 8:56 PM Tonghao Zhang  wrote:
>>
>> On Tue, Feb 8, 2022 at 4:34 PM Lazuardi Nasution  
>> wrote:
>> >
>> > Hi Tonghao,
>> >
>> > How can I know if my driver support it. i have ConnectX-5 EN, ConnectX-6 
>> > VPI and Connect-6 DX.
>> I tested this patch on Connect-6 DX. but now there is no hw NIC for me
>> to test again.
>> if you enable the isolate mode on hw nic, then you can install the
>> flows, for example, to match the tunnel packet via dpctl command with
>> --consistent option.
>> http://patchwork.ozlabs.org/project/openvswitch/patch/20201214022001.84273-3-xiangxia.m@gmail.com/
>>
>> > Best regards.
>> >
>> > On Tue, Feb 8, 2022, 3:27 PM Tonghao Zhang  
>> > wrote:
>> >>
>> >> On Tue, Feb 8, 2022 at 4:16 PM Lazuardi Nasution  
>> >> wrote:
>> >> >
>> >> > Hi Tonghao,
>> >> >
>> >> > Do you have any example or tutorial regarding this?
>> >> No, but I plan to write a doc about this. The patch 1/4 only enable
>> >> isolate_offload_mode of dpdk netdev.
>> >> If your driver support isolate mode of dpdk, it can work in ovs.
>> >> > Best regards.
>> >> >
>> >> >
>> >> > On Tue, Feb 8, 2022, 2:51 PM Tonghao Zhang  
>> >> > wrote:
>> >> >>
>> >> >> On Sat, Feb 5, 2022 at 8:59 PM Lazuardi Nasution 
>> >> >>  wrote:
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > Is there any example about how to use OVS with Bifurcation driver? I 
>> >> >> > have
>> >> >> > read that with Intel NIC I should create specific VF for that, but 
>> >> >> > with
>> >> >> > Mellanox NIC it is not required to have VF.
>> >> >> Hi
>> >> >> there is a patchset, I hope it can help you.
>> >> >> http://patchwork.ozlabs.org/project/openvswitch/list/?series=220295
>> >> >> > Best regards.
>> >> >>
>> >> >> Best regards, Tonghao
>> >>
>> >>
>> >>
>> >> --
>> >> Best regards, Tonghao
>>
>>
>>
>> --
>> Best regards, Tonghao



-- 
Best regards, Tonghao
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: add ovn-set-local-ip option

2022-02-09 Thread Vladislav Odintsov
Hi Mohammad,

thanks for the review!

I’ve added the requested testcase and submitted the v2:
https://patchwork.ozlabs.org/project/ovn/patch/20220209233314.51948-1-odiv...@gmail.com/

Regards,
Vladislav Odintsov

> On 6 Feb 2022, at 12:59, Mohammad Heib  wrote:
> 
> Hi Vladislav,
> 
> your change looks good to me.
> just if you can add a small test under ./tests to validate your change and
> make sure that future changes will not break your change (see 
> ./tests/ovn-ipsec.at  for a good example).
> 
> Thanks,
> 
> On 1/27/22 18:11, Vladislav Odintsov wrote:
>> When transport node has multiple interfaces (vlans) and
>> ovn-encap-ip on different hosts need to be configured
>> from different VLANs source IP for encapsulated packet
>> can be not the same, which is expected by remote system.
>> 
>> Explicitely setting local_ip resolves such problem.
>> 
>> Signed-off-by: Vladislav Odintsov
>> ---
>>  controller/encaps.c | 37 +
>>  controller/ovn-controller.8.xml |  7 +++
>>  2 files changed, 31 insertions(+), 13 deletions(-)
>> 
>> diff --git a/controller/encaps.c b/controller/encaps.c
>> index 66e0cd8cd..3b0c92931 100644
>> --- a/controller/encaps.c
>> +++ b/controller/encaps.c
>> @@ -23,6 +23,7 @@
>>  #include "openvswitch/vlog.h"
>>  #include "lib/ovn-sb-idl.h"
>>  #include "ovn-controller.h"
>> +#include "smap.h"
>>VLOG_DEFINE_THIS_MODULE(encaps);
>>  @@ -176,8 +177,31 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
>> sbrec_sb_global *sbg,
>>  smap_add(, "dst_port", dst_port);
>>  }
>>  +const struct ovsrec_open_vswitch *cfg =
>> +ovsrec_open_vswitch_table_first(ovs_table);
>> +
>> +bool set_local_ip = false;
>> +if (cfg) {
>> +/* If the tos option is configured, get it */
>> +const char *encap_tos = smap_get_def(>external_ids,
>> +   "ovn-encap-tos", "none");
>> +
>> +if (encap_tos && strcmp(encap_tos, "none")) {
>> +smap_add(, "tos", encap_tos);
>> +}
>> +
>> +/* If ovn-set-local-ip option is configured, get it */
>> +set_local_ip = smap_get_bool(>external_ids, "ovn-set-local-ip",
>> + false);
>> +}
>> +
>>  /* Add auth info if ipsec is enabled. */
>>  if (sbg->ipsec) {
>> +set_local_ip = true;
>> +smap_add(, "remote_name", new_chassis_id);
>> +}
>> +
>> +if (set_local_ip) {
>>  const struct sbrec_chassis *this_chassis = tc->this_chassis;
>>  const char *local_ip = NULL;
>>  @@ -200,19 +224,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
>> sbrec_sb_global *sbg,
>>  if (local_ip) {
>>  smap_add(, "local_ip", local_ip);
>>  }
>> -smap_add(, "remote_name", new_chassis_id);
>> -}
>> -
>> -const struct ovsrec_open_vswitch *cfg =
>> -ovsrec_open_vswitch_table_first(ovs_table);
>> -/* If the tos option is configured, get it */
>> -if (cfg) {
>> -const char *encap_tos = smap_get_def(>external_ids,
>> -   "ovn-encap-tos", "none");
>> -
>> -if (encap_tos && strcmp(encap_tos, "none")) {
>> -smap_add(, "tos", encap_tos);
>> -}
>>  }
>>/* If there's an existing chassis record that does not need any 
>> change,
>> diff --git a/controller/ovn-controller.8.xml 
>> b/controller/ovn-controller.8.xml
>> index e9708fe64..cc9a7d1c2 100644
>> --- a/controller/ovn-controller.8.xml
>> +++ b/controller/ovn-controller.8.xml
>> @@ -304,6 +304,13 @@
>>  of how many entries there are in the cache.  By default this is set 
>> to
>>  3 (30 seconds).
>>
>> +  external_ids:ovn-set-local-ip
>> +  
>> +The boolean flag indicates if ovn-controller when 
>> create
>> +tunnel ports should set local_ip parameter.  Can be
>> +heplful to pin source outer IP for the tunnel when multiple 
>> interfaces
>> +are used on the host for overlay traffic.
>> +  
>>  
>>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2] controller: add ovn-set-local-ip option

2022-02-09 Thread Vladislav Odintsov
When transport node has multiple interfaces (vlans) and
ovn-encap-ip on different hosts need to be configured
from different VLANs source IP for encapsulated packet
can be not the same, which is expected by remote system.

Explicitely setting local_ip resolves such problem.

Signed-off-by: Vladislav Odintsov 
---
 controller/encaps.c | 37 +
 controller/ovn-controller.8.xml |  7 +++
 tests/ovn-controller.at |  9 
 3 files changed, 40 insertions(+), 13 deletions(-)

diff --git a/controller/encaps.c b/controller/encaps.c
index 66e0cd8cd..3b0c92931 100644
--- a/controller/encaps.c
+++ b/controller/encaps.c
@@ -23,6 +23,7 @@
 #include "openvswitch/vlog.h"
 #include "lib/ovn-sb-idl.h"
 #include "ovn-controller.h"
+#include "smap.h"
 
 VLOG_DEFINE_THIS_MODULE(encaps);
 
@@ -176,8 +177,31 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 smap_add(, "dst_port", dst_port);
 }
 
+const struct ovsrec_open_vswitch *cfg =
+ovsrec_open_vswitch_table_first(ovs_table);
+
+bool set_local_ip = false;
+if (cfg) {
+/* If the tos option is configured, get it */
+const char *encap_tos = smap_get_def(>external_ids,
+   "ovn-encap-tos", "none");
+
+if (encap_tos && strcmp(encap_tos, "none")) {
+smap_add(, "tos", encap_tos);
+}
+
+/* If ovn-set-local-ip option is configured, get it */
+set_local_ip = smap_get_bool(>external_ids, "ovn-set-local-ip",
+ false);
+}
+
 /* Add auth info if ipsec is enabled. */
 if (sbg->ipsec) {
+set_local_ip = true;
+smap_add(, "remote_name", new_chassis_id);
+}
+
+if (set_local_ip) {
 const struct sbrec_chassis *this_chassis = tc->this_chassis;
 const char *local_ip = NULL;
 
@@ -200,19 +224,6 @@ tunnel_add(struct tunnel_ctx *tc, const struct 
sbrec_sb_global *sbg,
 if (local_ip) {
 smap_add(, "local_ip", local_ip);
 }
-smap_add(, "remote_name", new_chassis_id);
-}
-
-const struct ovsrec_open_vswitch *cfg =
-ovsrec_open_vswitch_table_first(ovs_table);
-/* If the tos option is configured, get it */
-if (cfg) {
-const char *encap_tos = smap_get_def(>external_ids,
-   "ovn-encap-tos", "none");
-
-if (encap_tos && strcmp(encap_tos, "none")) {
-smap_add(, "tos", encap_tos);
-}
 }
 
 /* If there's an existing chassis record that does not need any change,
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index e9708fe64..cc9a7d1c2 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -304,6 +304,13 @@
 of how many entries there are in the cache.  By default this is set to
 3 (30 seconds).
   
+  external_ids:ovn-set-local-ip
+  
+The boolean flag indicates if ovn-controller when create
+tunnel ports should set local_ip parameter.  Can be
+heplful to pin source outer IP for the tunnel when multiple interfaces
+are used on the host for overlay traffic.
+  
 
 
 
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 2f39e5f3e..9e6302e5a 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -298,6 +298,15 @@ OVS_WAIT_UNTIL([check_tunnel_property type geneve])
 ovs-vsctl del-port ovn-fakech-0
 OVS_WAIT_UNTIL([check_tunnel_property type geneve])
 
+# set `ovn-set-local-ip` option to true and check if tunnel parameters
+OVS_WAIT_WHILE([check_tunnel_property options:local_ip "\"192.168.0.1\""])
+ovs-vsctl set open . external_ids:ovn-set-local-ip=true
+OVS_WAIT_UNTIL([check_tunnel_property options:local_ip "\"192.168.0.1\""])
+
+# Change the local_ip on the OVS side and check than OVN fixes it
+ovs-vsctl set interface ovn-fakech-0 options:local_ip="1.1.1.1"
+OVS_WAIT_UNTIL([check_tunnel_property options:local_ip "\"192.168.0.1\""])
+
 # Gracefully terminate daemons
 OVN_CLEANUP_SBOX([hv])
 OVN_CLEANUP_VSWITCH([main])
-- 
2.30.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofproto: fix ipfix not always sampling on egress

2022-02-09 Thread Ilya Maximets
On 1/25/22 10:09, Eelco Chaudron wrote:
> 
> 
> On 24 Jan 2022, at 12:50, Adrian Moreno wrote:
> 
>> We are currently requiring in_port to be a valid port number for ipfix
>> sampling even if the sampling is done on the output port (egress).
>>
>> This restriction is not really needed and can affect pipelines that
>> intentionally set the in_port to OFPP_NONE during flow processing. For
>> instance, OVN does this, see:
>>
>> cfa547821 Fix ovn-controller generated packets from getting dropped for
>> reject ACL action.
>>
>> This patch skips ipfix sampling only if both (ingress and egress) ports
>> are invalid.
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2016346
>> Signed-off-by: Adrian Moreno 
>> ---
> 
> Thanks for adding the test. Reviewed and tested and it looks good!
> 
> Acked-by: Eelco Chaudron 

Thanks, Adrian and Eelco!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next v8] net: openvswitch: IPv6: Add IPv6 extension header support

2022-02-09 Thread Cpp Code
On Thu, Dec 9, 2021 at 11:36 PM Pravin Shelar  wrote:
>
> ()
>
> On Mon, Dec 6, 2021 at 3:00 PM Cpp Code  wrote:
> >
> > On Thu, Dec 2, 2021 at 9:28 PM Pravin Shelar  wrote:
> > >
> > > On Thu, Dec 2, 2021 at 12:20 PM Cpp Code  wrote:
> > > >
> > > > On Wed, Dec 1, 2021 at 11:34 PM Pravin Shelar  
> > > > wrote:
> > > > >
> > > > > On Wed, Nov 24, 2021 at 11:33 AM Toms Atteka  
> > > > > wrote:
> > > > > >
> > > > > > This change adds a new OpenFlow field OFPXMT_OFB_IPV6_EXTHDR and
> > > > > > packets can be filtered using ipv6_ext flag.
> > > > > >
> > > > > > Signed-off-by: Toms Atteka 
> > > > > > ---
> > > > > >  include/uapi/linux/openvswitch.h |   6 ++
> > > > > >  net/openvswitch/flow.c   | 140 
> > > > > > +++
> > > > > >  net/openvswitch/flow.h   |  14 
> > > > > >  net/openvswitch/flow_netlink.c   |  26 +-
> > > > > >  4 files changed, 184 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/include/uapi/linux/openvswitch.h 
> > > > > > b/include/uapi/linux/openvswitch.h
> > > > > > index a87b44cd5590..43790f07e4a2 100644
> > > > > > --- a/include/uapi/linux/openvswitch.h
> > > > > > +++ b/include/uapi/linux/openvswitch.h
> > > > > > @@ -342,6 +342,7 @@ enum ovs_key_attr {
> > > > > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4,   /* struct 
> > > > > > ovs_key_ct_tuple_ipv4 */
> > > > > > OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6,   /* struct 
> > > > > > ovs_key_ct_tuple_ipv6 */
> > > > > > OVS_KEY_ATTR_NSH,   /* Nested set of ovs_nsh_key_* */
> > > > > > +   OVS_KEY_ATTR_IPV6_EXTHDRS,  /* struct ovs_key_ipv6_exthdr */
> > > > > >
> > > > > >  #ifdef __KERNEL__
> > > > > > OVS_KEY_ATTR_TUNNEL_INFO,  /* struct ip_tunnel_info */
> > > > > > @@ -421,6 +422,11 @@ struct ovs_key_ipv6 {
> > > > > > __u8   ipv6_frag;   /* One of OVS_FRAG_TYPE_*. */
> > > > > >  };
> > > > > >
> > > > > > +/* separate structure to support backward compatibility with older 
> > > > > > user space */
> > > > > > +struct ovs_key_ipv6_exthdrs {
> > > > > > +   __u16  hdrs;
> > > > > > +};
> > > > > > +
> > > > > >  struct ovs_key_tcp {
> > > > > > __be16 tcp_src;
> > > > > > __be16 tcp_dst;
> > > > > > diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> > > > > > index 9d375e74b607..28acb40437ca 100644
> > > > > > --- a/net/openvswitch/flow.c
> > > > > > +++ b/net/openvswitch/flow.c
> > > > > > @@ -239,6 +239,144 @@ static bool icmphdr_ok(struct sk_buff *skb)
> > > > > >   sizeof(struct icmphdr));
> > > > > >  }
> > > > > >
> > > > > > +/**
> > > > > > + * get_ipv6_ext_hdrs() - Parses packet and sets IPv6 extension 
> > > > > > header flags.
> > > > > > + *
> > > > > > + * @skb: buffer where extension header data starts in packet
> > > > > > + * @nh: ipv6 header
> > > > > > + * @ext_hdrs: flags are stored here
> > > > > > + *
> > > > > > + * OFPIEH12_UNREP is set if more than one of a given IPv6 
> > > > > > extension header
> > > > > > + * is unexpectedly encountered. (Two destination options headers 
> > > > > > may be
> > > > > > + * expected and would not cause this bit to be set.)
> > > > > > + *
> > > > > > + * OFPIEH12_UNSEQ is set if IPv6 extension headers were not in the 
> > > > > > order
> > > > > > + * preferred (but not required) by RFC 2460:
> > > > > > + *
> > > > > > + * When more than one extension header is used in the same packet, 
> > > > > > it is
> > > > > > + * recommended that those headers appear in the following order:
> > > > > > + *  IPv6 header
> > > > > > + *  Hop-by-Hop Options header
> > > > > > + *  Destination Options header
> > > > > > + *  Routing header
> > > > > > + *  Fragment header
> > > > > > + *  Authentication header
> > > > > > + *  Encapsulating Security Payload header
> > > > > > + *  Destination Options header
> > > > > > + *  upper-layer header
> > > > > > + */
> > > > > > +static void get_ipv6_ext_hdrs(struct sk_buff *skb, struct ipv6hdr 
> > > > > > *nh,
> > > > > > + u16 *ext_hdrs)
> > > > > > +{
> > > > > > +   u8 next_type = nh->nexthdr;
> > > > > > +   unsigned int start = skb_network_offset(skb) + 
> > > > > > sizeof(struct ipv6hdr);
> > > > > > +   int dest_options_header_count = 0;
> > > > > > +
> > > > > > +   *ext_hdrs = 0;
> > > > > > +
> > > > > > +   while (ipv6_ext_hdr(next_type)) {
> > > > > > +   struct ipv6_opt_hdr _hdr, *hp;
> > > > > > +
> > > > > > +   switch (next_type) {
> > > > > > +   case IPPROTO_NONE:
> > > > > > +   *ext_hdrs |= OFPIEH12_NONEXT;
> > > > > > +   /* stop parsing */
> > > > > > +   return;
> > > > > > +
> > > > > > +   case IPPROTO_ESP:
> > > > > > +   if (*ext_hdrs & OFPIEH12_ESP)
> > > > > > +   *ext_hdrs |= OFPIEH12_UNREP;
> > > > > > +   

Re: [ovs-dev] [PATCH ovn v2] northd: Document --dummy-numa option.

2022-02-09 Thread Numan Siddique
On Tue, Feb 8, 2022 at 12:30 PM Mark Michelson  wrote:
>
> Signed-off-by: Mark Michelson 

LGTM.

Acked-by: Numan Siddique 

Numan



> ---
> v1 -> v2:
> * Fixed a mismatched  tag in the XML
> ---
>  northd/ovn-northd.8.xml | 49 +
>  northd/ovn-northd.c |  1 +
>  2 files changed, 50 insertions(+)
>
> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
> index 79f35bc16..cd2617b0a 100644
> --- a/northd/ovn-northd.8.xml
> +++ b/northd/ovn-northd.8.xml
> @@ -68,6 +68,55 @@
>restarting a process or disturbing a running system.
>  
>
> +  --dummy-numa
> +  
> +
> +  Typically, OVS uses sysfs to determine the number of NUMA nodes and
> +  CPU cores that are available on a machine. The parallelization code
> +  in OVN uses this information to determine if there are enough
> +  resources to use parallelization. The current algorithm enables
> +  parallelization if the total number of CPU cores divided by the
> +  number of NUMA nodes is greater than or equal to four.
> +
> +
> +
> +  In certain situations, it may be desirable to enable 
> parallelization
> +  on a system that otherwise would not have it allowed. The
> +  --dummy-numa option allows for you to fake the NUMA
> +  nodes and cores that OVS thinks your system has. The syntax 
> consists
> +  of using numbers to represent the NUMA node IDs. The number of 
> times
> +  that a NUMA node ID appears represents how many CPU cores that NUMA
> +  node contains. So for instance, if you did the following:
> +
> +
> +
> +  --dummy-numa=0,0,0,0
> +
> +
> +
> +  it would make OVS assume that you have a single NUMA node with ID 
> 0,
> +  and that NUMA node consists of four CPU cores. Similarly, you could
> +  do:
> +
> +
> +
> +  --dummy-numa=0,0,0,0,0,0,1,1,1,1,1,1
> +
> +
> +
> +  to make OVS assume you have two NUMA nodes with IDs 0 and 1, each
> +  with six CPU cores.
> +
> +
> +
> +  Currently, the only affect this option has is on whether
> +  parallelization can be enabled in ovn-northd. There are no NUMA 
> node
> +  or CPU core-specific actions performed by OVN. Setting
> +  --dummy-numa in ovn-northd does not affect how other 
> OVS
> +  processes on the system (such as ovs-vswitchd) count the number of
> +  NUMA nodes and CPU cores; this setting is local to ovn-northd.
> +
> +  
>  
>  
>database in the above options must be an OVSDB active or
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 80303503a..44d4ca706 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -525,6 +525,7 @@ Options:\n\
>--ovnsb-db=DATABASE   connect to ovn-sb database at DATABASE\n\
>  (default: %s)\n\
>--dry-run start in paused state (do not commit db 
> changes)\n\
> +  --dummy-numa  override default NUMA node and CPU core 
> discovery\n\
>--unixctl=SOCKET  override default control socket name\n\
>-h, --helpdisplay this help message\n\
>-o, --options list available options\n\
> --
> 2.31.1
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2 11/11] ovn-controller: Handle addresses addition in address set incrementally.

2022-02-09 Thread 0-day Robot
Bleep bloop.  Greetings Han Zhou, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#233 FILE: controller/lflow.c:655:
 * XXX: if necessary, we can optimize this by checking all the address set

Lines checked: 721, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 04/11] ovn-controller.c: Remove unnecessary asserts and useless variables.

2022-02-09 Thread Han Zhou
On Tue, Feb 8, 2022 at 11:03 PM 0-day Robot  wrote:
>
> Bleep bloop.  Greetings Han Zhou, I am a robot and I have tried out your
patch.
> Thanks for your contribution.
>
> I encountered some error that I wasn't expecting.  See the details below.
>
>
> build:
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.   -I ./include  -I ./include -I ./ovn
-I ./include -I ./lib -I ./lib -I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
   -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
-Wformat -Wformat-security -Wswitch-enum -Wunused-parameter
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing
-Wshadow -Werror -Werror  -g -O2 -MT controller/lport.o -MD -MP -MF
$depbase.Tpo -c -o controller/lport.o controller/lport.c &&\
> mv -f $depbase.Tpo $depbase.Po
> depbase=`echo controller/ofctrl.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.   -I ./include  -I ./include -I ./ovn
-I ./include -I ./lib -I ./lib -I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
   -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
-Wformat -Wformat-security -Wswitch-enum -Wunused-parameter
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing
-Wshadow -Werror -Werror  -g -O2 -MT controller/ofctrl.o -MD -MP -MF
$depbase.Tpo -c -o controller/ofctrl.o controller/ofctrl.c &&\
> mv -f $depbase.Tpo $depbase.Po
> depbase=`echo controller/pinctrl.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.   -I ./include  -I ./include -I ./ovn
-I ./include -I ./lib -I ./lib -I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
   -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
-Wformat -Wformat-security -Wswitch-enum -Wunused-parameter
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing
-Wshadow -Werror -Werror  -g -O2 -MT controller/pinctrl.o -MD -MP -MF
$depbase.Tpo -c -o controller/pinctrl.o controller/pinctrl.c &&\
> mv -f $depbase.Tpo $depbase.Po
> depbase=`echo controller/patch.o | sed 's|[^/]*$|.deps/&|;s|\.o$||'`;\
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.   -I ./include  -I ./include -I ./ovn
-I ./include -I ./lib -I ./lib -I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/lib
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
-I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR
   -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
-Wformat -Wformat-security -Wswitch-enum -Wunused-parameter
-Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition
-Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing
-Wshadow -Werror -Werror  -g -O2 -MT controller/patch.o -MD -MP -MF
$depbase.Tpo -c -o controller/patch.o controller/patch.c &&\
> mv -f $depbase.Tpo $depbase.Po
> depbase=`echo controller/ovn-controller.o | sed
's|[^/]*$|.deps/&|;s|\.o$||'`;\
> gcc -std=gnu99 -DHAVE_CONFIG_H -I.   -I ./include  -I ./include -I ./ovn
-I ./include -I ./lib -I ./lib -I
/var/lib/jenkins/jobs/0day_robot_upstream_build_ovn_from_pw/workspace/OVSDIR/include
-I

[ovs-dev] [PATCH ovn v2 11/11] ovn-controller: Handle addresses addition in address set incrementally.

2022-02-09 Thread Han Zhou
To avoid reprocessing the lflow when a referenced address set has new
addresses added, this patch generates a fake address set that only
contains the added addresses for flow generation, and then eliminates
the flows that are not related to the newly added addresses.

Scale test shows obvious performance gains because the time complexity
changed from O(n) to O(1). The bigger the size of address set, the more
CPU savings. With the AS size of 10k, the test shows ~40x speed up.

Test setup:
CPU: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz.
5 ACL all referencing an address set of 10,000 IPs.

Measure the time spent by ovn-controller for handling one IP
addition from the address set.

Before: ~400ms
After: 11-12ms

Signed-off-by: Han Zhou 
---
 controller/lflow-conj-ids.c |  20 ++
 controller/lflow-conj-ids.h |   1 +
 controller/lflow.c  | 360 ++--
 controller/ovn-controller.c |   8 +
 include/ovn/expr.h  |   1 +
 lib/expr.c  |   2 +-
 tests/ovn-controller.at |  41 ++--
 7 files changed, 397 insertions(+), 36 deletions(-)

diff --git a/controller/lflow-conj-ids.c b/controller/lflow-conj-ids.c
index bfe63862a..372391e5a 100644
--- a/controller/lflow-conj-ids.c
+++ b/controller/lflow-conj-ids.c
@@ -157,6 +157,26 @@ lflow_conj_ids_alloc_specified(struct conj_ids *conj_ids,
 return true;
 }
 
+/* Find and return the start id that is allocated to the logical flow. Return
+ * 0 if not found. */
+uint32_t
+lflow_conj_ids_find(struct conj_ids *conj_ids, const struct uuid *lflow_uuid)
+{
+struct lflow_conj_node *lflow_conj;
+bool found = false;
+HMAP_FOR_EACH_WITH_HASH (lflow_conj, hmap_node, uuid_hash(lflow_uuid),
+ _ids->lflow_conj_ids) {
+if (uuid_equals(_conj->lflow_uuid, lflow_uuid)) {
+found = true;
+break;
+}
+}
+if (!found) {
+return 0;
+}
+return lflow_conj->start_conj_id;
+}
+
 /* Frees the conjunction IDs used by lflow_uuid. */
 void
 lflow_conj_ids_free(struct conj_ids *conj_ids, const struct uuid *lflow_uuid)
diff --git a/controller/lflow-conj-ids.h b/controller/lflow-conj-ids.h
index 6da0a612c..a1f7a95a5 100644
--- a/controller/lflow-conj-ids.h
+++ b/controller/lflow-conj-ids.h
@@ -35,6 +35,7 @@ bool lflow_conj_ids_alloc_specified(struct conj_ids *,
 const struct uuid *lflow_uuid,
 uint32_t start_conj_id, uint32_t n_conjs);
 void lflow_conj_ids_free(struct conj_ids *, const struct uuid *lflow_uuid);
+uint32_t lflow_conj_ids_find(struct conj_ids *, const struct uuid *lflow_uuid);
 void lflow_conj_ids_init(struct conj_ids *);
 void lflow_conj_ids_destroy(struct conj_ids *);
 void lflow_conj_ids_clear(struct conj_ids *);
diff --git a/controller/lflow.c b/controller/lflow.c
index 9b1184c0e..658b3109b 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -74,6 +74,19 @@ struct condition_aux {
 struct lflow_resource_ref *lfrr;
 };
 
+static struct expr *
+convert_match_to_expr(const struct sbrec_logical_flow *,
+  const struct sbrec_datapath_binding *,
+  struct expr **prereqs, const struct shash *addr_sets,
+  const struct shash *port_groups,
+  struct lflow_resource_ref *, bool *pg_addr_set_ref);
+static void
+add_matches_to_flow_table(const struct sbrec_logical_flow *,
+  const struct sbrec_datapath_binding *,
+  struct hmap *matches, uint8_t ptable,
+  uint8_t output_ptable, struct ofpbuf *ovnacts,
+  bool ingress, struct lflow_ctx_in *,
+  struct lflow_ctx_out *);
 static void
 consider_logical_flow(const struct sbrec_logical_flow *lflow,
   struct hmap *dhcp_opts, struct hmap *dhcpv6_opts,
@@ -513,6 +526,262 @@ as_info_from_expr_const(const char *as_name, const union 
expr_constant *c,
 return true;
 }
 
+/* Parses the lflow regarding the changed address set 'as_name', and gererates
+ * ovs flows for the newly added addresses in 'as_diff_added' only. It is
+ * similar to consider_logical_flow__, with the below differences:
+ *
+ * - It has one more arg 'as_ref_count' to deduce how many flows are expected
+ *   to be added.
+ * - It uses a small fake address set that contains only the added addresses
+ *   to replace the original address set temporarily and restores it after
+ *   parsing.
+ * - It doesn't check or touch lflow-cache, because lflow-cache is disabled
+ *   when address-sets/port-groups are used.
+ * - It doesn't check non-local lports because it should have been checked
+ *   when the lflow is initially parsed, and if it is non-local and skipped
+ *   then it wouldn't have the address set parsed and referenced.
+ *
+ * Because of these differences, it is just cleaner to keep it as a separate
+ * function. */
+static bool

[ovs-dev] [PATCH ovn v2 08/11] ovn-controller: Add tests for different ACL address set usage patterns.

2022-02-09 Thread Han Zhou
Add test cases to prepare for fine-grained address set incremental
processing. Also add coverage counters for consider_logical_flow and
check in the test cases.

Signed-off-by: Han Zhou 
---
 controller/lflow.c  |2 +
 tests/ovn-controller.at | 1177 +++
 2 files changed, 1179 insertions(+)

diff --git a/controller/lflow.c b/controller/lflow.c
index 2a75af5bd..b5254bd7c 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -41,6 +41,7 @@
 VLOG_DEFINE_THIS_MODULE(lflow);
 
 COVERAGE_DEFINE(lflow_run);
+COVERAGE_DEFINE(consider_logical_flow);
 
 /* Symbol table. */
 
@@ -1081,6 +1082,7 @@ consider_logical_flow(const struct sbrec_logical_flow 
*lflow,
 }
 ovs_assert(!dp_group || !dp);
 
+COVERAGE_INC(consider_logical_flow);
 if (!is_recompute) {
 ovs_assert(!lflows_processed_find(l_ctx_out->lflows_processed,
   >header_.uuid));
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 2f39e5f3e..b16162ef4 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -853,3 +853,1180 @@ OVS_WAIT_UNTIL([as hv1 ovs-ofctl dump-flows br-int | 
grep table=38 | grep -q "re
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
+
+AT_SETUP([ovn-controller - I-P for address set update: no conjunction])
+AT_KEYWORDS([as-i-p])
+
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+check ovs-vsctl -- add-port br-int hv1-vif1 -- \
+set interface hv1-vif1 external-ids:iface-id=ls1-lp1
+
+check ovn-nbctl ls-add ls1
+
+check ovn-nbctl lsp-add ls1 ls1-lp1 \
+-- lsp-set-addresses ls1-lp1 "f0:00:00:00:00:01"
+
+wait_for_ports_up
+ovn-appctl -t ovn-controller vlog/set file:dbg
+
+dp_key=$(printf "%x" $(fetch_column datapath tunnel_key external_ids:name=ls1))
+port_key=$(printf "%x" $(fetch_column port_binding tunnel_key 
logical_port=ls1-lp1))
+
+read_counter() {
+ovn-appctl -t ovn-controller coverage/read-counter $1
+}
+
+ovn-nbctl create address_set name=as1
+check ovn-nbctl acl-add ls1 to-lport 100 'outport == "ls1-lp1" && ip4.src == 
$as1' drop
+
+# Add IPs to as1 for 10 times, 1 IP each time.
+reprocess_count_old=$(read_counter consider_logical_flow)
+
+for i in $(seq 10); do
+check ovn-nbctl add address_set as1 addresses 10.0.0.$i
+check ovn-nbctl --wait=hv sync
+if test "$i" = 3; then
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int 
table=44,reg15=0x$port_key | \
+grep -v reply | awk '{print $7, $8}' | sort], [0], [dnl
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.1 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.2 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.3 
actions=drop
+])
+fi
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int table=44 | grep -c 
"priority=1100"], [0], [$i
+])
+done
+
+reprocess_count_new=$(read_counter consider_logical_flow)
+AT_CHECK([echo $(($reprocess_count_new - $reprocess_count_old))], [0], [10
+])
+
+# Remove the IPs from as1, 1 IP each time.
+reprocess_count_old=$(read_counter consider_logical_flow)
+
+for i in $(seq 10); do
+check ovn-nbctl remove address_set as1 addresses 10.0.0.$i
+check ovn-nbctl --wait=hv sync
+if test "$i" = 9; then
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int 
table=44,reg15=0x$port_key | \
+grep -v reply | awk '{print $7, $8}'], [0], [dnl
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.10 
actions=drop
+])
+fi
+if test "$i" = 10; then
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int table=44 | grep 
"priority=1100"], [1], [ignore])
+else
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int table=44 | grep -c 
"priority=1100"], [0], [$((10 - $i))
+])
+fi
+done
+
+reprocess_count_new=$(read_counter consider_logical_flow)
+AT_CHECK([echo $(($reprocess_count_new - $reprocess_count_old))], [0], [10
+])
+
+# Add IPs to as1 for 10 times, 2 IPs each time.
+reprocess_count_old=$(read_counter consider_logical_flow)
+
+for i in $(seq 10); do
+check ovn-nbctl add address_set as1 addresses 10.0.0.$i,10.0.1.$i
+check ovn-nbctl --wait=hv sync
+if test "$i" = 3; then
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows br-int 
table=44,reg15=0x$port_key | \
+grep -v reply | awk '{print $7, $8}' | sort], [0], [dnl
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.1 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.2 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.0.3 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.1.1 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.1.2 
actions=drop
+priority=1100,ip,reg15=0x$port_key,metadata=0x$dp_key,nw_src=10.0.1.3 
actions=drop
+])
+fi
+AT_CHECK_UNQUOTED([ovs-ofctl dump-flows 

[ovs-dev] [PATCH ovn v2 10/11] ovn-controller: Handle addresses deletion in address set incrementally.

2022-02-09 Thread Han Zhou
The cost of reprocessing a lflow referencing a big address set can be very
high. Now a single address change in an address set would cause the
related logical flows being reprocessed. When the change rate of an
address set is high, ovn-controller would be busy reprocessing lflows.

This patch handles addresses deletion incrementally. It deletes the
related flows for the deleted addresses only, without deleting
and recreating unrelated flows unnecessarily.

Scale test shows obvious performance gains because the time complexity
changed from O(n) to O(1). The bigger the size of address set, the more
CPU savings. With the AS size of 10k, the test shows ~40x speed up.

Test setup:
CPU: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz.
5 ACL all referencing an address set of 10,000 IPs.

Measure the time spent by ovn-controller for handling one IP deletion
from the address set.

Before: ~400ms
After: 11-12ms

There is memory cost increase, due to the index built to track each
individual IP. The total memory cost for the OF flows in ovn-controller
increased ~20% in the 10k AS size test.

Before:
ofctrl_desired_flow_usage-KB:22248
ofctrl_installed_flow_usage-KB:14850
ofctrl_sb_flow_ref_usage-KB:7208

After:
ofctrl_desired_flow_usage-KB:22248
ofctrl_installed_flow_usage-KB:14850
ofctrl_sb_flow_ref_usage-KB:15551

Signed-off-by: Han Zhou 
---
 controller/lflow.c  | 149 
 controller/lflow.h  |  10 +++
 controller/ofctrl.c |  71 +
 controller/ofctrl.h |   5 ++
 controller/ovn-controller.c |  25 --
 tests/ovn-controller.at |  20 +++--
 6 files changed, 265 insertions(+), 15 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 8064e3d43..9b1184c0e 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -485,6 +485,155 @@ lflow_handle_changed_flows(struct lflow_ctx_in *l_ctx_in,
 return ret;
 }
 
+static bool
+as_info_from_expr_const(const char *as_name, const union expr_constant *c,
+struct addrset_info *as_info)
+{
+as_info->name = as_name;
+as_info->ip = c->value.ipv6;
+if (c->masked) {
+as_info->mask = c->mask.ipv6;
+} else {
+/* Generate mask so that it is the same as what's added for
+ * expr->cmp.mask. See make_cmp__() in expr.c. */
+union mf_subvalue mask;
+memset(, 0, sizeof mask);
+if (c->format == LEX_F_IPV4) {
+mask.ipv4 = be32_prefix_mask(32);
+} else if (c->format == LEX_F_IPV6) {
+mask.ipv6 = ipv6_create_mask(128);
+} else if (c->format == LEX_F_ETHERNET) {
+mask.mac = eth_addr_exact;
+} else {
+/* Not an address */
+return false;
+}
+as_info->mask = mask.ipv6;
+}
+return true;
+}
+
+static bool
+as_update_can_be_handled(const char *as_name, struct addr_set_diff *as_diff,
+ struct lflow_ctx_in *l_ctx_in)
+{
+struct expr_constant_set *as = shash_find_data(l_ctx_in->addr_sets,
+   as_name);
+ovs_assert(as);
+size_t n_added = as_diff->added ? as_diff->added->n_values : 0;
+size_t n_deleted = as_diff->deleted ? as_diff->deleted->n_values : 0;
+size_t old_as_size = as->n_values + n_deleted - n_added;
+
+/* If the change may impact n_conj, i.e. the template of the flows would
+ * change, we must reprocess the lflow. */
+if (old_as_size <= 1 || as->n_values <= 1) {
+return false;
+}
+
+/* If the size of the diff is too big, reprocessing may be more
+ * efficient than incrementally processing the diffs.  */
+if ((n_added + n_deleted) >= as->n_values) {
+return false;
+}
+
+return true;
+}
+
+/* Handles address set update incrementally - processes only the diff
+ * (added/deleted) addresses in the address set. If it cannot handle the update
+ * incrementally, returns false, so that the caller will trigger reprocessing
+ * for the lflow.
+ *
+ * The reasons that the function returns false are:
+ *
+ * - The size of the address set changed to/from 0 or 1, which means the
+ *   'template' of the lflow translation is changed. In this case reprocessing
+ *   doesn't impact performance because the size of the address set is already
+ *   very small.
+ *
+ * - The size of the change is equal or bigger than the new size. In this case
+ *   it doesn't make sense to incrementally processing the changes because
+ *   reprocessing can be faster.
+ *
+ * - When the address set information couldn't be properly tracked during lflow
+ *   parsing. The typical cases are:
+ *
+ *  - The relational operator to the address set is not '=='. In this case
+ *there is no 1-1 mapping between the addresses and the flows
+ *generated.
+ *
+ *  - The sub expression of the address set is combined with other sub-
+ *expressions/constants, usually because of 

[ovs-dev] [PATCH ovn v2 09/11] lflow: Track reference count of address sets when parsing lflows.

2022-02-09 Thread Han Zhou
Not only track the address set references but also track how many times
each address set is referenced when parsing an lflow. This count will be
needed by address set incremental processing.

Signed-off-by: Han Zhou 
---
 controller/lflow.c | 32 ++--
 controller/lflow.h |  3 +++
 include/ovn/expr.h |  4 ++--
 lib/expr.c | 16 
 4 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index b5254bd7c..8064e3d43 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -90,7 +90,8 @@ static void lflows_processed_add(struct hmap 
*lflows_processed,
 static void lflows_processed_remove(struct hmap *lflows_processed,
 struct lflow_processed_node *node);
 static void lflow_resource_add(struct lflow_resource_ref *, enum ref_type,
-   const char *ref_name, const struct uuid *);
+   const char *ref_name, const struct uuid *,
+   size_t ref_count);
 static struct ref_lflow_node *ref_lflow_lookup(struct hmap *ref_lflow_table,
enum ref_type,
const char *ref_name);
@@ -115,7 +116,7 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
  * in the future when the lport's port binding changes, the logical flow
  * that references this lport can be reprocessed. */
 lflow_resource_add(aux->lfrr, REF_TYPE_PORTBINDING, port_name,
-   >lflow->header_.uuid);
+   >lflow->header_.uuid, 0);
 
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(aux->sbrec_port_binding_by_name, port_name);
@@ -131,7 +132,7 @@ lookup_port_cb(const void *aux_, const char *port_name, 
unsigned int *portp)
 struct ds mg_key = DS_EMPTY_INITIALIZER;
 get_mc_group_key(port_name, aux->dp->tunnel_key, _key);
 lflow_resource_add(aux->lfrr, REF_TYPE_MC_GROUP, ds_cstr(_key),
-   >lflow->header_.uuid);
+   >lflow->header_.uuid, 0);
 ds_destroy(_key);
 
 const struct sbrec_multicast_group *mg = mcgroup_lookup_by_dp_name(
@@ -173,7 +174,7 @@ is_chassis_resident_cb(const void *c_aux_, const char 
*port_name)
  * that in the future when the lport's port-binding changes the logical
  * flow that references this lport can be reprocessed. */
 lflow_resource_add(c_aux->lfrr, REF_TYPE_PORTBINDING, port_name,
-   _aux->lflow->header_.uuid);
+   _aux->lflow->header_.uuid, 0);
 
 const struct sbrec_port_binding *pb
 = lport_lookup_by_name(c_aux->sbrec_port_binding_by_name, port_name);
@@ -266,7 +267,8 @@ lflow_ref_lookup(struct hmap *lflow_ref_table,
 
 static void
 lflow_resource_add(struct lflow_resource_ref *lfrr, enum ref_type type,
-   const char *ref_name, const struct uuid *lflow_uuid)
+   const char *ref_name, const struct uuid *lflow_uuid,
+   size_t ref_count)
 {
 struct ref_lflow_node *rlfn = ref_lflow_lookup(>ref_lflow_table,
type, ref_name);
@@ -302,6 +304,7 @@ lflow_resource_add(struct lflow_resource_ref *lfrr, enum 
ref_type type,
 
 struct lflow_ref_list_node *lrln = xzalloc(sizeof *lrln);
 lrln->lflow_uuid = *lflow_uuid;
+lrln->ref_count = ref_count;
 lrln->rlfn = rlfn;
 hmap_insert(>lflow_uuids, >hmap_node, uuid_hash(lflow_uuid));
 ovs_list_push_back(>lflow_ref_head, >list_node);
@@ -750,7 +753,7 @@ convert_match_to_expr(const struct sbrec_logical_flow 
*lflow,
   struct lflow_resource_ref *lfrr,
   bool *pg_addr_set_ref)
 {
-struct sset addr_sets_ref = SSET_INITIALIZER(_sets_ref);
+struct shash addr_sets_ref = SHASH_INITIALIZER(_sets_ref);
 struct sset port_groups_ref = SSET_INITIALIZER(_groups_ref);
 char *error = NULL;
 
@@ -758,22 +761,23 @@ convert_match_to_expr(const struct sbrec_logical_flow 
*lflow,
port_groups, _sets_ref,
_groups_ref, dp->tunnel_key,
);
-const char *addr_set_name;
-SSET_FOR_EACH (addr_set_name, _sets_ref) {
-lflow_resource_add(lfrr, REF_TYPE_ADDRSET, addr_set_name,
-   >header_.uuid);
+struct shash_node *addr_sets_ref_node;
+SHASH_FOR_EACH (addr_sets_ref_node, _sets_ref) {
+lflow_resource_add(lfrr, REF_TYPE_ADDRSET, addr_sets_ref_node->name,
+   >header_.uuid,
+   *(size_t *)addr_sets_ref_node->data);
 }
 const char *port_group_name;
 SSET_FOR_EACH (port_group_name, _groups_ref) {
 lflow_resource_add(lfrr, REF_TYPE_PORTGROUP, port_group_name,
-   

[ovs-dev] [PATCH ovn v2 06/11] ovn-controller: Tracking SB address set updates.

2022-02-09 Thread Han Zhou
This patch tracks the added and removed addresses for each updated
address sets.

Signed-off-by: Han Zhou 
---
 controller/ovn-controller.c | 174 +---
 include/ovn/expr.h  |  10 ++-
 lib/expr.c  | 133 ---
 3 files changed, 234 insertions(+), 83 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index e73523ce2..24e09f78f 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1393,7 +1393,7 @@ struct ed_type_addr_sets {
 bool change_tracked;
 struct sset new;
 struct sset deleted;
-struct sset updated;
+struct shash updated;
 };
 
 static void *
@@ -1406,19 +1406,44 @@ en_addr_sets_init(struct engine_node *node OVS_UNUSED,
 as->change_tracked = false;
 sset_init(>new);
 sset_init(>deleted);
-sset_init(>updated);
+shash_init(>updated);
 return as;
 }
 
+struct addr_set_diff {
+struct expr_constant_set *added;
+struct expr_constant_set *deleted;
+};
+
+static void
+en_addr_sets_clear_tracked_data(void *data)
+{
+struct ed_type_addr_sets *as = data;
+sset_clear(>new);
+sset_clear(>deleted);
+struct shash_node *node, *next;
+SHASH_FOR_EACH_SAFE (node, next, >updated) {
+struct addr_set_diff *asd = node->data;
+expr_constant_set_destroy(asd->added);
+free(asd->added);
+expr_constant_set_destroy(asd->deleted);
+free(asd->deleted);
+}
+shash_clear_free_data(>updated);
+as->change_tracked = false;
+}
+
 static void
 en_addr_sets_cleanup(void *data)
 {
+en_addr_sets_clear_tracked_data(data);
+
 struct ed_type_addr_sets *as = data;
 expr_const_sets_destroy(>addr_sets);
 shash_destroy(>addr_sets);
 sset_destroy(>new);
 sset_destroy(>deleted);
-sset_destroy(>updated);
+shash_destroy(>updated);
 }
 
 /* Iterate address sets in the southbound database.  Create and update the
@@ -1437,8 +1462,8 @@ addr_sets_init(const struct sbrec_address_set_table 
*address_set_table,
 
 static void
 addr_sets_update(const struct sbrec_address_set_table *address_set_table,
- struct shash *addr_sets, struct sset *new,
- struct sset *deleted, struct sset *updated)
+ struct shash *addr_sets, struct sset *added,
+ struct sset *deleted, struct shash *updated)
 {
 const struct sbrec_address_set *as;
 SBREC_ADDRESS_SET_TABLE_FOR_EACH_TRACKED (as, address_set_table) {
@@ -1446,13 +1471,23 @@ addr_sets_update(const struct sbrec_address_set_table 
*address_set_table,
 expr_const_sets_remove(addr_sets, as->name);
 sset_add(deleted, as->name);
 } else {
-expr_const_sets_add_integers(addr_sets, as->name,
- (const char *const *) as->addresses,
- as->n_addresses);
-if (sbrec_address_set_is_new(as)) {
-sset_add(new, as->name);
+struct expr_constant_set *cs_old = shash_find_data(addr_sets,
+   as->name);
+if (!cs_old) {
+sset_add(added, as->name);
+expr_const_sets_add_integers(addr_sets, as->name,
+(const char *const *) as->addresses, as->n_addresses);
 } else {
-sset_add(updated, as->name);
+/* Find out the diff for the updated address set. */
+struct expr_constant_set *cs_new =
+expr_constant_set_create_integers(
+(const char *const *) as->addresses, as->n_addresses);
+struct addr_set_diff *as_diff = xmalloc(sizeof *as_diff);
+expr_constant_set_integers_diff(cs_old, cs_new,
+_diff->added,
+_diff->deleted);
+shash_add(updated, as->name, as_diff);
+expr_const_sets_add(addr_sets, as->name, cs_new);
 }
 }
 }
@@ -1463,9 +1498,6 @@ en_addr_sets_run(struct engine_node *node, void *data)
 {
 struct ed_type_addr_sets *as = data;
 
-sset_clear(>new);
-sset_clear(>deleted);
-sset_clear(>updated);
 expr_const_sets_destroy(>addr_sets);
 
 struct sbrec_address_set_table *as_table =
@@ -1483,10 +1515,6 @@ addr_sets_sb_address_set_handler(struct engine_node 
*node, void *data)
 {
 struct ed_type_addr_sets *as = data;
 
-sset_clear(>new);
-sset_clear(>deleted);
-sset_clear(>updated);
-
 struct sbrec_address_set_table *as_table =
 (struct sbrec_address_set_table *)EN_OVSDB_GET(
 engine_get_input("SB_address_set", node));
@@ -1495,7 +1523,7 @@ addr_sets_sb_address_set_handler(struct engine_node 
*node, void *data)
  >deleted, >updated);
 
 if 

[ovs-dev] [PATCH ovn v2 07/11] lflow.c: Set "changed" properly in lflow_handle_changed_ref().

2022-02-09 Thread Han Zhou
The current code sets changed to true only when a lflow is reprocessed,
if the lflow is deleted (not found) the changed would be false, even if
we have deleted desired flows. Fortunately this should not cause real
problems yet because if a lflow is deleted it should later trigger the
lflow_handle_changed_flows(), which always update the engine state. But
it is still better to set it properly in lflow_handle_changed_ref() in
the first placer.

Signed-off-by: Han Zhou 
---
 controller/lflow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 79652735c..2a75af5bd 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -516,6 +516,7 @@ lflow_handle_changed_ref(enum ref_type ref_type, const char 
*ref_name,
 if (ovs_list_is_empty(_todo)) {
 return true;
 }
+*changed = true;
 
 struct hmap dhcp_opts = HMAP_INITIALIZER(_opts);
 struct hmap dhcpv6_opts = HMAP_INITIALIZER(_opts);
@@ -584,7 +585,6 @@ lflow_handle_changed_ref(enum ref_type ref_type, const char 
*ref_name,
 consider_logical_flow(lflow, _opts, _opts,
   _ra_opts, _event_opts, false,
   l_ctx_in, l_ctx_out);
-*changed = true;
 }
 HMAP_FOR_EACH_SAFE (ofrn, ofrn_next, hmap_node, _remove_nodes) {
 hmap_remove(_remove_nodes, >hmap_node);
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 05/11] ovn-controller.c: Refactor init_lflow_ctx.

2022-02-09 Thread Han Zhou
The function gets rt_data and non_vif_data from args but all the other
data from engine inputs, which looks weird. Just unify the pattern and
get all data from engine inputs in the function.

Signed-off-by: Han Zhou 
---
 controller/ovn-controller.c | 61 +
 1 file changed, 14 insertions(+), 47 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 65642f4b7..e73523ce2 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2191,8 +2191,6 @@ struct ed_type_lflow_output {
 
 static void
 init_lflow_ctx(struct engine_node *node,
-   struct ed_type_runtime_data *rt_data,
-   struct ed_type_non_vif_data *non_vif_data,
struct ed_type_lflow_output *fo,
struct lflow_ctx_in *l_ctx_in,
struct lflow_ctx_out *l_ctx_out)
@@ -2274,6 +2272,12 @@ init_lflow_ctx(struct engine_node *node,
 
 ovs_assert(chassis);
 
+struct ed_type_runtime_data *rt_data =
+engine_get_input_data("runtime_data", node);
+
+struct ed_type_non_vif_data *non_vif_data =
+engine_get_input_data("non_vif_data", node);
+
 struct ed_type_addr_sets *as_data =
 engine_get_input_data("addr_sets", node);
 struct shash *addr_sets = _data->addr_sets;
@@ -2360,11 +2364,6 @@ en_lflow_output_cleanup(void *data)
 static void
 en_lflow_output_run(struct engine_node *node, void *data)
 {
-struct ed_type_runtime_data *rt_data =
-engine_get_input_data("runtime_data", node);
-struct ed_type_non_vif_data *non_vif_data =
-engine_get_input_data("non_vif_data", node);
-
 struct ovsrec_open_vswitch_table *ovs_table =
 (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
 engine_get_input("OVS_open_vswitch", node));
@@ -2409,7 +2408,7 @@ en_lflow_output_run(struct engine_node *node, void *data)
 
 struct lflow_ctx_in l_ctx_in;
 struct lflow_ctx_out l_ctx_out;
-init_lflow_ctx(node, rt_data, non_vif_data, fo, _ctx_in, _ctx_out);
+init_lflow_ctx(node, fo, _ctx_in, _ctx_out);
 lflow_run(_ctx_in, _ctx_out);
 
 engine_set_node_state(node, EN_UPDATED);
@@ -2418,15 +2417,10 @@ en_lflow_output_run(struct engine_node *node, void 
*data)
 static bool
 lflow_output_sb_logical_flow_handler(struct engine_node *node, void *data)
 {
-struct ed_type_runtime_data *rt_data =
-engine_get_input_data("runtime_data", node);
-struct ed_type_non_vif_data *non_vif_data =
-engine_get_input_data("non_vif_data", node);
-
 struct ed_type_lflow_output *fo = data;
 struct lflow_ctx_in l_ctx_in;
 struct lflow_ctx_out l_ctx_out;
-init_lflow_ctx(node, rt_data, non_vif_data, fo, _ctx_in, _ctx_out);
+init_lflow_ctx(node, fo, _ctx_in, _ctx_out);
 
 bool handled = lflow_handle_changed_flows(_ctx_in, _ctx_out);
 
@@ -2462,16 +2456,11 @@ lflow_output_sb_mac_binding_handler(struct engine_node 
*node, void *data)
 static bool
 lflow_output_sb_multicast_group_handler(struct engine_node *node, void *data)
 {
-struct ed_type_runtime_data *rt_data =
-engine_get_input_data("runtime_data", node);
-struct ed_type_non_vif_data *non_vif_data =
-engine_get_input_data("non_vif_data", node);
-
 struct ed_type_lflow_output *lfo = data;
 
 struct lflow_ctx_in l_ctx_in;
 struct lflow_ctx_out l_ctx_out;
-init_lflow_ctx(node, rt_data, non_vif_data, lfo, _ctx_in, _ctx_out);
+init_lflow_ctx(node, lfo, _ctx_in, _ctx_out);
 if (!lflow_handle_changed_mc_groups(_ctx_in, _ctx_out)) {
 return false;
 }
@@ -2483,16 +2472,11 @@ lflow_output_sb_multicast_group_handler(struct 
engine_node *node, void *data)
 static bool
 lflow_output_sb_port_binding_handler(struct engine_node *node, void *data)
 {
-struct ed_type_runtime_data *rt_data =
-engine_get_input_data("runtime_data", node);
-struct ed_type_non_vif_data *non_vif_data =
-engine_get_input_data("non_vif_data", node);
-
 struct ed_type_lflow_output *lfo = data;
 
 struct lflow_ctx_in l_ctx_in;
 struct lflow_ctx_out l_ctx_out;
-init_lflow_ctx(node, rt_data, non_vif_data, lfo, _ctx_in, _ctx_out);
+init_lflow_ctx(node, lfo, _ctx_in, _ctx_out);
 if (!lflow_handle_changed_port_bindings(_ctx_in, _ctx_out)) {
 return false;
 }
@@ -2505,11 +2489,6 @@ static bool
 _lflow_output_resource_ref_handler(struct engine_node *node, void *data,
   enum ref_type ref_type)
 {
-struct ed_type_runtime_data *rt_data =
-engine_get_input_data("runtime_data", node);
-struct ed_type_non_vif_data *non_vif_data =
-engine_get_input_data("non_vif_data", node);
-
 struct ed_type_addr_sets *as_data =
 engine_get_input_data("addr_sets", node);
 
@@ -2520,7 +2499,7 @@ _lflow_output_resource_ref_handler(struct engine_node 
*node, void *data,
 
 struct lflow_ctx_in l_ctx_in;
 struct lflow_ctx_out l_ctx_out;
-

[ovs-dev] [PATCH ovn v2 03/11] ovn-controller: Track individual IP information of address set during lflow parsing.

2022-02-09 Thread Han Zhou
This patch tracks individual address information when parsing address
sets from logical flows, and link to the corresponding desired flow
resulted from the IP.

Signed-off-by: Han Zhou 
---
 controller/lflow.c|  19 --
 controller/ofctrl.c   | 130 ++
 controller/ofctrl.h   |  19 --
 controller/physical.c |   2 +-
 include/ovn/expr.h|   9 +++
 lib/expr.c|  81 --
 6 files changed, 224 insertions(+), 36 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 8b32c7469..79652735c 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -692,13 +692,23 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 }
 }
 }
+
+struct addrset_info as_info = {
+.name = m->as_name,
+.ip = m->as_ip,
+.mask = m->as_mask
+};
 if (!m->n) {
 ofctrl_add_flow_metered(l_ctx_out->flow_table, ptable,
 lflow->priority,
 lflow->header_.uuid.parts[0], >match,
 , >header_.uuid,
-ctrl_meter_id);
+ctrl_meter_id,
+as_info.name ? _info : NULL);
 } else {
+if (m->n > 1) {
+ovs_assert(!as_info.name);
+}
 uint64_t conj_stubs[64 / 8];
 struct ofpbuf conj;
 
@@ -716,7 +726,8 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 ofctrl_add_or_append_flow(l_ctx_out->flow_table, ptable,
   lflow->priority, 0,
   >match, , >header_.uuid,
-  ctrl_meter_id);
+  ctrl_meter_id,
+  as_info.name ? _info : NULL);
 ofpbuf_uninit();
 }
 }
@@ -1524,7 +1535,7 @@ add_lb_ct_snat_hairpin_dp_flows(struct ovn_controller_lb 
*lb,
 ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, 200,
   lb->slb->header_.uuid.parts[0],
   _match, _acts, >slb->header_.uuid,
-  NX_CTLR_NO_METER);
+  NX_CTLR_NO_METER, NULL);
 }
 
 ofpbuf_uninit(_acts);
@@ -1698,7 +1709,7 @@ add_lb_ct_snat_hairpin_vip_flow(struct ovn_controller_lb 
*lb,
 ofctrl_add_or_append_flow(flow_table, OFTABLE_CT_SNAT_HAIRPIN, priority,
   lb->slb->header_.uuid.parts[0],
   , , >slb->header_.uuid,
-  NX_CTLR_NO_METER);
+  NX_CTLR_NO_METER, NULL);
 ofpbuf_uninit();
 
 }
diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index bcd6cea79..7671a3b7a 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -165,15 +165,35 @@ struct sb_to_flow {
 struct uuid sb_uuid;
 struct ovs_list flows; /* A list of struct sb_flow_ref nodes that
   are referenced by the sb_uuid. */
+struct ovs_list addrsets; /* A list of struct sb_addrset_ref. */
 };
 
 struct sb_flow_ref {
 struct ovs_list sb_list; /* List node in desired_flow.references. */
-struct ovs_list flow_list; /* List node in sb_to_flow.desired_flows. */
+struct ovs_list flow_list; /* List node in sb_to_flow.flows. */
+struct ovs_list as_ip_flow_list; /* List node in ip_to_flow_node.flows. */
 struct desired_flow *flow;
 struct uuid sb_uuid;
 };
 
+struct sb_addrset_ref {
+struct ovs_list list_node; /* List node in sb_to_flow.addrsets. */
+char *name; /* Name of the address set. */
+struct hmap ip_to_flow_map; /* map from IPs in the address set to flows.
+   Each node is struct ip_to_flow_node. */
+};
+
+struct ip_to_flow_node {
+struct hmap_node hmap_node; /* Node in sb_addrset_ref.ip_to_flow_map. */
+struct in6_addr as_ip;
+struct in6_addr as_mask;
+
+/* A list of struct sb_flow_ref. A single IP in an address set can be
+ * used by multiple flows.  e.g., in match:
+ * ip.src == $as1 && ip.dst == $as1. */
+struct ovs_list flows;
+};
+
 /* An installed flow, in static variable installed_lflows/installed_pflows.
  *
  * Installed flows are updated in ofctrl_put for maintaining the flow
@@ -1030,9 +1050,26 @@ sb_to_flow_find(struct hmap *uuid_flow_table, const 
struct uuid *sb_uuid)
 return NULL;
 }
 
+static struct ip_to_flow_node *
+ip_to_flow_find(struct hmap *ip_to_flow_map, const struct in6_addr *as_ip,
+const struct in6_addr *as_mask)
+{
+uint32_t hash = hash_bytes(as_ip, sizeof *as_ip, 0);
+
+struct ip_to_flow_node *itfn;
+HMAP_FOR_EACH_WITH_HASH (itfn, hmap_node, hash, 

[ovs-dev] [PATCH ovn v2 04/11] ovn-controller.c: Remove unnecessary asserts and useless variables.

2022-02-09 Thread Han Zhou
Signed-off-by: Han Zhou 
---
 controller/ovn-controller.c | 28 
 1 file changed, 28 deletions(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 8631bccbc..65642f4b7 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2422,14 +2422,6 @@ lflow_output_sb_logical_flow_handler(struct engine_node 
*node, void *data)
 engine_get_input_data("runtime_data", node);
 struct ed_type_non_vif_data *non_vif_data =
 engine_get_input_data("non_vif_data", node);
-struct ovsrec_open_vswitch_table *ovs_table =
-(struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
-engine_get_input("OVS_open_vswitch", node));
-struct ovsrec_bridge_table *bridge_table =
-(struct ovsrec_bridge_table *)EN_OVSDB_GET(
-engine_get_input("OVS_bridge", node));
-const struct ovsrec_bridge *br_int = get_br_int(bridge_table, ovs_table);
-ovs_assert(br_int);
 
 struct ed_type_lflow_output *fo = data;
 struct lflow_ctx_in l_ctx_in;
@@ -2524,26 +2516,6 @@ _lflow_output_resource_ref_handler(struct engine_node 
*node, void *data,
 struct ed_type_port_groups *pg_data =
 engine_get_input_data("port_groups", node);
 
-struct ovsrec_open_vswitch_table *ovs_table =
-(struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
-engine_get_input("OVS_open_vswitch", node));
-struct ovsrec_bridge_table *bridge_table =
-(struct ovsrec_bridge_table *)EN_OVSDB_GET(
-engine_get_input("OVS_bridge", node));
-const struct ovsrec_bridge *br_int = get_br_int(bridge_table, ovs_table);
-const char *chassis_id = get_ovs_chassis_id(ovs_table);
-
-struct ovsdb_idl_index *sbrec_chassis_by_name =
-engine_ovsdb_node_get_index(
-engine_get_input("SB_chassis", node),
-"name");
-const struct sbrec_chassis *chassis = NULL;
-if (chassis_id) {
-chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id);
-}
-
-ovs_assert(br_int && chassis);
-
 struct ed_type_lflow_output *fo = data;
 
 struct lflow_ctx_in l_ctx_in;
-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v2 02/11] ofctrl.c: Combine remove_flows_from_sb_to_flow and ofctrl_flood_remove_flows.

2022-02-09 Thread Han Zhou
There are redundant logic between these functions. Refactor and combine
them.

Signed-off-by: Han Zhou 
---
 controller/ofctrl.c | 128 
 1 file changed, 59 insertions(+), 69 deletions(-)

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 08fcfed8b..bcd6cea79 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -257,6 +257,11 @@ static uint32_t ovn_flow_match_hash(const struct ovn_flow 
*);
 static char *ovn_flow_to_string(const struct ovn_flow *);
 static void ovn_flow_log(const struct ovn_flow *, const char *action);
 
+static void remove_flows_from_sb_to_flow(struct ovn_desired_flow_table *,
+ struct sb_to_flow *,
+ const char *log_msg,
+ struct hmap *flood_remove_nodes);
+
 /* OpenFlow connection to the switch. */
 static struct rconn *swconn;
 
@@ -1165,36 +1170,6 @@ ofctrl_add_or_append_flow(struct ovn_desired_flow_table 
*desired_flows,
 }
 }
 
-/* Remove ovn_flows for the given "sb_to_flow" node in the uuid_flow_table.
- * Optionally log the message for each flow that is acturally removed, if
- * log_msg is not NULL. */
-static void
-remove_flows_from_sb_to_flow(struct ovn_desired_flow_table *flow_table,
- struct sb_to_flow *stf,
- const char *log_msg)
-{
-struct sb_flow_ref *sfr, *next;
-LIST_FOR_EACH_SAFE (sfr, next, flow_list, >flows) {
-ovs_list_remove(>sb_list);
-ovs_list_remove(>flow_list);
-struct desired_flow *f = sfr->flow;
-mem_stats.sb_flow_ref_usage -= sb_flow_ref_size(sfr);
-free(sfr);
-
-if (ovs_list_is_empty(>references)) {
-if (log_msg) {
-ovn_flow_log(>flow, log_msg);
-}
-hmap_remove(_table->match_flow_table,
->match_hmap_node);
-track_or_destroy_for_flow_del(flow_table, f);
-}
-}
-hmap_remove(_table->uuid_flow_table, >hmap_node);
-mem_stats.sb_flow_ref_usage -= sb_to_flow_size(stf);
-free(stf);
-}
-
 void
 ofctrl_remove_flows(struct ovn_desired_flow_table *flow_table,
 const struct uuid *sb_uuid)
@@ -1202,7 +1177,8 @@ ofctrl_remove_flows(struct ovn_desired_flow_table 
*flow_table,
 struct sb_to_flow *stf = sb_to_flow_find(_table->uuid_flow_table,
  sb_uuid);
 if (stf) {
-remove_flows_from_sb_to_flow(flow_table, stf, "ofctrl_remove_flow");
+remove_flows_from_sb_to_flow(flow_table, stf, "ofctrl_remove_flow",
+ NULL);
 }
 
 /* remove any related group and meter info */
@@ -1243,32 +1219,73 @@ flood_remove_flows_for_sb_uuid(struct 
ovn_desired_flow_table *flow_table,
 return;
 }
 
+remove_flows_from_sb_to_flow(flow_table, stf, "flood remove",
+ flood_remove_nodes);
+}
+
+void
+ofctrl_flood_remove_flows(struct ovn_desired_flow_table *flow_table,
+  struct hmap *flood_remove_nodes)
+{
+struct ofctrl_flood_remove_node *ofrn;
+int i, n = 0;
+
+/* flood_remove_flows_for_sb_uuid() will modify the 'flood_remove_nodes'
+ * hash map by inserting new items, so we can't use it for iteration.
+ * Copying the sb_uuids into an array. */
+struct uuid *sb_uuids;
+sb_uuids = xmalloc(hmap_count(flood_remove_nodes) * sizeof *sb_uuids);
+HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
+sb_uuids[n++] = ofrn->sb_uuid;
+}
+
+for (i = 0; i < n; i++) {
+flood_remove_flows_for_sb_uuid(flow_table, _uuids[i],
+   flood_remove_nodes);
+}
+free(sb_uuids);
+
+/* remove any related group and meter info */
+HMAP_FOR_EACH (ofrn, hmap_node, flood_remove_nodes) {
+ovn_extend_table_remove_desired(groups, >sb_uuid);
+ovn_extend_table_remove_desired(meters, >sb_uuid);
+}
+}
+
+/* Remove ovn_flows for the given "sb_to_flow" node in the uuid_flow_table.
+ * Optionally log the message for each flow that is acturally removed, if
+ * log_msg is not NULL. */
+static void
+remove_flows_from_sb_to_flow(struct ovn_desired_flow_table *flow_table,
+ struct sb_to_flow *stf,
+ const char *log_msg,
+ struct hmap *flood_remove_nodes)
+{
 /* ovn_flows that have other references and waiting to be removed. */
 struct ovs_list to_be_removed = OVS_LIST_INITIALIZER(_be_removed);
 
 /* Traverse all flows for the given sb_uuid. */
 struct sb_flow_ref *sfr, *next;
 LIST_FOR_EACH_SAFE (sfr, next, flow_list, >flows) {
-struct desired_flow *f = sfr->flow;
-ovn_flow_log(>flow, "flood remove");
-
 ovs_list_remove(>sb_list);
 ovs_list_remove(>flow_list);
+struct desired_flow 

[ovs-dev] [PATCH ovn v2 01/11] expr.c: Use expr_destroy and expr_clone instead of free and xmemdup.

2022-02-09 Thread Han Zhou
Use the functions for safer memory operation.

Signed-off-by: Han Zhou 
---
 lib/expr.c | 44 ++--
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/lib/expr.c b/lib/expr.c
index e3f6bb892..5fc6c1ce9 100644
--- a/lib/expr.c
+++ b/lib/expr.c
@@ -186,7 +186,7 @@ expr_combine(enum expr_type type, struct expr *a, struct 
expr *b)
 } else if (a->type == type) {
 if (b->type == type) {
 ovs_list_splice(>andor, b->andor.next, >andor);
-free(b);
+expr_destroy(b);
 } else {
 ovs_list_push_back(>andor, >node);
 }
@@ -210,7 +210,7 @@ expr_insert_andor(struct expr *andor, struct expr *before, 
struct expr *new)
 /* Conjunction junction, what's your function? */
 }
 ovs_list_splice(>node, new->andor.next, >andor);
-free(new);
+expr_destroy(new);
 } else {
 ovs_list_insert(>node, >node);
 }
@@ -276,11 +276,11 @@ expr_fix_andor(struct expr *expr, bool short_circuit)
 
 if (ovs_list_is_short(>andor)) {
 if (ovs_list_is_empty(>andor)) {
-free(expr);
+expr_destroy(expr);
 return expr_create_boolean(!short_circuit);
 } else {
-sub = expr_from_node(ovs_list_front(>andor));
-free(expr);
+sub = expr_from_node(ovs_list_pop_front(>andor));
+expr_destroy(expr);
 return sub;
 }
 } else {
@@ -2096,7 +2096,7 @@ expr_simplify_relational(struct expr *expr)
  * and similarly for "tcp.dst <= 1234". */
 struct expr *new = NULL;
 if (eq) {
-new = xmemdup(expr, sizeof *expr);
+new = expr_clone(expr);
 new->cmp.relop = EXPR_R_EQ;
 }
 
@@ -2105,7 +2105,7 @@ expr_simplify_relational(struct expr *expr)
  z = bitwise_scan(value, sizeof *value, lt, z + 1, end)) {
 struct expr *e;
 
-e = xmemdup(expr, sizeof *expr);
+e = expr_clone(expr);
 e->cmp.relop = EXPR_R_EQ;
 bitwise_toggle_bit(>cmp.value, sizeof e->cmp.value, z);
 bitwise_zero(>cmp.value, sizeof e->cmp.value, start, z - start);
@@ -2324,7 +2324,7 @@ crush_and_string(struct expr *expr, const struct 
expr_symbol *symbol)
 expr_destroy(expr);
 return new;
 }
-free(new);
+expr_destroy(new);
 break;
 case EXPR_T_CONDITION:
 OVS_NOT_REACHED();
@@ -2463,14 +2463,14 @@ crush_and_numeric(struct expr *expr, const struct 
expr_symbol *symbol)
 expr_destroy(sub);
 }
 }
-free(disjuncts);
-free(expr);
+expr_destroy(disjuncts);
+expr_destroy(expr);
 if (ovs_list_is_empty(>andor)) {
-free(or);
+expr_destroy(or);
 return expr_create_boolean(false);
 } else if (ovs_list_is_short(>andor)) {
 struct expr *cmp = expr_from_node(ovs_list_pop_front(>andor));
-free(or);
+expr_destroy(or);
 return cmp;
 } else {
 return crush_cmps(or, symbol);
@@ -2517,15 +2517,15 @@ crush_and_numeric(struct expr *expr, const struct 
expr_symbol *symbol)
 }
 expr_destroy(as);
 expr_destroy(bs);
-free(new);
+expr_destroy(new);
 
 if (ovs_list_is_empty(>andor)) {
 expr_destroy(expr);
-free(or);
+expr_destroy(or);
 return expr_create_boolean(false);
 } else if (ovs_list_is_short(>andor)) {
 struct expr *cmp = expr_from_node(ovs_list_pop_front(>andor));
-free(or);
+expr_destroy(or);
 if (ovs_list_is_empty(>andor)) {
 expr_destroy(expr);
 return crush_cmps(cmp, symbol);
@@ -2675,7 +2675,7 @@ expr_sort(struct expr *expr)
 qsort(subs, n, sizeof *subs, compare_expr_sort);
 
 ovs_list_init(>andor);
-free(expr);
+expr_destroy(expr);
 expr = NULL;
 
 for (i = 0; i < n; ) {
@@ -2708,7 +2708,7 @@ expr_sort(struct expr *expr)
 expr = crushed;
 break;
 } else {
-free(crushed);
+expr_destroy(crushed);
 }
 } else {
 expr = expr_combine(EXPR_T_AND, expr, crushed);
@@ -2754,8 +2754,8 @@ expr_normalize_and(struct expr *expr)
 }
 }
 if (ovs_list_is_short(>andor)) {
-struct expr *sub = expr_from_node(ovs_list_front(>andor));
-free(expr);
+struct expr *sub = expr_from_node(ovs_list_pop_front(>andor));
+expr_destroy(expr);
 return sub;
 }
 
@@ -2813,7 +2813,7 @@ expr_normalize_or(struct expr *expr)
 expr_destroy(expr);
 return new;
 }
-free(new);
+expr_destroy(new);
 

[ovs-dev] [PATCH ovn v2 00/11] ovn-controller: Fine-grained address set incremental processing.

2022-02-09 Thread Han Zhou
Today although the incremental processing engine of ovn-controller handles
address set changes incrementally, it is at the logical flow level instead of
individual addresses level. A single address change in an address set would
cause all the related logical flows being reprocessed.  The cost of
reprocessing a lflow referencing a big address set can be very high. When the
change rate of anaddress sets is high, ovn-controller would be busy reprocessing
logical flows.

This patch series optimizes this typical scenario for large scale environment
by incrementally processing each individual address updates. When the change is
small (e.g. adding/deleting a single address in an address set), this results
in constant processing time, regardless of the size of the address set.

There are limitations that these approaches can't apply. For example, when an
ACL is in the below forms:

ip.src == $as1 || ip.dst == $as2
ip.src == {$as1, $as2}
ip.src == {$as1, ip1}

In these cases during lflow parsing the expressions are combined to a single
OR, which loses the tracking information for the address sets' IPs and flows
generated. There are other cases that can't be handled that are documented for
the function lflow_handle_addr_set_update, and also added in test cases.  In
these cases it just fall back to the old approach that reprocesses the
lflow. So, this change doesn't add any new constraint to the users, but just
leave some cases as unoptimized as it was before.

Scale test shows obvious performance gains because the time complexity
changed from O(n) to O(1). The bigger the size of address set, the more
CPU savings. With the AS size of 10k, the test shows ~40x speed up.

Test setup:
CPU: Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz.
5 ACL all referencing an address set of 10,000 IPs.

Measure the time spent by ovn-controller for handling one IP deletion
from the address set:

Before: ~400ms
After: 11-12ms

There is memory cost increase, due to the index built to track each individual
addresses. The total memory cost for the OF flows in ovn-controller increased
~20% in the 10k AS size test.

Before:
ofctrl_desired_flow_usage-KB:22248
ofctrl_installed_flow_usage-KB:14850
ofctrl_sb_flow_ref_usage-KB:7208

After:
ofctrl_desired_flow_usage-KB:22248
ofctrl_installed_flow_usage-KB:14850
ofctrl_sb_flow_ref_usage-KB:15551

---
v1 -> v2:
- Fixed a build error of patch 4, which was caused by misplacing a change to
  patch 4 which should have been in patch 5. Updated patch 5 commit message as
  well.

Changes after RFC:
- Added a new patch for maintaining ref_count in logical flow resource
  reference for resource type: address-set, which is needed for the correctness
  of both IP addition and deletion I-P in some corner cases.
- Fixed the corner case when the same address set is used multiple times in the
  same lflow with one of the references untrackable. Added test case to cover
  as well.
- Added more documentation, such as the limitations and corner cases.
- Added tests for ipv6 and mac address support.
- Other minor improvements.

Han Zhou (11):
  expr.c: Use expr_destroy and expr_clone instead of free and xmemdup.
  ofctrl.c: Combine remove_flows_from_sb_to_flow and
ofctrl_flood_remove_flows.
  ovn-controller: Track individual IP information of address set during
lflow parsing.
  ovn-controller.c: Remove unnecessary asserts and useless variables.
  ovn-controller.c: Refactor init_lflow_ctx.
  ovn-controller: Tracking SB address set updates.
  lflow.c: Set "changed" properly in lflow_handle_changed_ref().
  ovn-controller: Add tests for different ACL address set usage
patterns.
  lflow: Track reference count of address sets when parsing lflows.
  ovn-controller: Handle addresses deletion in address set
incrementally.
  ovn-controller: Handle addresses addition in address set
incrementally.

 controller/lflow-conj-ids.c |   20 +
 controller/lflow-conj-ids.h |1 +
 controller/lflow.c  |  532 +++-
 controller/lflow.h  |   13 +
 controller/ofctrl.c |  329 +++---
 controller/ofctrl.h |   24 +-
 controller/ovn-controller.c |  278 
 controller/physical.c   |2 +-
 include/ovn/expr.h  |   24 +-
 lib/expr.c  |  274 ++--
 tests/ovn-controller.at | 1186 +++
 11 files changed, 2378 insertions(+), 305 deletions(-)

-- 
2.30.2

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/13] Fix undefined behavior in loop macros

2022-02-09 Thread Adrian Moreno



On 2/9/22 16:55, Ilya Maximets wrote:

c. The changes in this OVS patchset seem to allow easier backport but
     that won't really be the case for OVN.

I wonder if it makes sense to keep both versions:
- the old *_SAFE(.., next, ..)
- a new *_SAFE_V2 (or a better name) that drops the need for explicitly
    supplied "next".

What do you think?


How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D
I'm just thinking that it would be a way to promote the use of the new and 
cleaner iterator in new code.

On the other hand, moving most of the users to the new macro should be a good example 
while we keep backwards compatibility. But even if we do keep the old name of the old 
version of the macro, it should have a new implementation that makes sure the 
"next" variable is not left pointing to something invalid, so I guess the 
backwards compatibility is broken anyway. What do you think?



Maybe we can overload the macro instead?
With something like this:
   
https://stackoverflow.com/questions/9183993/msvc-variadic-macro-expansion/24028231#24028231

Best regards, Ilya Maximets.



That's a nice solution, yes.

Thanks.
--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/13] Fix undefined behavior in loop macros

2022-02-09 Thread Ilya Maximets
>> c. The changes in this OVS patchset seem to allow easier backport but
>>     that won't really be the case for OVN.
>>
>> I wonder if it makes sense to keep both versions:
>> - the old *_SAFE(.., next, ..)
>> - a new *_SAFE_V2 (or a better name) that drops the need for explicitly
>>    supplied "next".
>>
>> What do you think?
> 
> How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D
> I'm just thinking that it would be a way to promote the use of the new and 
> cleaner iterator in new code.
> 
> On the other hand, moving most of the users to the new macro should be a good 
> example while we keep backwards compatibility. But even if we do keep the old 
> name of the old version of the macro, it should have a new implementation 
> that makes sure the "next" variable is not left pointing to something 
> invalid, so I guess the backwards compatibility is broken anyway. What do you 
> think?


Maybe we can overload the macro instead?
With something like this:
  
https://stackoverflow.com/questions/9183993/msvc-variadic-macro-expansion/24028231#24028231

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS with Bifurcation Example

2022-02-09 Thread Lazuardi Nasution
HI Tonghao,

Do you know how to enable isolate mode on OVS-DPDK? By searching
"rte_flow_isolate" on OVS source code, I think OVS-DPDK doesn't support it
right now.

Best regards.

On Wed, Feb 9, 2022 at 8:56 PM Tonghao Zhang 
wrote:

> On Tue, Feb 8, 2022 at 4:34 PM Lazuardi Nasution 
> wrote:
> >
> > Hi Tonghao,
> >
> > How can I know if my driver support it. i have ConnectX-5 EN, ConnectX-6
> VPI and Connect-6 DX.
> I tested this patch on Connect-6 DX. but now there is no hw NIC for me
> to test again.
> if you enable the isolate mode on hw nic, then you can install the
> flows, for example, to match the tunnel packet via dpctl command with
> --consistent option.
>
> http://patchwork.ozlabs.org/project/openvswitch/patch/20201214022001.84273-3-xiangxia.m@gmail.com/
>
> > Best regards.
> >
> > On Tue, Feb 8, 2022, 3:27 PM Tonghao Zhang 
> wrote:
> >>
> >> On Tue, Feb 8, 2022 at 4:16 PM Lazuardi Nasution <
> mrxlazuar...@gmail.com> wrote:
> >> >
> >> > Hi Tonghao,
> >> >
> >> > Do you have any example or tutorial regarding this?
> >> No, but I plan to write a doc about this. The patch 1/4 only enable
> >> isolate_offload_mode of dpdk netdev.
> >> If your driver support isolate mode of dpdk, it can work in ovs.
> >> > Best regards.
> >> >
> >> >
> >> > On Tue, Feb 8, 2022, 2:51 PM Tonghao Zhang 
> wrote:
> >> >>
> >> >> On Sat, Feb 5, 2022 at 8:59 PM Lazuardi Nasution <
> mrxlazuar...@gmail.com> wrote:
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > Is there any example about how to use OVS with Bifurcation driver?
> I have
> >> >> > read that with Intel NIC I should create specific VF for that, but
> with
> >> >> > Mellanox NIC it is not required to have VF.
> >> >> Hi
> >> >> there is a patchset, I hope it can help you.
> >> >> http://patchwork.ozlabs.org/project/openvswitch/list/?series=220295
> >> >> > Best regards.
> >> >>
> >> >> Best regards, Tonghao
> >>
> >>
> >>
> >> --
> >> Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/13] Fix undefined behavior in loop macros

2022-02-09 Thread Adrian Moreno



On 2/9/22 16:23, Dumitru Ceara wrote:

On 1/24/22 11:34, Adrian Moreno wrote:

When running builds with UBSan, some undefined behavior was detected in the 
iteration of common data data structures in OVS.
Coincidentally, a bug was reported [1] whose root cause whas another, this time 
undetected, undefined behavior in the iteration macros.

 From both cases, we conclude that the way we're currently iterating the data 
structures is prone to errors and UB. This series is an attempt to rewrite 
those macros in a UB-safe manner.


Hi Adrian,

Thanks for this!  The patchset needs a small rebase, nothing major
though.  That's also why 0-day robot failed to build.



The core problem is that #define OBJECT_CONTAINING(POINTER, OBJECT, MEMBER) macro is 
being used on invalid POINTER values. In some cases we use NULL to compute the 
end-of-loop condition. In others, we allow it to point to non-contained objects (e.g: a 
non-contained stack allocated "struct ovs_list" as in [1]).

In order to systematically solve this in all cases this series introduces a new set of 
macros that implement a multi-variable loop iteration. They declare a hidden iterator 
variable inside the loop, use to iterate and evaluate the loop condition and only compute 
its OBJECT_CONTAINING if it satisfies the loop condition. One consequence of this safety 
guard is that the pointer provided by the user is set to NULL after the loop (if not 
exited via "break;").



It seems that sparse is not too happy about these changes.

In the documentation we recommend sparse version 0.5.1 or later.
However, sparse complains about various things when run with any version
older than 0.6.2 (more specifically after [0]).  I think we should just
recommend a newer version of sparse and that's it.

[0] 
https://git.kernel.org/pub/scm/devel/sparse/sparse.git/commit/?id=ffb24e18c9b83e5878ee9ca4513deb5de235e15c

However, that's not the only issue with sparse.  It seems on specific
distributions (e.g., on my Fedora 34 test machine) sparse fails to use
the right headers.  I made it work with this change, although I'm not
sure this is the best way of doing things:

diff --git a/acinclude.m4 b/acinclude.m4
index 0c360fd1ef73..f704bf36cdfe 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE],
 : ${SPARSE=sparse}
 AC_SUBST([SPARSE])
 AC_CONFIG_COMMANDS_PRE(
- [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I 
$(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')'])
+ [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) -I 
$(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc 
$(CGCCFLAGS),'"$CC"')'])
  
 AC_ARG_ENABLE(

   [sparse],
---



Thanks for bisecting sparse and pinpoint the problematic patch.
I will add your patch and the recommendation to use sparse > 0.6.2 in the next 
version of the series.




Apart from normal iteration, many OVS data structures have a _SAFE version of 
the loop which require the user to declare an extra variable to hold the next 
value of the iterator. The _SAFE version of the multi-variable iterators have 
the extra benefit of not requiring such extra variable.
On relevant data structures, an initial patch rewrites the macros in a 
backwards compatible manner and a second patch modifies all the callers to 
remove the unneeded variable. The fist patch would be easy to backport and the 
second would make code cleaner for the master branch.


Although this sounds nice, I'm afraid it creates some issues:
a. OVN relies in some places on the fact that it can explicitly access
the "next" of the iterator, e.g.:
https://github.com/ovn-org/ovn/blob/main/lib/expr.c#L1958

We can fix it but it's not too nice.

b. There's currently some inconsistency in the way safe container
iterators can be used:
- SSET_FOR_EACH_SAFE() still expects the user to pass a "next"
  variable.
- the IDL safe iterator helpers in ovsdb-idlc still require the user
  to provide a "next" variable:
  https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-idlc.in#L254



Good catch! I'll take care of those two iterators as well and make it 
homogeneous in the next version.



c. The changes in this OVS patchset seem to allow easier backport but
that won't really be the case for OVN.

I wonder if it makes sense to keep both versions:
- the old *_SAFE(.., next, ..)
- a new *_SAFE_V2 (or a better name) that drops the need for explicitly
   supplied "next".

What do you think?


How about *_SAFE_OLD(...) for the old one and *_SAFE(...) for the new one? :-D
I'm just thinking that it would be a way to promote the use of the new and 
cleaner iterator in new code.


On the other hand, moving most of the users to the new macro should be a good 
example while we keep backwards compatibility. But even if we do keep the old 
name of 

Re: [ovs-dev] [PATCH 00/13] Fix undefined behavior in loop macros

2022-02-09 Thread Dumitru Ceara
On 2/9/22 16:23, Dumitru Ceara wrote:
> However, that's not the only issue with sparse.  It seems on specific
> distributions (e.g., on my Fedora 34 test machine) sparse fails to use
> the right headers.  I made it work with this change, although I'm not
> sure this is the best way of doing things:
> 
> diff --git a/acinclude.m4 b/acinclude.m4
> index 0c360fd1ef73..f704bf36cdfe 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE],
> : ${SPARSE=sparse}
> AC_SUBST([SPARSE])
> AC_CONFIG_COMMANDS_PRE(
> - [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) 
> $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse $(SPARSEFLAGS) 
> $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')'])
> + [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) 
> $(SPARSE_WERROR) -I $(top_srcdir)/include/sparse -I $(top_srcdir)/include 
> $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')'])
>  
> AC_ARG_ENABLE(
>   [sparse],
> ---

Turns out this was kind of my fault because I had OVS headers installed
on the test system .  Those headers obviously had the old macro
definitions.  So we need to make sure we use the ones from the tree instead.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 00/13] Fix undefined behavior in loop macros

2022-02-09 Thread Dumitru Ceara
On 1/24/22 11:34, Adrian Moreno wrote:
> When running builds with UBSan, some undefined behavior was detected in the 
> iteration of common data data structures in OVS.
> Coincidentally, a bug was reported [1] whose root cause whas another, this 
> time undetected, undefined behavior in the iteration macros.
> 
> From both cases, we conclude that the way we're currently iterating the data 
> structures is prone to errors and UB. This series is an attempt to rewrite 
> those macros in a UB-safe manner.

Hi Adrian,

Thanks for this!  The patchset needs a small rebase, nothing major
though.  That's also why 0-day robot failed to build.

> 
> The core problem is that #define OBJECT_CONTAINING(POINTER, OBJECT, MEMBER) 
> macro is being used on invalid POINTER values. In some cases we use NULL to 
> compute the end-of-loop condition. In others, we allow it to point to 
> non-contained objects (e.g: a non-contained stack allocated "struct ovs_list" 
> as in [1]).
> 
> In order to systematically solve this in all cases this series introduces a 
> new set of macros that implement a multi-variable loop iteration. They 
> declare a hidden iterator variable inside the loop, use to iterate and 
> evaluate the loop condition and only compute its OBJECT_CONTAINING if it 
> satisfies the loop condition. One consequence of this safety guard is that 
> the pointer provided by the user is set to NULL after the loop (if not exited 
> via "break;").
> 

It seems that sparse is not too happy about these changes.

In the documentation we recommend sparse version 0.5.1 or later.
However, sparse complains about various things when run with any version
older than 0.6.2 (more specifically after [0]).  I think we should just
recommend a newer version of sparse and that's it.

[0] 
https://git.kernel.org/pub/scm/devel/sparse/sparse.git/commit/?id=ffb24e18c9b83e5878ee9ca4513deb5de235e15c

However, that's not the only issue with sparse.  It seems on specific
distributions (e.g., on my Fedora 34 test machine) sparse fails to use
the right headers.  I made it work with this change, although I'm not
sure this is the best way of doing things:

diff --git a/acinclude.m4 b/acinclude.m4
index 0c360fd1ef73..f704bf36cdfe 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -1424,7 +1424,7 @@ AC_DEFUN([OVS_ENABLE_SPARSE],
: ${SPARSE=sparse}
AC_SUBST([SPARSE])
AC_CONFIG_COMMANDS_PRE(
- [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) 
-I $(top_srcdir)/include/sparse $(SPARSEFLAGS) $(SPARSE_EXTRA_INCLUDES) " cgcc 
$(CGCCFLAGS),'"$CC"')'])
+ [CC='$(if $(C:0=),env REAL_CC="'"$CC"'" CHECK="$(SPARSE) $(SPARSE_WERROR) 
-I $(top_srcdir)/include/sparse -I $(top_srcdir)/include $(SPARSEFLAGS) 
$(SPARSE_EXTRA_INCLUDES) " cgcc $(CGCCFLAGS),'"$CC"')'])
 
AC_ARG_ENABLE(
  [sparse],
---

> Apart from normal iteration, many OVS data structures have a _SAFE version of 
> the loop which require the user to declare an extra variable to hold the next 
> value of the iterator. The _SAFE version of the multi-variable iterators have 
> the extra benefit of not requiring such extra variable.
> On relevant data structures, an initial patch rewrites the macros in a 
> backwards compatible manner and a second patch modifies all the callers to 
> remove the unneeded variable. The fist patch would be easy to backport and 
> the second would make code cleaner for the master branch.

Although this sounds nice, I'm afraid it creates some issues:
a. OVN relies in some places on the fact that it can explicitly access
   the "next" of the iterator, e.g.:
   https://github.com/ovn-org/ovn/blob/main/lib/expr.c#L1958

   We can fix it but it's not too nice.

b. There's currently some inconsistency in the way safe container
   iterators can be used:
   - SSET_FOR_EACH_SAFE() still expects the user to pass a "next"
 variable.
   - the IDL safe iterator helpers in ovsdb-idlc still require the user
 to provide a "next" variable:
 https://github.com/openvswitch/ovs/blob/master/ovsdb/ovsdb-idlc.in#L254

c. The changes in this OVS patchset seem to allow easier backport but
   that won't really be the case for OVN.

I wonder if it makes sense to keep both versions:
- the old *_SAFE(.., next, ..)
- a new *_SAFE_V2 (or a better name) that drops the need for explicitly
  supplied "next".

What do you think?

> 
> Testing notes:
> In order to verify this series removes all the loop-related UB, I've tested 
> it on top of Dumitru's series [2] (without patch 1/11, which can hide some 
> still invalid use of OBJECT_CONTAINING).
> I've also verified no extra errors are reported through clang-analyzer.
> 
> Limitations:
> The proposed approach benefits code readability, therefore the name of the 
> iterator variable is derived from the name of the object pointer given by the 
> caller.
> This means that in an unlikely but still possible case in which a caller 
> wants to nest two loops with the same iterating pointer object, the inner 
> 

Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread Richardson, Bruce



> -Original Message-
> From: David Marchand 
> Sent: Wednesday, February 9, 2022 3:05 PM
> To: Ilya Maximets 
> Cc: Kevin Traynor ; Pai G, Sunil
> ; Maxime Coquelin ;
> d...@openvswitch.org; Mcnamara, John ; Hu, Jiayu
> ; Richardson, Bruce 
> Subject: Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async
> API's in OvS.
> 
> On Wed, Feb 9, 2022 at 3:01 PM Ilya Maximets  wrote:
> > >> Why this process can't be lockless?
> > >> If we have to lock the device, maybe we can do both submission
> > >> and completion from the thread that polls corresponding Rx queue?
> > >> Tx threads may enqueue mbufs to some lockless ring inside the
> > >> rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit
> > >> jobs to dma device and check completions.  No locks required.
> > >>
> > >
> > > This still means that Rx polling has to be taking place for OVS
> > > Tx to the device to operate.
> > >
> > > Isn't that a new dependency on OVS being pushed from the driver? and
> wouldn't it rule out OVS being able to switch to an interrupt mode, or
> reduce polling in the future if there was no/low packets to Rx.
> > >
> > > Of course, they could be mutually exclusive features that might have
> an opt-in, especially seen as one is performance related and the other is
> about power saving.
> >
> > AFAICT, vhost library doesn't handle interrupts, so OVS will need to
> > implement them, i.e. create private interrupt handle and register
> > all the kickfd descriptors there.  At this point, I think, we might
> > as well create a second private interrupt handle that will listen
> > on fds that Tx thread will kick every time after successful enqueue
> > if dma enqueue is enabled.  This all can happen solely in OVS and we
> > may even have a different wakeup mechanism since we're not bound to
> > use DPDK interrupts, which are just epolls anyway.
> 
> I agree, this is not a blocker for an interrupt mode.
> 
> Just a note, that the vhost library already provides kickfd as part of
> the vring structure.
> A api is available to request notifications from the guest on a specific
> vring.
> (And no experimental API for this!)
> 
> About the DPDK interrupt framework, OVS does not need the epoll stuff
> even for "normal" DPDK devices Rx interrupts.
> eventfds from vfio-pci / other kmods can already be retrieved from
> existing ethdev API without using DPDK interrupt thread/framework.
> 
> 
> >
> > In any case, some extra engineering will be required to support vhost
> > rx interrupts even without dma.
> 
> I have a PoC in my branches.
> I'll send a RFC on this topic, after 22.03-rc1/2.
> 
> 
> >
> > Also, is dma engine capable of generating interrupts?  Does DPDK API
> > support that anyhow?
> 
> Cc: Bruce who may know.
> At least, I see nothing in current dmadev API.
 
You are right, there is nothing in dmadev (yet) for interrupt support. However, 
if necessary, I'm sure it could be added.

/Bruce

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread David Marchand
On Wed, Feb 9, 2022 at 3:01 PM Ilya Maximets  wrote:
> >> Why this process can't be lockless?
> >> If we have to lock the device, maybe we can do both submission
> >> and completion from the thread that polls corresponding Rx queue?
> >> Tx threads may enqueue mbufs to some lockless ring inside the
> >> rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit
> >> jobs to dma device and check completions.  No locks required.
> >>
> >
> > This still means that Rx polling has to be taking place for OVS
> > Tx to the device to operate.
> >
> > Isn't that a new dependency on OVS being pushed from the driver? and 
> > wouldn't it rule out OVS being able to switch to an interrupt mode, or 
> > reduce polling in the future if there was no/low packets to Rx.
> >
> > Of course, they could be mutually exclusive features that might have an 
> > opt-in, especially seen as one is performance related and the other is 
> > about power saving.
>
> AFAICT, vhost library doesn't handle interrupts, so OVS will need to
> implement them, i.e. create private interrupt handle and register
> all the kickfd descriptors there.  At this point, I think, we might
> as well create a second private interrupt handle that will listen
> on fds that Tx thread will kick every time after successful enqueue
> if dma enqueue is enabled.  This all can happen solely in OVS and we
> may even have a different wakeup mechanism since we're not bound to
> use DPDK interrupts, which are just epolls anyway.

I agree, this is not a blocker for an interrupt mode.

Just a note, that the vhost library already provides kickfd as part of
the vring structure.
A api is available to request notifications from the guest on a specific vring.
(And no experimental API for this!)

About the DPDK interrupt framework, OVS does not need the epoll stuff
even for "normal" DPDK devices Rx interrupts.
eventfds from vfio-pci / other kmods can already be retrieved from
existing ethdev API without using DPDK interrupt thread/framework.


>
> In any case, some extra engineering will be required to support vhost
> rx interrupts even without dma.

I have a PoC in my branches.
I'll send a RFC on this topic, after 22.03-rc1/2.


>
> Also, is dma engine capable of generating interrupts?  Does DPDK API
> support that anyhow?

Cc: Bruce who may know.
At least, I see nothing in current dmadev API.


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] Add support for port binding migration

2022-02-09 Thread Ihar Hrachyshka

Thank you Mark!!! Great feedback. See below.

On 2/7/22 4:49 PM, Mark Michelson wrote:

Hi Ihar,

In addition to the findings in-line down below, I have some high-level 
questions.



Why does the port announces its readiness with a RARP packet? 
Presumably, a CMS will set the migration-destination option on a 
logical switch port, then will do something that prompts the the 
device to send a RARP. If the CMS can prompt a device to send a RARP, 
couldn't the CMS just mark migration-unblocked=true in the SBDB 
directly? Or is there something more to this?


RARP injection comes from behavior of qemu where it issues a series of 
RARPs when unpaused / started to inform the world about VM location / 
presence. Before it issues RARPs, it's unsafe to maintain communication 
with the new location due to potential duplicate frames arriving from 
network stacks of both VM instances.


In this scenario we are trying to avoid any delays produced by CMS > 
SBDB > flow calculation > flow update chain. Hence presetting all flows, 
(including the one that "activates" / unblocks the new location on RARP 
observed by meddling with flows).


I am with you on injecting libvirt behavior into OVN but not sure how to 
implement a similar activation mechanism in a generic form. The good 
news is if CMS doesn't care / doesn't need the mechanism, it can always 
tag the port with migration-unblocked in SBDB.


Side note: one thing that I was thinking about while working on this 
feature is: can it be useful to have more than 2 locations of the same 
port, e.g. outside of live migration scenario? Like having 3 locations, 
one running an actual client, another analyzing all the traffic that is 
mirrored to it while it's in "migration" mode (should probably call it 
differently then), another perhaps dumping traffic to a file... This 
could involve redesign of the patch - switching from 
"migration_destination" column to a "list of port ghosts' locations"




One issue with this change is that it leaves the southbound 
port_binding entry "dirty" by having migration-unblocked set to true 
forever after the migration completes. 


Not forever. The option is cleared once migration-destination is unset 
by CMS. (Port completed migration.) I suspect the whole thing would work 
w/o the option set anywhere, we could store the fact in-memory. It's 
just that in the unlikely scenario of ovn-controller dying in the middle 
of migration, the knowledge may be lost, blocking flows reinstated. 
(Probably not that big of a deal anyway since migration-destination 
clearance should happen in quick succession, removing the blocking 
flows, among other things.)


Consider what is necessary if you wanted to migrate that port a second 
time to a different hypervisor. In addition to setting the new 
migration-destination on the NB logical port, you would need to 
explicitly set migration-unblocked to "false" (or clear it) on the SB 
port_binding. Otherwise, the blocking flows would not be installed on 
the new migration destination, resulting in it going live early. It 
may be a better idea to have a temporary setting that is set when the 
migration is happening and is cleared once the migration completes. 
This way the starting and end state of the migration is identical, and 
you can perform multiple migrations the same each time.


Yes, and as I said above, we already clear it w/ the patch. If CMS never 
completes migration, then yes, migration-unblocked may linger. But then 
you presumably won't get your port moved to another location w/o first 
clearing migration state. But if that's of concern, keeping the 
observable fact in-memory will work too.


Also, a minor nitpick, I'm not a big fan of the name 
"migration-unblocked" due to the "un" in its name. It makes if 
statements like


    if (!smap_get_bool(_smap, "migration-unblocked", false))

a bit hard to follow. I can think of two ways to remove the negation

Option 1: Change to "migration-blocked" (or something that sounds less 
like an error, maybe "migration-in-progress") and reverse all the 
true-false logic used throughout.


    if (smap_get_bool(_smap, "migration-blocked", true))

This would follow nicely with the suggestion I had about having 
something set temporarily during the migration and then being removed 
when the migration is complete.


Option 2: Keep the true-false logic the same but change the name to 
something positively-named like "migration-completed".


    if (!smap_get_bool(_smap, "migration-completed", false))

In Neutron API, there's a concept of "port activation". Perhaps we could 
call the option "activated". (Also useful if want to move farther from 
migration terminology.)




See below for more findings. All of those take the patch at face value 
and do not incorporate the suggestions above.


Thanks for such a detailed review at this point in the life of the 
patch. :) Duly noted / will include in next iterations. Also note that 
this first version of the patch 

Re: [ovs-dev] [PATCH v2] tc: Fix incorrect TC rule for decap+encap datapath flow

2022-02-09 Thread Ilya Maximets
On 2/6/22 09:22, Roi Dayan wrote:
> 
> 
> On 2022-02-04 3:33 PM, Eelco Chaudron wrote:
>>
>>
>> On 2 Dec 2021, at 13:38, Roi Dayan via dev wrote:
>>
>>> A datapath flow generated for traffic from vxlan port to another vxlan port
>>> looks like this:
>>>
>>> tunnel(tun_id=0x65,src=10.10.11.3,dst=10.10.11.2,ttl=0/0,tp_dst=4789,flags(+key)),...,in_port(vxlan_sys_4789),...,
>>>  
>>> actions:set(tunnel(tun_id=0x66,src=10.10.12.2,dst=10.10.12.3,tp_dst=4789,flags(key))),vxlan_sys_4789
>>>
>>> The generated TC rule with explicit tunnel key unset action added after
>>> tunnel key set action, which is wrong.
>>>
>>> filter protocol ip pref 7 flower chain 0 handle 0x1
>>>    dst_mac fa:16:3e:2a:4e:23
>>>    eth_type ipv4
>>>    ip_tos 0x0/3
>>>    enc_dst_ip 10.10.11.2
>>>    enc_src_ip 10.10.11.3
>>>    enc_key_id 101
>>>    enc_dst_port 4789
>>>    ip_flags nofrag
>>>    not_in_hw
>>>  action order 1: tunnel_key  set
>>>  src_ip 10.10.12.2
>>>  dst_ip 10.10.12.3
>>>  key_id 102
>>>  dst_port 4789
>>>  nocsum pipe
>>>   index 1 ref 1 bind 1 installed 568 sec used 0 sec
>>>  Action statistics:
>>>  Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>
>>>  action order 2: tunnel_key  unset pipe
>>>   index 2 ref 1 bind 1 installed 568 sec used 0 sec
>>>  Action statistics:
>>>  Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>
>>>  action order 3: mirred (Egress Redirect to device vxlan_sys_4789) 
>>> stolen
>>>  index 1 ref 1 bind 1 installed 568 sec used 0 sec
>>>  Action statistics:
>>>  Sent 46620 bytes 555 pkt (dropped 0, overlimits 0 requeues 0)
>>>  backlog 0b 0p requeues 0
>>>  cookie e0c82bfd504b701428b00db6b08db3b2
>>>
>>> Fix it by also adding the the tunnel key unset action before the tunnel
>>> key set action and not only before output port.
>>>
>>> Fixes: 7c53bd7839d8 ("tc: Move tunnel_key unset action before output ports")
>>> Signed-off-by: Roi Dayan 
>>> Reviewed-by: Paul Blakey 
>>> ---
>>
>> The changes look good to me, but I think it’s time we start adding test 
>> cases to system-offloads-traffic.at, or maybe even a new TC specific set, 
>> what do you think?
>>
>> Acked-by: Eelco Chaudron 
>>
> 
> yes time to do that for multiple fixes already done.
> I'll check about adding a test for this case in a new commit later.
> 
> thanks for the review.

Thanks, Roi and Eelco!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread Ilya Maximets
On 2/9/22 11:19, Kevin Traynor wrote:
> On 03/02/2022 11:21, Ilya Maximets wrote:
>> On 2/3/22 11:48, Pai G, Sunil wrote:
>>> Hi Maxime,
>>>
> This version of the patch seems to have negative impact on performance
 for burst traffic profile[1].
> Benefits seen with the previous version (v2) was up to ~1.6x for 1568 byte
 packets compared to ~1.2x seen with the current design (v3) as measured on
 new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.
> The cause of the drop seems to be because of the excessive vhost txq
 contention across the PMD threads.

 So it means the Tx/Rx queue pairs aren't consumed by the same PMD
 thread. can you confirm?
>>>
>>> Yes, the completion polls for a given txq happens on a single PMD thread(on 
>>> the same thread where its corresponding rxq is being polled) but other 
>>> threads can submit(enqueue) packets on the same txq,  which leads to 
>>> contention.
>>
>> Why this process can't be lockless?
>> If we have to lock the device, maybe we can do both submission
>> and completion from the thread that polls corresponding Rx queue?
>> Tx threads may enqueue mbufs to some lockless ring inside the
>> rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit
>> jobs to dma device and check completions.  No locks required.
>>
> 
> This still means that Rx polling has to be taking place for OVS
> Tx to the device to operate.
> 
> Isn't that a new dependency on OVS being pushed from the driver? and wouldn't 
> it rule out OVS being able to switch to an interrupt mode, or reduce polling 
> in the future if there was no/low packets to Rx.
> 
> Of course, they could be mutually exclusive features that might have an 
> opt-in, especially seen as one is performance related and the other is about 
> power saving.

AFAICT, vhost library doesn't handle interrupts, so OVS will need to
implement them, i.e. create private interrupt handle and register
all the kickfd descriptors there.  At this point, I think, we might
as well create a second private interrupt handle that will listen
on fds that Tx thread will kick every time after successful enqueue
if dma enqueue is enabled.  This all can happen solely in OVS and we
may even have a different wakeup mechanism since we're not bound to
use DPDK interrupts, which are just epolls anyway.

In any case, some extra engineering will be required to support vhost
rx interrupts even without dma.

Also, is dma engine capable of generating interrupts?  Does DPDK API
support that anyhow?

> 
> Maybe there could be other reasons for not Rx polling a device? I can't think 
> of any right now.
> 
>>>

> [1]:
> https://builders.intel.com/docs/networkbuilders/open-vswitch-optimized
> -deployment-benchmark-technology-guide.pdf
> [2]:
> https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator
>
> Thanks and regards
> Sunil
>>>
>>
>> ___
>> dev mailing list
>> d...@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS with Bifurcation Example

2022-02-09 Thread Tonghao Zhang
On Tue, Feb 8, 2022 at 4:34 PM Lazuardi Nasution  wrote:
>
> Hi Tonghao,
>
> How can I know if my driver support it. i have ConnectX-5 EN, ConnectX-6 VPI 
> and Connect-6 DX.
I tested this patch on Connect-6 DX. but now there is no hw NIC for me
to test again.
if you enable the isolate mode on hw nic, then you can install the
flows, for example, to match the tunnel packet via dpctl command with
--consistent option.
http://patchwork.ozlabs.org/project/openvswitch/patch/20201214022001.84273-3-xiangxia.m@gmail.com/

> Best regards.
>
> On Tue, Feb 8, 2022, 3:27 PM Tonghao Zhang  wrote:
>>
>> On Tue, Feb 8, 2022 at 4:16 PM Lazuardi Nasution  
>> wrote:
>> >
>> > Hi Tonghao,
>> >
>> > Do you have any example or tutorial regarding this?
>> No, but I plan to write a doc about this. The patch 1/4 only enable
>> isolate_offload_mode of dpdk netdev.
>> If your driver support isolate mode of dpdk, it can work in ovs.
>> > Best regards.
>> >
>> >
>> > On Tue, Feb 8, 2022, 2:51 PM Tonghao Zhang  
>> > wrote:
>> >>
>> >> On Sat, Feb 5, 2022 at 8:59 PM Lazuardi Nasution  
>> >> wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > Is there any example about how to use OVS with Bifurcation driver? I 
>> >> > have
>> >> > read that with Intel NIC I should create specific VF for that, but with
>> >> > Mellanox NIC it is not required to have VF.
>> >> Hi
>> >> there is a patchset, I hope it can help you.
>> >> http://patchwork.ozlabs.org/project/openvswitch/list/?series=220295
>> >> > Best regards.
>> >>
>> >> Best regards, Tonghao
>>
>>
>>
>> --
>> Best regards, Tonghao



-- 
Best regards, Tonghao
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 06/18] python: add flow base class

2022-02-09 Thread Adrian Moreno



On 1/31/22 22:09, James Troup wrote:

Adrian Moreno  writes:


diff --git a/python/ovs/flows/flow.py b/python/ovs/flows/flow.py
new file mode 100644
index 0..2456d5f87
--- /dev/null
+++ b/python/ovs/flows/flow.py
@@ -0,0 +1,125 @@


[...]


+class Flow(object):
+"""The Flow class is a base class for other types of concrete flows
+(such as OFproto Flows or DPIF Flows).
+
+A flow is basically comprised of a number of sections.
+For each section named {section_name}, the flow object will have the
+following attributes:
+ - {section_name} will return the sections data in a formatted way.
+ - {section_name}_kv will return the sections data as a list of KeyValues.
+
+Args:
+sections (list[Section]): List of sections that comprise the flow
+orig (str): Original flow string.
+id (Any): Optional; identifier that clients can use to uniqely identify


uniquely



Thanks for catching all the typos and sorry. I'll fix them all in the next 
version.

--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] ovs-monitor-ipsec: add list-commands command

2022-02-09 Thread Mohammad Heib
Currently ovs-python unixctl implement the list-commands
operation as 'help'  command which doesn't match the ovs-appctl
man page and that can confuse the end-users who want to check
the supported operations of the ovs-monitor-ipsec.

This patch adds a list-commands alias name to'help' operation.

Signed-off-by: Mohammad Heib 
---
 ipsec/ovs-monitor-ipsec.in | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ipsec/ovs-monitor-ipsec.in b/ipsec/ovs-monitor-ipsec.in
index a8b0705d9..f891c767d 100755
--- a/ipsec/ovs-monitor-ipsec.in
+++ b/ipsec/ovs-monitor-ipsec.in
@@ -1308,7 +1308,8 @@ def main():
 idl = ovs.db.idl.Idl(remote, schema_helper)
 
 ovs.daemon.daemonize()
-
+ovs.unixctl.command_register("list-commands", "", 0, 0,
+ ovs.unixctl._unixctl_help, None)
 ovs.unixctl.command_register("xfrm/policies", "", 0, 0,
  unixctl_xfrm_policies, None)
 ovs.unixctl.command_register("xfrm/state", "", 0, 0,
-- 
2.34.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 03/18] python: add list parser

2022-02-09 Thread Adrian Moreno



On 1/31/22 22:06, James Troup wrote:

Adrian Moreno  writes:


diff --git a/python/ovs/flows/list.py b/python/ovs/flows/list.py
new file mode 100644
index 0..cafd23d0a
--- /dev/null
+++ b/python/ovs/flows/list.py
@@ -0,0 +1,124 @@
+import re
+
+from ovs.flows.kv import KeyValue, KeyMetadata, ParseError
+from ovs.flows.decoders import decode_default
+
+
+class ListDecoders(object):
+"""ListDecoders is used by ListParser to decode the elements in the list.
+
+A decoder is a function that accepts a value and returns its decoded
+object.
+
+ListDecoders is initialized with a list of of tuples that contains the


s/of of/of/



Thanks James. I'll fix it in the next version.
--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 03/18] python: add list parser

2022-02-09 Thread Adrian Moreno



On 2/5/22 03:55, Mark Michelson wrote:

On 1/28/22 11:04, Adrian Moreno wrote:

Some openflow or dpif flows encode their arguments in lists, eg:
"some_action(arg1,arg2,arg3)". In order to decode this in a way that can
be then stored and queried, add ListParser and ListDecoders classes
that parse lists into KeyValue instances.

The ListParser / ListDecoders mechanism is quite similar to KVParser and
KVDecoders. Since the "key" of the different KeyValue objects is now
ommited, it has to be provided by ListDecoders.

For example, take the openflow action "resubmit" that can be written as:

 resubmit([port],[table][,ct])

Can be decoded by creating a ListDecoders instance such as:

 ListDecoders([
   ("port", decode_default),
   ("table", decode_int),
   ("ct", decode_flag),
 ])

Naturally, the order of the decoders must be kept.

Signed-off-by: Adrian Moreno 
---
  python/automake.mk   |   1 +
  python/ovs/flows/list.py | 124 +++
  2 files changed, 125 insertions(+)
  create mode 100644 python/ovs/flows/list.py

diff --git a/python/automake.mk b/python/automake.mk
index 7ce842d66..73438d615 100644
--- a/python/automake.mk
+++ b/python/automake.mk
@@ -29,6 +29,7 @@ ovs_pyfiles = \
  python/ovs/flows/__init__.py \
  python/ovs/flows/decoders.py \
  python/ovs/flows/kv.py \
+    python/ovs/flows/list.py \
  python/ovs/json.py \
  python/ovs/jsonrpc.py \
  python/ovs/ovsuuid.py \
diff --git a/python/ovs/flows/list.py b/python/ovs/flows/list.py
new file mode 100644
index 0..cafd23d0a
--- /dev/null
+++ b/python/ovs/flows/list.py
@@ -0,0 +1,124 @@
+import re
+
+from ovs.flows.kv import KeyValue, KeyMetadata, ParseError
+from ovs.flows.decoders import decode_default
+
+
+class ListDecoders(object):
+    """ListDecoders is used by ListParser to decode the elements in the list.
+
+    A decoder is a function that accepts a value and returns its decoded
+    object.
+
+    ListDecoders is initialized with a list of of tuples that contains the
+    keyword and the decoding function associated with each position in the
+    list. The order is, therefore, important.
+
+    Args:
+    decoders (list of tuples): Optional; a list of tuples.
+    The first element in the tuple is the keyword associated with the
+    value. The second element in the tuple is the decoder function.
+    """
+
+    def __init__(self, decoders=None):
+    self._decoders = decoders or list()
+
+    def decode(self, index, value_str):
+    """Decode the index'th element of the list.
+
+    Args:
+    index (int): The position in the list of the element to decode.
+    value_str (str): The value string to decode.
+    """
+    if index < 0 or index >= len(self._decoders):
+    return self._default_decoder(index, value_str)
+
+    try:
+    key = self._decoders[index][0]
+    value = self._decoders[index][1](value_str)
+    return key, value
+    except Exception as e:
+    raise ParseError(
+    "Failed to decode value_str {}: {}".format(value_str, str(e))
+    )
+
+    @staticmethod
+    def _default_decoder(index, value):
+    key = "elem_{}".format(index)
+    return key, decode_default(value)
+
+
+class ListParser(object):
+    """ListParser parses a list of values and stores them as key-value pairs.
+
+    It uses a ListDecoders instance to decode each element in the list.
+
+    Args:
+    string (str): The string to parse.
+    decoders (ListDecoders): Optional, the decoders to use.
+    delims (list): Optional, list of delimiters of the list. Defaults to
+    [','].
+    """
+    def __init__(self, string, decoders=None, delims=[","]):
+    self._string = string
+    self._decoders = decoders or ListDecoders()
+    self._keyval = list()
+    self._regexp = r"({})".format("|".join(delims))
+
+    def kv(self):
+    return self._keyval
+
+    def __iter__(self):
+    return iter(self._keyval)
+
+    def parse(self):
+    """Parse the list in string.
+
+    Raises:
+    ParseError if any parsing error occurs.
+    """
+    kpos = 0
+    index = 0
+    while kpos < len(self._string) and self._string[kpos] != "\n":
+    split_parts = re.split(self._regexp, self._string[kpos:], 1)
+    if len(split_parts) == 0:
+    break


Like I commented in patch 1, I don't think len(split_parts) can be 0 here.



Agree. I'll remove it in the next version.


+
+    value_str = split_parts[0]
+
+    key, value = self._decoders.decode(index, value_str)
+
+    meta = KeyMetadata(
+    kpos=kpos,
+    vpos=kpos,
+    kstring=value_str,
+    vstring=value_str,
+    )
+    self._keyval.append(KeyValue(key, 

Re: [ovs-dev] [PATCH v2 02/18] python: add mask, ip and eth decoders

2022-02-09 Thread Adrian Moreno

On 1/31/22 20:07, James Troup wrote:

Adrian Moreno  writes:


diff --git a/python/ovs/flows/decoders.py b/python/ovs/flows/decoders.py
index 73d5c0c60..2f8e5bd0a 100644
--- a/python/ovs/flows/decoders.py
+++ b/python/ovs/flows/decoders.py
@@ -16,3 +25,392 @@ def decode_default(value):
  return int(value, 0)
  except ValueError:
  return value
+
+
+def decode_flag(value):
+"""Decode a flag. It's existance is just flagged by returning True."""


'existence'


+return True
+
+
+def decode_int(value):
+"""Integer decoder.
+
+Both base10 and base16 integers are supported.


Is this true?  If `value` is a string, it isn't.  int("0x4") throws an
exception but int(0x4) doesn't.


+Used for fields such as:
+n_bytes=34
+metadata=0x4
+"""
+return int(value, 0)


[...]


+class IntMask(Decoder):
+"""Base class for Integer Mask decoder classes.
+
+It supports decoding a value/mask pair. The class has to be derived,
+and the size attribute must be set.
+"""
+
+size = None  # Size in bits.
+
+def __init__(self, string):
+if not self.size:
+raise NotImplementedError(
+"IntMask should be derived and " "size should be fixed"
+)


Any reason not to join the strings here?


+class EthMask(Decoder):
+"""EthMask represents an Ethernet address with optional mask.


This is nitpicky, but the capitalisation of Ethernet is inconsistent
throughout this function; it'd be nice to clean it up.


[...]


+def __eq__(self, other):
+"""Equality operator.
+
+Both the Ethernet address and the mask are compared. This can be used
+to implement filters where we expecte a specific mask to be present,


'expect'


[...]


+class IPMask(Decoder):
+"""IPMask stores an IPv6 or IPv4 and a mask.
+
+It uses netaddr.IPAddress.
+
+IPMasks can represent valid CIDRs or randomly maked IP Addresses.


'masked'?


+def __contains__(self, other):
+"""Contains operator.
+
+Only comparing valid CIDRs is supported.
+
+Args:
+other (netaddr.IPAddres or IPMask): An IP address.


'netaddr.IPAddress'




Thanks James,

I'll fix all the typos in the next version.

--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 01/18] python: add generic Key-Value parser

2022-02-09 Thread Adrian Moreno

Thanks Mark.

On 2/5/22 03:55, Mark Michelson wrote:

On 1/28/22 11:04, Adrian Moreno wrote:

Most of ofproto and dpif flows are based on key-value pairs. These
key-value pairs can be represented in several ways, eg: key:value,
key=value, key(value).

Add the following classes that allow parsing of key-value strings:
* KeyValue: holds a key-value pair
* KeyMetadata: holds some metadata associated with a KeyValue such as
   the original key and value strings and their position in the global
   string
* KVParser: is able to parse a string and extract it's key-value pairs
   as KeyValue instances. Before creating the KeyValue instance it tries
   to decode the value via the KVDecoders
* KVDecoders holds a number of decoders that KVParser can use to decode
   key-value pairs. It accepts a dictionary of keys and callables to
   allow users to specify what decoder (i.e: callable) to use for each
   key

Also, flake8 seems to be incorrectly reporting an error (E203) in:
"slice[index + offset : index + offset]" which is PEP8 compliant. So,
ignore this error.

Signed-off-by: Adrian Moreno 
---
  Makefile.am  |   3 +-
  python/automake.mk   |   6 +-
  python/ovs/flows/__init__.py |   0
  python/ovs/flows/decoders.py |  18 ++
  python/ovs/flows/kv.py   | 320 +++
  python/setup.py  |   2 +-
  6 files changed, 346 insertions(+), 3 deletions(-)
  create mode 100644 python/ovs/flows/__init__.py
  create mode 100644 python/ovs/flows/decoders.py
  create mode 100644 python/ovs/flows/kv.py

diff --git a/Makefile.am b/Makefile.am
index cb8076433..4f51d225e 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -391,6 +391,7 @@ ALL_LOCAL += flake8-check
  #   E128 continuation line under-indented for visual indent
  #   E129 visually indented line with same indent as next logical line
  #   E131 continuation line unaligned for hanging indent
+#   E203 whitespace before ':'
  #   E722 do not use bare except, specify exception instead
  #   W503 line break before binary operator
  #   W504 line break after binary operator
@@ -403,7 +404,7 @@ ALL_LOCAL += flake8-check
  #   H233 Python 3.x incompatible use of print operator
  #   H238 old style class declaration, use new style (inherit from `object`)
  FLAKE8_SELECT = H231,H232,H233,H238
-FLAKE8_IGNORE = 
E121,E123,E125,E126,E127,E128,E129,E131,E722,W503,W504,F811,D,H,I
+FLAKE8_IGNORE = 
E121,E123,E125,E126,E127,E128,E129,E131,E203,E722,W503,W504,F811,D,H,I

  flake8-check: $(FLAKE8_PYFILES)
  $(FLAKE8_WERROR)$(AM_V_GEN) \
    src='$^' && \
diff --git a/python/automake.mk b/python/automake.mk
index 767512f17..7ce842d66 100644
--- a/python/automake.mk
+++ b/python/automake.mk
@@ -16,7 +16,6 @@ ovs_pyfiles = \
  python/ovs/compat/sortedcontainers/sorteddict.py \
  python/ovs/compat/sortedcontainers/sortedset.py \
  python/ovs/daemon.py \
-    python/ovs/fcntl_win.py \
  python/ovs/db/__init__.py \
  python/ovs/db/custom_index.py \
  python/ovs/db/data.py \
@@ -26,6 +25,10 @@ ovs_pyfiles = \
  python/ovs/db/schema.py \
  python/ovs/db/types.py \
  python/ovs/fatal_signal.py \
+    python/ovs/fcntl_win.py \
+    python/ovs/flows/__init__.py \
+    python/ovs/flows/decoders.py \
+    python/ovs/flows/kv.py \
  python/ovs/json.py \
  python/ovs/jsonrpc.py \
  python/ovs/ovsuuid.py \
@@ -42,6 +45,7 @@ ovs_pyfiles = \
  python/ovs/version.py \
  python/ovs/vlog.py \
  python/ovs/winutils.py
+
  # These python files are used at build time but not runtime,
  # so they are not installed.
  EXTRA_DIST += \
diff --git a/python/ovs/flows/__init__.py b/python/ovs/flows/__init__.py
new file mode 100644
index 0..e69de29bb
diff --git a/python/ovs/flows/decoders.py b/python/ovs/flows/decoders.py
new file mode 100644
index 0..73d5c0c60
--- /dev/null
+++ b/python/ovs/flows/decoders.py
@@ -0,0 +1,18 @@
+""" Defines helpful decoders that can be used to decode information from the
+flows.
+
+A decoder is generally a callable that accepts a string and returns the value
+object.
+"""
+
+
+def decode_default(value):
+    """Default decoder.
+
+    It tries to convert into an integer value and, if it fails, just
+    returns the string.
+    """
+    try:
+    return int(value, 0)
+    except ValueError:
+    return value
diff --git a/python/ovs/flows/kv.py b/python/ovs/flows/kv.py
new file mode 100644
index 0..78cfe627e
--- /dev/null
+++ b/python/ovs/flows/kv.py
@@ -0,0 +1,320 @@
+""" Common helper classes for flow Key-Value parsing.
+"""
+
+import functools
+import re
+
+from ovs.flows.decoders import decode_default
+
+
+class ParseError(RuntimeError):
+    """Exception raised when an error occurs during parsing."""
+    pass
+
+
+class KeyMetadata(object):


Is python 3.4 the latest version we're allowed to use features from? If not, 
this would be an excellent candidate to be a dataclass (python 3.7+).




I agree it clearly looks like a 

Re: [ovs-dev] [PATCH v3] system-dpdk.at: Add warning log in mfex fuzzy test.

2022-02-09 Thread Eelco Chaudron



On 9 Feb 2022, at 10:50, Kumar Amber wrote:

> Some specific warning are seen on various systems
> which may not be visible on others but good to add
> such logs to test to avoid test-case failure.
>
> Thw warning only effects the fuzzy tests due to
> more than 1000+ flows being offloading simultanously.
>
> Wilcarding flow count number as for different systems
> under test the number could vary in the warning log.
>
> Suggested-by: Eelco Chaudron 
> Signed-off-by: Kumar Amber 
>

Looks good to me.

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread Kevin Traynor

On 03/02/2022 11:21, Ilya Maximets wrote:

On 2/3/22 11:48, Pai G, Sunil wrote:

Hi Maxime,


This version of the patch seems to have negative impact on performance

for burst traffic profile[1].

Benefits seen with the previous version (v2) was up to ~1.6x for 1568 byte

packets compared to ~1.2x seen with the current design (v3) as measured on
new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.

The cause of the drop seems to be because of the excessive vhost txq

contention across the PMD threads.

So it means the Tx/Rx queue pairs aren't consumed by the same PMD
thread. can you confirm?


Yes, the completion polls for a given txq happens on a single PMD thread(on the 
same thread where its corresponding rxq is being polled) but other threads can 
submit(enqueue) packets on the same txq,  which leads to contention.


Why this process can't be lockless?
If we have to lock the device, maybe we can do both submission
and completion from the thread that polls corresponding Rx queue?
Tx threads may enqueue mbufs to some lockless ring inside the
rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit
jobs to dma device and check completions.  No locks required.



This still means that Rx polling has to be taking place for OVS
Tx to the device to operate.

Isn't that a new dependency on OVS being pushed from the driver? and 
wouldn't it rule out OVS being able to switch to an interrupt mode, or 
reduce polling in the future if there was no/low packets to Rx.


Of course, they could be mutually exclusive features that might have an 
opt-in, especially seen as one is performance related and the other is 
about power saving.


Maybe there could be other reasons for not Rx polling a device? I can't 
think of any right now.







[1]:
https://builders.intel.com/docs/networkbuilders/open-vswitch-optimized
-deployment-benchmark-technology-guide.pdf
[2]:
https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator

Thanks and regards
Sunil




___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev



___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3] system-dpdk.at: Add warning log in mfex fuzzy test.

2022-02-09 Thread Kumar Amber
Some specific warning are seen on various systems
which may not be visible on others but good to add
such logs to test to avoid test-case failure.

Thw warning only effects the fuzzy tests due to
more than 1000+ flows being offloading simultanously.

Wilcarding flow count number as for different systems
under test the number could vary in the warning log.

Suggested-by: Eelco Chaudron 
Signed-off-by: Kumar Amber 

---
v2:
- wildcard flow number count in output log check.(David)
v3:
-wildcard only the flow count number (1000+) in warning.(Eelco)
---
 tests/system-dpdk.at | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index 9384cf7f0..c3ee6990c 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -273,7 +273,9 @@ OVS_WAIT_UNTIL([test `ovs-vsctl get interface p1 statistics 
| grep -oP 'rx_packe
 
 dnl Clean up
 AT_CHECK([ovs-vsctl del-port br0 p1], [], [stdout], [stderr])
-OVS_VSWITCHD_STOP("[SYSTEM_DPDK_ALLOWED_LOGS]")
+OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
+\@upcall: datapath reached the dynamic limit of .* flows.@d
+])")
 AT_CLEANUP
 dnl --
 
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 01/18] python: add generic Key-Value parser

2022-02-09 Thread Adrian Moreno



On 1/31/22 20:29, James Troup wrote:

Adrian Moreno  writes:


diff --git a/python/ovs/flows/decoders.py b/python/ovs/flows/decoders.py
new file mode 100644
index 0..73d5c0c60
--- /dev/null
+++ b/python/ovs/flows/decoders.py
@@ -0,0 +1,18 @@
+""" Defines helpful decoders that can be used to decode information from the


Leading whitespace?

[...]


--- /dev/null
+++ b/python/ovs/flows/kv.py
@@ -0,0 +1,320 @@
+""" Common helper classes for flow Key-Value parsing.


Id.


+class KeyMetadata(object):
+"""Class for keeping key metadata.
+


[...]


+delim (string): Optional, the string use as delimiter between the key
+and the value.
+end_delim (string): Optional, the string use as end delimiter between
+the key and the value.


Either s/string use/string used/ or s/string use/string to use/ ?


+class KVDecoders(object):


[...]


+@staticmethod
+def _default_free_decoder(key):
+"""Default decoder for free kewords."""


keywords

[...]


+parenthesys_pattern = re.compile(r"(\(|\))")


Not sure if this spelling is intentional or?



Thanks James.
I'll address all the typos and spelling in the next version.

--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovs-dev v4] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-09 Thread Peng He
Adrian Moreno  于2022年2月9日周三 17:41写道:

>
>
> On 2/9/22 10:08, Peng He wrote:
> > Hi, Adrian,
> >
> >
> >
> > Adrian Moreno mailto:amore...@redhat.com>>
> 于2022年2月8日
> > 周二 18:14写道:
> >
> >
> >
> > On 2/5/22 04:58, Peng He wrote:
> >  > Hi, Adrian,
> >  >
> >  >
> >  >
> >  > Adrian Moreno mailto:amore...@redhat.com>
> > >>
> 于2022年2月4日
> >  > 周五 21:37写道:
> >  >
> >  >
> >  >
> >  > On 1/18/22 16:01, Peng He wrote:
> >  >  >  From hepeng:
> >  >  >
> >  >
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> > <
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> >
> >  >
> >   <
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> <
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> >>
> >  >  >
> >  >  > also from guohongzhi  > 
> >  >  >>>:
> >  >  >
> >  >
> >
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> > <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> >
> >  >
> >   <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> >>
> >  >  >
> >  >  > also from a discussion about the mixing use of RCU and
> refcount in
> > the mail
> >  >  > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan
> Rivet.
> >  >  >
> >  >  > A summary, as quoted from Ilya:
> >  >  >
> >  >  > "
> >  >  > RCU for ofproto was introduced for one
> >  >  > and only one reason - to avoid freeing ofproto while rules
> are still
> >  >  > alive.  This was done in commit f416c8d61601 ("ofproto:
> RCU postpone
> >  >  > rule destruction.").  The goal was to allow using rules
> without
> >  >  > refcounting them within a single grace period.  And that
> forced us
> >  >  > to postpone destruction of the ofproto for a single grace
> period.
> >  >  > Later commit 39c9459355b6 ("Use classifier versioning.")
> made it
> >  >  > possible for rules to be alive for more than one grace
> period, so
> >  >  > the commit made ofproto wait for 2 grace periods by double
> postponing.
> >  >  > As we can see now, that wasn't enough and we have to wait
> for more
> >  >  > than 2 grace periods in certain cases.
> >  >  > "
> >  >  >
> >  >  > In a short, the ofproto should have a longer life time
> than rule, if
> >  >  > the rule lasts for more than 2 grace periods, the ofproto
> should live
> >  >  > longer to ensure rule->ofproto is valid. It's hard to
> predict how long
> >  >  > a ofproto should live, thus we need to use refcount on
> ofproto to make
> >  >  > things easy. The controversial part is that we have
> already used
> > RCU postpone
> >  >  > to delay ofproto destrution, if we have to add refcount,
> is it
> > simpler to
> >  >  > use just refcount without RCU postpone?
> >  >  >
> >  >  > IMO, I think going back to the pure refcount solution is
> more
> >  >  > complicated than mixing using both.
> >  >  >
> >  >  > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
> >  >  >
> >  >  > during ofproto_rule_create, should we use ofproto_ref
> >  >  > or ofproto_try_ref? how can we make sure the ofproto is
> alive?
> >  >  >
> >  >  > By using RCU, ofproto has three states:
> >  >  >
> >  >  > state 1: alive, with refcount >= 1
> >  >  > state 2: dying, with refcount == 0, however pointer is
> valid
> >  >  > state 3: died, memory freed, pointer might be dangling.
> >  >  >
> >  >  > Without using RCU, there is no state 2, thus, we have to
> be very
> > careful
> >  >  > every time we see a ofproto pointer. In contrast, with
> RCU, we can
> > be sure
> >  >  > that it's alive at least in this grace peroid, so we can
> just check if
> >  >  > it is dying by ofproto_try_ref.
> >  >  >
> >  >  > This shows that by mixing use of RCU and refcount we can
> save a
> > lot of work
> >  >  > worrying if ofproto is dangling.
> >  >  >
> >  >  > In short, the RCU part makes sure the ofproto is alive
> 

Re: [ovs-dev] [ovs-dev v4] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-09 Thread Adrian Moreno



On 2/9/22 10:08, Peng He wrote:

Hi, Adrian,



Adrian Moreno mailto:amore...@redhat.com>> 于2022年2月8日 
周二 18:14写道:




On 2/5/22 04:58, Peng He wrote:
 > Hi, Adrian,
 >
 >
 >
 > Adrian Moreno mailto:amore...@redhat.com>
>> 于2022年2月4日
 > 周五 21:37写道:
 >
 >
 >
 >     On 1/18/22 16:01, Peng He wrote:
 >      >  From hepeng:
 >      >
 >

https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473


 >   
  >

 >      >
 >      > also from guohongzhi mailto:guohongz...@huawei.com>
 >     >>:
 >      >
 >

http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/


 >   
  >

 >      >
 >      > also from a discussion about the mixing use of RCU and refcount in
the mail
 >      > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
 >      >
 >      > A summary, as quoted from Ilya:
 >      >
 >      > "
 >      > RCU for ofproto was introduced for one
 >      > and only one reason - to avoid freeing ofproto while rules are 
still
 >      > alive.  This was done in commit f416c8d61601 ("ofproto: RCU 
postpone
 >      > rule destruction.").  The goal was to allow using rules without
 >      > refcounting them within a single grace period.  And that forced us
 >      > to postpone destruction of the ofproto for a single grace period.
 >      > Later commit 39c9459355b6 ("Use classifier versioning.") made it
 >      > possible for rules to be alive for more than one grace period, so
 >      > the commit made ofproto wait for 2 grace periods by double 
postponing.
 >      > As we can see now, that wasn't enough and we have to wait for more
 >      > than 2 grace periods in certain cases.
 >      > "
 >      >
 >      > In a short, the ofproto should have a longer life time than rule, 
if
 >      > the rule lasts for more than 2 grace periods, the ofproto should 
live
 >      > longer to ensure rule->ofproto is valid. It's hard to predict how 
long
 >      > a ofproto should live, thus we need to use refcount on ofproto to 
make
 >      > things easy. The controversial part is that we have already used
RCU postpone
 >      > to delay ofproto destrution, if we have to add refcount, is it
simpler to
 >      > use just refcount without RCU postpone?
 >      >
 >      > IMO, I think going back to the pure refcount solution is more
 >      > complicated than mixing using both.
 >      >
 >      > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
 >      >
 >      > during ofproto_rule_create, should we use ofproto_ref
 >      > or ofproto_try_ref? how can we make sure the ofproto is alive?
 >      >
 >      > By using RCU, ofproto has three states:
 >      >
 >      > state 1: alive, with refcount >= 1
 >      > state 2: dying, with refcount == 0, however pointer is valid
 >      > state 3: died, memory freed, pointer might be dangling.
 >      >
 >      > Without using RCU, there is no state 2, thus, we have to be very
careful
 >      > every time we see a ofproto pointer. In contrast, with RCU, we can
be sure
 >      > that it's alive at least in this grace peroid, so we can just 
check if
 >      > it is dying by ofproto_try_ref.
 >      >
 >      > This shows that by mixing use of RCU and refcount we can save a
lot of work
 >      > worrying if ofproto is dangling.
 >      >
 >      > In short, the RCU part makes sure the ofproto is alive when we 
use it,
 >      > and the refcount part makes sure it lives longer enough.
 >      >
 >      > Also regarding a new patch filed recently, people are now making 
use
 >      > of RCU to protect ofproto:
 >      >
 >      >
 >

https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunj...@huawei.com/


Re: [ovs-dev] [ovs-dev v4] ofproto: add refcount to ofproto to fix ofproto use-after-free

2022-02-09 Thread Peng He
Hi, Adrian,



Adrian Moreno  于2022年2月8日周二 18:14写道:

>
>
> On 2/5/22 04:58, Peng He wrote:
> > Hi, Adrian,
> >
> >
> >
> > Adrian Moreno mailto:amore...@redhat.com>>
> 于2022年2月4日
> > 周五 21:37写道:
> >
> >
> >
> > On 1/18/22 16:01, Peng He wrote:
> >  >  From hepeng:
> >  >
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> > <
> https://patchwork.ozlabs.org/project/openvswitch/patch/20200717015041.82746-1-hepeng.0...@bytedance.com/#2487473
> >
> >  >
> >  > also from guohongzhi  > >:
> >  >
> >
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> > <
> http://patchwork.ozlabs.org/project/openvswitch/patch/20200306130555.19884-1-guohongz...@huawei.com/
> >
> >  >
> >  > also from a discussion about the mixing use of RCU and refcount
> in the mail
> >  > list with Ilya Maximets, William Tu, Ben Pfaf, and Gaëtan Rivet.
> >  >
> >  > A summary, as quoted from Ilya:
> >  >
> >  > "
> >  > RCU for ofproto was introduced for one
> >  > and only one reason - to avoid freeing ofproto while rules are
> still
> >  > alive.  This was done in commit f416c8d61601 ("ofproto: RCU
> postpone
> >  > rule destruction.").  The goal was to allow using rules without
> >  > refcounting them within a single grace period.  And that forced us
> >  > to postpone destruction of the ofproto for a single grace period.
> >  > Later commit 39c9459355b6 ("Use classifier versioning.") made it
> >  > possible for rules to be alive for more than one grace period, so
> >  > the commit made ofproto wait for 2 grace periods by double
> postponing.
> >  > As we can see now, that wasn't enough and we have to wait for more
> >  > than 2 grace periods in certain cases.
> >  > "
> >  >
> >  > In a short, the ofproto should have a longer life time than rule,
> if
> >  > the rule lasts for more than 2 grace periods, the ofproto should
> live
> >  > longer to ensure rule->ofproto is valid. It's hard to predict how
> long
> >  > a ofproto should live, thus we need to use refcount on ofproto to
> make
> >  > things easy. The controversial part is that we have already used
> RCU postpone
> >  > to delay ofproto destrution, if we have to add refcount, is it
> simpler to
> >  > use just refcount without RCU postpone?
> >  >
> >  > IMO, I think going back to the pure refcount solution is more
> >  > complicated than mixing using both.
> >  >
> >  > Gaëtan Rive asks some questions on guohongzhi's v2 patch:
> >  >
> >  > during ofproto_rule_create, should we use ofproto_ref
> >  > or ofproto_try_ref? how can we make sure the ofproto is alive?
> >  >
> >  > By using RCU, ofproto has three states:
> >  >
> >  > state 1: alive, with refcount >= 1
> >  > state 2: dying, with refcount == 0, however pointer is valid
> >  > state 3: died, memory freed, pointer might be dangling.
> >  >
> >  > Without using RCU, there is no state 2, thus, we have to be very
> careful
> >  > every time we see a ofproto pointer. In contrast, with RCU, we
> can be sure
> >  > that it's alive at least in this grace peroid, so we can just
> check if
> >  > it is dying by ofproto_try_ref.
> >  >
> >  > This shows that by mixing use of RCU and refcount we can save a
> lot of work
> >  > worrying if ofproto is dangling.
> >  >
> >  > In short, the RCU part makes sure the ofproto is alive when we
> use it,
> >  > and the refcount part makes sure it lives longer enough.
> >  >
> >  > Also regarding a new patch filed recently, people are now making
> use
> >  > of RCU to protect ofproto:
> >  >
> >  >
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunj...@huawei.com/
> > <
> https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunj...@huawei.com/
> >
> >  >
> >  > In this patch, I have merged guohongzhi's patch and mine, and
> fixes
> >  > accoring to the previous comments.
> >  >
> >  > Signed-off-by: Peng He  > >
> >  > Signed-off-by: guohongzhi  > >
> >  > ---
> >  >   ofproto/ofproto-dpif-xlate-cache.c |  2 +
> >  >   ofproto/ofproto-dpif-xlate.c   | 14 ---
> >  >   ofproto/ofproto-dpif.c | 24 +++-
> >  >   ofproto/ofproto-provider.h |  2 +
> >  >   ofproto/ofproto.c  | 62
> +++---
> >  >   ofproto/ofproto.h  |  4 ++
> >  >   6 files changed, 87 insertions(+), 21 deletions(-)
> >  >
> >  > diff --git a/ofproto/ofproto-dpif-xlate-cache.c
> > 

Re: [ovs-dev] [PATCH RFC dpdk-latest v3 0/1] Enable vhost async API's in OvS.

2022-02-09 Thread Pai G, Sunil
> >>> This version of the patch seems to have negative impact on
> >>> performance
> >> for burst traffic profile[1].
> >>> Benefits seen with the previous version (v2) was up to ~1.6x for
> >>> 1568 byte
> >> packets compared to ~1.2x seen with the current design (v3) as
> >> measured on new Intel hardware that supports DSA [2] , CPU @ 1.8Ghz.
> >>> The cause of the drop seems to be because of the excessive vhost txq
> >> contention across the PMD threads.
> >>
> >> So it means the Tx/Rx queue pairs aren't consumed by the same PMD
> >> thread. can you confirm?
> >
> > Yes, the completion polls for a given txq happens on a single PMD
> thread(on the same thread where its corresponding rxq is being polled) but
> other threads can submit(enqueue) packets on the same txq,  which leads to
> contention.
> 
> Why this process can't be lockless?
> If we have to lock the device, maybe we can do both submission and
> completion from the thread that polls corresponding Rx queue?
> Tx threads may enqueue mbufs to some lockless ring inside the
> rte_vhost_enqueue_burst.  Rx thread may dequeue them and submit jobs
> to dma device and check completions.  No locks required.
> 

Thank you for the comments, Ilya.

Hi Jiayu, Maxime,

Could I request your opinions on this from the vhost library perspective ? 

Thanks and regards,
Sunil 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev