Re: [ovs-dev] [PATCH ovn] Documentation: Update release schedule to include 2022.

2021-02-09 Thread Numan Siddique
On Tue, Feb 2, 2021 at 2:05 AM Mark Michelson  wrote:
>
> Signed-off-by: Mark Michelson 

Acked-by: Numan Siddique 

Numan

> ---
>  Documentation/internals/release-process.rst | 20 +---
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/internals/release-process.rst 
> b/Documentation/internals/release-process.rst
> index 875e01554..f37c09e51 100644
> --- a/Documentation/internals/release-process.rst
> +++ b/Documentation/internals/release-process.rst
> @@ -126,28 +126,34 @@ approximate:
>  Release Calendar
>  
>
> -The 2020 timetable is shown below. Note that these dates are not set in 
> stone.
> +The 2021 timetable is shown below. Note that these dates are not set in 
> stone.
>  If extenuating circumstances arise, a release may be delayed from its target
>  date.
>
>  +-+-+-+-+
>  | Release | Soft Freeze | Branch Creation | Release |
>  +-+-+-+-+
> -| 20.12.0 | Nov 6   | Nov 20  | Dec 4   |
> +| 21.03.0 | Feb 5   | Feb 19  | Mar 5   |
> ++-+-+-+-+
> +| 21.06.0 | May 7   | May 21  | Jun 4   |
> ++-+-+-+-+
> +| 21.09.0 | Aug 6   | Aug 20  | Sep 3   |
> ++-+-+-+-+
> +| 21.12.0 | Nov 5   | Nov 19  | Dec 3   |
>  +-+-+-+-+
>
> -Below is the 2021 timetable
> +Below is the 2022 timetable
>
>  +-+-+-+-+
>  | Release | Soft Freeze | Branch Creation | Release |
>  +-+-+-+-+
> -| 21.03.0 | Feb 5   | Feb 19  | Mar 5   |
> +| 22.03.0 | Feb 4   | Feb 18  | Mar 4   |
>  +-+-+-+-+
> -| 21.06.0 | May 7   | May 21  | Jun 4   |
> +| 22.06.0 | May 6   | May 20  | Jun 3   |
>  +-+-+-+-+
> -| 21.09.0 | Aug 6   | Aug 20  | Sep 3   |
> +| 22.09.0 | Aug 5   | Aug 19  | Sep 2   |
>  +-+-+-+-+
> -| 21.12.0 | Nov 5   | Nov 19  | Dec 3   |
> +| 22.12.0 | Nov 4   | Nov 18  | Dec 2   |
>  +-+-+-+-+
>
>  Contact
> --
> 2.29.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 ovn 00/10] ovn-controller: Make lflow cache size configurable.

2021-02-09 Thread Numan Siddique
On Wed, Feb 10, 2021 at 1:37 AM Dumitru Ceara  wrote:
>
> Scale tests have identified the lflow cache to be one of the main memory
> consumers in ovn-controller.  This series refactors the lflow cache code
> and adds configuration knobs to limit the size (in lines and/or memory)
> of the cache.
>
> Patches 1 and 6 fix issues with the already existing lflow cache code.
> Even though patch 6 is a bug fix, it's easier to add it later in the
> series because it uses the new lflow cache statistics (from patch 4)
> to add a unit test that exercises the buggy scenario.
>
> Changes in v3:
> - Addressed Mark and Numan's comments (individual changes listed in
>   each patch).
> - Added acks where applicable.
> Changes in v2:
> - Added two bug fixes for already existing problems (patches 1 and 6).
> - Added unit tests as requested by Mark.
> - Added support for evicting "less important" entries when the cache
>   limit is reached.
> - Improved cache entries memory accounting.
>
> Dumitru Ceara (10):
>   lflow: Fix cache update when I-P engine aborts.
>   lflow: Refactor convert_match_to_expr() to explicitly consume prereqs.
>   lflow-cache: Move the lflow cache to its own module.
>   lflow-cache: Add lflow-cache/show-stats command.
>   lflow-cache: Add unit tests.
>   lflow: Do not cache non-conjunctive flows that use address 
> sets/portgroups.
>   lflow-cache: Add coverage counters.
>   lflow-cache: Reclaim heap memory after cache flush.
>   lflow-cache: Make maximum number of cache entries configurable.
>   lflow-cache: Make max cache memory usage configurable.


Thanks Dumitru for addressing the comments in v3.

I applied this series to master.

Numan

>
>
>  NEWS|5
>  configure.ac|1
>  controller/automake.mk  |2
>  controller/chassis.c|   44 
>  controller/lflow-cache.c|  371 
>  controller/lflow-cache.h|   81 
>  controller/lflow.c  |  378 +---
>  controller/lflow.h  |   10 -
>  controller/ovn-controller.8.xml |   34 +++
>  controller/ovn-controller.c |  106 +++---
>  controller/test-lflow-cache.c   |  238 +++
>  controller/test-ofctrl-seqno.c  |   18 --
>  include/ovn/expr.h  |3
>  lib/expr.c  |   43 
>  tests/automake.mk   |8 +
>  tests/ovn-lflow-cache.at|  405 
> +++
>  tests/ovn.at|   82 
>  tests/test-utils.c  |   49 +
>  tests/test-utils.h  |   26 +++
>  tests/testsuite.at  |1
>  20 files changed, 1597 insertions(+), 308 deletions(-)
>  create mode 100644 controller/lflow-cache.c
>  create mode 100644 controller/lflow-cache.h
>  create mode 100644 controller/test-lflow-cache.c
>  create mode 100644 tests/ovn-lflow-cache.at
>  create mode 100644 tests/test-utils.c
>  create mode 100644 tests/test-utils.h
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [RFC PATCH] dp-packet-gso: Add GSO support.

2021-02-09 Thread William Tu
This patch adds GSO support for IPv4 TCP, when userspace-tso is enabled.
Tested using veth sending a TSO packet to OVS, segments to smaller TCP
segment, and forward to netdev-afxdp port at another namespace.

Future work includes:
1. GSO for UDP, and IPv6 TCP/UDP GSO.
2. Tunnel GSO: VxLan GSO, Geneve GSO, GRE GSO...

Tested using
$ make check-afxdp TESTSUITEFLAGS='3'

Or script below:
  ovs-vsctl set Open_vSwitch . other_config:userspace-tso-enable=true
  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
  ip netns add at_ns0
  ip link add p0 type veth peer name afxdp-p0
  ip link set p0 netns at_ns0
  ip link set dev afxdp-p0 up
  ovs-vsctl add-port br0 afxdp-p0
  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
  ip addr add "10.1.1.1/24" dev p0
  ip link set dev p0 up
  NS_EXEC_HEREDOC

  ip netns add at_ns1
  ip link add p1 type veth peer name afxdp-p1
  ip link set p1 netns at_ns1
  ip link set dev afxdp-p1 up
  ovs-vsctl add-port br0 afxdp-p1 -- set int afxdp-p1 type=afxdp

  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
  ip addr add "10.1.1.2/24" dev p1
  ip link set dev p1 up
  NS_EXEC_HEREDOC

  ip netns exec at_ns0 ping -c 3 -i .2 10.1.1.2
  ip netns exec at_ns1 ethtool -K p1 tx off
  ip netns exec at_ns1 iperf -s
  ip netns exec at_ns0 iperf -c 10.1.1.2 -t1

Tested-at: https://github.com/williamtu/ovs-travis/actions/runs/553156643
Signed-off-by: William Tu 
---
 lib/automake.mk   |   2 +
 lib/dp-packet-gso.c   | 149 ++
 lib/dp-packet-gso.h   |  27 +
 lib/netdev-afxdp.c|   6 ++
 lib/netdev.c  |  88 +++--
 lib/packets.c |  35 
 lib/packets.h |   1 +
 tests/system-afxdp.at |  32 +++
 8 files changed, 324 insertions(+), 16 deletions(-)
 create mode 100644 lib/dp-packet-gso.c
 create mode 100644 lib/dp-packet-gso.h

diff --git a/lib/automake.mk b/lib/automake.mk
index 39afbff9d1a0..57f504d52f5c 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -104,6 +104,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpctl.h \
lib/dp-packet.h \
lib/dp-packet.c \
+   lib/dp-packet-gso.h \
+   lib/dp-packet-gso.c \
lib/dpdk.h \
lib/dpif-netdev-lookup.h \
lib/dpif-netdev-lookup.c \
diff --git a/lib/dp-packet-gso.c b/lib/dp-packet-gso.c
new file mode 100644
index ..5ae7c88298a5
--- /dev/null
+++ b/lib/dp-packet-gso.c
@@ -0,0 +1,149 @@
+/*
+ * Copyright (c) 2021 VMware, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include "coverage.h"
+#include "csum.h"
+#include "dp-packet.h"
+#include "dp-packet-gso.h"
+#include "dpif-netdev.h"
+#include "openvswitch/compiler.h"
+#include "openvswitch/dynamic-string.h"
+#include "openvswitch/vlog.h"
+#include "packets.h"
+#include "util.h"
+
+VLOG_DEFINE_THIS_MODULE(dp_packet_gso);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+/* Update ip header's total len, and id and update tcp header's
+ * sent sequence number.  In the end, update ip and tcp csum.
+ */
+static void
+update_ipv4_tcp_headers(const struct dp_packet *src, struct dp_packet **pkts,
+uint16_t nb_segs)
+{
+struct tcp_header *tcp;
+struct ip_header *ip;
+struct dp_packet *p;
+uint32_t tcp_seq;
+uint16_t ipid;
+int i;
+
+ip = dp_packet_l3(src);
+ipid = ntohs(ip->ip_id);
+tcp = dp_packet_l4(src);
+tcp_seq = ntohl(get_16aligned_be32(>tcp_seq));
+
+for (i = 0; i < nb_segs; i++) {
+p = pkts[i];
+
+ip = dp_packet_l3(p);
+ip->ip_tot_len = htons(dp_packet_l3_size(p));
+ip->ip_id = htons(ipid);
+ip->ip_csum = 0;
+ip->ip_csum = csum(ip, sizeof *ip);
+
+tcp = dp_packet_l4(p);
+put_16aligned_be32(>tcp_seq, htonl(tcp_seq));
+packet_csum_tcpudp(p);
+
+ipid += 1;
+tcp_seq += (const char *) dp_packet_tail(p) -
+   (const char *) dp_packet_l4(p) -
+   TCP_OFFSET(tcp->tcp_ctl) * 4;
+}
+}
+
+static void
+hdr_segment_init(struct dp_packet *dst, const struct dp_packet *src)
+{
+/* Copy the following fields into the returned buffer: l2_pad_size,
+ * l2_5_ofs, l3_ofs, l4_ofs, cutlen, packet_type and md. */
+memcpy(>l2_pad_size, >l2_pad_size,
+   sizeof(struct dp_packet) -
+   offsetof(struct dp_packet, l2_pad_size));
+
+  

Re: [ovs-dev] [PATCH ovn 1/6] controller: Split mac learning code to a separate file.

2021-02-09 Thread Mark Michelson

Hi Numan, only one finding down below.

I know that you've added some end-to-end tests for MAC learning in later 
patches, but I think adding some basic unit tests for mac-learn.c is 
also useful.


On 2/5/21 1:59 AM, num...@ovn.org wrote:

From: Numan Siddique 

This patch moves the 'struct put_mac_binding' from controller/pinctrl.c
to controller/mac-learn.c as a 'struct mac_binding' and adds the
relevant functions to access it.

Signed-off-by: Numan Siddique 
---
  controller/automake.mk |   5 +-
  controller/mac-learn.c | 101 
  controller/mac-learn.h |  46 +++
  controller/pinctrl.c   | 102 -
  4 files changed, 181 insertions(+), 73 deletions(-)
  create mode 100644 controller/mac-learn.c
  create mode 100644 controller/mac-learn.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 480578eda3..9b8debd2ff 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -27,7 +27,10 @@ controller_ovn_controller_SOURCES = \
controller/ovn-controller.c \
controller/ovn-controller.h \
controller/physical.c \
-   controller/physical.h
+   controller/physical.h \
+   controller/mac-learn.c \
+   controller/mac-learn.h
+
  controller_ovn_controller_LDADD = lib/libovn.la 
$(OVS_LIBDIR)/libopenvswitch.la
  man_MANS += controller/ovn-controller.8
  EXTRA_DIST += controller/ovn-controller.8.xml
diff --git a/controller/mac-learn.c b/controller/mac-learn.c
new file mode 100644
index 00..36a6d6e589
--- /dev/null
+++ b/controller/mac-learn.c
@@ -0,0 +1,101 @@
+/* Copyright (c) 2020, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "mac-learn.h"
+
+/* OpenvSwitch lib includes. */
+#include "openvswitch/vlog.h"
+#include "lib/smap.h"
+
+VLOG_DEFINE_THIS_MODULE(mac_learn);
+
+#define MAX_MAC_BINDINGS 1000
+
+static size_t mac_binding_hash(uint32_t dp_key, uint32_t port_key,
+   struct in6_addr *);
+static struct mac_binding *mac_binding_find(struct hmap *mac_bindings,
+uint32_t dp_key,
+uint32_t port_key,
+struct in6_addr *ip, size_t hash);
+
+void
+ovn_mac_bindings_init(struct hmap *mac_bindings)
+{
+hmap_init(mac_bindings);
+}
+
+void
+ovn_mac_bindings_flush(struct hmap *mac_bindings)
+{
+struct mac_binding *mb;
+HMAP_FOR_EACH_POP (mb, hmap_node, mac_bindings) {
+free(mb);
+}
+}
+
+void
+ovn_mac_bindings_destroy(struct hmap *mac_bindings)
+{
+ovn_mac_bindings_flush(mac_bindings);
+hmap_destroy(mac_bindings);
+}
+
+struct mac_binding *
+ovn_mac_binding_add(struct hmap *mac_bindings, uint32_t dp_key,
+uint32_t port_key, struct in6_addr *ip,
+struct eth_addr mac)
+{
+uint32_t hash = mac_binding_hash(dp_key, port_key, ip);
+
+struct mac_binding *mb =
+mac_binding_find(mac_bindings, dp_key, port_key, ip, hash);
+if (!mb) {
+if (hmap_count(mac_bindings) >= MAX_MAC_BINDINGS) {
+return NULL;
+}
+
+mb = xmalloc(sizeof *mb);
+mb->dp_key = dp_key;
+mb->port_key = port_key;
+mb->ip = *ip;
+hmap_insert(mac_bindings, >hmap_node, hash);
+}
+mb->mac = mac;
+
+return mb;
+}
+
+static size_t
+mac_binding_hash(uint32_t dp_key, uint32_t port_key, struct in6_addr *ip)
+{
+return hash_bytes(ip, sizeof *ip, hash_2words(dp_key, port_key));
+}
+
+static struct mac_binding *
+mac_binding_find(struct hmap *mac_bindings, uint32_t dp_key,
+   uint32_t port_key, struct in6_addr *ip, size_t hash)
+{
+struct mac_binding *mb;
+HMAP_FOR_EACH_WITH_HASH (mb, hmap_node, hash, mac_bindings) {
+if (mb->dp_key == dp_key && mb->port_key == port_key &&
+IN6_ARE_ADDR_EQUAL(>ip, ip)) {
+return mb;
+}
+}
+
+return NULL;
+}
diff --git a/controller/mac-learn.h b/controller/mac-learn.h
new file mode 100644
index 00..3a8520c360
--- /dev/null
+++ b/controller/mac-learn.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2020 Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * 

Re: [ovs-dev] [PATCH ovn] expr: Combine multiple ipv4 with wildcard mask.

2021-02-09 Thread Dumitru Ceara

Resending as my previous try didn't make it on the mailing list.

On 2/9/21 1:06 PM, Dumitru Ceara wrote:

On 2/9/21 12:26 PM, gmingchen(陈供明) wrote:

Hi Dumitru,
Thanks for review.

On 2021/2/8, 6:19 PM, "Dumitru Ceara"  wrote:

 On 2/7/21 1:25 PM, gmingchen(陈供明) wrote:
 > From: Gongming Chen 

 Hi Gongming,

 First of all, thanks for the contribution!

 This is not a full review, just some comments for now.

 It seems that there's a memory leak with your patch applied.
 AddressSanitizer reports:

 Direct leak of 12 byte(s) in 1 object(s) allocated from:
  #0 0x49644d in malloc
 
(/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/tests/ovstest+0x49644d) 


  #1 0x538604 in xmalloc
 /home/runner/work/ovn/ovn/ovs_src/lib/util.c:138:15
  #2 0x62636f in expr_const_sets_add
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../lib/expr.c:1237:18 


  #3 0x4cf8f6 in create_addr_sets
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:230:5 


  #4 0x4cf8f6 in test_parse_expr__
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:296:5 


  #5 0x4dfd04 in ovs_cmdl_run_command__
 /home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17
  #6 0x4c6810 in test_ovn_main
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1635:5 


  #7 0x4c6810 in ovstest_wrapper_test_ovn_main__
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1638:1 


  #8 0x4dfd04 in ovs_cmdl_run_command__
 /home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17
  #9 0x4c5fe3 in main
 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/ovstest.c:150:9 


  #10 0x7f71aa6ddbf6 in __libc_start_main
 (/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

 Full reports are available in the ovsrobot OVN github CI run 
artifacts:

 https://github.com/ovsrobot/ovn/actions/runs/545416492

 Just a note, if you push the branch to your own fork it will 
trigger the
 github action to run CI and the "linux clang test asan" job will 
also

 enable AddressSanitizer.

Thanks, there are really missing the free ip_data.ip, resulting in a 
memory leak.


Ok.



 There are also a few style related issues (e.g., sizeof args), 
please

 see the guidelines here:

 
https://github.com/ovn-org/ovn/blob/master/Documentation/internals/contributing/coding-style.rst 



Sorry, does this mean that
ip_r_data->ip = xmalloc(4 * sizeof(uint32_t))
should be replaced with
ip_r_data->ip = xmalloc(4 * sizeof ip_r_data->ip)?


Actually:
sizeof *ip_r_data->ip



 >
 > In the ovn security group, each host ip corresponds to at least 
4 flow
 > tables (different connection states). As the scale of hosts 
using the

 > security group increases, the ovs security group flow table will
 > increase sharply, especially when it is applied  the remote group
 > feature in OpenStack.
 >
 > This patch merges ipv4 addresses with wildcard masks, and 
replaces this
 > ipv4 addresses with the merged ip/mask. This will greatly 
reduce the
 > entries in the ovs security group flow table, especially when 
the host
 > size is large. After being used in a production environment, 
the number
 > of ovs flow tables will be reduced by at least 50% in most 
scenarios,

 > when the remote group in OpenStack is applied.

 I think it would be great to describe the algorithm here, in the 
commit

 log, but also in the comments in the code.

You are right, I will resubmit and add the description of the 
algorithm to the

commit log and the code comments.


Cool, thanks.



 >
 > Analysis in the simplest scenario, a network 1.1.1.0/24 
network, enable
 > the OpenStack security group remote group feature, create 253 
virtual

 > machine ports(1.1.1.2-1.1.1.254).
 >
 > Only focus on the number of ip addresses, in the table=44 table:
 > ./configure --disable-combine-ipv4:
 > 1.1.1.2-1.1.1.254(253 flow meters) * 4(connection status) *
 > 1(local net of localport) = 1012
 >
 > ./configure --enable-combine-ipv4(default):
 > 1.1.1.2/31
 > 1.1.1.4/30
 > 1.1.1.8/29
 > 1.1.1.16/28
 > 1.1.1.32/27
 > 1.1.1.64/26
 > 1.1.1.128/26
 > 1.1.1.192/27
 > 1.1.1.224/28
 > 1.1.1.240/29
 > 1.1.1.248/30
 > 1.1.1.252/31
 > 1.1.1.254
 > 13 flow tables * 4(connection status) * 1(local net of 
localport) = 52

 >
 > Reduced from 1012 flow meters to 52, a 19.4 times reduction.
 >
 > Some scenes are similar to the following:
 > 1.1.1.2, 1.1.1.6
 > After the combine:
 > 1.1.1.2/255.255.255.251
 > This will slightly increase the difficulty of finding the flow 
table

 > corresponding to a single address.
 > such as:
 > ovs-ofctl dump-flows br-int | grep 

Re: [ovs-dev] [PATCH v2 ovn 00/10] ovn-controller: Make lflow cache size configurable.

2021-02-09 Thread Dumitru Ceara

On 2/8/21 11:09 PM, Mark Michelson wrote:

Hi Dumitru,

Thanks for the v2. I had one small nit on patch 6. Aside from that (and 
Numan's comments), looks good.


Acked-by: Mark Michelson 


Hi Mark, Numan,

Thanks for the reviews!

I sent a v3 with all the suggested changes.

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 09/10] lflow-cache: Make maximum number of cache entries configurable.

2021-02-09 Thread Dumitru Ceara
Add a new OVS external-id, "ovn-limit-lflow-cache", through which users
can specify the maximum size of the ovn-controller logical flow cache.

To maintain backwards compatibility the default behavior is to not
enforce any limit on the size of the cache.

When the cache becomes full, the rule is to prefer more "important"
cache entries over less "important" ones.  That is, evict entries of
type LCACHE_T_CONJ_ID if there's no room to add an entry of type
LCACHE_T_EXPR.  Similarly, evict entries of type LCACHE_T_CONJ_ID or
LCACHE_T_EXPR if there's no room to add an entry of type
LCACHE_T_MATCHES.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- add comment to lflow_cache_make_room__().
- update UT outputs.
---
 NEWS|3 +
 controller/chassis.c|   23 -
 controller/lflow-cache.c|   55 -
 controller/lflow-cache.h|2 -
 controller/ovn-controller.8.xml |   16 ++
 controller/ovn-controller.c |8 +++
 controller/test-lflow-cache.c   |   12 +++--
 tests/ovn-lflow-cache.at|  104 +++
 8 files changed, 213 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 2044671..6e10557 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,9 @@ Post-v20.12.0
   - Add a new option to Load_Balancer.options, "hairpin_snat_ip", to allow
 users to explicitly select which source IP should be used for load
 balancer hairpin traffic.
+  - ovn-controller: Add a configuration knob, through OVS external-id
+"ovn-limit-lflow-cache", to allow enforcing a limit for the size of the
+logical flow cache.
 
 OVN v20.12.0 - 18 Dec 2020
 --
diff --git a/controller/chassis.c b/controller/chassis.c
index 0937e33..1b63d8e 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -50,6 +50,7 @@ struct ovs_chassis_cfg {
 const char *monitor_all;
 const char *chassis_macs;
 const char *enable_lflow_cache;
+const char *limit_lflow_cache;
 
 /* Set of encap types parsed from the 'ovn-encap-type' external-id. */
 struct sset encap_type_set;
@@ -136,6 +137,12 @@ get_enable_lflow_cache(const struct smap *ext_ids)
 }
 
 static const char *
+get_limit_lflow_cache(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-limit-lflow-cache", "");
+}
+
+static const char *
 get_encap_csum(const struct smap *ext_ids)
 {
 return smap_get_def(ext_ids, "ovn-encap-csum", "true");
@@ -257,6 +264,7 @@ chassis_parse_ovs_config(const struct 
ovsrec_open_vswitch_table *ovs_table,
 ovs_cfg->monitor_all = get_monitor_all(>external_ids);
 ovs_cfg->chassis_macs = get_chassis_mac_mappings(>external_ids);
 ovs_cfg->enable_lflow_cache = get_enable_lflow_cache(>external_ids);
+ovs_cfg->limit_lflow_cache = get_limit_lflow_cache(>external_ids);
 
 if (!chassis_parse_ovs_encap_type(encap_type, _cfg->encap_type_set)) {
 return false;
@@ -284,13 +292,16 @@ chassis_build_other_config(struct smap *config, const 
char *bridge_mappings,
const char *datapath_type, const char *cms_options,
const char *monitor_all, const char *chassis_macs,
const char *iface_types,
-   const char *enable_lflow_cache, bool is_interconn)
+   const char *enable_lflow_cache,
+   const char *limit_lflow_cache,
+   bool is_interconn)
 {
 smap_replace(config, "ovn-bridge-mappings", bridge_mappings);
 smap_replace(config, "datapath-type", datapath_type);
 smap_replace(config, "ovn-cms-options", cms_options);
 smap_replace(config, "ovn-monitor-all", monitor_all);
 smap_replace(config, "ovn-enable-lflow-cache", enable_lflow_cache);
+smap_replace(config, "ovn-limit-lflow-cache", limit_lflow_cache);
 smap_replace(config, "iface-types", iface_types);
 smap_replace(config, "ovn-chassis-mac-mappings", chassis_macs);
 smap_replace(config, "is-interconn", is_interconn ? "true" : "false");
@@ -307,6 +318,7 @@ chassis_other_config_changed(const char *bridge_mappings,
  const char *monitor_all,
  const char *chassis_macs,
  const char *enable_lflow_cache,
+ const char *limit_lflow_cache,
  const struct ds *iface_types,
  bool is_interconn,
  const struct sbrec_chassis *chassis_rec)
@@ -346,6 +358,13 @@ chassis_other_config_changed(const char *bridge_mappings,
 return true;
 }
 
+const char *chassis_limit_lflow_cache =
+get_limit_lflow_cache(_rec->other_config);
+
+if (strcmp(limit_lflow_cache, chassis_limit_lflow_cache)) {
+return true;
+}
+
 const char *chassis_mac_mappings =
 

[ovs-dev] [PATCH ovn v3 10/10] lflow-cache: Make max cache memory usage configurable.

2021-02-09 Thread Dumitru Ceara
Add a new OVS external-id, "ovn-memlimit-lflow-cache-kb", through which
users can specify the maximum amount of memory (in KB) ovn-controller
can use for caching logical flows.

To maintain backwards compatibility the default behavior is to not
enforce any memory limit on the size of the cache.

Add lflow cache reports to "ovn-appctl -t ovn-controller memory/show".

The memory usage for cache entries of type LCACHE_T_EXPR is computed by
doing another pass through the expression tree.  While this adds a few
extra cycles, entries of type LCACHE_T_EXPR shouldn't be very common
because they are created only for flows with matches that include
"is_chassis_resident()" but do not reference port groups and address sets.

Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3:
- Add total cache memory usage to lflow-cache/show-stats output.
- Update lflow-cache UTs.
---
 NEWS|8 +++--
 controller/chassis.c|   21 +
 controller/lflow-cache.c|   58 ++--
 controller/lflow-cache.h|   12 ++--
 controller/lflow.c  |8 +++--
 controller/ovn-controller.8.xml |7 
 controller/ovn-controller.c |8 -
 controller/test-lflow-cache.c   |   31 +---
 include/ovn/expr.h  |3 +-
 lib/expr.c  |   39 -
 tests/ovn-lflow-cache.at|   62 ++-
 11 files changed, 216 insertions(+), 41 deletions(-)

diff --git a/NEWS b/NEWS
index 6e10557..ddb5977 100644
--- a/NEWS
+++ b/NEWS
@@ -16,9 +16,11 @@ Post-v20.12.0
   - Add a new option to Load_Balancer.options, "hairpin_snat_ip", to allow
 users to explicitly select which source IP should be used for load
 balancer hairpin traffic.
-  - ovn-controller: Add a configuration knob, through OVS external-id
-"ovn-limit-lflow-cache", to allow enforcing a limit for the size of the
-logical flow cache.
+  - ovn-controller: Add configuration knobs, through OVS external-id
+"ovn-limit-lflow-cache" and "ovn-memlimit-lflow-cache-kb", to allow
+enforcing a limit for the size of the logical flow cache based on
+maximum number of entries and/or memory usage.
+  - ovn-controller: Add lflow cache related memory reports.
 
 OVN v20.12.0 - 18 Dec 2020
 --
diff --git a/controller/chassis.c b/controller/chassis.c
index 1b63d8e..310132d 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -51,6 +51,7 @@ struct ovs_chassis_cfg {
 const char *chassis_macs;
 const char *enable_lflow_cache;
 const char *limit_lflow_cache;
+const char *memlimit_lflow_cache;
 
 /* Set of encap types parsed from the 'ovn-encap-type' external-id. */
 struct sset encap_type_set;
@@ -143,6 +144,12 @@ get_limit_lflow_cache(const struct smap *ext_ids)
 }
 
 static const char *
+get_memlimit_lflow_cache(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-memlimit-lflow-cache-kb", "");
+}
+
+static const char *
 get_encap_csum(const struct smap *ext_ids)
 {
 return smap_get_def(ext_ids, "ovn-encap-csum", "true");
@@ -265,6 +272,8 @@ chassis_parse_ovs_config(const struct 
ovsrec_open_vswitch_table *ovs_table,
 ovs_cfg->chassis_macs = get_chassis_mac_mappings(>external_ids);
 ovs_cfg->enable_lflow_cache = get_enable_lflow_cache(>external_ids);
 ovs_cfg->limit_lflow_cache = get_limit_lflow_cache(>external_ids);
+ovs_cfg->memlimit_lflow_cache =
+get_memlimit_lflow_cache(>external_ids);
 
 if (!chassis_parse_ovs_encap_type(encap_type, _cfg->encap_type_set)) {
 return false;
@@ -294,6 +303,7 @@ chassis_build_other_config(struct smap *config, const char 
*bridge_mappings,
const char *iface_types,
const char *enable_lflow_cache,
const char *limit_lflow_cache,
+   const char *memlimit_lflow_cache,
bool is_interconn)
 {
 smap_replace(config, "ovn-bridge-mappings", bridge_mappings);
@@ -302,6 +312,7 @@ chassis_build_other_config(struct smap *config, const char 
*bridge_mappings,
 smap_replace(config, "ovn-monitor-all", monitor_all);
 smap_replace(config, "ovn-enable-lflow-cache", enable_lflow_cache);
 smap_replace(config, "ovn-limit-lflow-cache", limit_lflow_cache);
+smap_replace(config, "ovn-memlimit-lflow-cache-kb", memlimit_lflow_cache);
 smap_replace(config, "iface-types", iface_types);
 smap_replace(config, "ovn-chassis-mac-mappings", chassis_macs);
 smap_replace(config, "is-interconn", is_interconn ? "true" : "false");
@@ -319,6 +330,7 @@ chassis_other_config_changed(const char *bridge_mappings,
  const char *chassis_macs,
  const char *enable_lflow_cache,
  const char *limit_lflow_cache,
+ 

[ovs-dev] [PATCH ovn v3 08/10] lflow-cache: Reclaim heap memory after cache flush.

2021-02-09 Thread Dumitru Ceara
If possible, automatically reclaim heap memory when the lflow cache is
flushed.  This can be an expensive operation but cache flushing is not
a very common operation.

This change is inspired by Ilya Maximets' OVS commit:
  f38f98a2c0dd ("ovsdb-server: Reclaim heap memory after compaction.")

Additionally, when flushing the cache, also shrink the backing hmap.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
 configure.ac |1 +
 controller/lflow-cache.c |9 +
 2 files changed, 10 insertions(+)

diff --git a/configure.ac b/configure.ac
index b2d0843..ebb09a5 100644
--- a/configure.ac
+++ b/configure.ac
@@ -96,6 +96,7 @@ OVN_CHECK_DOT
 OVS_CHECK_IF_DL
 OVS_CHECK_STRTOK_R
 AC_CHECK_DECLS([sys_siglist], [], [], [[#include ]])
+AC_CHECK_DECLS([malloc_trim], [], [], [[#include ]])
 AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
   [], [], [[#include ]])
 AC_CHECK_MEMBERS([struct ifreq.ifr_flagshigh], [], [], [[#include ]])
diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index be764d5..403aed4 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -17,6 +17,10 @@
 
 #include 
 
+#if HAVE_DECL_MALLOC_TRIM
+#include 
+#endif
+
 #include "coverage.h"
 #include "lflow-cache.h"
 #include "lib/uuid.h"
@@ -86,7 +90,12 @@ lflow_cache_flush(struct lflow_cache *lc)
 HMAP_FOR_EACH_SAFE (lce, lce_next, node, >entries[i]) {
 lflow_cache_delete__(lc, lce);
 }
+hmap_shrink(>entries[i]);
 }
+
+#if HAVE_DECL_MALLOC_TRIM
+malloc_trim(0);
+#endif
 }
 
 void

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 07/10] lflow-cache: Add coverage counters.

2021-02-09 Thread Dumitru Ceara
These are available in ovn-controller logs or when explicitly requested
by users with "ovn-appctl -t ovn-controller coverage/show".

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
 controller/lflow-cache.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index 185f540..be764d5 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -17,10 +17,23 @@
 
 #include 
 
+#include "coverage.h"
 #include "lflow-cache.h"
 #include "lib/uuid.h"
 #include "ovn/expr.h"
 
+COVERAGE_DEFINE(lflow_cache_flush);
+COVERAGE_DEFINE(lflow_cache_add_conj_id);
+COVERAGE_DEFINE(lflow_cache_add_expr);
+COVERAGE_DEFINE(lflow_cache_add_matches);
+COVERAGE_DEFINE(lflow_cache_free_conj_id);
+COVERAGE_DEFINE(lflow_cache_free_expr);
+COVERAGE_DEFINE(lflow_cache_free_matches);
+COVERAGE_DEFINE(lflow_cache_add);
+COVERAGE_DEFINE(lflow_cache_hit);
+COVERAGE_DEFINE(lflow_cache_miss);
+COVERAGE_DEFINE(lflow_cache_delete);
+
 static const char *lflow_cache_type_names[LCACHE_T_MAX] = {
 [LCACHE_T_CONJ_ID] = "cache-conj-id",
 [LCACHE_T_EXPR]= "cache-expr",
@@ -65,6 +78,7 @@ lflow_cache_flush(struct lflow_cache *lc)
 return;
 }
 
+COVERAGE_INC(lflow_cache_flush);
 for (size_t i = 0; i < LCACHE_T_MAX; i++) {
 struct lflow_cache_entry *lce;
 struct lflow_cache_entry *lce_next;
@@ -139,6 +153,7 @@ lflow_cache_add_conj_id(struct lflow_cache *lc, const 
struct uuid *lflow_uuid,
 if (!lcv) {
 return;
 }
+COVERAGE_INC(lflow_cache_add_conj_id);
 lcv->conj_id_ofs = conj_id_ofs;
 }
 
@@ -153,6 +168,7 @@ lflow_cache_add_expr(struct lflow_cache *lc, const struct 
uuid *lflow_uuid,
 expr_destroy(expr);
 return;
 }
+COVERAGE_INC(lflow_cache_add_expr);
 lcv->conj_id_ofs = conj_id_ofs;
 lcv->expr = expr;
 }
@@ -169,6 +185,7 @@ lflow_cache_add_matches(struct lflow_cache *lc, const 
struct uuid *lflow_uuid,
 free(matches);
 return;
 }
+COVERAGE_INC(lflow_cache_add_matches);
 lcv->expr_matches = matches;
 }
 
@@ -186,10 +203,12 @@ lflow_cache_get(struct lflow_cache *lc, const struct uuid 
*lflow_uuid)
 
 HMAP_FOR_EACH_WITH_HASH (lce, node, hash, >entries[i]) {
 if (uuid_equals(>lflow_uuid, lflow_uuid)) {
+COVERAGE_INC(lflow_cache_hit);
 return >value;
 }
 }
 }
+COVERAGE_INC(lflow_cache_miss);
 return NULL;
 }
 
@@ -202,6 +221,7 @@ lflow_cache_delete(struct lflow_cache *lc, const struct 
uuid *lflow_uuid)
 
 struct lflow_cache_value *lcv = lflow_cache_get(lc, lflow_uuid);
 if (lcv) {
+COVERAGE_INC(lflow_cache_delete);
 lflow_cache_delete__(lc, CONTAINER_OF(lcv, struct lflow_cache_entry,
   value));
 }
@@ -217,6 +237,7 @@ lflow_cache_add__(struct lflow_cache *lc, const struct uuid 
*lflow_uuid,
 
 struct lflow_cache_entry *lce = xzalloc(sizeof *lce);
 
+COVERAGE_INC(lflow_cache_add);
 lce->lflow_uuid = *lflow_uuid;
 lce->value.type = type;
 hmap_insert(>entries[type], >node, uuid_hash(lflow_uuid));
@@ -236,11 +257,14 @@ lflow_cache_delete__(struct lflow_cache *lc, struct 
lflow_cache_entry *lce)
 OVS_NOT_REACHED();
 break;
 case LCACHE_T_CONJ_ID:
+COVERAGE_INC(lflow_cache_free_conj_id);
 break;
 case LCACHE_T_EXPR:
+COVERAGE_INC(lflow_cache_free_expr);
 expr_destroy(lce->value.expr);
 break;
 case LCACHE_T_MATCHES:
+COVERAGE_INC(lflow_cache_free_matches);
 expr_matches_destroy(lce->value.expr_matches);
 free(lce->value.expr_matches);
 break;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 06/10] lflow: Do not cache non-conjunctive flows that use address sets/portgroups.

2021-02-09 Thread Dumitru Ceara
Caching the conjunction id offset for flows that refer to address sets
and/or port groups but do not currently generate conjunctive matches,
e.g., because the port group has only one locally bound port, is wrong.

If ports are later added to the port group and/or addresses are added to
the address set, this flow will be reconsidered but at this point will
generate conjunctive matches.  We cannot use the cached conjunction id
offset because it might create collisions with other conjunction ids
created for unrelated flows.

Fixes: 1213bc827040 ("ovn-controller: Cache logical flow expr matches.")
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3:
- Fix ovn.at comment.
---
 controller/lflow.c |2 +-
 tests/ovn.at   |   12 
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index b8abfbb..8faedde 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -886,7 +886,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
  >header_.uuid, conj_id_ofs,
  cached_expr);
 cached_expr = NULL;
-} else {
+} else if (n_conjs) {
 lflow_cache_add_conj_id(l_ctx_out->lflow_cache,
 >header_.uuid, conj_id_ofs);
 }
diff --git a/tests/ovn.at b/tests/ovn.at
index 878230a..55ea6d1 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22410,6 +22410,18 @@ AT_CHECK([test "$(($conj_id_cnt + 2))" = 
"$(get_cache_count cache-conj-id)"], [0
 AT_CHECK([test "$expr_cnt" = "$(get_cache_count cache-expr)"], [0], [])
 AT_CHECK([test "$matches_cnt" = "$(get_cache_count cache-matches)"], [0], [])
 
+AS_BOX([Check no caching for non-conjunctive port group/address set matches])
+conj_id_cnt=$(get_cache_count cache-conj-id)
+expr_cnt=$(get_cache_count cache-expr)
+matches_cnt=$(get_cache_count cache-matches)
+
+check ovn-nbctl acl-add ls1 from-lport 1 'inport == @pg2 && outport == @pg2 && 
is_chassis_resident("lsp1")' drop
+check ovn-nbctl --wait=hv sync
+
+AT_CHECK([test "$conj_id_cnt" = "$(get_cache_count cache-conj-id)"], [0], [])
+AT_CHECK([test "$expr_cnt" = "$(get_cache_count cache-expr)"], [0], [])
+AT_CHECK([test "$matches_cnt" = "$(get_cache_count cache-matches)"], [0], [])
+
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 05/10] lflow-cache: Add unit tests.

2021-02-09 Thread Dumitru Ceara
Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- change calls to lflow-cache API to use new conventions.
- update UT outputs to match new lflow-cache stats outputs.
---
 controller/test-lflow-cache.c  |  223 +++
 controller/test-ofctrl-seqno.c |   18 ---
 tests/automake.mk  |8 +
 tests/ovn-lflow-cache.at   |  257 
 tests/ovn.at   |   68 +++
 tests/test-utils.c |   49 
 tests/test-utils.h |   26 
 tests/testsuite.at |1 
 8 files changed, 632 insertions(+), 18 deletions(-)
 create mode 100644 controller/test-lflow-cache.c
 create mode 100644 tests/ovn-lflow-cache.at
 create mode 100644 tests/test-utils.c
 create mode 100644 tests/test-utils.h

diff --git a/controller/test-lflow-cache.c b/controller/test-lflow-cache.c
new file mode 100644
index 000..79fe8c6
--- /dev/null
+++ b/controller/test-lflow-cache.c
@@ -0,0 +1,223 @@
+/* Copyright (c) 2021, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "lib/uuid.h"
+#include "ovn/expr.h"
+#include "tests/ovstest.h"
+#include "tests/test-utils.h"
+#include "util.h"
+
+#include "lflow-cache.h"
+
+static void
+test_lflow_cache_add__(struct lflow_cache *lc, const char *op_type,
+   const struct uuid *lflow_uuid,
+   unsigned int conj_id_ofs,
+   struct expr *e)
+{
+printf("ADD %s:\n", op_type);
+printf("  conj-id-ofs: %u\n", conj_id_ofs);
+
+if (!strcmp(op_type, "conj-id")) {
+lflow_cache_add_conj_id(lc, lflow_uuid, conj_id_ofs);
+} else if (!strcmp(op_type, "expr")) {
+lflow_cache_add_expr(lc, lflow_uuid, conj_id_ofs, expr_clone(e));
+} else if (!strcmp(op_type, "matches")) {
+struct hmap *matches = xmalloc(sizeof *matches);
+ovs_assert(expr_to_matches(e, NULL, NULL, matches) == 0);
+ovs_assert(hmap_count(matches) == 1);
+lflow_cache_add_matches(lc, lflow_uuid, matches);
+} else {
+OVS_NOT_REACHED();
+}
+}
+
+static void
+test_lflow_cache_lookup__(struct lflow_cache *lc,
+  const struct uuid *lflow_uuid)
+{
+struct lflow_cache_value *lcv = lflow_cache_get(lc, lflow_uuid);
+
+printf("LOOKUP:\n");
+if (!lcv) {
+printf("  not found\n");
+return;
+}
+
+printf("  conj_id_ofs: %"PRIu32"\n", lcv->conj_id_ofs);
+switch (lcv->type) {
+case LCACHE_T_CONJ_ID:
+printf("  type: conj-id\n");
+break;
+case LCACHE_T_EXPR:
+printf("  type: expr\n");
+break;
+case LCACHE_T_MATCHES:
+printf("  type: matches\n");
+break;
+case LCACHE_T_NONE:
+OVS_NOT_REACHED();
+break;
+}
+}
+
+static void
+test_lflow_cache_delete__(struct lflow_cache *lc,
+  const struct uuid *lflow_uuid)
+{
+printf("DELETE\n");
+lflow_cache_delete(lc, lflow_uuid);
+}
+
+static void
+test_lflow_cache_stats__(struct lflow_cache *lc)
+{
+struct ds ds = DS_EMPTY_INITIALIZER;
+
+lflow_cache_get_stats(lc, );
+printf("%s", ds_cstr());
+ds_destroy();
+}
+
+static void
+test_lflow_cache_operations(struct ovs_cmdl_context *ctx)
+{
+struct lflow_cache *lc = lflow_cache_create();
+struct expr *e = expr_create_boolean(true);
+bool enabled = !strcmp(ctx->argv[1], "true");
+unsigned int shift = 2;
+unsigned int n_ops;
+
+lflow_cache_enable(lc, enabled);
+test_lflow_cache_stats__(lc);
+
+if (!test_read_uint_value(ctx, shift++, "n_ops", _ops)) {
+goto done;
+}
+
+for (unsigned int i = 0; i < n_ops; i++) {
+const char *op = test_read_value(ctx, shift++, "op");
+
+if (!op) {
+goto done;
+}
+
+struct uuid lflow_uuid;
+uuid_generate(_uuid);
+
+if (!strcmp(op, "add")) {
+const char *op_type = test_read_value(ctx, shift++, "op_type");
+if (!op_type) {
+goto done;
+}
+
+unsigned int conj_id_ofs;
+if (!test_read_uint_value(ctx, shift++, "conj-id-ofs",
+  _id_ofs)) {
+goto done;
+}
+
+test_lflow_cache_add__(lc, op_type, _uuid, conj_id_ofs, e);
+test_lflow_cache_lookup__(lc, 

[ovs-dev] [PATCH ovn v3 04/10] lflow-cache: Add lflow-cache/show-stats command.

2021-02-09 Thread Dumitru Ceara
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3:
- change lflow_cache_get_stats() to populate an output string.
- document lflow-cache/show-stats command.
---
 controller/lflow-cache.c|   27 +++
 controller/lflow-cache.h|2 ++
 controller/ovn-controller.8.xml |6 ++
 controller/ovn-controller.c |   17 +
 4 files changed, 52 insertions(+)

diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index e12c69d..185f540 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -21,6 +21,12 @@
 #include "lib/uuid.h"
 #include "ovn/expr.h"
 
+static const char *lflow_cache_type_names[LCACHE_T_MAX] = {
+[LCACHE_T_CONJ_ID] = "cache-conj-id",
+[LCACHE_T_EXPR]= "cache-expr",
+[LCACHE_T_MATCHES] = "cache-matches",
+};
+
 struct lflow_cache {
 struct hmap entries[LCACHE_T_MAX];
 bool enabled;
@@ -103,6 +109,27 @@ lflow_cache_is_enabled(const struct lflow_cache *lc)
 }
 
 void
+lflow_cache_get_stats(const struct lflow_cache *lc, struct ds *output)
+{
+if (!output) {
+return;
+}
+
+if (!lc) {
+ds_put_cstr(output, "Invalid arguments.");
+return;
+}
+
+ds_put_format(output, "Enabled: %s\n",
+  lflow_cache_is_enabled(lc) ? "true" : "false");
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+ds_put_format(output, "%-16s: %"PRIuSIZE"\n",
+  lflow_cache_type_names[i],
+  hmap_count(>entries[i]));
+}
+}
+
+void
 lflow_cache_add_conj_id(struct lflow_cache *lc, const struct uuid *lflow_uuid,
 uint32_t conj_id_ofs)
 {
diff --git a/controller/lflow-cache.h b/controller/lflow-cache.h
index dce8341..03a64f6 100644
--- a/controller/lflow-cache.h
+++ b/controller/lflow-cache.h
@@ -18,6 +18,7 @@
 #ifndef LFLOW_CACHE_H
 #define LFLOW_CACHE_H 1
 
+#include "openvswitch/dynamic-string.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/uuid.h"
 
@@ -56,6 +57,7 @@ void lflow_cache_flush(struct lflow_cache *);
 void lflow_cache_destroy(struct lflow_cache *);
 void lflow_cache_enable(struct lflow_cache *, bool enabled);
 bool lflow_cache_is_enabled(const struct lflow_cache *);
+void lflow_cache_get_stats(const struct lflow_cache *, struct ds *output);
 
 void lflow_cache_add_conj_id(struct lflow_cache *,
  const struct uuid *lflow_uuid,
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 23ceb86..1fa7af4 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -549,6 +549,12 @@
   
 Flushes the ovn-controller logical flow cache.
   
+
+  lflow-cache/show-stats
+  
+Displays logical flow cache statistics: enabled/disabled, per cache
+type entry counts.
+  
   
 
 
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index ae458f0..c77cfcd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -82,6 +82,7 @@ static unixctl_cb_func debug_pause_execution;
 static unixctl_cb_func debug_resume_execution;
 static unixctl_cb_func debug_status_execution;
 static unixctl_cb_func lflow_cache_flush_cmd;
+static unixctl_cb_func lflow_cache_show_stats_cmd;
 static unixctl_cb_func debug_delay_nb_cfg_report;
 
 #define DEFAULT_BRIDGE_NAME "br-int"
@@ -2666,6 +2667,9 @@ main(int argc, char *argv[])
 unixctl_command_register("flush-lflow-cache", "[deprecated]", 0, 0,
  lflow_cache_flush_cmd,
  _output_data->pd);
+unixctl_command_register("lflow-cache/show-stats", "", 0, 0,
+ lflow_cache_show_stats_cmd,
+ _output_data->pd);
 
 bool reset_ovnsb_idl_min_index = false;
 unixctl_command_register("sb-cluster-state-reset", "", 0, 0,
@@ -3274,6 +3278,19 @@ lflow_cache_flush_cmd(struct unixctl_conn *conn 
OVS_UNUSED,
 }
 
 static void
+lflow_cache_show_stats_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED,
+   const char *argv[] OVS_UNUSED, void *arg_)
+{
+struct flow_output_persistent_data *fo_pd = arg_;
+struct lflow_cache *lc = fo_pd->lflow_cache;
+struct ds ds = DS_EMPTY_INITIALIZER;
+
+lflow_cache_get_stats(lc, );
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
+static void
 cluster_state_reset_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED,
const char *argv[] OVS_UNUSED, void *idl_reset_)
 {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 03/10] lflow-cache: Move the lflow cache to its own module.

2021-02-09 Thread Dumitru Ceara
This abstracts the implementation details of the ovn-controller logical
flow cache and also has refactors how the cache is being used allowing
further commits to expand its functionality.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- pass 'struct uuid lflow_uuid *' instead of 'struct sbrec_logical_flow *'
  to all lflow-cache APIs.
---
 controller/automake.mk  |2 
 controller/lflow-cache.c|  222 
 controller/lflow-cache.h|   73 +
 controller/lflow.c  |  338 ++-
 controller/lflow.h  |8 -
 controller/ovn-controller.c |   57 +++
 lib/expr.c  |4 +
 7 files changed, 437 insertions(+), 267 deletions(-)
 create mode 100644 controller/lflow-cache.c
 create mode 100644 controller/lflow-cache.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 480578e..75119a6 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -14,6 +14,8 @@ controller_ovn_controller_SOURCES = \
controller/ip-mcast.h \
controller/lflow.c \
controller/lflow.h \
+   controller/lflow-cache.c \
+   controller/lflow-cache.h \
controller/lport.c \
controller/lport.h \
controller/ofctrl.c \
diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
new file mode 100644
index 000..e12c69d
--- /dev/null
+++ b/controller/lflow-cache.c
@@ -0,0 +1,222 @@
+/*
+ * Copyright (c) 2015, 2016 Nicira, Inc.
+ * Copyright (c) 2021, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "lflow-cache.h"
+#include "lib/uuid.h"
+#include "ovn/expr.h"
+
+struct lflow_cache {
+struct hmap entries[LCACHE_T_MAX];
+bool enabled;
+};
+
+struct lflow_cache_entry {
+struct hmap_node node;
+struct uuid lflow_uuid; /* key */
+
+struct lflow_cache_value value;
+};
+
+static struct lflow_cache_value *lflow_cache_add__(
+struct lflow_cache *lc, const struct uuid *lflow_uuid,
+enum lflow_cache_type type);
+static void lflow_cache_delete__(struct lflow_cache *lc,
+ struct lflow_cache_entry *lce);
+
+struct lflow_cache *
+lflow_cache_create(void)
+{
+struct lflow_cache *lc = xmalloc(sizeof *lc);
+
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+hmap_init(>entries[i]);
+}
+
+lc->enabled = true;
+return lc;
+}
+
+void
+lflow_cache_flush(struct lflow_cache *lc)
+{
+if (!lc) {
+return;
+}
+
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+struct lflow_cache_entry *lce;
+struct lflow_cache_entry *lce_next;
+
+HMAP_FOR_EACH_SAFE (lce, lce_next, node, >entries[i]) {
+lflow_cache_delete__(lc, lce);
+}
+}
+}
+
+void
+lflow_cache_destroy(struct lflow_cache *lc)
+{
+if (!lc) {
+return;
+}
+
+lflow_cache_flush(lc);
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+hmap_destroy(>entries[i]);
+}
+free(lc);
+}
+
+void
+lflow_cache_enable(struct lflow_cache *lc, bool enabled)
+{
+if (!lc) {
+return;
+}
+
+if (lc->enabled && !enabled) {
+lflow_cache_flush(lc);
+}
+lc->enabled = enabled;
+}
+
+bool
+lflow_cache_is_enabled(const struct lflow_cache *lc)
+{
+return lc && lc->enabled;
+}
+
+void
+lflow_cache_add_conj_id(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+uint32_t conj_id_ofs)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_CONJ_ID);
+
+if (!lcv) {
+return;
+}
+lcv->conj_id_ofs = conj_id_ofs;
+}
+
+void
+lflow_cache_add_expr(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+ uint32_t conj_id_ofs, struct expr *expr)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_EXPR);
+
+if (!lcv) {
+expr_destroy(expr);
+return;
+}
+lcv->conj_id_ofs = conj_id_ofs;
+lcv->expr = expr;
+}
+
+void
+lflow_cache_add_matches(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+struct hmap *matches)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_MATCHES);
+
+if (!lcv) {
+expr_matches_destroy(matches);
+free(matches);
+return;
+}
+lcv->expr_matches = matches;
+}
+
+struct 

[ovs-dev] [PATCH v3 ovn 00/10] ovn-controller: Make lflow cache size configurable.

2021-02-09 Thread Dumitru Ceara
Scale tests have identified the lflow cache to be one of the main memory
consumers in ovn-controller.  This series refactors the lflow cache code
and adds configuration knobs to limit the size (in lines and/or memory)
of the cache.

Patches 1 and 6 fix issues with the already existing lflow cache code.
Even though patch 6 is a bug fix, it's easier to add it later in the
series because it uses the new lflow cache statistics (from patch 4)
to add a unit test that exercises the buggy scenario.

Changes in v3:
- Addressed Mark and Numan's comments (individual changes listed in
  each patch).
- Added acks where applicable.
Changes in v2:
- Added two bug fixes for already existing problems (patches 1 and 6).
- Added unit tests as requested by Mark.
- Added support for evicting "less important" entries when the cache
  limit is reached.
- Improved cache entries memory accounting.

Dumitru Ceara (10):
  lflow: Fix cache update when I-P engine aborts.
  lflow: Refactor convert_match_to_expr() to explicitly consume prereqs.
  lflow-cache: Move the lflow cache to its own module.
  lflow-cache: Add lflow-cache/show-stats command.
  lflow-cache: Add unit tests.
  lflow: Do not cache non-conjunctive flows that use address 
sets/portgroups.
  lflow-cache: Add coverage counters.
  lflow-cache: Reclaim heap memory after cache flush.
  lflow-cache: Make maximum number of cache entries configurable.
  lflow-cache: Make max cache memory usage configurable.


 NEWS|5 
 configure.ac|1 
 controller/automake.mk  |2 
 controller/chassis.c|   44 
 controller/lflow-cache.c|  371 
 controller/lflow-cache.h|   81 
 controller/lflow.c  |  378 +---
 controller/lflow.h  |   10 -
 controller/ovn-controller.8.xml |   34 +++
 controller/ovn-controller.c |  106 +++---
 controller/test-lflow-cache.c   |  238 +++
 controller/test-ofctrl-seqno.c  |   18 --
 include/ovn/expr.h  |3 
 lib/expr.c  |   43 
 tests/automake.mk   |8 +
 tests/ovn-lflow-cache.at|  405 +++
 tests/ovn.at|   82 
 tests/test-utils.c  |   49 +
 tests/test-utils.h  |   26 +++
 tests/testsuite.at  |1 
 20 files changed, 1597 insertions(+), 308 deletions(-)
 create mode 100644 controller/lflow-cache.c
 create mode 100644 controller/lflow-cache.h
 create mode 100644 controller/test-lflow-cache.c
 create mode 100644 tests/ovn-lflow-cache.at
 create mode 100644 tests/test-utils.c
 create mode 100644 tests/test-utils.h

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 02/10] lflow: Refactor convert_match_to_expr() to explicitly consume prereqs.

2021-02-09 Thread Dumitru Ceara
It was (and still is) the responsibility of the caller of
convert_match_to_expr() to explicitly free any 'prereqs' expression that
was passed as argument if the expression parsing of the 'lflow' match failed.

However, convert_match_to_expr() now updates the value of '*prereqs' setting
it to NULL in the successful case.  This makes it easier for callers of the
function because 'expr_destroy(prereqs)' can now be called unconditionally.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
Note: This patch doesn't change the callers of convert_match_to_expr() to
take advantage of the new semantic but following patches do.
---
 controller/lflow.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 02a4480..340f1f0 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -758,11 +758,12 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 /* Converts the match and returns the simplified expr tree.
  *
  * The caller should evaluate the conditions and normalize the expr tree.
+ * If parsing is successful, '*prereqs' is also consumed.
  */
 static struct expr *
 convert_match_to_expr(const struct sbrec_logical_flow *lflow,
   const struct sbrec_datapath_binding *dp,
-  struct expr *prereqs,
+  struct expr **prereqs,
   const struct shash *addr_sets,
   const struct shash *port_groups,
   struct lflow_resource_ref *lfrr,
@@ -795,8 +796,9 @@ convert_match_to_expr(const struct sbrec_logical_flow 
*lflow,
 sset_destroy(_groups_ref);
 
 if (!error) {
-if (prereqs) {
-e = expr_combine(EXPR_T_AND, e, prereqs);
+if (*prereqs) {
+e = expr_combine(EXPR_T_AND, e, *prereqs);
+*prereqs = NULL;
 }
 e = expr_annotate(e, , );
 }
@@ -854,7 +856,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 .n_tables = LOG_PIPELINE_LEN,
 .cur_ltable = lflow->table_id,
 };
-struct expr *prereqs;
+struct expr *prereqs = NULL;
 char *error;
 
 error = ovnacts_parse_string(lflow->actions, , , );
@@ -885,7 +887,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 struct expr *expr = NULL;
 if (!l_ctx_out->lflow_cache_map) {
 /* Caching is disabled. */
-expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets,
+expr = convert_match_to_expr(lflow, dp, , l_ctx_in->addr_sets,
  l_ctx_in->port_groups, l_ctx_out->lfrr,
  NULL);
 if (!expr) {
@@ -949,7 +951,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 
 bool pg_addr_set_ref = false;
 if (!expr) {
-expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets,
+expr = convert_match_to_expr(lflow, dp, , l_ctx_in->addr_sets,
  l_ctx_in->port_groups, l_ctx_out->lfrr,
  _addr_set_ref);
 if (!expr) {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 01/10] lflow: Fix cache update when I-P engine aborts.

2021-02-09 Thread Dumitru Ceara
If the I-P engine aborts in the middle of a run, the 'flow_output' node
change handlers or run callback might not be called.

As a side effect this may cause that Logical_Flow IDL tracked changes
are not processed during the iteration.  As a consequence, if a
Logical_Flow was removed from the Southound DB, then its associated
cache entry (if any) will not be flushed.

This has two side effects:
1. Stale entries are kept in the cache until a full recompute or cache
   flush happens.
2. If a new Logical_Flow is added to the Southbound and it happens to
   have a UUID that matches one of a stale cache entry then
   ovn-controller will install incorrect openflows.

IDL tracked changes are cleared at every iteration of ovn-controller.
Skipping the ovsdb_idl_track_clear() call if the I-P engine aborted is
not a valid option for now because it might lead to some of the IDL
changes to be processed twice.

Instead, lflow_handle_cached_flows() is called now at every iteration
of ovn-controller making sure deleted flows are removed from the cache.

Also, rename the 'flush-lflow-cache' unixctl command to
'lflow-cache/flush' to better match the style of other command names.
The old 'flush-lflow-cache' command is kept (as deprecated) to avoid
breaking existing CMS scripts.

Fixes: 2662498bfd13 ("ovn-controller: Persist the conjunction ids allocated for 
conjuctive matches.")
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 

---
v3:
- keep 'flush-lflow-cache' to avoid breaking CMS scripts.
- add man page entry for 'lflow-cache/flush'
---
 controller/lflow.c  |   30 +-
 controller/lflow.h  |4 +++-
 controller/ovn-controller.8.xml |5 +
 controller/ovn-controller.c |   24 
 tests/ovn.at|2 +-
 5 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 2b7d356..02a4480 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1583,19 +1583,6 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct 
lflow_ctx_out *l_ctx_out)
 {
 COVERAGE_INC(lflow_run);
 
-/* when lflow_run is called, it's possible that some of the logical flows
- * are deleted. We need to delete the lflow cache for these
- * lflows (if present), otherwise, they will not be deleted at all. */
-if (l_ctx_out->lflow_cache_map) {
-const struct sbrec_logical_flow *lflow;
-SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_TRACKED (lflow,
-l_ctx_in->logical_flow_table) {
-if (sbrec_logical_flow_is_deleted(lflow)) {
-lflow_cache_delete(l_ctx_out->lflow_cache_map, lflow);
-}
-}
-}
-
 add_logical_flows(l_ctx_in, l_ctx_out);
 add_neighbor_flows(l_ctx_in->sbrec_port_binding_by_name,
l_ctx_in->mac_binding_table, l_ctx_in->local_datapaths,
@@ -1604,6 +1591,23 @@ lflow_run(struct lflow_ctx_in *l_ctx_in, struct 
lflow_ctx_out *l_ctx_out)
  l_ctx_out->flow_table);
 }
 
+/* Should be called at every ovn-controller iteration before IDL tracked
+ * changes are cleared to avoid maintaining cache entries for flows that
+ * don't exist anymore.
+ */
+void
+lflow_handle_cached_flows(struct hmap *lflow_cache_map,
+  const struct sbrec_logical_flow_table *flow_table)
+{
+const struct sbrec_logical_flow *lflow;
+
+SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_TRACKED (lflow, flow_table) {
+if (sbrec_logical_flow_is_deleted(lflow)) {
+lflow_cache_delete(lflow_cache_map, lflow);
+}
+}
+}
+
 void
 lflow_destroy(void)
 {
diff --git a/controller/lflow.h b/controller/lflow.h
index ba79cc3..cf4f0e8 100644
--- a/controller/lflow.h
+++ b/controller/lflow.h
@@ -156,7 +156,9 @@ struct lflow_ctx_out {
 };
 
 void lflow_init(void);
-void  lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *);
+void lflow_run(struct lflow_ctx_in *, struct lflow_ctx_out *);
+void lflow_handle_cached_flows(struct hmap *lflow_cache,
+   const struct sbrec_logical_flow_table *);
 bool lflow_handle_changed_flows(struct lflow_ctx_in *, struct lflow_ctx_out *);
 bool lflow_handle_changed_ref(enum ref_type, const char *ref_name,
   struct lflow_ctx_in *, struct lflow_ctx_out *,
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 29833c7..23ceb86 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -544,6 +544,11 @@
 end-to-end latency in a large scale environment.  See
 ovn-nbctl(8) for more details.
   
+
+  lflow-cache/flush
+  
+Flushes the ovn-controller logical flow cache.
+  
   
 
 
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index ef3e0e9..0f54ccd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ 

[ovs-dev] [PATCH] rhel: Do not update kmod RPM newer major revision kernels

2021-02-09 Thread Greg Rose
The ovs-kmod-manage.sh script will run weak-updates even on newer
release kernels installing a non-compatible or un-runnable kernel
module.

Update the script to never install weak-updates onto kernels with
newer major release versions.

VMware-BZ: #2717283
Signed-off-by: Greg Rose 
---
 rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh 
b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
index f147857e4..66b09472a 100644
--- a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
+++ b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
@@ -155,6 +155,16 @@ kmod_versions=()
 kversion=$(rpm -ql ${rpmname} | grep '\.ko$' | \
sed -n -e 's/^\/lib\/modules\/\(.*\)\/extra\/.*$/\1/p' | \
sort | uniq)
+
+IFS='.\|-' read installed_major installed_minor installed_patch \
+installed_major_rev installed_minor_rev installed_extra <<<"${kversion}"
+
+if [ "$installed_major_rev" -lt "$major_rev" ]; then
+echo Not installing RPM with major revision "$installed_major_rev" \
+to kernel with greater major revision "$major_rev.  Exiting"
+exit 1
+fi
+
 for kv in $kversion; do
 IFS='.\|-' read -r -a kv_nums <<<"${kv}"
 kmod_versions+=(${kv_nums[$ver_offset]})
-- 
2.17.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] Add weekly CI job that uses OVS master.

2021-02-09 Thread Mark Michelson

On 2/4/21 2:01 PM, Dumitru Ceara wrote:

On 1/28/21 9:35 PM, Mark Michelson wrote:

Signed-off-by: Mark Michelson 
---
This patch is based on the "Include OVS as a git submodule." patch,
which, at this time has not been merged into OVN master yet.
---
  .github/workflows/test.yml | 37 -
  1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index 916b14da2..f3a53a8b6 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -1,6 +1,11 @@
  name: Build and Test
-on: [push, pull_request]
+on:
+  push:
+  pull_request:
+  schedule:
+    # Run Sunday at midnight
+    - cron: '0 0 * * 0'
  jobs:
    build-linux:
@@ -49,10 +54,26 @@ jobs:
  steps:
  - name: checkout
+  if: github.event_name == 'push' || github.event_name == 
'pull_request'

    uses: actions/checkout@v2
    with:
  submodules: recursive
+    # For weekly runs, don't update submodules
+    - name: checkout without submodule
+  if: github.event_name == 'schedule'
+  uses: actions/checkout@v2
+
+    # Weekly runs test using OVS master instead of the
+    # submodule.
+    - name: checkout OVS master
+  if: github.event_name == 'schedule'
+  uses: actions/checkout@v2
+  with:
+    repository: 'openvswitch/ovs'
+    path: 'ovs'
+    ref: 'master'
+
  - name: update APT cache
    run:  sudo apt update
@@ -105,9 +126,23 @@ jobs:
  steps:
  - name: checkout
+  if: github.event_name == 'push' || github.event_name == 
'pull_request'

    uses: actions/checkout@v2
    with:
  submodules: recursive
+    # For weekly runs, don't update submodules
+    - name: checkout without submodule
+  if: github.event_name == 'schedule'
+  uses: actions/checkout@v2
+    # Weekly runs test using OVS master instead of the
+    # submodule.
+    - name: checkout OVS master
+  if: github.event_name == 'schedule'
+  uses: actions/checkout@v2
+  with:
+    repository: 'openvswitch/ovs'
+    path: 'ovs'
+    ref: 'master'
  - name: install dependencies
    run:  brew install automake libtool
  - name: prepare



Hi Mark,

I didn't try out the schedule part but the change looks good to me. 
Github CI on "push" passes fine.


Acked-by: Dumitru Ceara 

Thanks,
Dumitru



I have merged this to master.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] Include OVS as a git submodule.

2021-02-09 Thread Mark Michelson

On 1/27/21 1:52 PM, Numan Siddique wrote:

On Tue, Jan 26, 2021 at 2:02 AM Dumitru Ceara  wrote:


On 1/25/21 7:39 PM, Mark Michelson wrote:

OVN developers have had isssues with the current method by which OVS
source code is used by OVN.

* There is no way to record the minimum commit/version of OVS to use
when compiling OVN.
* When debugging issues, bisecting OVN commits may also requires
simultaneously changing OVS commits. This makes for multiple moving
targets to try to track.
* Performance improvements made to OVS libraries and OVSDB may benefit
OVN. However, there's no way to encourage the use of the improved OVS
source.

By using a submodule, it allows for OVN to record a specific commit of
OVS that is expected to be used.

Signed-off-by: Mark Michelson 
---


This looks good to me, thanks!

Acked-by: Dumitru Ceara 


LGTM too.

Acked-by: Numan Siddique 

We would definitely need another job in CI to test with OVS master.
That can be a follow up patch.

Thanks
Numan


I have merged this to master.





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] northd: Provide the Gateway router option 'lb_force_snat_ip' to take router port ips.

2021-02-09 Thread numans
From: Numan Siddique 

When a Gateway router is configured with a load balancer
and it is also configured with options:lb_force_snat_ip=,
OVN after load balancing the destination IP to one of the
backend also does a NAT on the source ip with the
lb_force_snat_ip if the packet is destined to a load balancer
VIP.

There is a problem with the snat of source ip to 'lb_force_snat_ip'
in one particular usecase.  When the packet enters the Gateway router
from a provider logical switch destined to the load balancer VIP,
then it is first load balanced to one of the backend and then
the source ip is snatted to 'lb_force_snat_ip'.  If the chosen
backend is reachable via the provider logical switch, then the
packet is hairpinned back and it may hit the wire with
the source ip 'lb_force_snat_ip'.  If 'lb_force_snat_ip' happens
to be an OVN internal IP then the packet may be dropped.

This patch addresses this issue by providing the option to
set the option - 'lb_force_snat_ip=router_ip'.  If 'router_ip'
is set, then OVN will snat the load balanced packet to the
router ip of the logical router port which chosen as 'outport'
in lr_in_ip_routing stage.

Example.

If the gateway router is

ovn-nbctl show lr0
router 68f20092-5563-44b8-9ccb-b11de3e3a66c (lr0)
port lr0-sw0
mac: "00:00:00:00:ff:01"
networks: ["10.0.0.1/24"]
port lr0-public
mac: "00:00:20:20:12:13"
networks: ["172.168.0.100/24"]

Then the below logical flows are added if 'lb_force_snat_ip'
is configured to 'router_ip'.

table=1 (lr_out_snat), priority=110
   match=(flags.force_snat_for_lb == 1 && ip4 && outport == "lr0-public"),
   action=(ct_snat(172.168.0.100);)

table=1 (lr_out_snat), priority=110
   match=(flags.force_snat_for_lb == 1 && ip4 && outport == "lr0-sw0")
   action=(ct_snat(10.0.0.1);)

For the above described scenario, the packet will have source ip as
172.168.0.100 which belongs to the provider logical switch CIDR.

Reported-by: Tim Rozet 
Signed-off-by: Numan Siddique 
---
 northd/ovn-northd.8.xml | 35 ++
 northd/ovn-northd.c | 66 --
 tests/ovn-northd.at | 79 +
 3 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 70065a36d9..27b28aff93 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -3653,6 +3653,32 @@ nd_ns {
   flags.force_snat_for_dnat == 1  ip with an
   action ct_snat(B);.
 
+  
+
+  
+
+  If the Gateway router in the OVN Northbound database has been
+  configured to force SNAT a packet (that has been previously
+  load-balanced) using router IP (i.e :lb_force_snat_ip=router_ip), then for
+  each logical router port P attached to the Gateway
+  router, a priority-110 flow matches
+  flags.force_snat_for_lb == 1  outport == P
+   with an action ct_snat(R);
+  where R is the router port IP configured.
+  If R is an IPv4 address then the match will also
+  include ip4 and if it is an IPv6 address, then the
+  match will also include ip6.
+
+
+
+  If the logical router port P is configured with multiple
+  IPv4 and multiple IPv6 addresses, only the first IPv4 and first IPv6
+  address is considered.
+
+  
+
+  
 
   If the Gateway router in the OVN Northbound database has been
   configured to force SNAT a packet (that has been previously
@@ -3660,6 +3686,9 @@ nd_ns {
   flags.force_snat_for_lb == 1  ip with an
   action ct_snat(B);.
 
+  
+
+  
 
   For each configuration in the OVN Northbound database, that asks
   to change the source IP address of a packet from an IP address of
@@ -3673,14 +3702,18 @@ nd_ns {
   options, then the action would be ip4/6.src=
   (B).
 
+  
 
+  
 
   If the NAT rule has allowed_ext_ips configured, then
   there is an additional match ip4.dst == allowed_ext_ips
   . Similarly, for IPV6, match would be ip6.dst ==
   allowed_ext_ips.
 
+  
 
+  
 
   If the NAT rule has exempted_ext_ips set, then
   there is an additional flow configured at the priority + 1 of
@@ -3689,7 +3722,9 @@ nd_ns {
   . This flow is used to bypass the ct_snat action for a packet
   which is destinted to exempted_ext_ips.
 
+  
 
+  
 
   A priority-0 logical flow with match 1 has actions
   next;.
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index db6572a62b..ece158b71e 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -622,6 +622,7 @@ struct ovn_datapath {
 
 struct lport_addresses dnat_force_snat_addrs;
 struct lport_addresses lb_force_snat_addrs;
+bool 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-09 Thread Sriharsha Basavapatna via dev
On Tue, Feb 9, 2021 at 8:41 PM Eli Britstein  wrote:
>
>
> On 2/8/2021 6:21 PM, Sriharsha Basavapatna wrote:
> > On Mon, Feb 8, 2021 at 7:33 PM Eli Britstein  wrote:
> >>
> >> On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote:
> >>> On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein  wrote:
>  On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote:
> > On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
> >  wrote:
> >> On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  
> >> wrote:
> >>> VXLAN decap in OVS-DPDK configuration consists of two flows:
> >>> F1: in_port(ens1f0),eth(),ipv4(),udp(), 
> >>> actions:tnl_pop(vxlan_sys_4789)
> >>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >>>
> >>> F1 is a classification flow. It has outer headers matches and it
> >>> classifies the packet as a VXLAN packet, and using tnl_pop action the
> >>> packet continues processing in F2.
> >>> F2 is a flow that has matches on tunnel metadata as well as on the 
> >>> inner
> >>> packet headers (as any other flow).
> >>>
> >>> In order to fully offload VXLAN decap path, both F1 and F2 should be
> >>> offloaded. As there are more than one flow in HW, it is possible that
> >>> F1 is done by HW but F2 is not. Packet is received by SW, and should 
> >>> be
> >>> processed starting from F2 as F1 was already done by HW.
> >>> Rte_flows are applicable only on physical port IDs. Vport flows (e.g. 
> >>> F2)
> >>> are applied on uplink ports attached to OVS.
> >>>
> >>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds 
> >>> API
> >>> for tunnel offloads.
> >>>
> >>> Travis:
> >>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >>>
> >>> GitHub Actions:
> >>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >>>
> >>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >>>
> >>> Eli Britstein (13):
> >>>  netdev-offload: Add HW miss packet state recover API
> >>>  netdev-dpdk: Introduce DPDK tunnel APIs
> >>>  netdev-offload-dpdk: Implement flow dump create/destroy APIs
> >>>  netdev-dpdk: Add flow_api support for netdev vxlan vports
> >>>  netdev-offload-dpdk: Implement HW miss packet recover for vport
> >>>  dpif-netdev: Add HW miss packet state recover logic
> >>>  netdev-offload-dpdk: Change log rate limits
> >>>  netdev-offload-dpdk: Support tunnel pop action
> >>>  netdev-offload-dpdk: Refactor offload rule creation
> >>>  netdev-dpdk: Introduce an API to query if a dpdk port is an 
> >>> uplink
> >>>port
> >>>  netdev-offload-dpdk: Map netdev and ufid to offload objects
> >>>  netdev-offload-dpdk: Support vports flows offload
> >>>  netdev-dpdk-offload: Add vxlan pattern matching function
> >>>
> >>> Ilya Maximets (2):
> >>>  netdev-offload: Allow offloading to netdev without ifindex.
> >>>  netdev-offload: Disallow offloading to unrelated tunneling 
> >>> vports.
> >>>
> >>> Documentation/howto/dpdk.rst  |   1 +
> >>> NEWS  |   2 +
> >>> lib/dpif-netdev.c |  49 +-
> >>> lib/netdev-dpdk.c | 135 ++
> >>> lib/netdev-dpdk.h | 104 -
> >>> lib/netdev-offload-dpdk.c | 851 
> >>> +-
> >>> lib/netdev-offload-provider.h |   5 +
> >>> lib/netdev-offload-tc.c   |   8 +
> >>> lib/netdev-offload.c  |  29 +-
> >>> lib/netdev-offload.h  |   1 +
> >>> 10 files changed, 1033 insertions(+), 152 deletions(-)
> >>>
> >>> --
> >>> 2.28.0.546.g385c171
> >>>
> >> Hi Eli,
> >>
> >> Thanks for posting this new patchset to support tunnel decap action 
> >> offload.
> >>
> >> I haven't looked at the entire patchset yet. But I focused on the
> >> patches that introduce 1-to-many mapping between an OVS flow (f2) and
> >> HW offloaded flows.
> >>
> >> Here is a representation of the design proposed in this patchset. A
> >> flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
> >> the underlying uplink/physical port is P0, gets offloaded to not only
> >> P0, but also to other physical ports P1, P2... and so on.
> >>
> >>P0 <> VxLAN-vPort <> VFRep-1
> >>
> >>P1
> >>P2
> >>...
> >>Pn
> >>
> >> IMO, the problems with this design are:
> >>
> >> - Offloading a flow to an unrelated physical device that has nothing
> >> to do with that flow (invalid device for the flow).
> >> - Offloading to not just one, but several such invalid physical 
> >> devices.
> >> - Consuming HW resources 

Re: [ovs-dev] [PATCH 00/15] Netdev vxlan-decap offload

2021-02-09 Thread Eli Britstein



On 2/8/2021 6:21 PM, Sriharsha Basavapatna wrote:

On Mon, Feb 8, 2021 at 7:33 PM Eli Britstein  wrote:


On 2/8/2021 3:11 PM, Sriharsha Basavapatna wrote:

On Sun, Feb 7, 2021 at 4:58 PM Eli Britstein  wrote:

On 2/5/2021 8:26 PM, Sriharsha Basavapatna wrote:

On Fri, Feb 5, 2021 at 4:55 PM Sriharsha Basavapatna
 wrote:

On Wed, Jan 27, 2021 at 11:40 PM Eli Britstein  wrote:

VXLAN decap in OVS-DPDK configuration consists of two flows:
F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0

F1 is a classification flow. It has outer headers matches and it
classifies the packet as a VXLAN packet, and using tnl_pop action the
packet continues processing in F2.
F2 is a flow that has matches on tunnel metadata as well as on the inner
packet headers (as any other flow).

In order to fully offload VXLAN decap path, both F1 and F2 should be
offloaded. As there are more than one flow in HW, it is possible that
F1 is done by HW but F2 is not. Packet is received by SW, and should be
processed starting from F2 as F1 was already done by HW.
Rte_flows are applicable only on physical port IDs. Vport flows (e.g. F2)
are applied on uplink ports attached to OVS.

This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
for tunnel offloads.

Travis:
v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552

GitHub Actions:
v1: https://github.com/elibritstein/OVS/actions/runs/515334647

[1] https://mails.dpdk.org/archives/dev/2020-October/187314.html

Eli Britstein (13):
 netdev-offload: Add HW miss packet state recover API
 netdev-dpdk: Introduce DPDK tunnel APIs
 netdev-offload-dpdk: Implement flow dump create/destroy APIs
 netdev-dpdk: Add flow_api support for netdev vxlan vports
 netdev-offload-dpdk: Implement HW miss packet recover for vport
 dpif-netdev: Add HW miss packet state recover logic
 netdev-offload-dpdk: Change log rate limits
 netdev-offload-dpdk: Support tunnel pop action
 netdev-offload-dpdk: Refactor offload rule creation
 netdev-dpdk: Introduce an API to query if a dpdk port is an uplink
   port
 netdev-offload-dpdk: Map netdev and ufid to offload objects
 netdev-offload-dpdk: Support vports flows offload
 netdev-dpdk-offload: Add vxlan pattern matching function

Ilya Maximets (2):
 netdev-offload: Allow offloading to netdev without ifindex.
 netdev-offload: Disallow offloading to unrelated tunneling vports.

Documentation/howto/dpdk.rst  |   1 +
NEWS  |   2 +
lib/dpif-netdev.c |  49 +-
lib/netdev-dpdk.c | 135 ++
lib/netdev-dpdk.h | 104 -
lib/netdev-offload-dpdk.c | 851 +-
lib/netdev-offload-provider.h |   5 +
lib/netdev-offload-tc.c   |   8 +
lib/netdev-offload.c  |  29 +-
lib/netdev-offload.h  |   1 +
10 files changed, 1033 insertions(+), 152 deletions(-)

--
2.28.0.546.g385c171


Hi Eli,

Thanks for posting this new patchset to support tunnel decap action offload.

I haven't looked at the entire patchset yet. But I focused on the
patches that introduce 1-to-many mapping between an OVS flow (f2) and
HW offloaded flows.

Here is a representation of the design proposed in this patchset. A
flow f2 (inner flow) between the VxLAN-vPort and VFRep-1, for which
the underlying uplink/physical port is P0, gets offloaded to not only
P0, but also to other physical ports P1, P2... and so on.

   P0 <> VxLAN-vPort <> VFRep-1

   P1
   P2
   ...
   Pn

IMO, the problems with this design are:

- Offloading a flow to an unrelated physical device that has nothing
to do with that flow (invalid device for the flow).
- Offloading to not just one, but several such invalid physical devices.
- Consuming HW resources for a flow that is never seen or intended to
be processed by those physical devices.
- Impacts flow scale on other physical devices, since it would consume
their HW resources with a large number of such invalid flows.
- The indirect list used to track these multiple mappings complicates
the offload layer implementation.
- The addition of flow_dump_create() to offload APIs, just to parse
and get a list of user datapath netdevs is confusing and not needed.

I have been exploring an alternate design to address this problem of
figuring out the right physical device for a given tunnel inner-flow.
I will send a patch, please take a look so we can continue the discussion.

I just posted this patch, please see the link below; this is currently
based on the decap offload patchset (but it can be rebased if needed,
without the decap patchset too). The patch provides changes to pass
physical port (orig_in_port) information to the offload layer in the
context of flow F2. Note that additional changes would be needed in
the decap patchset to utilize this, if we 

Re: [ovs-dev] [PATCH ovn] expr: Combine multiple ipv4 with wildcard mask.

2021-02-09 Thread 陈供明
Hi Dumitru,
Thanks for review.

On 2021/2/8, 6:19 PM, "Dumitru Ceara"  wrote:

On 2/7/21 1:25 PM, gmingchen(陈供明) wrote:
> From: Gongming Chen 

Hi Gongming,

First of all, thanks for the contribution!

This is not a full review, just some comments for now.

It seems that there's a memory leak with your patch applied. 
AddressSanitizer reports:

Direct leak of 12 byte(s) in 1 object(s) allocated from:
 #0 0x49644d in malloc 
(/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/tests/ovstest+0x49644d)
 #1 0x538604 in xmalloc 
/home/runner/work/ovn/ovn/ovs_src/lib/util.c:138:15
 #2 0x62636f in expr_const_sets_add 
/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../lib/expr.c:1237:18
 #3 0x4cf8f6 in create_addr_sets 

/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:230:5
 #4 0x4cf8f6 in test_parse_expr__ 

/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:296:5
 #5 0x4dfd04 in ovs_cmdl_run_command__ 
/home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17
 #6 0x4c6810 in test_ovn_main 

/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1635:5
 #7 0x4c6810 in ovstest_wrapper_test_ovn_main__ 

/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/test-ovn.c:1638:1
 #8 0x4dfd04 in ovs_cmdl_run_command__ 
/home/runner/work/ovn/ovn/ovs_src/lib/command-line.c:247:17
 #9 0x4c5fe3 in main 

/home/runner/work/ovn/ovn/ovn-20.12.90/_build/sub/../../tests/ovstest.c:150:9
 #10 0x7f71aa6ddbf6 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x21bf6)

Full reports are available in the ovsrobot OVN github CI run artifacts:
https://github.com/ovsrobot/ovn/actions/runs/545416492

Just a note, if you push the branch to your own fork it will trigger the 
github action to run CI and the "linux clang test asan" job will also 
enable AddressSanitizer.

Thanks, there are really missing the free ip_data.ip, resulting in a memory 
leak.

There are also a few style related issues (e.g., sizeof args), please 
see the guidelines here:


https://github.com/ovn-org/ovn/blob/master/Documentation/internals/contributing/coding-style.rst

Sorry, does this mean that
ip_r_data->ip = xmalloc(4 * sizeof(uint32_t))
should be replaced with
ip_r_data->ip = xmalloc(4 * sizeof ip_r_data->ip)?

> 
> In the ovn security group, each host ip corresponds to at least 4 flow
> tables (different connection states). As the scale of hosts using the
> security group increases, the ovs security group flow table will
> increase sharply, especially when it is applied  the remote group
> feature in OpenStack.
> 
> This patch merges ipv4 addresses with wildcard masks, and replaces this
> ipv4 addresses with the merged ip/mask. This will greatly reduce the
> entries in the ovs security group flow table, especially when the host
> size is large. After being used in a production environment, the number
> of ovs flow tables will be reduced by at least 50% in most scenarios,
> when the remote group in OpenStack is applied.

I think it would be great to describe the algorithm here, in the commit 
log, but also in the comments in the code.

You are right, I will resubmit and add the description of the algorithm to the
commit log and the code comments.

> 
> Analysis in the simplest scenario, a network 1.1.1.0/24 network, enable
> the OpenStack security group remote group feature, create 253 virtual
> machine ports(1.1.1.2-1.1.1.254).
> 
> Only focus on the number of ip addresses, in the table=44 table:
> ./configure --disable-combine-ipv4:
> 1.1.1.2-1.1.1.254(253 flow meters) * 4(connection status) *
> 1(local net of localport) = 1012
> 
> ./configure --enable-combine-ipv4(default):
> 1.1.1.2/31
> 1.1.1.4/30
> 1.1.1.8/29
> 1.1.1.16/28
> 1.1.1.32/27
> 1.1.1.64/26
> 1.1.1.128/26
> 1.1.1.192/27
> 1.1.1.224/28
> 1.1.1.240/29
> 1.1.1.248/30
> 1.1.1.252/31
> 1.1.1.254
> 13 flow tables * 4(connection status) * 1(local net of localport) = 52
> 
> Reduced from 1012 flow meters to 52, a 19.4 times reduction.
> 
> Some scenes are similar to the following:
> 1.1.1.2, 1.1.1.6
> After the combine:
> 1.1.1.2/255.255.255.251
> This will slightly increase the difficulty of finding the flow table
> corresponding to a single address.
> such as:
> ovs-ofctl dump-flows br-int | grep 1.1.1.6
> The result is empty.
> 1.1.1.6 will match 1.1.1.2/255.255.255.251

Would it make sense and potentially make the life of the users easier by 
also adding a utility to automatically do the IP/MASK match?

I'm thinking of something like:
$ ovs-ofctl dump-flows br-int | ovn-match-ip 1.1.1.6
[..]

I 

[ovs-dev] [PATCH ovn v3 09/10] lflow-cache: Make maximum number of cache entries configurable.

2021-02-09 Thread Dumitru Ceara
Add a new OVS external-id, "ovn-limit-lflow-cache", through which users
can specify the maximum size of the ovn-controller logical flow cache.

To maintain backwards compatibility the default behavior is to not
enforce any limit on the size of the cache.

When the cache becomes full, the rule is to prefer more "important"
cache entries over less "important" ones.  That is, evict entries of
type LCACHE_T_CONJ_ID if there's no room to add an entry of type
LCACHE_T_EXPR.  Similarly, evict entries of type LCACHE_T_CONJ_ID or
LCACHE_T_EXPR if there's no room to add an entry of type
LCACHE_T_MATCHES.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- add comment to lflow_cache_make_room__().
- update UT outputs.
---
 NEWS|3 +
 controller/chassis.c|   23 -
 controller/lflow-cache.c|   55 -
 controller/lflow-cache.h|2 -
 controller/ovn-controller.8.xml |   16 ++
 controller/ovn-controller.c |8 +++
 controller/test-lflow-cache.c   |   12 +++--
 tests/ovn-lflow-cache.at|  104 +++
 8 files changed, 213 insertions(+), 10 deletions(-)

diff --git a/NEWS b/NEWS
index 2044671..6e10557 100644
--- a/NEWS
+++ b/NEWS
@@ -16,6 +16,9 @@ Post-v20.12.0
   - Add a new option to Load_Balancer.options, "hairpin_snat_ip", to allow
 users to explicitly select which source IP should be used for load
 balancer hairpin traffic.
+  - ovn-controller: Add a configuration knob, through OVS external-id
+"ovn-limit-lflow-cache", to allow enforcing a limit for the size of the
+logical flow cache.
 
 OVN v20.12.0 - 18 Dec 2020
 --
diff --git a/controller/chassis.c b/controller/chassis.c
index 0937e33..1b63d8e 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -50,6 +50,7 @@ struct ovs_chassis_cfg {
 const char *monitor_all;
 const char *chassis_macs;
 const char *enable_lflow_cache;
+const char *limit_lflow_cache;
 
 /* Set of encap types parsed from the 'ovn-encap-type' external-id. */
 struct sset encap_type_set;
@@ -136,6 +137,12 @@ get_enable_lflow_cache(const struct smap *ext_ids)
 }
 
 static const char *
+get_limit_lflow_cache(const struct smap *ext_ids)
+{
+return smap_get_def(ext_ids, "ovn-limit-lflow-cache", "");
+}
+
+static const char *
 get_encap_csum(const struct smap *ext_ids)
 {
 return smap_get_def(ext_ids, "ovn-encap-csum", "true");
@@ -257,6 +264,7 @@ chassis_parse_ovs_config(const struct 
ovsrec_open_vswitch_table *ovs_table,
 ovs_cfg->monitor_all = get_monitor_all(>external_ids);
 ovs_cfg->chassis_macs = get_chassis_mac_mappings(>external_ids);
 ovs_cfg->enable_lflow_cache = get_enable_lflow_cache(>external_ids);
+ovs_cfg->limit_lflow_cache = get_limit_lflow_cache(>external_ids);
 
 if (!chassis_parse_ovs_encap_type(encap_type, _cfg->encap_type_set)) {
 return false;
@@ -284,13 +292,16 @@ chassis_build_other_config(struct smap *config, const 
char *bridge_mappings,
const char *datapath_type, const char *cms_options,
const char *monitor_all, const char *chassis_macs,
const char *iface_types,
-   const char *enable_lflow_cache, bool is_interconn)
+   const char *enable_lflow_cache,
+   const char *limit_lflow_cache,
+   bool is_interconn)
 {
 smap_replace(config, "ovn-bridge-mappings", bridge_mappings);
 smap_replace(config, "datapath-type", datapath_type);
 smap_replace(config, "ovn-cms-options", cms_options);
 smap_replace(config, "ovn-monitor-all", monitor_all);
 smap_replace(config, "ovn-enable-lflow-cache", enable_lflow_cache);
+smap_replace(config, "ovn-limit-lflow-cache", limit_lflow_cache);
 smap_replace(config, "iface-types", iface_types);
 smap_replace(config, "ovn-chassis-mac-mappings", chassis_macs);
 smap_replace(config, "is-interconn", is_interconn ? "true" : "false");
@@ -307,6 +318,7 @@ chassis_other_config_changed(const char *bridge_mappings,
  const char *monitor_all,
  const char *chassis_macs,
  const char *enable_lflow_cache,
+ const char *limit_lflow_cache,
  const struct ds *iface_types,
  bool is_interconn,
  const struct sbrec_chassis *chassis_rec)
@@ -346,6 +358,13 @@ chassis_other_config_changed(const char *bridge_mappings,
 return true;
 }
 
+const char *chassis_limit_lflow_cache =
+get_limit_lflow_cache(_rec->other_config);
+
+if (strcmp(limit_lflow_cache, chassis_limit_lflow_cache)) {
+return true;
+}
+
 const char *chassis_mac_mappings =
 

[ovs-dev] [PATCH ovn v3 07/10] lflow-cache: Add coverage counters.

2021-02-09 Thread Dumitru Ceara
These are available in ovn-controller logs or when explicitly requested
by users with "ovn-appctl -t ovn-controller coverage/show".

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
 controller/lflow-cache.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index 185f540..be764d5 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -17,10 +17,23 @@
 
 #include 
 
+#include "coverage.h"
 #include "lflow-cache.h"
 #include "lib/uuid.h"
 #include "ovn/expr.h"
 
+COVERAGE_DEFINE(lflow_cache_flush);
+COVERAGE_DEFINE(lflow_cache_add_conj_id);
+COVERAGE_DEFINE(lflow_cache_add_expr);
+COVERAGE_DEFINE(lflow_cache_add_matches);
+COVERAGE_DEFINE(lflow_cache_free_conj_id);
+COVERAGE_DEFINE(lflow_cache_free_expr);
+COVERAGE_DEFINE(lflow_cache_free_matches);
+COVERAGE_DEFINE(lflow_cache_add);
+COVERAGE_DEFINE(lflow_cache_hit);
+COVERAGE_DEFINE(lflow_cache_miss);
+COVERAGE_DEFINE(lflow_cache_delete);
+
 static const char *lflow_cache_type_names[LCACHE_T_MAX] = {
 [LCACHE_T_CONJ_ID] = "cache-conj-id",
 [LCACHE_T_EXPR]= "cache-expr",
@@ -65,6 +78,7 @@ lflow_cache_flush(struct lflow_cache *lc)
 return;
 }
 
+COVERAGE_INC(lflow_cache_flush);
 for (size_t i = 0; i < LCACHE_T_MAX; i++) {
 struct lflow_cache_entry *lce;
 struct lflow_cache_entry *lce_next;
@@ -139,6 +153,7 @@ lflow_cache_add_conj_id(struct lflow_cache *lc, const 
struct uuid *lflow_uuid,
 if (!lcv) {
 return;
 }
+COVERAGE_INC(lflow_cache_add_conj_id);
 lcv->conj_id_ofs = conj_id_ofs;
 }
 
@@ -153,6 +168,7 @@ lflow_cache_add_expr(struct lflow_cache *lc, const struct 
uuid *lflow_uuid,
 expr_destroy(expr);
 return;
 }
+COVERAGE_INC(lflow_cache_add_expr);
 lcv->conj_id_ofs = conj_id_ofs;
 lcv->expr = expr;
 }
@@ -169,6 +185,7 @@ lflow_cache_add_matches(struct lflow_cache *lc, const 
struct uuid *lflow_uuid,
 free(matches);
 return;
 }
+COVERAGE_INC(lflow_cache_add_matches);
 lcv->expr_matches = matches;
 }
 
@@ -186,10 +203,12 @@ lflow_cache_get(struct lflow_cache *lc, const struct uuid 
*lflow_uuid)
 
 HMAP_FOR_EACH_WITH_HASH (lce, node, hash, >entries[i]) {
 if (uuid_equals(>lflow_uuid, lflow_uuid)) {
+COVERAGE_INC(lflow_cache_hit);
 return >value;
 }
 }
 }
+COVERAGE_INC(lflow_cache_miss);
 return NULL;
 }
 
@@ -202,6 +221,7 @@ lflow_cache_delete(struct lflow_cache *lc, const struct 
uuid *lflow_uuid)
 
 struct lflow_cache_value *lcv = lflow_cache_get(lc, lflow_uuid);
 if (lcv) {
+COVERAGE_INC(lflow_cache_delete);
 lflow_cache_delete__(lc, CONTAINER_OF(lcv, struct lflow_cache_entry,
   value));
 }
@@ -217,6 +237,7 @@ lflow_cache_add__(struct lflow_cache *lc, const struct uuid 
*lflow_uuid,
 
 struct lflow_cache_entry *lce = xzalloc(sizeof *lce);
 
+COVERAGE_INC(lflow_cache_add);
 lce->lflow_uuid = *lflow_uuid;
 lce->value.type = type;
 hmap_insert(>entries[type], >node, uuid_hash(lflow_uuid));
@@ -236,11 +257,14 @@ lflow_cache_delete__(struct lflow_cache *lc, struct 
lflow_cache_entry *lce)
 OVS_NOT_REACHED();
 break;
 case LCACHE_T_CONJ_ID:
+COVERAGE_INC(lflow_cache_free_conj_id);
 break;
 case LCACHE_T_EXPR:
+COVERAGE_INC(lflow_cache_free_expr);
 expr_destroy(lce->value.expr);
 break;
 case LCACHE_T_MATCHES:
+COVERAGE_INC(lflow_cache_free_matches);
 expr_matches_destroy(lce->value.expr_matches);
 free(lce->value.expr_matches);
 break;

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 05/10] lflow-cache: Add unit tests.

2021-02-09 Thread Dumitru Ceara
Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- change calls to lflow-cache API to use new conventions.
- update UT outputs to match new lflow-cache stats outputs.
---
 controller/test-lflow-cache.c  |  223 +++
 controller/test-ofctrl-seqno.c |   18 ---
 tests/automake.mk  |8 +
 tests/ovn-lflow-cache.at   |  257 
 tests/ovn.at   |   68 +++
 tests/test-utils.c |   49 
 tests/test-utils.h |   26 
 tests/testsuite.at |1 
 8 files changed, 632 insertions(+), 18 deletions(-)
 create mode 100644 controller/test-lflow-cache.c
 create mode 100644 tests/ovn-lflow-cache.at
 create mode 100644 tests/test-utils.c
 create mode 100644 tests/test-utils.h

diff --git a/controller/test-lflow-cache.c b/controller/test-lflow-cache.c
new file mode 100644
index 000..79fe8c6
--- /dev/null
+++ b/controller/test-lflow-cache.c
@@ -0,0 +1,223 @@
+/* Copyright (c) 2021, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "lib/uuid.h"
+#include "ovn/expr.h"
+#include "tests/ovstest.h"
+#include "tests/test-utils.h"
+#include "util.h"
+
+#include "lflow-cache.h"
+
+static void
+test_lflow_cache_add__(struct lflow_cache *lc, const char *op_type,
+   const struct uuid *lflow_uuid,
+   unsigned int conj_id_ofs,
+   struct expr *e)
+{
+printf("ADD %s:\n", op_type);
+printf("  conj-id-ofs: %u\n", conj_id_ofs);
+
+if (!strcmp(op_type, "conj-id")) {
+lflow_cache_add_conj_id(lc, lflow_uuid, conj_id_ofs);
+} else if (!strcmp(op_type, "expr")) {
+lflow_cache_add_expr(lc, lflow_uuid, conj_id_ofs, expr_clone(e));
+} else if (!strcmp(op_type, "matches")) {
+struct hmap *matches = xmalloc(sizeof *matches);
+ovs_assert(expr_to_matches(e, NULL, NULL, matches) == 0);
+ovs_assert(hmap_count(matches) == 1);
+lflow_cache_add_matches(lc, lflow_uuid, matches);
+} else {
+OVS_NOT_REACHED();
+}
+}
+
+static void
+test_lflow_cache_lookup__(struct lflow_cache *lc,
+  const struct uuid *lflow_uuid)
+{
+struct lflow_cache_value *lcv = lflow_cache_get(lc, lflow_uuid);
+
+printf("LOOKUP:\n");
+if (!lcv) {
+printf("  not found\n");
+return;
+}
+
+printf("  conj_id_ofs: %"PRIu32"\n", lcv->conj_id_ofs);
+switch (lcv->type) {
+case LCACHE_T_CONJ_ID:
+printf("  type: conj-id\n");
+break;
+case LCACHE_T_EXPR:
+printf("  type: expr\n");
+break;
+case LCACHE_T_MATCHES:
+printf("  type: matches\n");
+break;
+case LCACHE_T_NONE:
+OVS_NOT_REACHED();
+break;
+}
+}
+
+static void
+test_lflow_cache_delete__(struct lflow_cache *lc,
+  const struct uuid *lflow_uuid)
+{
+printf("DELETE\n");
+lflow_cache_delete(lc, lflow_uuid);
+}
+
+static void
+test_lflow_cache_stats__(struct lflow_cache *lc)
+{
+struct ds ds = DS_EMPTY_INITIALIZER;
+
+lflow_cache_get_stats(lc, );
+printf("%s", ds_cstr());
+ds_destroy();
+}
+
+static void
+test_lflow_cache_operations(struct ovs_cmdl_context *ctx)
+{
+struct lflow_cache *lc = lflow_cache_create();
+struct expr *e = expr_create_boolean(true);
+bool enabled = !strcmp(ctx->argv[1], "true");
+unsigned int shift = 2;
+unsigned int n_ops;
+
+lflow_cache_enable(lc, enabled);
+test_lflow_cache_stats__(lc);
+
+if (!test_read_uint_value(ctx, shift++, "n_ops", _ops)) {
+goto done;
+}
+
+for (unsigned int i = 0; i < n_ops; i++) {
+const char *op = test_read_value(ctx, shift++, "op");
+
+if (!op) {
+goto done;
+}
+
+struct uuid lflow_uuid;
+uuid_generate(_uuid);
+
+if (!strcmp(op, "add")) {
+const char *op_type = test_read_value(ctx, shift++, "op_type");
+if (!op_type) {
+goto done;
+}
+
+unsigned int conj_id_ofs;
+if (!test_read_uint_value(ctx, shift++, "conj-id-ofs",
+  _id_ofs)) {
+goto done;
+}
+
+test_lflow_cache_add__(lc, op_type, _uuid, conj_id_ofs, e);
+test_lflow_cache_lookup__(lc, 

[ovs-dev] [PATCH ovn v3 04/10] lflow-cache: Add lflow-cache/show-stats command.

2021-02-09 Thread Dumitru Ceara
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3:
- change lflow_cache_get_stats() to populate an output string.
- document lflow-cache/show-stats command.
---
 controller/lflow-cache.c|   27 +++
 controller/lflow-cache.h|2 ++
 controller/ovn-controller.8.xml |6 ++
 controller/ovn-controller.c |   17 +
 4 files changed, 52 insertions(+)

diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
index e12c69d..185f540 100644
--- a/controller/lflow-cache.c
+++ b/controller/lflow-cache.c
@@ -21,6 +21,12 @@
 #include "lib/uuid.h"
 #include "ovn/expr.h"
 
+static const char *lflow_cache_type_names[LCACHE_T_MAX] = {
+[LCACHE_T_CONJ_ID] = "cache-conj-id",
+[LCACHE_T_EXPR]= "cache-expr",
+[LCACHE_T_MATCHES] = "cache-matches",
+};
+
 struct lflow_cache {
 struct hmap entries[LCACHE_T_MAX];
 bool enabled;
@@ -103,6 +109,27 @@ lflow_cache_is_enabled(const struct lflow_cache *lc)
 }
 
 void
+lflow_cache_get_stats(const struct lflow_cache *lc, struct ds *output)
+{
+if (!output) {
+return;
+}
+
+if (!lc) {
+ds_put_cstr(output, "Invalid arguments.");
+return;
+}
+
+ds_put_format(output, "Enabled: %s\n",
+  lflow_cache_is_enabled(lc) ? "true" : "false");
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+ds_put_format(output, "%-16s: %"PRIuSIZE"\n",
+  lflow_cache_type_names[i],
+  hmap_count(>entries[i]));
+}
+}
+
+void
 lflow_cache_add_conj_id(struct lflow_cache *lc, const struct uuid *lflow_uuid,
 uint32_t conj_id_ofs)
 {
diff --git a/controller/lflow-cache.h b/controller/lflow-cache.h
index dce8341..03a64f6 100644
--- a/controller/lflow-cache.h
+++ b/controller/lflow-cache.h
@@ -18,6 +18,7 @@
 #ifndef LFLOW_CACHE_H
 #define LFLOW_CACHE_H 1
 
+#include "openvswitch/dynamic-string.h"
 #include "openvswitch/hmap.h"
 #include "openvswitch/uuid.h"
 
@@ -56,6 +57,7 @@ void lflow_cache_flush(struct lflow_cache *);
 void lflow_cache_destroy(struct lflow_cache *);
 void lflow_cache_enable(struct lflow_cache *, bool enabled);
 bool lflow_cache_is_enabled(const struct lflow_cache *);
+void lflow_cache_get_stats(const struct lflow_cache *, struct ds *output);
 
 void lflow_cache_add_conj_id(struct lflow_cache *,
  const struct uuid *lflow_uuid,
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index 23ceb86..1fa7af4 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -549,6 +549,12 @@
   
 Flushes the ovn-controller logical flow cache.
   
+
+  lflow-cache/show-stats
+  
+Displays logical flow cache statistics: enabled/disabled, per cache
+type entry counts.
+  
   
 
 
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index ae458f0..c77cfcd 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -82,6 +82,7 @@ static unixctl_cb_func debug_pause_execution;
 static unixctl_cb_func debug_resume_execution;
 static unixctl_cb_func debug_status_execution;
 static unixctl_cb_func lflow_cache_flush_cmd;
+static unixctl_cb_func lflow_cache_show_stats_cmd;
 static unixctl_cb_func debug_delay_nb_cfg_report;
 
 #define DEFAULT_BRIDGE_NAME "br-int"
@@ -2666,6 +2667,9 @@ main(int argc, char *argv[])
 unixctl_command_register("flush-lflow-cache", "[deprecated]", 0, 0,
  lflow_cache_flush_cmd,
  _output_data->pd);
+unixctl_command_register("lflow-cache/show-stats", "", 0, 0,
+ lflow_cache_show_stats_cmd,
+ _output_data->pd);
 
 bool reset_ovnsb_idl_min_index = false;
 unixctl_command_register("sb-cluster-state-reset", "", 0, 0,
@@ -3274,6 +3278,19 @@ lflow_cache_flush_cmd(struct unixctl_conn *conn 
OVS_UNUSED,
 }
 
 static void
+lflow_cache_show_stats_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED,
+   const char *argv[] OVS_UNUSED, void *arg_)
+{
+struct flow_output_persistent_data *fo_pd = arg_;
+struct lflow_cache *lc = fo_pd->lflow_cache;
+struct ds ds = DS_EMPTY_INITIALIZER;
+
+lflow_cache_get_stats(lc, );
+unixctl_command_reply(conn, ds_cstr());
+ds_destroy();
+}
+
+static void
 cluster_state_reset_cmd(struct unixctl_conn *conn, int argc OVS_UNUSED,
const char *argv[] OVS_UNUSED, void *idl_reset_)
 {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3 03/10] lflow-cache: Move the lflow cache to its own module.

2021-02-09 Thread Dumitru Ceara
This abstracts the implementation details of the ovn-controller logical
flow cache and also has refactors how the cache is being used allowing
further commits to expand its functionality.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
v3:
- pass 'struct uuid lflow_uuid *' instead of 'struct sbrec_logical_flow *'
  to all lflow-cache APIs.
---
 controller/automake.mk  |2 
 controller/lflow-cache.c|  222 
 controller/lflow-cache.h|   73 +
 controller/lflow.c  |  338 ++-
 controller/lflow.h  |8 -
 controller/ovn-controller.c |   57 +++
 lib/expr.c  |4 +
 7 files changed, 437 insertions(+), 267 deletions(-)
 create mode 100644 controller/lflow-cache.c
 create mode 100644 controller/lflow-cache.h

diff --git a/controller/automake.mk b/controller/automake.mk
index 480578e..75119a6 100644
--- a/controller/automake.mk
+++ b/controller/automake.mk
@@ -14,6 +14,8 @@ controller_ovn_controller_SOURCES = \
controller/ip-mcast.h \
controller/lflow.c \
controller/lflow.h \
+   controller/lflow-cache.c \
+   controller/lflow-cache.h \
controller/lport.c \
controller/lport.h \
controller/ofctrl.c \
diff --git a/controller/lflow-cache.c b/controller/lflow-cache.c
new file mode 100644
index 000..e12c69d
--- /dev/null
+++ b/controller/lflow-cache.c
@@ -0,0 +1,222 @@
+/*
+ * Copyright (c) 2015, 2016 Nicira, Inc.
+ * Copyright (c) 2021, Red Hat, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+
+#include "lflow-cache.h"
+#include "lib/uuid.h"
+#include "ovn/expr.h"
+
+struct lflow_cache {
+struct hmap entries[LCACHE_T_MAX];
+bool enabled;
+};
+
+struct lflow_cache_entry {
+struct hmap_node node;
+struct uuid lflow_uuid; /* key */
+
+struct lflow_cache_value value;
+};
+
+static struct lflow_cache_value *lflow_cache_add__(
+struct lflow_cache *lc, const struct uuid *lflow_uuid,
+enum lflow_cache_type type);
+static void lflow_cache_delete__(struct lflow_cache *lc,
+ struct lflow_cache_entry *lce);
+
+struct lflow_cache *
+lflow_cache_create(void)
+{
+struct lflow_cache *lc = xmalloc(sizeof *lc);
+
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+hmap_init(>entries[i]);
+}
+
+lc->enabled = true;
+return lc;
+}
+
+void
+lflow_cache_flush(struct lflow_cache *lc)
+{
+if (!lc) {
+return;
+}
+
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+struct lflow_cache_entry *lce;
+struct lflow_cache_entry *lce_next;
+
+HMAP_FOR_EACH_SAFE (lce, lce_next, node, >entries[i]) {
+lflow_cache_delete__(lc, lce);
+}
+}
+}
+
+void
+lflow_cache_destroy(struct lflow_cache *lc)
+{
+if (!lc) {
+return;
+}
+
+lflow_cache_flush(lc);
+for (size_t i = 0; i < LCACHE_T_MAX; i++) {
+hmap_destroy(>entries[i]);
+}
+free(lc);
+}
+
+void
+lflow_cache_enable(struct lflow_cache *lc, bool enabled)
+{
+if (!lc) {
+return;
+}
+
+if (lc->enabled && !enabled) {
+lflow_cache_flush(lc);
+}
+lc->enabled = enabled;
+}
+
+bool
+lflow_cache_is_enabled(const struct lflow_cache *lc)
+{
+return lc && lc->enabled;
+}
+
+void
+lflow_cache_add_conj_id(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+uint32_t conj_id_ofs)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_CONJ_ID);
+
+if (!lcv) {
+return;
+}
+lcv->conj_id_ofs = conj_id_ofs;
+}
+
+void
+lflow_cache_add_expr(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+ uint32_t conj_id_ofs, struct expr *expr)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_EXPR);
+
+if (!lcv) {
+expr_destroy(expr);
+return;
+}
+lcv->conj_id_ofs = conj_id_ofs;
+lcv->expr = expr;
+}
+
+void
+lflow_cache_add_matches(struct lflow_cache *lc, const struct uuid *lflow_uuid,
+struct hmap *matches)
+{
+struct lflow_cache_value *lcv =
+lflow_cache_add__(lc, lflow_uuid, LCACHE_T_MATCHES);
+
+if (!lcv) {
+expr_matches_destroy(matches);
+free(matches);
+return;
+}
+lcv->expr_matches = matches;
+}
+
+struct 

[ovs-dev] [PATCH ovn v3 02/10] lflow: Refactor convert_match_to_expr() to explicitly consume prereqs.

2021-02-09 Thread Dumitru Ceara
It was (and still is) the responsibility of the caller of
convert_match_to_expr() to explicitly free any 'prereqs' expression that
was passed as argument if the expression parsing of the 'lflow' match failed.

However, convert_match_to_expr() now updates the value of '*prereqs' setting
it to NULL in the successful case.  This makes it easier for callers of the
function because 'expr_destroy(prereqs)' can now be called unconditionally.

Acked-by: Mark Michelson 
Acked-by: Numan Siddique 
Signed-off-by: Dumitru Ceara 
---
Note: This patch doesn't change the callers of convert_match_to_expr() to
take advantage of the new semantic but following patches do.
---
 controller/lflow.c |   14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 02a4480..340f1f0 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -758,11 +758,12 @@ add_matches_to_flow_table(const struct sbrec_logical_flow 
*lflow,
 /* Converts the match and returns the simplified expr tree.
  *
  * The caller should evaluate the conditions and normalize the expr tree.
+ * If parsing is successful, '*prereqs' is also consumed.
  */
 static struct expr *
 convert_match_to_expr(const struct sbrec_logical_flow *lflow,
   const struct sbrec_datapath_binding *dp,
-  struct expr *prereqs,
+  struct expr **prereqs,
   const struct shash *addr_sets,
   const struct shash *port_groups,
   struct lflow_resource_ref *lfrr,
@@ -795,8 +796,9 @@ convert_match_to_expr(const struct sbrec_logical_flow 
*lflow,
 sset_destroy(_groups_ref);
 
 if (!error) {
-if (prereqs) {
-e = expr_combine(EXPR_T_AND, e, prereqs);
+if (*prereqs) {
+e = expr_combine(EXPR_T_AND, e, *prereqs);
+*prereqs = NULL;
 }
 e = expr_annotate(e, , );
 }
@@ -854,7 +856,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 .n_tables = LOG_PIPELINE_LEN,
 .cur_ltable = lflow->table_id,
 };
-struct expr *prereqs;
+struct expr *prereqs = NULL;
 char *error;
 
 error = ovnacts_parse_string(lflow->actions, , , );
@@ -885,7 +887,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 struct expr *expr = NULL;
 if (!l_ctx_out->lflow_cache_map) {
 /* Caching is disabled. */
-expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets,
+expr = convert_match_to_expr(lflow, dp, , l_ctx_in->addr_sets,
  l_ctx_in->port_groups, l_ctx_out->lfrr,
  NULL);
 if (!expr) {
@@ -949,7 +951,7 @@ consider_logical_flow__(const struct sbrec_logical_flow 
*lflow,
 
 bool pg_addr_set_ref = false;
 if (!expr) {
-expr = convert_match_to_expr(lflow, dp, prereqs, l_ctx_in->addr_sets,
+expr = convert_match_to_expr(lflow, dp, , l_ctx_in->addr_sets,
  l_ctx_in->port_groups, l_ctx_out->lfrr,
  _addr_set_ref);
 if (!expr) {

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 答复: 答复: [PATCH V3 2/4] Add GSO support for DPDK data path

2021-02-09 Thread Ilya Maximets
On 2/8/21 8:54 PM, William Tu wrote:
> On Mon, Feb 8, 2021 at 6:36 AM Ilya Maximets  wrote:
>>
>> On 2/8/21 1:56 AM, Yi Yang (杨燚)-云服务集团 wrote:
>>> Yes, GSO  is ok, but GRO may be have that issue, I didn't see that issue in 
>>> my openstack environment, so maybe it will be great if we can have a test 
>>> case to trigger that issue.
>>>
>>> -邮件原件-
>>> 发件人: William Tu [mailto:u9012...@gmail.com]
>>> 发送时间: 2021年2月7日 23:46
>>> 收件人: Yi Yang (杨燚)-云服务集团 
>>> 抄送: i.maxim...@ovn.org; yang_y...@163.com; ovs-dev@openvswitch.org; 
>>> f...@sysclose.org
>>> 主题: Re: [ovs-dev] 答复: [PATCH V3 2/4] Add GSO support for DPDK data path
>>>
>>> On Tue, Oct 27, 2020 at 6:02 PM Yi Yang (杨燚)-云服务集团  
>>> wrote:

 -邮件原件-
 发件人: dev [mailto:ovs-dev-boun...@openvswitch.org] 代表 Ilya Maximets
 发送时间: 2020年10月27日 21:03
 收件人: yang_y...@163.com; ovs-dev@openvswitch.org
 抄送: f...@sysclose.org; i.maxim...@ovn.org
 主题: Re: [ovs-dev] [PATCH V3 2/4] Add GSO support for DPDK data path

 On 8/7/20 12:56 PM, yang_y...@163.com wrote:
> From: Yi Yang 
>
> GSO(Generic Segment Offload) can segment large UDP  and TCP packet
> to small packets per MTU of destination , especially for the case
> that physical NIC can't do hardware offload VXLAN TSO and VXLAN UFO,
> GSO can make sure userspace TSO can still work but not drop.
>
> In addition, GSO can help improve UDP performane when UFO is enabled
> in VM.
>
> GSO can support TCP, UDP, VXLAN TCP, VXLAN UDP, it is done in Tx
> function of physical NIC.
>
> Signed-off-by: Yi Yang 
> ---
>  lib/dp-packet.h|  21 +++-
>  lib/netdev-dpdk.c  | 358
> +
>  lib/netdev-linux.c |  17 ++-
>  lib/netdev.c   |  67 +++---
>  4 files changed, 417 insertions(+), 46 deletions(-)
>>>
>>> snip
>>>
>
> @@ -2339,24 +2428,19 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk 
> *dev, struct rte_mbuf **pkts,
>  return cnt;
>  }
>
> -/* Tries to transmit 'pkts' to txq 'qid' of device 'dev'.  Takes
> ownership of
> - * 'pkts', even in case of failure.
> - *
> - * Returns the number of packets that weren't transmitted. */
> static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int 
> qid,
> - struct rte_mbuf **pkts, int cnt)
> +__netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid,
> +   struct rte_mbuf **pkts, int cnt)
>  {
>  uint32_t nb_tx = 0;
> -uint16_t nb_tx_prep = cnt;
> +uint32_t nb_tx_prep;
>
> -if (userspace_tso_enabled()) {
> -nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> -if (nb_tx_prep != cnt) {
> -VLOG_WARN_RL(, "%s: Output batch contains invalid 
> packets. "
> - "Only %u/%u are valid: %s", dev->up.name, 
> nb_tx_prep,
> - cnt, rte_strerror(rte_errno));
> -}
> +nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt);
> +if (nb_tx_prep != cnt) {
> +VLOG_WARN_RL(, "%s: Output batch contains invalid packets. "
> +  "Only %u/%u are valid: %s",
> + dev->up.name, nb_tx_prep,
> + cnt, rte_strerror(rte_errno));
>  }
>
>  while (nb_tx != nb_tx_prep) {
> @@ -2384,6 +2468,200 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, 
> int qid,
>  return cnt - nb_tx;
>  }
>
> +static inline void
> +set_multiseg_udptcp_cksum(struct rte_mbuf *mbuf)

 I didn't review the patch, only had a quick glance, but this part bothers 
 me.  OVS doesn't support multi-segment mbufs, so it should not be possible 
 for such mbufs being transmitted by OVS.  So, I do not understand why this 
 function needs to work with such mbufs.

 [Yi Yang] Only DPDK driver/Tx function will use it, not OVS, 
 set_multiseg_udptcp_cksum is called in GSO part, it is last step before Tx 
 function, it is a big external mbuf before rte_gso_segment, that isn't a 
 multi-segmented mbuf.

>>>
>>> Hi Ilya,
>>>
>>> Now I understand Yi Yang's point better and I agree with him.
>>> Looks like the patch does the GSO at the DPDK TX function.
>>> It creates multi-seg mbuf after rte_gso_segment(), but will immediately 
>>> send out the multi-seg mbuf to DPDK port, without traversing inside other 
>>> part of OVS code. I guess this case it should work OK?
>>
>> Hi.  GSO itself should be possible to implement, but we should
>> not enable support for multi-segment mbufs in the NIC, since this
>> might end up in multi-segment packets being received by OVS
>> causing lots of trouble (for example dp_packet_clone() uses simple
>> memcpy() that will result in invalid memory reads.)
> 
> 

Re: [ovs-dev] 答复: 答复: [PATCH] netdev-dpdk: fix incorrect shinfo initialization

2021-02-09 Thread Toshiaki Makita

On 2021/02/09 4:13, William Tu wrote:

On Mon, Feb 8, 2021 at 8:57 AM Ilya Maximets  wrote:


On 2/6/21 5:15 PM, William Tu wrote:

On Mon, Feb 1, 2021 at 5:48 PM Yi Yang (杨燚)-云服务集团  wrote:


Thanks Ilya, net_tap PMD is handling tap device on host side, so it can 
leverage vnet header to do TSO/GSO, maybe net_pmd authors don't know how to do 
this, from source code, tap fd isn't enabled vnet header and TSO.


thanks, learned a lot from these discussions.

I looked at the DPDK net_tap and indeed it doesn't support virtio net hdr.
Do you guys think it makes sense to add TSO at dpdk net_tap?
Or simply using the current OVS's userspace-enable-tso on tap/veth is
good enough?
(using type=system, not using dpdk port type on tap/veth.)

Regards,
William



I didn't benchmark all types of interfaces, but I'd say that, if you
need more or less high performance solution for userspace<->kernel
communication, you should, probably, take a look at virtio-user
ports with vhost kernel backend:
   https://doc.dpdk.org/guides/howto/virtio_user_as_exceptional_path.html
This should be the fastest and also feature-rich solution.

Thanks! I will give it a try.


Tap devices are not designed for high performance in general,
so I'd not suggest any of them for highly loaded ports.
If it's only for some small management traffic, it should be fine
to just use netdev-linux implementation.


That's what I thought until Flavio enables vnet header.


netdev-afxdp with pmd or non-pmd modes on a veth devices is another
(potentially high performance) solution.


When testing intra-host container to container performance,
Tap device becomes much faster than netdev-afxdp, especially with iperf TCP.
Mostly due to vnet header's TSO and csum offload feature.
It's a big limitation for XDP frame which couldn't carry large buffer or carry
the partial csum information.

I reach a conclusion that for intra-host container to container
TCP performance, from the fastest configuration to slowest (ns: namespace)
0) dpdk vhostuser in ns0 -> vhostuer - OVS userspace
(But requires TCP in userspace and application modification)
1) veth0 in ns0 -> veth with TSO - OVS kernel module - veth with TSO
-> veth1 in ns1
2) tap0 in ns0 -> virtio_user - OVS userspace - virtio_user -> tap1 in ns1
3) tap0 in ns0 -> recv_tap - OVS with userspace-tso - tap_batch_send
-> tap1 in ns1
4) veth0 in ns0 -> af_packet sock - OVS with userspace-tso -
af_packet_sock -> veth1 in ns1
5) veth0 in ns0 -> netdev-afxdp - OVS - netdev-afxdp -> veth1 in ns1

I also tested Toshiaki's XDP offload patch,
https://www.mail-archive.com/ovs-dev@openvswitch.org/msg45930.html
I would guess it's between 2 to 4.


Note that veth native XDP is fast when all of packet processing is done in XDP 
world.
That means packets generated in containers are not fast even with native veth 
XDP.
The fast case is that a phy device XDP_REDIRECTs frames to veth, and then the 
peer
veth in ns does something and XDP_TXes the frame, and then REDIRECT it to 
another veth pair.

phy --XDP_REDIRECT--> veth-host0 --> veth-ns0 --XDP_TX--> veth-host0 
--XDP_REDIRECT--> veth-host1 --> veth-ns1

Having said that, missing TSO is indeed a big limitation.
BTW there is some progress on TSO in XDP world...
https://patchwork.kernel.org/project/netdevbpf/cover/cover.1611086134.git.lore...@kernel.org/

Toshiaki Makita
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-afxdp: Add start qid support.

2021-02-09 Thread Ilya Maximets
On 2/8/21 11:35 PM, William Tu wrote:
> On Mon, Feb 8, 2021 at 4:58 AM Ilya Maximets  wrote:
>>
>> On 2/7/21 5:05 PM, Toshiaki Makita wrote:
>>> On 2021/02/07 2:00, William Tu wrote:
 On Fri, Feb 5, 2021 at 1:08 PM Gregory Rose  wrote:
> On 2/4/2021 7:08 PM, William Tu wrote:
>> On Thu, Feb 4, 2021 at 3:17 PM Gregory Rose  wrote:
>>> On 2/3/2021 1:21 PM, William Tu wrote:
 Mellanox card has different XSK design. It requires users to create
 dedicated queues for XSK. Unlike Intel's NIC which loads XDP program
 to all queues, Mellanox only loads XDP program to a subset of its 
 queue.

 When OVS uses AF_XDP with mlx5, it doesn't replace the existing RX and 
 TX
 queues in the channel with XSK RX and XSK TX queues, but it creates an
 additional pair of queues for XSK in that channel. To distinguish
 regular and XSK queues, mlx5 uses a different range of qids.
 That means, if the card has 24 queues, queues 0..11 correspond to
 regular queues, and queues 12..23 are XSK queues.
 In this case, we should attach the netdev-afxdp with 'start-qid=12'.

 I tested using Mellanox Connect-X 6Dx, by setting 'start-qid=1', and:
  $ ethtool -L enp2s0f0np0 combined 1
  # queue 0 is for non-XDP traffic, queue 1 is for XSK
  $ ethtool -N enp2s0f0np0 flow-type udp4 action 1
 note: we need additionally add flow-redirect rule to queue 1
>>>
>>> Seems awfully hardware dependent.  Is this just for Mellanox or does
>>> it have general usefulness?
>>>
>> It is just Mellanox's design which requires pre-configure the 
>> flow-director.
>> I only have cards from Intel and Mellanox so I don't know about other 
>> vendors.
>>
>> Thanks,
>> William
>>
>
> I think we need to abstract the HW layer a little bit.  This start-qid
> option is specific to a single piece of HW, at least at this point.
> We should expect that further HW  specific requirements for
> different NIC vendors will come up in the future.  I suggest
> adding a hw_options:mellanox:start-qid type hierarchy  so that
> as new HW requirements come up we can easily scale.  It will
> also make adding new vendors easier in the future.
>
> Even with NIC vendors you can't always count on each new generation
> design to always keep old requirements and methods for feature
> enablement.
>
> What do you think?
>
 Thanks for the feedback.
 So far I don't know whether other vendors will need this option or not.
>>>
>>> FWIU, this api "The lower half of the available amount of RX queues are 
>>> regular queues, and the upper half are XSK RX queues." is the result of 
>>> long discussion to support dedicated/isolated XSK rings, which is not meant 
>>> for a mellanox-specific feature.
>>>
>>> https://patchwork.ozlabs.org/project/netdev/cover/20190524093431.20887-1-maxi...@mellanox.com/
>>> https://patchwork.ozlabs.org/project/netdev/cover/20190612155605.22450-1-maxi...@mellanox.com/
>>>
>>> Toshiaki Makita
>>
>> Thanks for the links.  Very helpful.
>>
>> From what I understand lower half of queues should still work, i.e.
>> it should still be possible to attach AF_XDP socket to them.  But
>> they will not work in zero-copy mode ("generic" only?).
>> William, could you check that?  Does it work and with which mode
>> "best-effort" ends up with?  And what kind of errors libbpf returns
>> if we're trying to enable zero-copy?
> 
> Thanks for your feedback.
> Yes, only zero-copy mode needs to be aware of this, meaning zero-copy
> mode has to use the upper half of the queues (the start-qid option here).
> Native mode and SKB mode works OK on upper and lower queues.
> When attaching zc XSK to lower half queue, libbpf returns EINVAL at
> xsk_socket__create().

OK.  Thanks for checking.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v4 3/5] netdev-offload: Add xdp flow api provider

2021-02-09 Thread Toshiaki Makita

On 2021/02/05 12:03, William Tu wrote:

On Thu, Feb 4, 2021 at 5:15 PM William Tu  wrote:


Hi Toshiaki,

Thanks for the patch. I have some questions inline.

On Thu, Jul 30, 2020 at 7:55 PM Toshiaki Makita
 wrote:


This provider offloads classifier to software XDP.

It works only when a custom XDP object is loaded by afxdp netdev.
The BPF program needs to implement classifier with array-of-maps for
subtable hashmaps and arraymap for subtable masks. The flow api
provider detects classifier support in the custom XDP program when
loading it.

The requirements for the BPF program is as below:

- A struct named "meta_info" in ".ovs_meta" section
   inform vswitchd of supported keys, actions, and the max number of
   actions.

- A map named "subtbl_masks"
   An array which implements a list. The entry contains struct
   xdp_subtables_mask provided by lib/netdev-offload-xdp.h followed by
   miniflow buf of any length for the mask.

- A map named "subtbl_masks_hd"
   A one entry int array which contains the first entry index of the list
   implemented by subtbl_masks.

- A map named "flow_table"
   An array-of-maps. Each entry points to subtable hash-map instantiated
   with the following "subtbl_template".
   The entry of each index of subtbl_masks has miniflow mask of subtable
   in the corresponding index of flow_table.

- A map named "subtbl_template"
   A template for subtable, used for entries of "flow_table".
   The key must be miniflow buf (without leading map).

- A map named "output_map"
   A devmap. Each entry represents odp port.

- A map named "xsks_map"
   A xskmap. Used for upcall.

For more details, refer to the reference BPF program in next commit.

In the future it may be possible to offload classifier to SmartNIC
through XDP, but currently it's not because map-in-map is not supported
by XDP hw-offload.

Signed-off-by: Toshiaki Makita 
---
  acinclude.m4  |   25 +
  configure.ac  |1 +
  lib/automake.mk   |8 +
  lib/bpf-util.c|   38 ++
  lib/bpf-util.h|   22 +
  lib/netdev-afxdp.c|  218 +-
  lib/netdev-afxdp.h|3 +
  lib/netdev-linux-private.h|2 +
  lib/netdev-offload-provider.h |8 +-
  lib/netdev-offload-xdp.c  | 1213 +
  lib/netdev-offload-xdp.h  |   49 ++
  lib/netdev-offload.c  |3 +
  12 files changed, 1588 insertions(+), 2 deletions(-)
  create mode 100644 lib/bpf-util.c
  create mode 100644 lib/bpf-util.h
  create mode 100644 lib/netdev-offload-xdp.c
  create mode 100644 lib/netdev-offload-xdp.h

diff --git a/acinclude.m4 b/acinclude.m4
index 4bac9dbdd..31ff0c013 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -329,6 +329,31 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
  ])

+dnl OVS_CHECK_LINUX_XDP_OFFLOAD
+dnl
+dnl Check both llvm and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_XDP_OFFLOAD], [
+  AC_ARG_ENABLE([xdp_offload],
+[AC_HELP_STRING([--enable-xdp-offload],
+[Compile XDP offload])],
+[], [enable_xdp_offload=no])
+  AC_MSG_CHECKING([whether XDP offload is enabled])
+  if test "$enable_xdp_offload" != yes; then
+AC_MSG_RESULT([no])
+XDP_OFFLOAD_ENABLE=false
+  else
+if test "$enable_afxdp" == no; then
+  AC_MSG_ERROR([XDP offload depends on afxdp. Please add --enable-afxdp.])
+fi
+AC_MSG_RESULT([yes])
+XDP_OFFLOAD_ENABLE=true
+
+AC_DEFINE([HAVE_XDP_OFFLOAD], [1],
+  [Define to 1 if XDP offload compilation is available and 
enabled.])
+  fi
+  AM_CONDITIONAL([HAVE_XDP_OFFLOAD], test "$XDP_OFFLOAD_ENABLE" = true)
+])
+
  dnl OVS_CHECK_DPDK
  dnl
  dnl Configure DPDK source tree
diff --git a/configure.ac b/configure.ac
index 8d37af9db..530112c49 100644
--- a/configure.ac
+++ b/configure.ac
@@ -99,6 +99,7 @@ OVS_CHECK_DOT
  OVS_CHECK_IF_DL
  OVS_CHECK_STRTOK_R
  OVS_CHECK_LINUX_AF_XDP
+OVS_CHECK_LINUX_XDP_OFFLOAD
  AC_CHECK_DECLS([sys_siglist], [], [], [[#include ]])
  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
[], [], [[#include ]])
diff --git a/lib/automake.mk b/lib/automake.mk
index 920c958e3..9486b32e7 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -451,6 +451,14 @@ lib_libopenvswitch_la_SOURCES += \
 lib/netdev-afxdp.h
  endif

+if HAVE_XDP_OFFLOAD
+lib_libopenvswitch_la_SOURCES += \
+   lib/bpf-util.c \
+   lib/bpf-util.h \
+   lib/netdev-offload-xdp.c \
+   lib/netdev-offload-xdp.h
+endif
+
  if DPDK_NETDEV
  lib_libopenvswitch_la_SOURCES += \
 lib/dpdk.c \
diff --git a/lib/bpf-util.c b/lib/bpf-util.c
new file mode 100644
index 0..324cfbe1d
--- /dev/null
+++ b/lib/bpf-util.c
@@ -0,0 +1,38 @@
+/*
+ * Copyright (c) 2020 NTT Corp.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the 

Re: [ovs-dev] [PATCH v4 0/5] XDP offload using flow API provider

2021-02-09 Thread Toshiaki Makita

On 2021/02/05 2:36, William Tu wrote:

Hi Toshiaki,

Thanks for the patch. I've been testing it for a couple days.
I liked it a lot! The compile and build process all work without any issues.


Hi, thank you for reviewing!
Sorry for taking time to reply. It took time to remember every detail of the 
patch set...


On Thu, Jul 30, 2020 at 7:55 PM Toshiaki Makita
 wrote:


This patch adds an XDP-based flow cache using the OVS netdev-offload
flow API provider.  When an OVS device with XDP offload enabled,
packets first are processed in the XDP flow cache (with parse, and
table lookup implemented in eBPF) and if hits, the action processing
are also done in the context of XDP, which has the minimum overhead.

This provider is based on top of William's recently posted patch for
custom XDP load.  When a custom XDP is loaded, the provider detects if
the program supports classifier, and if supported it starts offloading
flows to the XDP program.

The patches are derived from xdp_flow[1], which is a mechanism similar to
this but implemented in kernel.


* Motivation

While userspace datapath using netdev-afxdp or netdev-dpdk shows good
performance, there are use cases where packets better to be processed in
kernel, for example, TCP/IP connections, or container to container
connections.  Current solution is to use tap device or af_packet with
extra kernel-to/from-userspace overhead.  But with XDP, a better solution
is to steer packets earlier in the XDP program, and decides to send to
userspace datapath or stay in kernel.

One problem with current netdev-afxdp is that it forwards all packets to
userspace, The first patch from William (netdev-afxdp: Enable loading XDP
program.) only provides the interface to load XDP program, howerver users
usually don't know how to write their own XDP program.

XDP also supports HW-offload so it may be possible to offload flows to
HW through this provider in the future, although not currently.
The reason is that map-in-map is required for our program to support
classifier with subtables in XDP, but map-in-map is not offloadable.
If map-in-map becomes offloadable, HW-offload of our program may also
be possible.


I think it's too far away for XDP to be offloaded into HW and meet OVS's
feature requirements.


I don't know blockers other than map-in-map, but probably there are more.
If you can provide explicit blockers I can add them in the cover letter.


There is a research prototype here, FYI.
https://www.usenix.org/conference/osdi20/presentation/brunella


This is a presentation about FPGA, not HW offload to SmartNIC, right?




* How to use

1. Install clang/llvm >= 9, libbpf >= 0.0.6 (included in kernel 5.5), and
kernel >= 5.3.

2. make with --enable-afxdp --enable-xdp-offload
--enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that


typo: I think you mean --enable-xdp-offload


Thanks.




the BPF object will not be installed anywhere by "make install" at this point.

3. Load custom XDP program
E.g.
$ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
   options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"
$ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
   options:xdp-obj="/path/to/ovs/bpf/flowtable_afxdp.o"

4. Enable XDP_REDIRECT
If you use veth devices, make sure to load some (possibly dummy) programs
on the peers of veth devices. This patch set includes a program which
does nothing but returns XDP_PASS. You can use it for the veth peer like
this:
$ ip link set veth1 xdpdrv object /path/to/ovs/bpf/xdp_noop.o section xdp


I'd suggest not using "veth1" as an example, because in (3) above, people
might think "veth1" is already attached to ovsbr0.
IIUC, here your "veth1" should be the device at the peer inside
another namespace.


Sure, will rename it.



Some HW NIC drivers require as many queues as cores on its system. Tweak
queues using "ethtool -L".

5. Enable hw-offload
$ ovs-vsctl set Open_vSwitch . other_config:offload-driver=linux_xdp
$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
This will starts offloading flows to the XDP program.

You should be able to see some maps installed, including "debug_stats".
$ bpftool map

If packets are successfully redirected by the XDP program,
debug_stats[2] will be counted.
$ bpftool map dump id 

Currently only very limited keys and output actions are supported.
For example NORMAL action entry and IP based matching work with current
key support. VLAN actions used by port tag/trunks are also supported.



I don't know if this is too much to ask for.
I wonder if you, or we can work together, to add at least a tunnel
support, ex: vxlan?
The current version is a good prototype for people to test an L2/L3
XDP offload switch,
but without a good use case, it's hard to attract more people to
contribute or use it.


I think we have discussed this before.
Vxlan or other tunneling is indeed important, but that's not straightforward.
Push is easy, but pop is 

Re: [ovs-dev] [PATCH v2] netlink: ignore IFLA_WIRELESS events

2021-02-09 Thread Michał Kazior
On Tue, 9 Feb 2021 at 00:58, Gregory Rose  wrote:
> On 1/14/2021 1:09 AM, Michal Kazior wrote:
> > From: Michal Kazior 
[...]
>
> Hi Michal,
>
> The patch looks fine to me.  Applies cleanly to master
> and did not cause any regressions in 'make check'.
>
> I'm curious since I don't ever use OVS with wireless ports
> if there is somewhere this issue has been reported.  If
> so then we should add it to the commit message.

Hi Greg,

I'm unaware of any reports of this sort and I wouldn't be surprised to
be the first one. Given my experience - and the fact this has been
sitting on the mailing list for almost 2 months with no comments now -
I imagine not many people tried mixing OVS with wext drivers,
especially vendor drivers.


Michał
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev