[ovs-dev] [v10 12/12] dpif-netdev: add mfex options to scalar dpif

2021-07-12 Thread Kumar Amber
From: kumar Amber 

This commits add the mfex optimized options to be
executed as part of scalar DPIF.

Signed-off-by: kumar Amber 
Acked-by: Flavio Leitner 
---
 lib/dpif-netdev.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index cca211837..14c98e450 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7029,6 +7029,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0,  n_mfex_opt_hit = 0;
 struct dfc_cache *cache = >flow_cache;
 struct dp_packet *packet;
+struct dp_packet_batch single_packet;
 const size_t cnt = dp_packet_batch_size(packets_);
 uint32_t cur_min = pmd->ctx.emc_insert_min;
 const uint32_t recirc_depth = *recirc_depth_get();
@@ -7039,6 +7040,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 size_t map_cnt = 0;
 bool batch_enable = true;
 
+single_packet.count = 1;
+
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+
 atomic_read_relaxed(>dp->smc_enable_db, _enable_db);
 pmd_perf_update_counter(>perf_stats,
 md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
@@ -7089,7 +7095,22 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 }
 
-miniflow_extract(packet, >mf);
+/* Set the count and packet for miniflow_opt with batch_size 1. */
+if ((mfex_func) && (!md_is_valid)) {
+single_packet.packets[0] = packet;
+int mf_ret;
+
+mf_ret = mfex_func(_packet, key, 1, port_no, pmd);
+/* Fallback to original miniflow_extract if there is a miss. */
+if (mf_ret) {
+n_mfex_opt_hit++;
+} else {
+miniflow_extract(packet, >mf);
+}
+} else {
+miniflow_extract(packet, >mf);
+}
+
 key->len = 0; /* Not computed yet. */
 key->hash =
 (md_is_valid == false)
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v10 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-07-12 Thread Kumar Amber
From: Harry van Haaren 

This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()

The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.

As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.

Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---

Hi Readers,

If you have a traffic profile you'd like to see accelerated using
avx512 code, please send me an email and we can collaborate on adding
support for it!

Regards, -Harry

---

v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 NEWS  |   2 +
 lib/dpif-netdev-extract-avx512.c  | 152 ++
 lib/dpif-netdev-private-extract.c |  30 ++
 lib/dpif-netdev-private-extract.h |  10 ++
 4 files changed, 194 insertions(+)

diff --git a/NEWS b/NEWS
index 26cd85978..849008a80 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,8 @@ Post-v2.15.0
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
  * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
+ * Add AVX512 based optimized miniflow extract function for traffic type
+   IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index c06e53582..ecb0be70d 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
 
 #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
 #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
+#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
+
+/* VLAN (Dot1Q) patterns and masks. */
+#define PATTERN_DT1Q_MASK   \
+  0x00, 0x00, 0xFF, 0xFF,
+#define PATTERN_DT1Q_IPV4   \
+  0x00, 0x00, 0x08, 0x00,
 
 /* Generator for checking IPv4 ver, ihl, and proto */
 #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
@@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
   34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */   \
   NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
+/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
+#define PATTERN_IPV4_TCP_SHUFFLE \
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+
+#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
+
+#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
 /* Generation of K-mask bitmask values, to zero out data in result. Note that
  * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
@@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
  * 

[ovs-dev] [v10 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-12 Thread Kumar Amber
From: Harry van Haaren 

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 

---
v9:
- include comments from flavio
v8:
- include documentation on AVX512 MFEX as per Eelco's suggestion
v7:
- fix minor review sentences (Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- inlcude assert for flow abi change
- include assert for offset changes
---
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-avx512.c  | 478 ++
 lib/dpif-netdev-private-extract.c |  13 +
 lib/dpif-netdev-private-extract.h |  30 ++
 4 files changed, 522 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-avx512.c

diff --git a/lib/automake.mk b/lib/automake.mk
index f4f36325e..299f81939 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-extract-avx512.c \
lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
new file mode 100644
index 0..c06e53582
--- /dev/null
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -0,0 +1,478 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * AVX512 Miniflow Extract.
+ *
+ * This file contains optimized implementations of miniflow_extract()
+ * for specific common traffic patterns. The optimizations allow for
+ * quick probing of a specific packet type, and if a match with a specific
+ * type is found, a shuffle like procedure builds up the required miniflow.
+ *
+ * Process
+ * -
+ *
+ * The procedure is to classify the packet based on the traffic type
+ * using predifined bit-masks and arrage the packet header data using shuffle
+ * instructions to a pre-defined place as required by the miniflow.
+ * This elimates the if-else ladder to identify the packet data and add data
+ * as per protocol which is present.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "flow.h"
+#include "dpdk.h"
+
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-flow.h"
+
+/* AVX512-BW level permutex2var_epi8 emulation. */
+static inline __m512i
+__attribute__((target("avx512bw")))
+_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
+   __m512i v_data_0,
+   __m512i v_shuf_idxs,
+   __m512i v_data_1)
+{
+/* Manipulate shuffle indexes for u16 size. */
+__mmask64 k_mask_odd_lanes = 0x;
+/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
+__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
+v_shuf_idxs,
+_mm512_setzero_si512());
+v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
+
+__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
+
+/* Shuffle each half at 16-bit width. */
+__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
+v_data_1);
+__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
+v_data_1);
+
+/* Find if the shuffle index was odd, via mask and compare. */
+uint16_t index_odd_mask = 0x1;
+const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
+
+/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask. */
+__m512i 

[ovs-dev] [v10 09/12] dpdk: add additional CPU ISA detection strings

2021-07-12 Thread Kumar Amber
From: Harry van Haaren 

This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).

These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
 NEWS   | 1 +
 lib/dpdk.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/NEWS b/NEWS
index 581bff225..26cd85978 100644
--- a/NEWS
+++ b/NEWS
@@ -40,6 +40,7 @@ Post-v2.15.0
traffic.
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
+ * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 9de2af58e..1b8f8e55b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
+CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
 CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v10 08/12] dpif/stats: add miniflow extract opt hits counter

2021-07-12 Thread Kumar Amber
From: Harry van Haaren 

This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:6786432  (100.0 %)

Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
---
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 lib/dpif-netdev-avx512.c|  3 +++
 lib/dpif-netdev-perf.c  |  3 +++
 lib/dpif-netdev-perf.h  |  1 +
 lib/dpif-netdev-unixctl.man |  4 
 lib/dpif-netdev.c   | 12 +++-
 tests/pmd.at|  6 --
 6 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 7772b7abf..544d36903 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 
 /* At this point we don't return error anymore, so commit stats here. */
+uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT,
+mfex_hit_cnt);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7103a2d4d..d7676ea2b 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
+"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_PHWOL_HIT],
 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
+stats[PMD_STAT_MFEX_OPT_HIT],
+100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 8b1a52387..834c26260 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -57,6 +57,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
+PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 83ce4f1c5..f34758416 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a 
recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
 
+The MFEX Opt hits displays the number of packets which is processed by the
+optimized miniflow extract implementations.
+
 Cycles are counted using the TSC or similar facilities (when available on
 the platform). The duration of one cycle depends on the processing platform.
 
@@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1:
   Rx packets: 2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:3599415  (1.50 passes/pkt)
   - PHWOL hits: 0  (  0.0 %)
+  - MFEX Opt hits:3570133  ( 99.5 %)
   - EMC hits:  336472  (  9.3 %)
   - SMC hits:   0  ( 0.0 %)
   - Megaflow hits:3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1132a0ad5..cca211837 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply,
   "  packet recirculations: %"PRIu64"\n"
   "  avg. datapath passes per packet: %.02f\n"
   "  phwol hits: %"PRIu64"\n"
+  "  mfex opt hits: 

[ovs-dev] [v10 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-12 Thread Kumar Amber
Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
Acked-by: Flavio Leitner 
---
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
 Documentation/topics/dpdk/bridge.rst |  55 +++
 tests/.gitignore |   1 +
 tests/automake.mk|   5 +++
 tests/mfex_fuzzy.py  |  31 +++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/system-dpdk.at |  49 
 6 files changed, 141 insertions(+)
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 662446401..7b81d0305 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -345,3 +345,58 @@ A compile time option is available in order to test it 
with the OVS unit
 test suite. Use the following configure option ::
 
 $ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+The prerquiste before running the unit test is to run the script provided ::
+
+tests/mfex_fuzzy.py
+
+This script generates a pcap with mulitple type of fuzzed packets to be used
+in the below unit test-case.
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
 /ovsdb-cluster-testsuite
 /ovsdb-cluster-testsuite.dir/
 /ovsdb-cluster-testsuite.log
+/pcap/
 /pki/
 /system-afxdp-testsuite
 /system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..2bcf054b0 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@
 
+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test.pcap \
+   tests/mfex_fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..395158b0d
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,31 @@
+#!/usr/bin/python3
+try:
+   from scapy.all import *
+except ModuleNotFoundError as err:
+   print(err + ": Scapy")
+import sys
+
+path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+   # Generate random protocol bases, use a fuzz() over the combined packet for 
full fuzzing.
+   eth = Ether(src=RandMAC(), dst=RandMAC())
+   vlan = Dot1Q()
+   ipv4 = IP(src=RandIP(), dst=RandIP())
+   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
+   udp = UDP(dport=RandShort(), sport=RandShort())
+   tcp = TCP(dport=RandShort(), sport=RandShort())
+
+   # IPv4 packets with fuzzing
+   pktdump.write(fuzz(eth/ipv4/udp))
+   pktdump.write(fuzz(eth/ipv4/tcp))
+   

[ovs-dev] [v10 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-12 Thread Kumar Amber
This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  37 +++-
 lib/dpif-netdev-extract-study.c  |  25 +-
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 128 +--
 4 files changed, 187 insertions(+), 12 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 0fa9341ac..662446401 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,45 @@ command also shows whether the CPU supports each 
implementation ::
 
 An implementation can be selected manually by the following command ::
 
-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core.The third parameter study_cnt, which is specific
+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.
 
 Also user can select the study implementation which studies the traffic for
 a specific number of packets by applying all available implementations of
 miniflow extract and then chooses the one with the most optimal result for
-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS must
+study before choosing an optimal implementation. If no packet count is
+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be running
+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study() executions with
+differing packet counts will use the most recent count value provided by usser.
+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the last parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count can be put as any arbitrary number or left blank::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
 
 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index eddb35682..61260cb70 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
 
-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
 
 /* Struct to hold miniflow study stats. */
 struct study_stats {
@@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }
 
+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study then
+ * set packet counter to requested number else set the packet counter
+ * to default number.
+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+mfex_study_pkts_count = pkt_cmp_count;
+
+return 0;
+}
+
+return -EINVAL;
+}
+
 uint32_t
 mfex_study_traffic(struct dp_packet_batch *packets,
struct netdev_flow_key *keys,
@@ -86,7 +107,7 @@ mfex_study_traffic(struct dp_packet_batch *packets,
 /* Choose the best implementation after a minimum packets have been
  * processed.
  */
-if (stats->pkt_count >= MFEX_MAX_PKT_COUNT) {
+if (stats->pkt_count >= mfex_study_pkts_count) {
 uint32_t best_func_index = MFEX_IMPL_START_IDX;
 uint32_t max_hits = 0;
 for (int i = MFEX_IMPL_START_IDX; i 

[ovs-dev] [v10 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-12 Thread Kumar Amber
This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v10:
- rework default set
v9:
- fix review comments Flavio
v7:
- fix review commens(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst |  5 +
 NEWS |  3 ++-
 acinclude.m4 | 16 
 configure.ac |  1 +
 lib/dpif-netdev-private-extract.c|  4 
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 6f37f2a75..0fa9341ac 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -307,3 +307,8 @@ implementations provide the same results.
 To set the Miniflow autovalidator, use this command ::
 
 $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+A compile time option is available in order to test it with the OVS unit
+test suite. Use the following configure option ::
+
+$ ./configure --enable-mfex-default-autovalidator
diff --git a/NEWS b/NEWS
index 4a7b89409..581bff225 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,8 @@ Post-v2.15.0
  * Add study function to miniflow function table which studies packet
and automatically chooses the best miniflow implementation for that
traffic.
+ * Add build time configure command to enable auto-validatior as default
+   miniflow implementation at build time.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
@@ -57,7 +59,6 @@ Post-v2.15.0
  whether the SNAT with all-zero IP address is supported.
  See ovs-vswitchd.conf.db(5) for details.
 
-
 v2.15.0 - 15 Feb 2021
 -
- OVSDB:
diff --git a/acinclude.m4 b/acinclude.m4
index 343303447..5a48f0335 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -14,6 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
+dnl This enables automatically running all unit tests with all MFEX
+dnl implementations.
+AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
+  AC_ARG_ENABLE([mfex-default-autovalidator],
+[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable 
MFEX autovalidator as default miniflow_extract implementation.])],
+[autovalidator=yes],[autovalidator=no])
+  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
+  if test "$autovalidator" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
 dnl This enables automatically running all unit tests with all DPCLS
 dnl implementations.
diff --git a/configure.ac b/configure.ac
index e45685a6c..46c402892 100644
--- a/configure.ac
+++ b/configure.ac
@@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_DPIF_AVX512_DEFAULT
+OVS_CHECK_MFEX_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 64745f66c..f007a7a80 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -60,7 +60,11 @@ void
 dpif_miniflow_extract_init(void)
 {
 atomic_uintptr_t *mfex_func = (void *)_mfex_func;
+#ifdef MFEX_AUTOVALIDATOR_DEFAULT
+int mfex_idx = MFEX_IMPL_AUTOVALIDATOR;
+#else
 int mfex_idx = MFEX_IMPL_SCALAR;
+#endif
 
 /* Call probe on each impl, and cache the result. */
 for (int i = 0; i < MFEX_IMPL_MAX; i++) {
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v10 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-07-12 Thread Kumar Amber
This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 

---
v10:
- fix minor typos.
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 2d0850836..6f37f2a75 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -256,3 +256,54 @@ The following line should be seen in the configure output 
when the above option
 is used ::
 
 checking whether DPIF AVX512 is default implementation... yes
+
+Miniflow Extract
+
+
+Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
+important header information into a compressed miniflow. This miniflow is
+composed of bits and blocks where the bits signify which blocks are set or
+have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
+values are used by the datapath for switching decisions later.The Optimized
+miniflow extract is traffic specific to speed up the lookup, whereas the
+scalar works for ALL traffic patterns
+
+Most modern CPUs have SIMD capabilities. These SIMD instructions are able
+to process a vector rather than act one single data. OVS provides multiple
+implementations of miniflow extract. This allows the user to take advantage
+of SIMD instructions like AVX512 to gain additional performance.
+
+A list of implementations can be obtained by the following command. The
+command also shows whether the CPU supports each implementation ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-get
+Available Optimized Miniflow Extracts:
+autovalidator (available: True, pmds: none)
+scalar (available: True, pmds: 1,15)
+study (available: True, pmds: none)
+
+An implementation can be selected manually by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study
+
+Also user can select the study implementation which studies the traffic for
+a specific number of packets by applying all available implementations of
+miniflow extract and then chooses the one with the most optimal result for
+that traffic pattern.
+
+Miniflow Extract Validation
+~~~
+
+As multiple versions of miniflow extract can co-exist, each with different
+CPU ISA optimizations, it is important to validate that they all give the
+exact same results. To easily test all miniflow implementations, an
+``autovalidator`` implementation of the miniflow exists. This implementation
+runs all other available miniflow extract implementations, and verifies that
+the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results.
+
+To set the Miniflow autovalidator, use this command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v10 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-12 Thread Kumar Amber
The study function runs all the available implementations
of miniflow_extract and makes a choice whose hitmask has
maximum hits and sets the mfex to that function.

Study can be run at runtime using the following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set study

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v10:
- fix minor comments from Eelco
v9:
- fix comments Flavio
v8:
- fix review comments Flavio
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add Atomic set in study
---
---
 NEWS  |   3 +
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-study.c   | 136 ++
 lib/dpif-netdev-private-extract.c |  12 +++
 lib/dpif-netdev-private-extract.h |  19 +
 5 files changed, 171 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-study.c

diff --git a/NEWS b/NEWS
index cf254bcfe..4a7b89409 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,9 @@ Post-v2.15.0
  * Add command line option to switch between MFEX function pointers.
  * Add miniflow extract auto-validator function to compare different
miniflow extract implementations against default implementation.
+ * Add study function to miniflow function table which studies packet
+   and automatically chooses the best miniflow implementation for that
+   traffic.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 53b8abc0f..f4f36325e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dp-packet.h \
lib/dp-packet.c \
lib/dpdk.h \
+   lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-lookup.h \
lib/dpif-netdev-lookup.c \
lib/dpif-netdev-lookup-autovalidator.c \
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
new file mode 100644
index 0..eddb35682
--- /dev/null
+++ b/lib/dpif-netdev-extract-study.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-netdev-private-thread.h"
+#include "openvswitch/vlog.h"
+#include "ovs-thread.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
+
+static atomic_uint32_t mfex_study_pkts_count = 0;
+
+/* Struct to hold miniflow study stats. */
+struct study_stats {
+uint32_t pkt_count;
+uint32_t impl_hitcount[MFEX_IMPL_MAX];
+};
+
+/* Define per thread data to hold the study stats. */
+DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
+
+/* Allocate per thread PMD pointer space for study_stats. */
+static inline struct study_stats *
+mfex_study_get_study_stats_ptr(void)
+{
+struct study_stats *stats = study_stats_get();
+if (OVS_UNLIKELY(!stats)) {
+   stats = xzalloc(sizeof *stats);
+   study_stats_set_unsafe(stats);
+}
+return stats;
+}
+
+uint32_t
+mfex_study_traffic(struct dp_packet_batch *packets,
+   struct netdev_flow_key *keys,
+   uint32_t keys_size, odp_port_t in_port,
+   struct dp_netdev_pmd_thread *pmd_handle)
+{
+uint32_t hitmask = 0;
+uint32_t mask = 0;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+struct study_stats *stats = mfex_study_get_study_stats_ptr();
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* Run traffic optimized miniflow_extract to collect the hitmask
+ * to be compared after certain packets have been hit to choose
+ * the best miniflow_extract version for that traffic.
+ */
+for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
+if (!miniflow_funcs[i].available) {
+continue;
+}
+
+hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
+ in_port, pmd_handle);
+stats->impl_hitcount[i] += count_1bits(hitmask);
+
+/* If traffic is not classified then we dont overwrite the keys
+ * array in minfiflow implementations so its safe to create a
+ * mask for all those packets whose miniflow have been created.
+ */
+mask |= hitmask;
+}
+
+

[ovs-dev] [v10 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-07-12 Thread Kumar Amber
This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v9:
- fix review comments Flavio
v6:
-fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove ovs assert and switch to default after a batch of packets
  is processed
- Atomic set and get introduced
- fix raw_ctz for windows build
---
---
 NEWS  |   2 +
 lib/dpif-netdev-private-extract.c | 150 ++
 lib/dpif-netdev-private-extract.h |  22 +
 lib/dpif-netdev.c |   2 +-
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b0f08e96d..cf254bcfe 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,8 @@ Post-v2.15.0
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
  * Add command line option to switch between MFEX function pointers.
+ * Add miniflow extract auto-validator function to compare different
+   miniflow extract implementations against default implementation.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 11d2ed2ec..6c5afd13d 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func = 
NULL;
  */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
 
+[MFEX_IMPL_AUTOVALIDATOR] = {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator", },
+
 [MFEX_IMPL_SCALAR] = {
 .probe = NULL,
 .extract_func = NULL,
@@ -155,3 +160,148 @@ dp_mfex_impl_get_by_name(const char *name, 
miniflow_extract_func *out_func)
 
 return -ENOENT;
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t in_port,
+struct dp_netdev_pmd_thread *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+if (keys_size < cnt) {
+miniflow_extract_func default_func = NULL;
+atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt;
+atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
+VLOG_ERR("Invalid key size supplied, Key_size: %d less than"
+ "batch_size:  %" PRIuSIZE"\n", keys_size, cnt);
+VLOG_ERR("Autovalidatior is disabled.\n");
+return 0;
+}
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized metadata. */
+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+uint32_t batch_failed = 0;
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) {
+if (!mfex_impls[j].available) {
+continue;
+}
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+   keys_size, in_port,
+   pmd_handle);
+
+/* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = raw_ctz(hit_mask);
+/* Set the index in hitmask to Zero. */
+

[ovs-dev] [v10 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Kumar Amber
This patch introduces the MFEX function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allows for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v10:
- fix build errors
- rework default set and atomic global variable
v9:
- fix review comments from Flavio
v7:
- fix review comments(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add enum to hold mfex indexes
- add new get and set implemenatations
- add Atomic set and get
---
---
 NEWS  |   1 +
 lib/automake.mk   |   2 +
 lib/dpif-netdev-avx512.c  |  31 +-
 lib/dpif-netdev-private-extract.c | 157 ++
 lib/dpif-netdev-private-extract.h | 113 +
 lib/dpif-netdev-private-thread.h  |   8 ++
 lib/dpif-netdev.c | 108 +++-
 7 files changed, 415 insertions(+), 5 deletions(-)
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h

diff --git a/NEWS b/NEWS
index 6cdccc715..b0f08e96d 100644
--- a/NEWS
+++ b/NEWS
@@ -32,6 +32,7 @@ Post-v2.15.0
  * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
+ * Add command line option to switch between MFEX function pointers.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 3c9523c1a..53b8abc0f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
lib/dpif-netdev-private-dpif.h \
+   lib/dpif-netdev-private-extract.c \
+   lib/dpif-netdev-private-extract.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 6f9aa8284..7772b7abf 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
  * // do all processing (HWOL->MFEX->EMC->SMC)
  * }
  */
+
+/* Do a batch minfilow extract into keys. */
+uint32_t mf_mask = 0;
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+if (mfex_func) {
+mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
+}
+
 uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
 uint32_t iter = lookup_pkts_bitmask;
 while (iter) {
@@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 pkt_metadata_init(>md, in_port);
 
 struct dp_netdev_flow *f = NULL;
+struct netdev_flow_key *key = [i];
+
+/* Check the minfiflow mask to see if the packet was correctly
+ * classifed by vector mfex else do a scalar miniflow extract
+ * for that packet.
+ */
+bool mfex_hit = !!(mf_mask & (1 << i));
 
 /* Check for a partial hardware offload match. */
 if (hwol_enabled) {
@@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 if (f) {
 rules[i] = >cr;
-pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+/* If AVX512 MFEX already classified the packet, use it. */
+if (mfex_hit) {
+pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
+} else {
+pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+}
+
 pkt_meta[i].bytes = dp_packet_size(packet);
 phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
@@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
-/* Do miniflow extract into keys. */
-struct netdev_flow_key *key = [i];
-miniflow_extract(packet, >mf);
+if (!mfex_hit) {
+/* Do a scalar miniflow extract into keys. */
+miniflow_extract(packet, >mf);
+}
 
 /* Cache TCP and byte values for all packets. */
 pkt_meta[i].bytes = 

[ovs-dev] [109 12/12] dpif-netdev: add mfex options to scalar dpif

2021-07-12 Thread kumar Amber
This commits add the mfex optimized options to be
executed as part of scalar DPIF.

Signed-off-by: kumar Amber 
Acked-by: Flavio Leitner 
---
 lib/dpif-netdev.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index cca211837..14c98e450 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7029,6 +7029,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0,  n_mfex_opt_hit = 0;
 struct dfc_cache *cache = >flow_cache;
 struct dp_packet *packet;
+struct dp_packet_batch single_packet;
 const size_t cnt = dp_packet_batch_size(packets_);
 uint32_t cur_min = pmd->ctx.emc_insert_min;
 const uint32_t recirc_depth = *recirc_depth_get();
@@ -7039,6 +7040,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 size_t map_cnt = 0;
 bool batch_enable = true;
 
+single_packet.count = 1;
+
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+
 atomic_read_relaxed(>dp->smc_enable_db, _enable_db);
 pmd_perf_update_counter(>perf_stats,
 md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV,
@@ -7089,7 +7095,22 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
 }
 }
 
-miniflow_extract(packet, >mf);
+/* Set the count and packet for miniflow_opt with batch_size 1. */
+if ((mfex_func) && (!md_is_valid)) {
+single_packet.packets[0] = packet;
+int mf_ret;
+
+mf_ret = mfex_func(_packet, key, 1, port_no, pmd);
+/* Fallback to original miniflow_extract if there is a miss. */
+if (mf_ret) {
+n_mfex_opt_hit++;
+} else {
+miniflow_extract(packet, >mf);
+}
+} else {
+miniflow_extract(packet, >mf);
+}
+
 key->len = 0; /* Not computed yet. */
 key->hash =
 (md_is_valid == false)
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [109 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles

2021-07-12 Thread kumar Amber
From: Harry van Haaren 

This commit adds 3 new traffic profile implementations to the
existing avx512 miniflow extract infrastructure. The profiles added are:
- Ether()/IP()/TCP()
- Ether()/Dot1Q()/IP()/UDP()
- Ether()/Dot1Q()/IP()/TCP()

The design of the avx512 code here is for scalability to add more
traffic profiles, as well as enabling CPU ISA. Note that an implementation
is primarily adding static const data, which the compiler then specializes
away when the profile specific function is declared below.

As a result, the code is relatively maintainable, and scalable for new
traffic profiles as well as new ISA, and does not lower performance
compared with manually written code for each profile/ISA.

Note that confidence in the correctness of each implementation is
achieved through autovalidation, unit tests with known packets, and
fuzz tested packets.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---

Hi Readers,

If you have a traffic profile you'd like to see accelerated using
avx512 code, please send me an email and we can collaborate on adding
support for it!

Regards, -Harry

---

v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 NEWS  |   2 +
 lib/dpif-netdev-extract-avx512.c  | 152 ++
 lib/dpif-netdev-private-extract.c |  30 ++
 lib/dpif-netdev-private-extract.h |  10 ++
 4 files changed, 194 insertions(+)

diff --git a/NEWS b/NEWS
index 26cd85978..849008a80 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,8 @@ Post-v2.15.0
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
  * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
+ * Add AVX512 based optimized miniflow extract function for traffic type
+   IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index c06e53582..ecb0be70d 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
 
 #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF)
 #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00)
+#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00)
+
+/* VLAN (Dot1Q) patterns and masks. */
+#define PATTERN_DT1Q_MASK   \
+  0x00, 0x00, 0xFF, 0xFF,
+#define PATTERN_DT1Q_IPV4   \
+  0x00, 0x00, 0x08, 0x00,
 
 /* Generator for checking IPv4 ver, ihl, and proto */
 #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \
@@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
   34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */   \
   NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
+/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */
+#define PATTERN_IPV4_TCP_SHUFFLE \
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+  26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+
+#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  38, 39, 40, 41, NU, NU, NU, NU, /* UDP */
+
+#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \
+  /* Ether (2 blocks): Note that *VLAN* type is written here. */  \
+  0,  1,  2,  3,  4,  5,  6,  7, 8,  9, 10, 11, 16, 17,  0,  0,   \
+  /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */   \
+  12, 13, 14, 15, 0, 0, 0, 0, \
+  30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */  \
+  NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */   \
+  NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
 
 /* Generation of K-mask bitmask values, to zero out data in result. Note that
  * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be
@@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, 
__m512i idx, __m512i a)
  * 

[ovs-dev] [109 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract

2021-07-12 Thread kumar Amber
From: Harry van Haaren 

This commit adds AVX512 implementations of miniflow extract.
By using the 64 bytes available in an AVX512 register, it is
possible to convert a packet to a miniflow data-structure in
a small quantity instructions.

The implementation here probes for Ether()/IP()/UDP() traffic,
and builds the appropriate miniflow data-structure for packets
that match the probe.

The implementation here is auto-validated by the miniflow
extract autovalidator, hence its correctness can be easily
tested and verified.

Note that this commit is designed to easily allow addition of new
traffic profiles in a scalable way, without code duplication for
each traffic profile.

Signed-off-by: Harry van Haaren 

---
v9:
- include comments from flavio
v8:
- include documentation on AVX512 MFEX as per Eelco's suggestion
v7:
- fix minor review sentences (Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- inlcude assert for flow abi change
- include assert for offset changes
---
---
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-avx512.c  | 478 ++
 lib/dpif-netdev-private-extract.c |  13 +
 lib/dpif-netdev-private-extract.h |  30 ++
 4 files changed, 522 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-avx512.c

diff --git a/lib/automake.mk b/lib/automake.mk
index f4f36325e..299f81939 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \
$(AM_CFLAGS)
 lib_libopenvswitchavx512_la_SOURCES = \
lib/dpif-netdev-lookup-avx512-gather.c \
+   lib/dpif-netdev-extract-avx512.c \
lib/dpif-netdev-avx512.c
 lib_libopenvswitchavx512_la_LDFLAGS = \
-static
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
new file mode 100644
index 0..c06e53582
--- /dev/null
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -0,0 +1,478 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/*
+ * AVX512 Miniflow Extract.
+ *
+ * This file contains optimized implementations of miniflow_extract()
+ * for specific common traffic patterns. The optimizations allow for
+ * quick probing of a specific packet type, and if a match with a specific
+ * type is found, a shuffle like procedure builds up the required miniflow.
+ *
+ * Process
+ * -
+ *
+ * The procedure is to classify the packet based on the traffic type
+ * using predifined bit-masks and arrage the packet header data using shuffle
+ * instructions to a pre-defined place as required by the miniflow.
+ * This elimates the if-else ladder to identify the packet data and add data
+ * as per protocol which is present.
+ */
+
+#ifdef __x86_64__
+/* Sparse cannot handle the AVX512 instructions. */
+#if !defined(__CHECKER__)
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "flow.h"
+#include "dpdk.h"
+
+#include "dpif-netdev-private-dpcls.h"
+#include "dpif-netdev-private-extract.h"
+#include "dpif-netdev-private-flow.h"
+
+/* AVX512-BW level permutex2var_epi8 emulation. */
+static inline __m512i
+__attribute__((target("avx512bw")))
+_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask,
+   __m512i v_data_0,
+   __m512i v_shuf_idxs,
+   __m512i v_data_1)
+{
+/* Manipulate shuffle indexes for u16 size. */
+__mmask64 k_mask_odd_lanes = 0x;
+/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */
+__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes,
+v_shuf_idxs,
+_mm512_setzero_si512());
+v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1);
+
+__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9);
+
+/* Shuffle each half at 16-bit width. */
+__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn,
+v_data_1);
+__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd,
+v_data_1);
+
+/* Find if the shuffle index was odd, via mask and compare. */
+uint16_t index_odd_mask = 0x1;
+const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask);
+
+/* EVEN lanes, find if u8 index was odd,  result as u16 bitmask. */
+__m512i 

[ovs-dev] [109 09/12] dpdk: add additional CPU ISA detection strings

2021-07-12 Thread kumar Amber
From: Harry van Haaren 

This commit enables OVS to at runtime check for more detailed
AVX512 capabilities, specifically Byte and Word (BW) extensions,
and Vector Bit Manipulation Instructions (VBMI).

These instructions will be used in the CPU ISA optimized
implementations of traffic profile aware miniflow extract.

Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 
Acked-by: Flavio Leitner 
---
 NEWS   | 1 +
 lib/dpdk.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/NEWS b/NEWS
index 581bff225..26cd85978 100644
--- a/NEWS
+++ b/NEWS
@@ -40,6 +40,7 @@ Post-v2.15.0
traffic.
  * Add build time configure command to enable auto-validatior as default
miniflow implementation at build time.
+ * Cache results for CPU ISA checks, reduces overhead on repeated lookups.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpdk.c b/lib/dpdk.c
index 9de2af58e..1b8f8e55b 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature)
 #if __x86_64__
 /* CPU flags only defined for the architecture that support it. */
 CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
+CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW);
+CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI);
 CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ);
 CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);
 #endif
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [109 08/12] dpif/stats: add miniflow extract opt hits counter

2021-07-12 Thread kumar Amber
From: Harry van Haaren 

This commit adds a new counter to be displayed to the user when
requesting datapath packet statistics. It counts the number of
packets that are parsed and a miniflow built up from it by the
optimized miniflow extract parsers.

The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an
extra entry indicating if the optimized MFEX was hit:

  - MFEX Opt hits:6786432  (100.0 %)

Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 
---
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 lib/dpif-netdev-avx512.c|  3 +++
 lib/dpif-netdev-perf.c  |  3 +++
 lib/dpif-netdev-perf.h  |  1 +
 lib/dpif-netdev-unixctl.man |  4 
 lib/dpif-netdev.c   | 12 +++-
 tests/pmd.at|  6 --
 6 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 7772b7abf..544d36903 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 
 /* At this point we don't return error anymore, so commit stats here. */
+uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits);
+pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT,
+mfex_hit_cnt);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits);
 pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT,
diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c
index 7103a2d4d..d7676ea2b 100644
--- a/lib/dpif-netdev-perf.c
+++ b/lib/dpif-netdev-perf.c
@@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 "  Rx packets:%12"PRIu64"  (%.0f Kpps, %.0f cycles/pkt)\n"
 "  Datapath passes:   %12"PRIu64"  (%.2f passes/pkt)\n"
 "  - PHWOL hits:  %12"PRIu64"  (%5.1f %%)\n"
+"  - MFEX Opt hits:   %12"PRIu64"  (%5.1f %%)\n"
 "  - EMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - SMC hits:%12"PRIu64"  (%5.1f %%)\n"
 "  - Megaflow hits:   %12"PRIu64"  (%5.1f %%, %.2f "
@@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct 
pmd_perf_stats *s,
 passes, rx_packets ? 1.0 * passes / rx_packets : 0,
 stats[PMD_STAT_PHWOL_HIT],
 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes,
+stats[PMD_STAT_MFEX_OPT_HIT],
+100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes,
 stats[PMD_STAT_EXACT_HIT],
 100.0 * stats[PMD_STAT_EXACT_HIT] / passes,
 stats[PMD_STAT_SMC_HIT],
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 8b1a52387..834c26260 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -57,6 +57,7 @@ extern "C" {
 
 enum pmd_stat_type {
 PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */
+PMD_STAT_MFEX_OPT_HIT,  /* Packets that had miniflow optimized match. */
 PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */
 PMD_STAT_SMC_HIT,   /* Packets that had a sig match hit (SMC). */
 PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */
diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man
index 83ce4f1c5..f34758416 100644
--- a/lib/dpif-netdev-unixctl.man
+++ b/lib/dpif-netdev-unixctl.man
@@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a 
recirculated packet
 experiences one additional lookup per recirculation, so there may be
 more lookups than forwarded packets in the datapath.
 
+The MFEX Opt hits displays the number of packets which is processed by the
+optimized miniflow extract implementations.
+
 Cycles are counted using the TSC or similar facilities (when available on
 the platform). The duration of one cycle depends on the processing platform.
 
@@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1:
   Rx packets: 2399607  (2381 Kpps, 848 cycles/pkt)
   Datapath passes:3599415  (1.50 passes/pkt)
   - PHWOL hits: 0  (  0.0 %)
+  - MFEX Opt hits:3570133  ( 99.5 %)
   - EMC hits:  336472  (  9.3 %)
   - SMC hits:   0  ( 0.0 %)
   - Megaflow hits:3262943  ( 90.7 %, 1.00 subtbl lookups/hit)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1132a0ad5..cca211837 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply,
   "  packet recirculations: %"PRIu64"\n"
   "  avg. datapath passes per packet: %.02f\n"
   "  phwol hits: %"PRIu64"\n"
+  "  mfex opt hits: 

[ovs-dev] [109 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator

2021-07-12 Thread kumar Amber
From: Kumar Amber 

Tests:
  6: OVS-DPDK - MFEX Autovalidator
  7: OVS-DPDK - MFEX Autovalidator Fuzzy

Added a new directory to store the PCAP file used
in the tests and a script to generate the fuzzy traffic
type pcap to be used in fuzzy unit test.

Signed-off-by: Kumar Amber 
Acked-by: Flavio Leitner 
---
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove sleep from first test and added minor 5 sec sleep to fuzzy
---
---
 Documentation/topics/dpdk/bridge.rst |  55 +++
 tests/.gitignore |   1 +
 tests/automake.mk|   5 +++
 tests/mfex_fuzzy.py  |  31 +++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/system-dpdk.at |  49 
 6 files changed, 141 insertions(+)
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 662446401..7b81d0305 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -345,3 +345,58 @@ A compile time option is available in order to test it 
with the OVS unit
 test suite. Use the following configure option ::
 
 $ ./configure --enable-mfex-default-autovalidator
+
+Unit Test Miniflow Extract
+++
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator
+
+The unit test uses mulitple traffic type to test the correctness of the
+implementaions.
+
+Running Fuzzy test with Autovalidator
++
+
+Fuzzy tests can also be done on miniflow extract with the help of
+auto-validator and Scapy. The steps below describes the steps to
+reproduce the setup with IP being fuzzed to generate packets.
+
+Scapy is used to create fuzzy IP packets and save them into a PCAP ::
+
+pkt = fuzz(Ether()/IP()/TCP())
+
+Set the miniflow extract to autovalidator using ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+OVS is configured to receive the generated packets ::
+
+$ ovs-vsctl add-port br0 pcap0 -- \
+set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0
+"rx_pcap=fuzzy.pcap"
+
+With this workflow, the autovalidator will ensure that all MFEX
+implementations are classifying each packet in exactly the same way.
+If an optimized MFEX implementation causes a different miniflow to be
+generated, the autovalidator has ovs_assert and logging statements that
+will inform about the issue.
+
+Unit Fuzzy test with Autovalidator
++
+
+The prerquiste before running the unit test is to run the script provided ::
+
+tests/mfex_fuzzy.py
+
+This script generates a pcap with mulitple type of fuzzed packets to be used
+in the below unit test-case.
+
+Unit test can also be used to test the workflow mentioned above by running
+the following test-case in tests/system-dpdk.at ::
+
+make check-dpdk TESTSUITEFLAGS='-k MFEX'
+OVS-DPDK - MFEX Autovalidator Fuzzy
diff --git a/tests/.gitignore b/tests/.gitignore
index 45b4f67b2..a3d927e5d 100644
--- a/tests/.gitignore
+++ b/tests/.gitignore
@@ -11,6 +11,7 @@
 /ovsdb-cluster-testsuite
 /ovsdb-cluster-testsuite.dir/
 /ovsdb-cluster-testsuite.log
+/pcap/
 /pki/
 /system-afxdp-testsuite
 /system-afxdp-testsuite.dir/
diff --git a/tests/automake.mk b/tests/automake.mk
index f45f8d76c..2bcf054b0 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk
echo "TEST_FUZZ_REGRESSION([$$basename])"; \
done > $@.tmp && mv $@.tmp $@
 
+EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS)
+MFEX_AUTOVALIDATOR_TESTS = \
+   tests/pcap/mfex_test.pcap \
+   tests/mfex_fuzzy.py
+
 OVSDB_CLUSTER_TESTSUITE_AT = \
tests/ovsdb-cluster-testsuite.at \
tests/ovsdb-execution.at \
diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py
new file mode 100755
index 0..395158b0d
--- /dev/null
+++ b/tests/mfex_fuzzy.py
@@ -0,0 +1,31 @@
+#!/usr/bin/python3
+try:
+   from scapy.all import *
+except ModuleNotFoundError as err:
+   print(err + ": Scapy")
+import sys
+
+path = str(sys.argv[1]) + "/pcap/fuzzy.pcap"
+pktdump = PcapWriter(path, append=False, sync=True)
+
+for i in range(0, 2000):
+
+   # Generate random protocol bases, use a fuzz() over the combined packet for 
full fuzzing.
+   eth = Ether(src=RandMAC(), dst=RandMAC())
+   vlan = Dot1Q()
+   ipv4 = IP(src=RandIP(), dst=RandIP())
+   ipv6 = IPv6(src=RandIP6(), dst=RandIP6())
+   udp = UDP(dport=RandShort(), sport=RandShort())
+   tcp = TCP(dport=RandShort(), sport=RandShort())
+
+   # IPv4 packets with fuzzing
+   pktdump.write(fuzz(eth/ipv4/udp))
+   pktdump.write(fuzz(eth/ipv4/tcp))
+  

[ovs-dev] [109 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-12 Thread kumar Amber
From: Kumar Amber 

This commit introduces additional command line paramter
for mfex study function. If user provides additional packet out
it is used in study to compare minimum packets which must be processed
else a default value is choosen.
Also introduces a third paramter for choosing a particular pmd core.

$ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3

Signed-off-by: Kumar Amber 

---
v10:
- fix review comments Eelco
v9:
- fix review comments Flavio
v7:
- change the command paramters for core_id and study_pkt_cnt
v5:
- fix review comments(Ian, Flavio, Eelco)
- introucde pmd core id parameter
---
---
 Documentation/topics/dpdk/bridge.rst |  37 +++-
 lib/dpif-netdev-extract-study.c  |  25 +-
 lib/dpif-netdev-private-extract.h|   9 ++
 lib/dpif-netdev.c| 128 +--
 4 files changed, 187 insertions(+), 12 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 0fa9341ac..662446401 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -284,12 +284,45 @@ command also shows whether the CPU supports each 
implementation ::
 
 An implementation can be selected manually by the following command ::
 
-$ ovs-appctl dpif-netdev/miniflow-parser-set study
+$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
+ [study_cnt]
+
+The above command has two optional parameters: study_cnt and core_id.
+The core_id sets a particular miniflow extract function to a specific
+pmd thread on the core.The third parameter study_cnt, which is specific
+to study and ignored by other implementations, means how many packets
+are needed to choose the best implementation.
 
 Also user can select the study implementation which studies the traffic for
 a specific number of packets by applying all available implementations of
 miniflow extract and then chooses the one with the most optimal result for
-that traffic pattern.
+that traffic pattern. The user can optionally provide an packet count
+[study_cnt] parameter which is the minimum number of packets that OVS must
+study before choosing an optimal implementation. If no packet count is
+provided, then the default value, 128 is chosen. Also, as there is no
+synchronization point between threads, one PMD thread might still be running
+a previous round, and can now decide on earlier data.
+
+The per packet count is a global value, and parallel study() executions with
+differing packet counts will use the most recent count value provided by usser.
+
+Study can be selected with packet count by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
+
+Study can be selected with packet count and explicit PMD selection
+by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
+
+In the above command the last parameter is the CORE ID of the PMD
+thread and this can also be used to explicitly set the miniflow
+extraction function pointer on different PMD threads.
+
+Scalar can be selected on core 3 by the following command where
+study count can be put as any arbitrary number or left blank::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
 
 Miniflow Extract Validation
 ~~~
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
index eddb35682..61260cb70 100644
--- a/lib/dpif-netdev-extract-study.c
+++ b/lib/dpif-netdev-extract-study.c
@@ -25,7 +25,7 @@
 
 VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
 
-static atomic_uint32_t mfex_study_pkts_count = 0;
+static atomic_uint32_t  mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
 
 /* Struct to hold miniflow study stats. */
 struct study_stats {
@@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void)
 return stats;
 }
 
+int
+mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name)
+{
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* If the packet count is set and implementation called is study then
+ * set packet counter to requested number else set the packet counter
+ * to default number.
+ */
+if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
+(pkt_cmp_count != 0)) {
+
+mfex_study_pkts_count = pkt_cmp_count;
+
+return 0;
+}
+
+return -EINVAL;
+}
+
 uint32_t
 mfex_study_traffic(struct dp_packet_batch *packets,
struct netdev_flow_key *keys,
@@ -86,7 +107,7 @@ mfex_study_traffic(struct dp_packet_batch *packets,
 /* Choose the best implementation after a minimum packets have been
  * processed.
  */
-if (stats->pkt_count >= MFEX_MAX_PKT_COUNT) {
+if (stats->pkt_count >= mfex_study_pkts_count) {
 uint32_t best_func_index = MFEX_IMPL_START_IDX;
 uint32_t max_hits = 0;
 for (int i = 

[ovs-dev] [109 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-12 Thread kumar Amber
From: Kumar Amber 

This commit adds a new command to allow the user to enable
autovalidatior by default at build time thus allowing for
runnig unit test by default.

 $ ./configure --enable-mfex-default-autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v10:
- rework default set
v9:
- fix review comments Flavio
v7:
- fix review commens(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst |  5 +
 NEWS |  3 ++-
 acinclude.m4 | 16 
 configure.ac |  1 +
 lib/dpif-netdev-private-extract.c|  4 
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 6f37f2a75..0fa9341ac 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -307,3 +307,8 @@ implementations provide the same results.
 To set the Miniflow autovalidator, use this command ::
 
 $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
+
+A compile time option is available in order to test it with the OVS unit
+test suite. Use the following configure option ::
+
+$ ./configure --enable-mfex-default-autovalidator
diff --git a/NEWS b/NEWS
index 4a7b89409..581bff225 100644
--- a/NEWS
+++ b/NEWS
@@ -38,6 +38,8 @@ Post-v2.15.0
  * Add study function to miniflow function table which studies packet
and automatically chooses the best miniflow implementation for that
traffic.
+ * Add build time configure command to enable auto-validatior as default
+   miniflow implementation at build time.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
@@ -57,7 +59,6 @@ Post-v2.15.0
  whether the SNAT with all-zero IP address is supported.
  See ovs-vswitchd.conf.db(5) for details.
 
-
 v2.15.0 - 15 Feb 2021
 -
- OVSDB:
diff --git a/acinclude.m4 b/acinclude.m4
index 343303447..5a48f0335 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -14,6 +14,22 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
+dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
+dnl This enables automatically running all unit tests with all MFEX
+dnl implementations.
+AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
+  AC_ARG_ENABLE([mfex-default-autovalidator],
+[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable 
MFEX autovalidator as default miniflow_extract implementation.])],
+[autovalidator=yes],[autovalidator=no])
+  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
+  if test "$autovalidator" != yes; then
+AC_MSG_RESULT([no])
+  else
+OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
+AC_MSG_RESULT([yes])
+  fi
+])
+
 dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
 dnl This enables automatically running all unit tests with all DPCLS
 dnl implementations.
diff --git a/configure.ac b/configure.ac
index e45685a6c..46c402892 100644
--- a/configure.ac
+++ b/configure.ac
@@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
 OVS_CTAGS_IDENTIFIERS
 OVS_CHECK_DPCLS_AUTOVALIDATOR
 OVS_CHECK_DPIF_AVX512_DEFAULT
+OVS_CHECK_MFEX_AUTOVALIDATOR
 OVS_CHECK_BINUTILS_AVX512
 
 AC_ARG_VAR(KARCH, [Kernel Architecture String])
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 64745f66c..f007a7a80 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -60,7 +60,11 @@ void
 dpif_miniflow_extract_init(void)
 {
 atomic_uintptr_t *mfex_func = (void *)_mfex_func;
+#ifdef MFEX_AUTOVALIDATOR_DEFAULT
+int mfex_idx = MFEX_IMPL_AUTOVALIDATOR;
+#else
 int mfex_idx = MFEX_IMPL_SCALAR;
+#endif
 
 /* Call probe on each impl, and cache the result. */
 for (int i = 0; i < MFEX_IMPL_MAX; i++) {
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [109 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-07-12 Thread kumar Amber
From: Kumar Amber 

This commit adds a section to the dpdk/bridge.rst netdev documentation,
detailing the added miniflow functionality. The newly added commands are
documented, and sample output is provided.

The use of auto-validator and special study function is also described
in detail as well as running fuzzy tests.

Signed-off-by: Kumar Amber 
Co-authored-by: Cian Ferriter 
Signed-off-by: Cian Ferriter 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Flavio Leitner 

---
v10:
- fix minor typos.
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
---
---
 Documentation/topics/dpdk/bridge.rst | 51 
 1 file changed, 51 insertions(+)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 2d0850836..6f37f2a75 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -256,3 +256,54 @@ The following line should be seen in the configure output 
when the above option
 is used ::
 
 checking whether DPIF AVX512 is default implementation... yes
+
+Miniflow Extract
+
+
+Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
+important header information into a compressed miniflow. This miniflow is
+composed of bits and blocks where the bits signify which blocks are set or
+have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
+values are used by the datapath for switching decisions later.The Optimized
+miniflow extract is traffic specific to speed up the lookup, whereas the
+scalar works for ALL traffic patterns
+
+Most modern CPUs have SIMD capabilities. These SIMD instructions are able
+to process a vector rather than act one single data. OVS provides multiple
+implementations of miniflow extract. This allows the user to take advantage
+of SIMD instructions like AVX512 to gain additional performance.
+
+A list of implementations can be obtained by the following command. The
+command also shows whether the CPU supports each implementation ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-get
+Available Optimized Miniflow Extracts:
+autovalidator (available: True, pmds: none)
+scalar (available: True, pmds: 1,15)
+study (available: True, pmds: none)
+
+An implementation can be selected manually by the following command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set study
+
+Also user can select the study implementation which studies the traffic for
+a specific number of packets by applying all available implementations of
+miniflow extract and then chooses the one with the most optimal result for
+that traffic pattern.
+
+Miniflow Extract Validation
+~~~
+
+As multiple versions of miniflow extract can co-exist, each with different
+CPU ISA optimizations, it is important to validate that they all give the
+exact same results. To easily test all miniflow implementations, an
+``autovalidator`` implementation of the miniflow exists. This implementation
+runs all other available miniflow extract implementations, and verifies that
+the results are identical.
+
+Running the OVS unit tests with the autovalidator enabled ensures all
+implementations provide the same results.
+
+To set the Miniflow autovalidator, use this command ::
+
+$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [109 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-12 Thread kumar Amber
From: Kumar Amber 

The study function runs all the available implementations
of miniflow_extract and makes a choice whose hitmask has
maximum hits and sets the mfex to that function.

Study can be run at runtime using the following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set study

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 
Acked-by: Eelco Chaudron 

---
v10:
- fix minor comments from Eelco
v9:
- fix comments Flavio
v8:
- fix review comments Flavio
v7:
- fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add Atomic set in study
---
---
 NEWS  |   3 +
 lib/automake.mk   |   1 +
 lib/dpif-netdev-extract-study.c   | 136 ++
 lib/dpif-netdev-private-extract.c |  12 +++
 lib/dpif-netdev-private-extract.h |  19 +
 5 files changed, 171 insertions(+)
 create mode 100644 lib/dpif-netdev-extract-study.c

diff --git a/NEWS b/NEWS
index cf254bcfe..4a7b89409 100644
--- a/NEWS
+++ b/NEWS
@@ -35,6 +35,9 @@ Post-v2.15.0
  * Add command line option to switch between MFEX function pointers.
  * Add miniflow extract auto-validator function to compare different
miniflow extract implementations against default implementation.
+ * Add study function to miniflow function table which studies packet
+   and automatically chooses the best miniflow implementation for that
+   traffic.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 53b8abc0f..f4f36325e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
lib/dp-packet.h \
lib/dp-packet.c \
lib/dpdk.h \
+   lib/dpif-netdev-extract-study.c \
lib/dpif-netdev-lookup.h \
lib/dpif-netdev-lookup.c \
lib/dpif-netdev-lookup-autovalidator.c \
diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
new file mode 100644
index 0..eddb35682
--- /dev/null
+++ b/lib/dpif-netdev-extract-study.c
@@ -0,0 +1,136 @@
+/*
+ * Copyright (c) 2021 Intel.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "dpif-netdev-private-thread.h"
+#include "openvswitch/vlog.h"
+#include "ovs-thread.h"
+
+VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
+
+static atomic_uint32_t mfex_study_pkts_count = 0;
+
+/* Struct to hold miniflow study stats. */
+struct study_stats {
+uint32_t pkt_count;
+uint32_t impl_hitcount[MFEX_IMPL_MAX];
+};
+
+/* Define per thread data to hold the study stats. */
+DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
+
+/* Allocate per thread PMD pointer space for study_stats. */
+static inline struct study_stats *
+mfex_study_get_study_stats_ptr(void)
+{
+struct study_stats *stats = study_stats_get();
+if (OVS_UNLIKELY(!stats)) {
+   stats = xzalloc(sizeof *stats);
+   study_stats_set_unsafe(stats);
+}
+return stats;
+}
+
+uint32_t
+mfex_study_traffic(struct dp_packet_batch *packets,
+   struct netdev_flow_key *keys,
+   uint32_t keys_size, odp_port_t in_port,
+   struct dp_netdev_pmd_thread *pmd_handle)
+{
+uint32_t hitmask = 0;
+uint32_t mask = 0;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct dpif_miniflow_extract_impl *miniflow_funcs;
+struct study_stats *stats = mfex_study_get_study_stats_ptr();
+miniflow_funcs = dpif_mfex_impl_info_get();
+
+/* Run traffic optimized miniflow_extract to collect the hitmask
+ * to be compared after certain packets have been hit to choose
+ * the best miniflow_extract version for that traffic.
+ */
+for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
+if (!miniflow_funcs[i].available) {
+continue;
+}
+
+hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
+ in_port, pmd_handle);
+stats->impl_hitcount[i] += count_1bits(hitmask);
+
+/* If traffic is not classified then we dont overwrite the keys
+ * array in minfiflow implementations so its safe to create a
+ * mask for all those packets whose miniflow have been created.
+ */
+mask |= hitmask;
+  

[ovs-dev] [109 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread kumar Amber
From: Kumar Amber 

This patch introduces the MFEX function pointers which allows
the user to switch between different miniflow extract implementations
which are provided by the OVS based on optimized ISA CPU.

The user can query for the available minflow extract variants available
for that CPU by following commands:

$ovs-appctl dpif-netdev/miniflow-parser-get

Similarly an user can set the miniflow implementation by the following
command :

$ ovs-appctl dpif-netdev/miniflow-parser-set name

This allows for more performance and flexibility to the user to choose
the miniflow implementation according to the needs.

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v10:
- fix build errors
- rework default set and atomic global variable
v9:
- fix review comments from Flavio
v7:
- fix review comments(Eelco, Flavio)
v5:
- fix review comments(Ian, Flavio, Eelco)
- add enum to hold mfex indexes
- add new get and set implemenatations
- add Atomic set and get
---
---
 NEWS  |   1 +
 lib/automake.mk   |   2 +
 lib/dpif-netdev-avx512.c  |  31 +-
 lib/dpif-netdev-private-extract.c | 157 ++
 lib/dpif-netdev-private-extract.h | 113 +
 lib/dpif-netdev-private-thread.h  |   8 ++
 lib/dpif-netdev.c | 108 +++-
 7 files changed, 415 insertions(+), 5 deletions(-)
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h

diff --git a/NEWS b/NEWS
index 6cdccc715..b0f08e96d 100644
--- a/NEWS
+++ b/NEWS
@@ -32,6 +32,7 @@ Post-v2.15.0
  * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
+ * Add command line option to switch between MFEX function pointers.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/automake.mk b/lib/automake.mk
index 3c9523c1a..53b8abc0f 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/dpif-netdev-private-dpcls.h \
lib/dpif-netdev-private-dpif.c \
lib/dpif-netdev-private-dpif.h \
+   lib/dpif-netdev-private-extract.c \
+   lib/dpif-netdev-private-extract.h \
lib/dpif-netdev-private-flow.h \
lib/dpif-netdev-private-thread.h \
lib/dpif-netdev-private.h \
diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
index 6f9aa8284..7772b7abf 100644
--- a/lib/dpif-netdev-avx512.c
+++ b/lib/dpif-netdev-avx512.c
@@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
  * // do all processing (HWOL->MFEX->EMC->SMC)
  * }
  */
+
+/* Do a batch minfilow extract into keys. */
+uint32_t mf_mask = 0;
+miniflow_extract_func mfex_func;
+atomic_read_relaxed(>miniflow_extract_opt, _func);
+if (mfex_func) {
+mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
+}
+
 uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
 uint32_t iter = lookup_pkts_bitmask;
 while (iter) {
@@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 pkt_metadata_init(>md, in_port);
 
 struct dp_netdev_flow *f = NULL;
+struct netdev_flow_key *key = [i];
+
+/* Check the minfiflow mask to see if the packet was correctly
+ * classifed by vector mfex else do a scalar miniflow extract
+ * for that packet.
+ */
+bool mfex_hit = !!(mf_mask & (1 << i));
 
 /* Check for a partial hardware offload match. */
 if (hwol_enabled) {
@@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 if (f) {
 rules[i] = >cr;
-pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+/* If AVX512 MFEX already classified the packet, use it. */
+if (mfex_hit) {
+pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
+} else {
+pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
+}
+
 pkt_meta[i].bytes = dp_packet_size(packet);
 phwol_hits++;
 hwol_emc_smc_hitmask |= (1 << i);
@@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
*pmd,
 }
 }
 
-/* Do miniflow extract into keys. */
-struct netdev_flow_key *key = [i];
-miniflow_extract(packet, >mf);
+if (!mfex_hit) {
+/* Do a scalar miniflow extract into keys. */
+miniflow_extract(packet, >mf);
+}
 
 /* Cache TCP and byte values for all packets. */
 

[ovs-dev] [109 02/12] dpif-netdev: Add auto validation function for miniflow extract

2021-07-12 Thread kumar Amber
From: Kumar Amber 

This patch introduced the auto-validation function which
allows users to compare the batch of packets obtained from
different miniflow implementations against the linear
miniflow extract and return a hitmask.

The autovaidator function can be triggered at runtime using the
following command:

$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator

Signed-off-by: Kumar Amber 
Co-authored-by: Harry van Haaren 
Signed-off-by: Harry van Haaren 

---
v9:
- fix review comments Flavio
v6:
-fix review comments(Eelco)
v5:
- fix review comments(Ian, Flavio, Eelco)
- remove ovs assert and switch to default after a batch of packets
  is processed
- Atomic set and get introduced
- fix raw_ctz for windows build
---
---
 NEWS  |   2 +
 lib/dpif-netdev-private-extract.c | 150 ++
 lib/dpif-netdev-private-extract.h |  22 +
 lib/dpif-netdev.c |   2 +-
 4 files changed, 175 insertions(+), 1 deletion(-)

diff --git a/NEWS b/NEWS
index b0f08e96d..cf254bcfe 100644
--- a/NEWS
+++ b/NEWS
@@ -33,6 +33,8 @@ Post-v2.15.0
CPU supports it. This enhances performance by using the native vpopcount
instructions, instead of the emulated version of vpopcount.
  * Add command line option to switch between MFEX function pointers.
+ * Add miniflow extract auto-validator function to compare different
+   miniflow extract implementations against default implementation.
- ovs-ctl:
  * New option '--no-record-hostname' to disable hostname configuration
in ovsdb on startup.
diff --git a/lib/dpif-netdev-private-extract.c 
b/lib/dpif-netdev-private-extract.c
index 11d2ed2ec..6c5afd13d 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func = 
NULL;
  */
 static struct dpif_miniflow_extract_impl mfex_impls[] = {
 
+[MFEX_IMPL_AUTOVALIDATOR] = {
+.probe = NULL,
+.extract_func = dpif_miniflow_extract_autovalidator,
+.name = "autovalidator", },
+
 [MFEX_IMPL_SCALAR] = {
 .probe = NULL,
 .extract_func = NULL,
@@ -155,3 +160,148 @@ dp_mfex_impl_get_by_name(const char *name, 
miniflow_extract_func *out_func)
 
 return -ENOENT;
 }
+
+uint32_t
+dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
+struct netdev_flow_key *keys,
+uint32_t keys_size, odp_port_t in_port,
+struct dp_netdev_pmd_thread *pmd_handle)
+{
+const size_t cnt = dp_packet_batch_size(packets);
+uint16_t good_l2_5_ofs[NETDEV_MAX_BURST];
+uint16_t good_l3_ofs[NETDEV_MAX_BURST];
+uint16_t good_l4_ofs[NETDEV_MAX_BURST];
+uint16_t good_l2_pad_size[NETDEV_MAX_BURST];
+struct dp_packet *packet;
+struct dp_netdev_pmd_thread *pmd = pmd_handle;
+struct netdev_flow_key test_keys[NETDEV_MAX_BURST];
+
+if (keys_size < cnt) {
+miniflow_extract_func default_func = NULL;
+atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt;
+atomic_store_relaxed(pmd_func, (uintptr_t) default_func);
+VLOG_ERR("Invalid key size supplied, Key_size: %d less than"
+ "batch_size:  %" PRIuSIZE"\n", keys_size, cnt);
+VLOG_ERR("Autovalidatior is disabled.\n");
+return 0;
+}
+
+/* Run scalar miniflow_extract to get default result. */
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+pkt_metadata_init(>md, in_port);
+miniflow_extract(packet, [i].mf);
+
+/* Store known good metadata to compare with optimized metadata. */
+good_l2_5_ofs[i] = packet->l2_5_ofs;
+good_l3_ofs[i] = packet->l3_ofs;
+good_l4_ofs[i] = packet->l4_ofs;
+good_l2_pad_size[i] = packet->l2_pad_size;
+}
+
+uint32_t batch_failed = 0;
+/* Iterate through each version of miniflow implementations. */
+for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) {
+if (!mfex_impls[j].available) {
+continue;
+}
+/* Reset keys and offsets before each implementation. */
+memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key));
+DP_PACKET_BATCH_FOR_EACH (i, packet, packets) {
+dp_packet_reset_offsets(packet);
+}
+/* Call optimized miniflow for each batch of packet. */
+uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys,
+   keys_size, in_port,
+   pmd_handle);
+
+/* Do a miniflow compare for bits, blocks and offsets for all the
+ * classified packets in the hitmask marked by set bits. */
+while (hit_mask) {
+/* Index for the set bit. */
+uint32_t i = raw_ctz(hit_mask);
+/* Set the index in hitmask to 

[ovs-dev] [v10 00/12] MFEX Infrastructure + Optimizations

2021-07-12 Thread kumar Amber
v10 update:
- re-worked the default implementation
- fix comments from Flavio and Eelco
- Include Acks from Eelco in study
v9 update:
- Include review comments from Flavio
- Rebase onto Master
- Include Acks from Flavio
v8 updates:
- Include documentation on AVX512 MFEX as per Eelco's suggestion on list
v7 updates:
- Rebase onto DPIF v15
- Changed commands to get and set MFEX
- Fixed comments from Flavio, Eelco
- Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF
- Removed sleep from auto-validator and added frame counter check
- Documentation updates
- Minor bug fixes
v6 updates:
- Fix non-ssl build
v5 updates:
- reabse onto latest DPIF v14
- use Enum for mfex impls
- add pmd core id set paramter in set command
- get command modified to display the pmd thread for individual mfex functions
- resolved comments from Eelco, Ian, Flavio
- Use Atomic to get and set miniflow implementations
- removed and reduced sleep in unit tests
- fixed scalar miniflow perf degradation
v4 updates:
- rebase on to latest DPIF v13
- fix fuzzy.py script with random mac/ip
v3 updates:
- rebase on to latest DPIF v12
- add additonal AVX512 traffic profiles for tcp and vlan
- add new command line for study function to add packet count
- add unit tests for fuzzy testing and auto-validation of mfex
- add mfex option hit stats to perf-show command
v2 updates:
- rebase on to latest DPIF v11
This patchset introduces miniflow extract Infrastructure changes
which allows user to choose different type of ISA based optimized
miniflow extract variants which can be user choosen or set based on 
packets studies automatically by OVS using different commands.
The Infrastructure also provides a way to check the correctness of
different ISA optimized miniflow extract variants against the scalar
version.

Harry van Haaren (4):
  dpif/stats: add miniflow extract opt hits counter
  dpdk: add additional CPU ISA detection strings
  dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
  dpif-netdev/mfex: add more AVX512 traffic profiles

Kumar Amber (7):
  dpif-netdev: Add command line and function pointer for miniflow
extract
  dpif-netdev: Add auto validation function for miniflow extract
  dpif-netdev: Add study function to select the best mfex function
  docs/dpdk/bridge: add miniflow extract section.
  dpif-netdev: Add configure to enable autovalidator at build time.
  dpif-netdev: Add packet count and core id paramters for study
  test/sytem-dpdk: Add unit test for mfex autovalidator

kumar Amber (1):
  dpif-netdev: add mfex options to scalar dpif

 Documentation/topics/dpdk/bridge.rst | 144 ++
 NEWS |  12 +-
 acinclude.m4 |  16 +
 configure.ac |   1 +
 lib/automake.mk  |   4 +
 lib/dpdk.c   |   2 +
 lib/dpif-netdev-avx512.c |  34 +-
 lib/dpif-netdev-extract-avx512.c | 630 +++
 lib/dpif-netdev-extract-study.c  | 157 +++
 lib/dpif-netdev-perf.c   |   3 +
 lib/dpif-netdev-perf.h   |   1 +
 lib/dpif-netdev-private-extract.c| 366 
 lib/dpif-netdev-private-extract.h| 203 +
 lib/dpif-netdev-private-thread.h |   8 +
 lib/dpif-netdev-unixctl.man  |   4 +
 lib/dpif-netdev.c| 255 ++-
 tests/.gitignore |   1 +
 tests/automake.mk|   5 +
 tests/mfex_fuzzy.py  |  31 ++
 tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes
 tests/pmd.at |   6 +-
 tests/system-dpdk.at |  49 +++
 22 files changed, 1918 insertions(+), 14 deletions(-)
 create mode 100644 lib/dpif-netdev-extract-avx512.c
 create mode 100644 lib/dpif-netdev-extract-study.c
 create mode 100644 lib/dpif-netdev-private-extract.c
 create mode 100644 lib/dpif-netdev-private-extract.h
 create mode 100755 tests/mfex_fuzzy.py
 create mode 100644 tests/pcap/mfex_test.pcap

-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 2/2] netdev-offload-dpdk: Fix ethernet type for VLANs

2021-07-12 Thread Eli Britstein
For VLANs, the match of ethernet type should be specified in inner_type
field of the vlan match, and not type field in ethernet match.
Fix it.

Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow")
Signed-off-by: Eli Britstein 
Reviewed-by: Salem Sol 
---
 lib/netdev-offload-dpdk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 3e0d0643b..65f9b3685 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -1106,12 +1106,13 @@ parse_flow_match(struct netdev *netdev,
 spec->tci = match->flow.vlans[0].tci & ~htons(VLAN_CFI);
 mask->tci = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI);
 
-/* Match any protocols. */
-mask->inner_type = 0;
-
 if (eth_spec && eth_mask) {
 eth_spec->has_vlan = 1;
 eth_mask->has_vlan = 1;
+spec->inner_type = eth_spec->type;
+mask->inner_type = eth_mask->type;
+eth_spec->type = match->flow.vlans[0].tpid;
+eth_mask->type = match->wc.masks.vlans[0].tpid;
 }
 
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_VLAN, spec, mask);
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 1/2] netdev-offload-dpdk: Use has_vlan match attribute

2021-07-12 Thread Eli Britstein
DPDK 20.11 introduced an ability to specify existance/non-existance of
VLAN tag by [1].
Use this attribute.

[1]: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN items")

Signed-off-by: Eli Britstein 
Reviewed-by: Salem Sol 
---
 lib/netdev-offload-dpdk.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index e7913292e..3e0d0643b 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -210,6 +210,8 @@ dump_flow_pattern(struct ds *s,
 
 ds_put_cstr(s, "eth ");
 if (eth_spec) {
+uint32_t has_vlan_mask;
+
 if (!eth_mask) {
 eth_mask = _flow_item_eth_mask;
 }
@@ -222,6 +224,9 @@ dump_flow_pattern(struct ds *s,
 DUMP_PATTERN_ITEM(eth_mask->type, "type", "0x%04"PRIx16,
   ntohs(eth_spec->type),
   ntohs(eth_mask->type));
+has_vlan_mask = eth_mask->has_vlan ? UINT32_MAX : 0;
+DUMP_PATTERN_ITEM(has_vlan_mask, "has_vlan", "%d",
+  eth_spec->has_vlan, eth_mask->has_vlan);
 }
 ds_put_cstr(s, "/ ");
 } else if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) {
@@ -1037,6 +1042,7 @@ parse_flow_match(struct netdev *netdev,
  struct flow_patterns *patterns,
  struct match *match)
 {
+struct rte_flow_item_eth *eth_spec = NULL, *eth_mask = NULL;
 struct flow *consumed_masks;
 uint8_t proto = 0;
 
@@ -1082,6 +1088,11 @@ parse_flow_match(struct netdev *netdev,
 memset(_masks->dl_src, 0, sizeof consumed_masks->dl_src);
 consumed_masks->dl_type = 0;
 
+spec->has_vlan = 0;
+mask->has_vlan = 1;
+eth_spec = spec;
+eth_mask = mask;
+
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH, spec, mask);
 }
 
@@ -1098,6 +1109,11 @@ parse_flow_match(struct netdev *netdev,
 /* Match any protocols. */
 mask->inner_type = 0;
 
+if (eth_spec && eth_mask) {
+eth_spec->has_vlan = 1;
+eth_mask->has_vlan = 1;
+}
+
 add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_VLAN, spec, mask);
 }
 /* For untagged matching match->wc.masks.vlans[0].tci is 0x and
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-12 Thread Amber, Kumar
Hi Eelco,

Pls find my comments inline.

> -Original Message-
> From: Eelco Chaudron 
> Sent: Monday, July 12, 2021 9:17 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v9 06/12] dpif-netdev: Add packet count and core id paramters 
> for
> study
> 
> 
> 
> On 12 Jul 2021, at 7:51, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This commit introduces additional command line paramter for mfex study
> > function. If user provides additional packet out it is used in study
> > to compare minimum packets which must be processed else a default
> > value is choosen.
> > Also introduces a third paramter for choosing a particular pmd core.
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
> >
> > Signed-off-by: Kumar Amber 
> >
> > ---
> > v9:
> > - fix review comments Flavio
> > v7:
> > - change the command paramters for core_id and study_pkt_cnt
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - introucde pmd core id parameter
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  39 -
> >  lib/dpif-netdev-extract-study.c  |  26 +-
> >  lib/dpif-netdev-private-extract.h|   9 ++
> >  lib/dpif-netdev.c| 121 +--
> >  4 files changed, 181 insertions(+), 14 deletions(-)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index 4db416ddd..c31067c51 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -284,12 +284,45 @@ command also shows whether the CPU supports
> each implementation ::
> >
> >  An implementation can be selected manually by the following command ::
> >
> > -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> > + [study_cnt]
> >
> > -Also user can select the study implementation which studies the
> > traffic for
> > +The above command has two optional parameters: study_cnt and core_id.
> > +The core_id set a particular miniflow extract function to a specific
> 
> The core_id sets
> 
> > +pmd thread on the core. Third parameter study_cnt, which is specific
> 
> The third parameter
> 

Fixed both typos.

> > +to study and ignored by other implementations, means how many packets
> > +are needed to choose the best implementation.
> > +
> > +The user can select the study implementation which studies the
> > +traffic for
> >  a specific number of packets by applying all available implementaions
> > of
> 
> implementations
> 
> >  miniflow extract and than chooses the one with most optimal result
> > for that
> 
> and then chooses ... with the most optimal
> 

Fixed these 2 as well.

> > -traffic pattern.
> > +traffic pattern. The user can optionally provide an packet count
> > +[study_cnt] parameter which is the minimum number of packets that OVS
> > +must study before choosing an optimal implementation. If no packet
> > +count is provided, then the default value, 128 is chosen. Also, as
> > +there is no synchronization point between threads, one PMD thread
> > +might still be running a previous round, and can now decide on earlier 
> > data.
> > +
> > +The per packet count is a global value, and parallel `study()`
> > +executions with
> 
> Should study() just be study?
>

Changed
 
> > +differing packet counts will use the most recent count value provided by
> usser.
> > +
> > +Study can be selected with packet count by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> > +
> > +Study can be selected with packet count and explicit PMD selection by
> > +the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> > +
> > +In the above command the last parameter is the CORE ID of the PMD
> > +thread and this can also be used to explicitly set the miniflow
> > +extraction function pointer on different PMD threads.
> > +
> > +Scalar can be selected on core 3 by the following command where study
> > +count can be put as any arbitary number or left blank::
> 
> arbitrary
> 

Fixed.
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
> >
> >  Miniflow Extract Validation
> >  ~~~
> > diff --git a/lib/dpif-netdev-extract-study.c
> > b/lib/dpif-netdev-extract-study.c index a19759bd9..2dc3faf83 100644
> > --- a/lib/dpif-netdev-extract-study.c
> > +++ b/lib/dpif-netdev-extract-study.c
> > @@ -25,7 +25,7 @@
> >
> >  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> >
> > -static uint32_t mfex_study_pkts_count = 0;
> > +static uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
> >
> >  /* Struct to hold miniflow study stats. */  struct study_stats { @@
> > -48,6 +48,28 @@ mfex_study_get_study_stats_ptr(void)
> >  return stats;
> 

Re: [ovs-dev] [v9 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-12 Thread Amber, Kumar
Hi Eelco,

Fixed all and reworked default.

> -Original Message-
> From: Eelco Chaudron 
> Sent: Monday, July 12, 2021 6:50 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v9 05/12] dpif-netdev: Add configure to enable autovalidator at
> build time.
> 
> 
> 
> On 12 Jul 2021, at 7:51, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This commit adds a new command to allow the user to enable
> > autovalidatior by default at build time thus allowing for runnig unit
> > test by default.
> >
> >  $ ./configure --enable-mfex-default-autovalidator
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> > v9:
> > - fix review comments Flavio
> > v7:
> > - fix review commens(Eelco, Flavio)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst |  5 +
> >  NEWS |  3 ++-
> >  acinclude.m4 | 16 
> >  configure.ac |  1 +
> >  lib/dpif-netdev-private-extract.c|  8 ++--
> >  5 files changed, 30 insertions(+), 3 deletions(-)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index 7c618cf1f..4db416ddd 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -307,3 +307,8 @@ implementations provide the same results.
> >  To set the Miniflow autovalidator, use this command ::
> >
> >  $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> > +
> > +A compile time option is available in order to test it with the OVS
> > +unit test suite. Use the following configure option ::
> > +
> > +$ ./configure --enable-mfex-default-autovalidator
> > diff --git a/NEWS b/NEWS
> > index 4a7b89409..581bff225 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -38,6 +38,8 @@ Post-v2.15.0
> >   * Add study function to miniflow function table which studies packet
> > and automatically chooses the best miniflow implementation for that
> > traffic.
> > + * Add build time configure command to enable auto-validatior as 
> > default
> > +   miniflow implementation at build time.
> > - ovs-ctl:
> >   * New option '--no-record-hostname' to disable hostname configuration
> > in ovsdb on startup.
> > @@ -57,7 +59,6 @@ Post-v2.15.0
> >   whether the SNAT with all-zero IP address is supported.
> >   See ovs-vswitchd.conf.db(5) for details.
> >
> > -
> 
> You are removing a white space here unrelated to your changes. Please leave it
> in.
> 
> >  v2.15.0 - 15 Feb 2021
> >  -
> > - OVSDB:
> > diff --git a/acinclude.m4 b/acinclude.m4 index 343303447..5a48f0335
> > 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -14,6 +14,22 @@
> >  # See the License for the specific language governing permissions and
> > # limitations under the License.
> >
> > +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
> > +dnl This enables automatically running all unit tests with all MFEX
> > +dnl implementations.
> > +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
> > +  AC_ARG_ENABLE([mfex-default-autovalidator],
> > +[AC_HELP_STRING([--enable-mfex-default-autovalidator], 
> > [Enable
> MFEX autovalidator as default miniflow_extract implementation.])],
> > +[autovalidator=yes],[autovalidator=no])
> > +  AC_MSG_CHECKING([whether MFEX Autovalidator is default
> > +implementation])
> > +  if test "$autovalidator" != yes; then
> > +AC_MSG_RESULT([no])
> > +  else
> > +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
> > +AC_MSG_RESULT([yes])
> > +  fi
> > +])
> > +
> >  dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
> >  dnl This enables automatically running all unit tests with all DPCLS
> > dnl implementations.
> > diff --git a/configure.ac b/configure.ac index e45685a6c..46c402892
> > 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
> >  OVS_CTAGS_IDENTIFIERS
> >  OVS_CHECK_DPCLS_AUTOVALIDATOR
> >  OVS_CHECK_DPIF_AVX512_DEFAULT
> > +OVS_CHECK_MFEX_AUTOVALIDATOR
> >  OVS_CHECK_BINUTILS_AVX512
> >
> >  AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git
> > a/lib/dpif-netdev-private-extract.c
> > b/lib/dpif-netdev-private-extract.c
> > index 4ea111f94..ad71f238e 100644
> > --- a/lib/dpif-netdev-private-extract.c
> > +++ b/lib/dpif-netdev-private-extract.c
> > @@ -77,20 +77,24 @@ dp_mfex_impl_get_default(void)  {
> >  atomic_uintptr_t *mfex_func = (void *)_mfex_func;
> >  static bool default_mfex_func_set = false;
> > +#ifdef MFEX_AUTOVALIDATOR_DEFAULT
> > +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR; #else
> >  int mfex_idx = MFEX_IMPL_SCALAR;
> > +#endif
> >
> >  

Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs

2021-07-12 Thread Amber, Kumar
Hi Flavio,

All fixed ready to merge 

> -Original Message-
> From: Flavio Leitner 
> Sent: Monday, July 12, 2021 11:53 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org
> Subject: Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs
> 
> 
> Hi Kumar,
> 
> There is an issue with the signed-offs reported by 0-day Robot.
> For additional info, please check the link below and look for the tag Co-
> authored-by:
> https://github.com/openvswitch/ovs/blob/master/Documentation/internals/co
> ntributing/submitting-patches.rst#tags
> 
> Otherwise the patch looks good time.
> Thanks,
> fbl
> 
> On Mon, Jul 12, 2021 at 11:44:05AM +0530, kumar Amber wrote:
> > From: Harry van Haaren 
> >
> > This commit avoids many instances of "using subtable X for miniflow (x,y)"
> > in the ovs-vswitchd log when using the DPCLS Autovalidator. This
> > occurs when no specialized subtable is found, and the generic "_any"
> > version of the avx512 subtable search implementation was used. This
> > change logs the subtable usage once, avoiding duplicates.
> >
> > Signed-off-by: Harry van Haaren 
> > Signed-off-by: kumar Amber 
> >
> > ---
> > v4:
> > - add doc updtae from Flavio
> > v3:
> > - add comments from Flavio
> > - add documentation update
> > ---
> >  Documentation/topics/dpdk/bridge.rst   | 34 ++
> >  lib/dpif-netdev-lookup-avx512-gather.c |  4 +--
> >  2 files changed, 36 insertions(+), 2 deletions(-)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index 0f70a0cad..374e03eb0 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is
> > not used. Put in logical  terms, a subtable is chosen if its priority
> > is greater than the previous  best candidate.
> >
> > +Optimizing Specific Subtable Search
> > +~~~
> > +
> > +During the packet classification, the datapath can use specialized
> > +lookup tables to optimize the search. However, not all situations are
> > +optimized. If you see a message like the following one in the OVS
> > +logs, it means that there is no specialized implementation available
> > +for the current networking traffic. In this case, OVS will continue
> > +to process the traffic normally using a more generic lookup table."
> > +
> > +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly 
> > others."
> > +
> > +(Note that the numbers 4 and 1 will likely be different in your logs)
> > +
> > +Additional specialized lookups can be added to OVS if the user
> > +provides that log message along with the command output as show below
> > +to the OVS mailing list. Note that the numbers in the log message
> > +("subtable (X,Y)") need to match with the numbers in the provided
> > +command output ("dp-extra-info:miniflow_bits(X,Y)").
> > +
> > +"ovs-appctl dpctl/dump-flows -m", which results in output like this:
> > +
> > +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5,
> skb_priority(0/0),skb_mark(0/0)
> > +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
> > +
> dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
> > +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
> > +
> 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
> > +
> ,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
> > +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
> > +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
> > +
> > +Please send an email to the OVS mailing list ovs-dev@openvswitch.org
> > +with the output of the "dp-extra-info:miniflow_bits(4,1)" values.
> > +
> >  CPU ISA Testing and Validation
> >  ~~
> >
> > diff --git a/lib/dpif-netdev-lookup-avx512-gather.c
> > b/lib/dpif-netdev-lookup-avx512-gather.c
> > index bc359dc4a..ced846aa7 100644
> > --- a/lib/dpif-netdev-lookup-avx512-gather.c
> > +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> > @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t
> u0_bits, uint32_t u1_bits)
> >   */
> >  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
> >  f = dpcls_avx512_gather_mf_any;
> > -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> > -  u0_bits, u1_bits);
> > +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable"
> > +   " (%d,%d) and possibly others.", u0_bits,
> > + u1_bits);
> >  }
> >
> >  return f;
> > --
> > 2.25.1
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> --
> fbl
___
dev mailing list
d...@openvswitch.org

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Amber, Kumar
Hi Eelco, Flavio ,

Pls find my comments Inline.



> > +/* For the first call, this will be choosen based on the
> > + * compile time flag and if nor flag is set it is set to
> > + * default scalar.
> > + */
> > +if (OVS_UNLIKELY(!default_mfex_func_set)) {
> > +VLOG_INFO("Default MFEX implementation is %s.\n",
> > +  mfex_impls[mfex_idx].name);
> > +atomic_store_relaxed(mfex_func, (uintptr_t) mfex_impls
> > + [mfex_idx].extract_func);
> > +default_mfex_func_set = true;
> 
> If this only needs to be done once, why not move it to
> dpif_miniflow_extract_init() as suggested during the v6 review (in a later 
> patch)?
> This will remove this check every time dp_mfex_impl_get_default() gets called.
> 

Yes it does make .

> > +}
> > +
> > +return default_mfex_func;
> > +}
> > +
> > +int
> > +dp_mfex_impl_set_default_by_name(const char *name) {
> > +miniflow_extract_func new_default;
> > +atomic_uintptr_t *mfex_func = (void *)_mfex_func;
> > +
> > +int err = dp_mfex_impl_get_by_name(name, _default);
> > +
> > +if (!err) {
> > +atomic_store_relaxed(mfex_func, (uintptr_t) new_default);
> > +}
> > +
> > +return err;
> > +
> > +}
> > +
> > +void
> > +dp_mfex_impl_get(struct ds *reply, struct dp_netdev_pmd_thread
> **pmd_list,
> > + size_t pmd_list_size) {
> > +/* Add all MFEX functions to reply string. */
> > +ds_put_cstr(reply, "Available MFEX implementations:\n");
> > +
> > +for (int i = 0; i < MFEX_IMPL_MAX; i++) {
> > +ds_put_format(reply, "  %s (available: %s pmds: ",
> > +  mfex_impls[i].name, mfex_impls[i].available ?
> > +  "True" : "False");
> 
> Flavio mentioned that True/False did not make sense to an end-user, not sure 
> if
> he has the same feeling here?
> Maybe yes/no make more sense here? Flavio?
> 

Changes to available same as previous comments.

> > +
> > +for (size_t j = 0; j < pmd_list_size; j++) {
> > +struct dp_netdev_pmd_thread *pmd = pmd_list[j];
> > +if (pmd->core_id == NON_PMD_CORE_ID) {
> > +continue;
> > +}
> > +
> > +if (pmd->miniflow_extract_opt == mfex_impls[i].extract_func) {
> > +ds_put_format(reply, "%u,", pmd->core_id);
> > +}
> > +}
> > +
> > +ds_chomp(reply, ',');
> > +
> > +if (ds_last(reply) == ' ') {
> > +ds_put_cstr(reply, "none");
> > +}
> > +
> > +ds_put_cstr(reply, ")\n");
> > +}
> > +
> > +}
> > +
> > +/* This function checks all available MFEX implementations, and
> > +selects and
> > + * returns the function pointer to the one requested by "name". If
> > +nothing
> > + * is found it reutrns error.
> 
> reutrns -> returns
> 

Fixed.
> > + */
> > +int
> > +dp_mfex_impl_get_by_name(const char *name, miniflow_extract_func
> > +*out_func) {
> > +if ((name == NULL) || (out_func == NULL)) {
> > +return -EINVAL;
> > +}
> > +
> > +for (int i = 0; i < MFEX_IMPL_MAX; i++) {
> > +if (strcmp(mfex_impls[i].name, name) == 0) {
> > +/* Check available is set before exec. */
> > +if (!mfex_impls[i].available) {
> > +*out_func = NULL;
> > +return -ENODEV;
> > +}
> > +
> > +*out_func = mfex_impls[i].extract_func;
> > +return 0;
> > +}
> > +}
> > +
> > +return -ENOENT;
> > +}
> > diff --git a/lib/dpif-netdev-private-extract.h
> > b/lib/dpif-netdev-private-extract.h
> > new file mode 100644
> > index 0..ddf2e2845
> > --- /dev/null
> > +++ b/lib/dpif-netdev-private-extract.h
> > @@ -0,0 +1,111 @@
> > +/*
> > + * Copyright (c) 2021 Intel.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing,
> > +software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions
> > +and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef MFEX_AVX512_EXTRACT
> > +#define MFEX_AVX512_EXTRACT 1
> > +
> > +#include 
> > +
> > +/* Forward declarations. */
> > +struct dp_packet;
> > +struct miniflow;
> > +struct dp_netdev_pmd_thread;
> > +struct dp_packet_batch;
> > +struct netdev_flow_key;
> > +
> > +/* Function pointer prototype to be implemented in the optimized
> > +miniflow
> > + * extract code.
> > + * returns the hitmask of the processed packets on success.
> > + * returns zero on failure.
> > + */
> > +typedef uint32_t 

Re: [ovs-dev] [v9 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-07-12 Thread Amber, Kumar
Hi Eelco 
fixed all typos in  v10

> -Original Message-
> From: Eelco Chaudron 
> Sent: Monday, July 12, 2021 6:37 PM
> To: Amber, Kumar 
> Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van
> Haaren, Harry ; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v9 04/12] docs/dpdk/bridge: add miniflow extract section.
> 
> 
> 
> On 12 Jul 2021, at 7:51, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This commit adds a section to the dpdk/bridge.rst netdev
> > documentation, detailing the added miniflow functionality. The newly
> > added commands are documented, and sample output is provided.
> >
> > The use of auto-validator and special study function is also described
> > in detail as well as running fuzzy tests.
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Cian Ferriter 
> > Signed-off-by: Cian Ferriter 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> > Acked-by: Flavio Leitner 
> >
> > ---
> > v7:
> > - fix review comments(Eelco)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > ---
> > ---
> >  Documentation/topics/dpdk/bridge.rst | 51
> > 
> >  1 file changed, 51 insertions(+)
> >
> > diff --git a/Documentation/topics/dpdk/bridge.rst
> > b/Documentation/topics/dpdk/bridge.rst
> > index 2d0850836..7c618cf1f 100644
> > --- a/Documentation/topics/dpdk/bridge.rst
> > +++ b/Documentation/topics/dpdk/bridge.rst
> > @@ -256,3 +256,54 @@ The following line should be seen in the
> > configure output when the above option  is used ::
> >
> >  checking whether DPIF AVX512 is default implementation... yes
> > +
> > +Miniflow Extract
> > +
> > +
> > +Miniflow extract (MFEX) performs parsing of the raw packets and
> > +extracts the important header information into a compressed miniflow.
> > +This miniflow is composed of bits and blocks where the bits signify
> > +which blocks are set or have values where as the blocks hold the
> > +metadata, ip, udp, vlan, etc. These values are used by the datapath
> > +for switching decisions later.The Optimized miniflow extract is
> > +traffic specific to speed up the lookup, whereas the scalar works for
> > +ALL traffic patterns
> > +
> > +Most modern CPUs have SIMD capabilities. These SIMD instructions are
> > +able to process a vector rather than act on one single data.
> 
> This sounds odd “rather than act on one single data.”?
> 
> > OVS provides multiple
> > +implementations of miniflow extract. This allows the user to take
> > +advantage of SIMD instructions like AVX512 to gain additional performance.
> > +
> > +A list of implementations can be obtained by the following command.
> > +The command also shows whether the CPU supports each implementation ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-get
> > +Available Optimized Miniflow Extracts:
> > +autovalidator (available: True, pmds: none)
> > +scalar (available: True, pmds: 1,15)
> > +study (available: True, pmds: none)
> > +
> > +An implementation can be selected manually by the following command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set study
> > +
> > +Also user can select the study implementation which studies the
> > +traffic for a specific number of packets by applying all available
> > +implementaions of
> 
> implementations
> 
> > +miniflow extract and than chooses the one with most optimal result
> > +for that
> 
> than -> then
> 
> most optimal -> the most optimal
> 
> > +traffic pattern.
> > +
> > +Miniflow Extract Validation
> > +~~~
> > +
> > +As multiple versions of miniflow extract can co-exist, each with
> > +different CPU ISA optimizations, it is important to validate that
> > +they all give the exact same results. To easily test all miniflow
> > +implementations, an ``autovalidator`` implementation of the miniflow
> > +exists. This implementation runs all other available miniflow extract
> > +implementations, and verifies that the results are identical.
> > +
> > +Running the OVS unit tests with the autovalidator enabled ensures all
> > +implementations provide the same results.
> > +
> > +To set the Miniflow autovalidator, use this command ::
> > +
> > +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> > --
> > 2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-12 Thread Amber, Kumar
Hi Eelco ,

Thanks for Ack. Will also fix the minor changes in next patch 


> > +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
> > +VLOG_DBG("MFEX study results for implementation %s:"
> > + " (hits %d/%d pkts)",
> > + miniflow_funcs[i].name, stats->impl_hitcount[i],
> > + stats->pkt_count);
> > +}
> 
> For all logs above “hits %d/%d pkts” should be “hits %u/%u pkts”
> 

Sure.

> > +}
> > +
> > +/* Reset stats so that study function can be called again
> > + * for next traffic type and optimal function ptr can be
> > + * chosen.
> > + */
> > +memset(stats, 0, sizeof(struct study_stats));
> > +}
> > +return mask;
> > +}
> > diff --git a/lib/dpif-netdev-private-extract.c
> > b/lib/dpif-netdev-private-extract.c
> > index 19f350349..4ea111f94 100644
> > --- a/lib/dpif-netdev-private-extract.c
> > +++ b/lib/dpif-netdev-private-extract.c
> > @@ -47,6 +47,11 @@ static struct dpif_miniflow_extract_impl mfex_impls[] =
> {
> >  .probe = NULL,
> >  .extract_func = NULL,
> >  .name = "scalar", },
> > +
> > +[MFEX_IMPL_STUDY] = {
> > +.probe = NULL,
> > +.extract_func = mfex_study_traffic,
> > +.name = "study", },
> >  };
> >
> >  BUILD_ASSERT_DECL(MFEX_IMPL_MAX == ARRAY_SIZE(mfex_impls)); @@ -
> 166,6
> > +171,12 @@ dp_mfex_impl_get_by_name(const char *name,
> miniflow_extract_func *out_func)
> >  return -ENOENT;
> >  }
> >
> > +void
> > +dpif_mfex_impl_info_get(struct dpif_miniflow_extract_impl **out_ptr)
> > +{
> > +*out_ptr = mfex_impls;
> > +}
> 
> If we are only interested in getting a pointer, why not just return it:
> 
>   struct dpif_miniflow_extract_impl *dpif_mfex_impl_info_get(void) {
>   return mfex_impls;
>   }
>

True.
 
> >  uint32_t
> >  dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets,
> >  struct netdev_flow_key *keys,
> > diff --git a/lib/dpif-netdev-private-extract.h
> > b/lib/dpif-netdev-private-extract.h
> > index de3270c88..32b7ccbb3 100644
> > --- a/lib/dpif-netdev-private-extract.h
> > +++ b/lib/dpif-netdev-private-extract.h
> > @@ -80,6 +80,7 @@ struct dpif_miniflow_extract_impl {  enum
> > dpif_miniflow_extract_impl_idx {
> >  MFEX_IMPL_AUTOVALIDATOR,
> >  MFEX_IMPL_SCALAR,
> > +MFEX_IMPL_STUDY,
> >  MFEX_IMPL_MAX
> >  };
> >
> > @@ -89,6 +90,9 @@ enum dpif_miniflow_extract_impl_idx {
> >
> >  #define MFEX_IMPL_START_IDX MFEX_IMPL_MAX
> >
> > +/* Max count of packets to be compared. */ #define
> MFEX_MAX_PKT_COUNT
> > +(128)
> > +
> >  /* This function returns all available implementations to the caller. The
> >   * quantity of implementations is returned by the int return value.
> >   */
> > @@ -109,6 +113,13 @@ miniflow_extract_func
> > dp_mfex_impl_get_default(void);
> >  /* Overrides the default MFEX with the user set MFEX. */  int
> > dp_mfex_impl_set_default_by_name(const char *name);
> >
> > +/* Retrieve the array of miniflow implementations for iteration.
> > + * On error, returns a negative number.
> > + * On success, returns the size of the arrays pointed to by the out 
> > parameter.
> > + */
> > +void
> > +dpif_mfex_impl_info_get(struct dpif_miniflow_extract_impl **out_ptr);
> > +
> >
> >  /* Initializes the available miniflow extract implementations by probing 
> > for
> >   * the CPU ISA requirements. As the runtime available CPU ISA does
> > not change @@ -130,4 +141,16 @@
> dpif_miniflow_extract_autovalidator(struct dp_packet_batch *batch,
> >  uint32_t keys_size, odp_port_t in_port,
> >  struct dp_netdev_pmd_thread
> > *pmd_handle);
> >
> > +/* Retrieve the number of packets by studying packets using different
> > +miniflow
> > + * implementations to choose the best implementation using the
> > +maximum hitmask
> > + * count.
> > + * On error, returns a zero for no packets.
> > + * On success, returns mask of the packets hit.
> > + */
> > +uint32_t
> > +mfex_study_traffic(struct dp_packet_batch *packets,
> > +   struct netdev_flow_key *keys,
> > +   uint32_t keys_size, odp_port_t in_port,
> > +   struct dp_netdev_pmd_thread *pmd_handle);
> > +
> >  #endif /* MFEX_AVX512_EXTRACT */
> > --
> > 2.25.1

Regards
Amber

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [v5] dpif/dpcls: limit count subtable search info logs

2021-07-12 Thread kumar Amber
From: Harry van Haaren 

This commit avoids many instances of "using subtable X for miniflow (x,y)"
in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
when no specialized subtable is found, and the generic "_any" version of
the avx512 subtable search implementation was used. This change logs the
subtable usage once, avoiding duplicates.

Signed-off-by: Harry van Haaren 
Signed-off-by: kumar Amber 
Co-authored-by: kumar Amber 

---
v5:
- fix checkpatch error
v4:
- add doc updtae from Flavio
v3:
- add comments from Flavio
- add documentation update
---
 Documentation/topics/dpdk/bridge.rst   | 34 ++
 lib/dpif-netdev-lookup-avx512-gather.c |  4 +--
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/Documentation/topics/dpdk/bridge.rst 
b/Documentation/topics/dpdk/bridge.rst
index 0f70a0cad..374e03eb0 100644
--- a/Documentation/topics/dpdk/bridge.rst
+++ b/Documentation/topics/dpdk/bridge.rst
@@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is not 
used. Put in logical
 terms, a subtable is chosen if its priority is greater than the previous
 best candidate.
 
+Optimizing Specific Subtable Search
+~~~
+
+During the packet classification, the datapath can use specialized
+lookup tables to optimize the search. However, not all situations
+are optimized. If you see a message like the following one in the OVS
+logs, it means that there is no specialized implementation available
+for the current networking traffic. In this case, OVS will continue
+to process the traffic normally using a more generic lookup table."
+
+"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others."
+
+(Note that the numbers 4 and 1 will likely be different in your logs)
+
+Additional specialized lookups can be added to OVS if the user
+provides that log message along with the command output as show
+below to the OVS mailing list. Note that the numbers in the log
+message ("subtable (X,Y)") need to match with the numbers in
+the provided command output ("dp-extra-info:miniflow_bits(X,Y)").
+
+"ovs-appctl dpctl/dump-flows -m", which results in output like this:
+
+ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, skb_priority(0/0),skb_mark(0/0)
+,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
+dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
+00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
+0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
+,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
+dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
+actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
+
+Please send an email to the OVS mailing list ovs-dev@openvswitch.org with
+the output of the "dp-extra-info:miniflow_bits(4,1)" values.
+
 CPU ISA Testing and Validation
 ~~
 
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
b/lib/dpif-netdev-lookup-avx512-gather.c
index bc359dc4a..ced846aa7 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
uint32_t u1_bits)
  */
 if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
 f = dpcls_avx512_gather_mf_any;
-VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
-  u0_bits, u1_bits);
+VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable"
+   " (%d,%d) and possibly others.", u0_bits, u1_bits);
 }
 
 return f;
-- 
2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix the description for LB's skip_snat option.

2021-07-12 Thread Han Zhou
On Mon, Jul 12, 2021 at 12:42 PM Numan Siddique  wrote:
>
> On Fri, Jul 9, 2021 at 2:49 PM Han Zhou  wrote:
> >
> > lb_force_snat_ip is a flag set in logical flow pipeline, while
> > lb_force_snat_ip is the option configured in NB DB.  In NB document we
> > should mention the actual option configured in NB instead of the flow
> > details.
> >
> > Signed-off-by: Han Zhou 
>
> Acked-by: Numan Siddique 
>
> Numan

Thanks Numan! Applied.
>
> > ---
> >  ovn-nb.xml | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/ovn-nb.xml b/ovn-nb.xml
> > index b6a0d1f43..d5efbb33e 100644
> > --- a/ovn-nb.xml
> > +++ b/ovn-nb.xml
> > @@ -1712,8 +1712,9 @@
> >
> >
> >  If the load balancing rule is configured with
skip_snat
> > -option, the force_snat_for_lb option configured for the router
> > -pipeline will not be applied for this load balancer.
> > +option, the option lb_force_snat_ip configured for the logical
router
> > +that references this load balancer will not be applied for
this load
> > +balancer.
> >
> >
> >
> > --
> > 2.30.2
> >
> > ___
> > dev mailing list
> > d...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn 1/2] system-test: Fix flake in ECMP IPv6 symmetric reply test

2021-07-12 Thread Han Zhou
On Mon, Jul 12, 2021 at 10:28 AM Mark Gray  wrote:
>
> Statically add IPv6 neighbor MAC addresses to avoid NS messages
> evicting datapath flows causing occasional test failures.
>
> We also configure all interfaces to send only one IPv6 router
> solicitation message. These messages can cause datapath flows
> to be unexpectedly evicted causing test failures.
>
> Fixes: 7c927c0c0be1 ("ovn-northd: Fix IPv6 ECMP symmetric reply flows")
> Signed-off-by: Mark Gray 
> ---
>  tests/system-ovn.at | 51 +
>  1 file changed, 28 insertions(+), 23 deletions(-)
>
> diff --git a/tests/system-ovn.at b/tests/system-ovn.at
> index 79879c6e003b..fc377bbd1a47 100644
> --- a/tests/system-ovn.at
> +++ b/tests/system-ovn.at
> @@ -5833,17 +5833,35 @@ ovn-nbctl lr-route-add R3 fd01::/64 fd02::1
>
>  # Logical port 'alice1' in switch 'alice'.
>  ADD_NAMESPACES(alice1)
> +# Only send 1 router solicitation as any additional ones can cause
datapath
> +# flows to get evicted, causing unexpected failures below.
> +NS_CHECK_EXEC([alice1], [sysctl -w
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
> +net.ipv6.conf.default.router_solicitations = 1
> +])
>  ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \
>   "fd01::1")
>  OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep
tentative)" = ""])
>  ovn-nbctl lsp-add alice alice1 \
>  -- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2"
> +# Add neighbour MAC address to avoid sending IPv6 NS messages which could
> +# cause datapath flows to be evicted
> +NS_CHECK_EXEC([alice1], [ip -6 neigh add fd01::1 lladdr
00:00:01:01:02:03 dev alice1], [0])
>
>  # Logical port 'bob1' in switch 'bob'.
>  ADD_NAMESPACES(bob1)
> +# Only send 1 router solicitation as any additional ones can cause
datapath
> +# flows to get evicted, causing unexpected failures below.
> +NS_CHECK_EXEC([bob1], [sysctl -w
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
> +net.ipv6.conf.default.router_solicitations = 1
> +])
>  ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \
>   "fd07::2")
>  OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep
tentative)" = ""])
> +# Add neighbour MAC addresses to avoid sending IPv6 NS messages which
could
> +# cause datapath flows to be evicted
> +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::2 lladdr 00:00:02:01:02:03
dev bob1], [0])
> +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::3 lladdr 00:00:01:01:02:04
dev bob1], [0])
> +
>  ovn-nbctl lsp-add bob bob1 \
>  -- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1"
>
> @@ -5852,45 +5870,32 @@ ovn-nbctl --wait=hv sync
>
>  on_exit 'ovs-ofctl dump-flows br-int'
>
> -# Later in this test we will check for a datapath flow that matches:
> -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)".
Due
> -# to the way OVS generates datapath flows with wildcards, ICMPv6 NS
flows will
> -# evict this datapath flow. In order to ensure that the flow does not
> -# get evicted, we send one ping packet in order to carry out neighbor
> -# discovery. We then flush the datpath to remove the NS flows so that
the flow
> -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)"
will
> -# be present when we check for it.
> -NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING],
\
> -[0], [dnl
> -2 packets transmitted, 2 received, 0% packet loss, time 0ms
> -])
> -ovs-appctl dpctl/del-flows
> -
>  # 'bob1' should be able to ping 'alice1' directly.
>  NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 |
FORMAT_PING], \
>  [0], [dnl
>  20 packets transmitted, 20 received, 0% packet loss, time 0ms
>  ])
>
> -# Ensure conntrack entry is present. We should not try to predict
> -# the tunnel key for the output port, so we strip it from the labels
> -# and just ensure that the known ethernet address is present.
> -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
> -sed -e 's/zone=[[0-9]]*/zone=/' |
> -sed -e
's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'],
[0], [dnl
>
-icmpv6,orig=(src=fd07::1,dst=fd01::2,id=,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=,type=129,code=0),zone=,labels=0x04010204
> -])
> -
>  # Ensure datapaths show conntrack states as expected
>  # Like with conntrack entries, we shouldn't try to predict
>  # port binding tunnel keys. So omit them from expected labels.
>  AT_CHECK([ovs-appctl dpctl/dump-flows | grep
'ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)' -c],
[0], [dnl
>  1
>  ])
> +
>  AT_CHECK([ovs-appctl dpctl/dump-flows | grep
'ct_state(-new+est+rpl+trk).*ct_label(0x.*04010204/.*)' -c],
[0], [dnl
>  1
>  ])
>
> +# Ensure conntrack entry is present. We should not try to predict
> +# the tunnel key for the output port, so we strip it from the labels
> +# and just ensure that the known ethernet address is present.
> +AT_CHECK([ovs-appctl 

Re: [ovs-dev] [PATCH v2 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)

2021-07-12 Thread Ilya Maximets
On 6/25/21 3:33 PM, Dumitru Ceara wrote:
> On 6/12/21 3:59 AM, Ilya Maximets wrote:
>> Replication can be used to scale out read-only access to the database.
>> But there are clients that are not read-only, but read-mostly.
>> One of the main examples is ovn-controller that mostly monitors
>> updates from the Southbound DB, but needs to claim ports by sending
>> transactions that changes some database tables.
>>
>> Southbound database serves lots of connections: all connections
>> from ovn-controllers and some service connections from cloud
>> infrastructure, e.g. some OpenStack agents are monitoring updates.
>> At a high scale and with a big size of the database ovsdb-server
>> spends too much time processing monitor updates and it's required
>> to move this load somewhere else.  This patch-set aims to introduce
>> required functionality to scale out read-mostly connections by
>> introducing a new OVSDB 'relay' service model .
>>
>> In this new service model ovsdb-server connects to existing OVSDB
>> server and maintains in-memory copy of the database.  It serves
>> read-only transactions and monitor requests by its own, but forwards
>> write transactions to the relay source.
>>
>> Key differences from the active-backup replication:
>> - support for "write" transactions.
>> - no on-disk storage. (probably, faster operation)
>> - support for multiple remotes (connect to the clustered db).
>> - doesn't try to keep connection as long as possible, but
>>   faster reconnects to other remotes to avoid missing updates.
>> - No need to know the complete database schema beforehand,
>>   only the schema name.
>> - can be used along with other standalone and clustered databases
>>   by the same ovsdb-server process. (doesn't turn the whole
>>   jsonrpc server to read-only mode)
>> - supports modern version of monitors (monitor_cond_since),
>>   because based on ovsdb-cs.
>> - could be chained, i.e. multiple relays could be connected
>>   one to another in a row or in a tree-like form.
>>
>> Bringing all above functionality to the existing active-backup
>> replication doesn't look right as it will make it less reliable
>> for the actual backup use case, and this also would be much
>> harder from the implementation point of view, because current
>> replication code is not based on ovsdb-cs or idl and all the required
>> features would be likely duplicated or replication would be fully
>> re-written on top of ovsdb-cs with severe modifications of the former.
>>
>> Relay is somewhere in the middle between active-backup replication and
>> the clustered model taking a lot from both, therefore is hard to
>> implement on top of any of them.
>>
>> To run ovsdb-server in relay mode, user need to simply run:
>>
>>   ovsdb-server --remote=punix:db.sock relay::
>>
>> e.g.
>>
>>   ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642
>>
>> More details and examples in the documentation in the last patch
>> of the series.
>>
>> I actually tried to implement transaction forwarding on top of
>> active-backup replication in v1 of this seies, but it required
>> a lot of tricky changes, including schema format changes in order
>> to bring required information to the end clients, so I decided
>> to fully rewrite the functionality in v2 with a different approach.
>>
>> Future work:
>> - Add support for transaction history (it could be just inherited
>>   from the transaction ids received from the relay source).  This
>>   will allow clients to utilize monitor_cond_since while working
>>   with relay.
> 
> Hi Ilya,
> 
> I acked most of the patches in the series (except 7/9 which I think
> might need a rather straightforward change) and I saw Mark also left
> some comments.
> 
> I wonder though if the lack of monitor_cond_since will be a show stopper
> for deploying this in production?  Or do you expect reconnects to happen
> less often do to the multi-tier nature of new deployments?

I do expect that relays will hide most of the re-connections, so clients
will have more stable connections.  In this case it should be fine to not
have monitor_cond_since for clients.  For sure, I'll work on adding
support for it.

Another factor is that deployments will, likely, have more relays
than the main servers, and so it should be easier to handle extra
load of downloading the whole database, if required.

> 
> I guess we need some scale test data with this deployed to have a better
> idea.

Sure, I collected some data from the scale tests and will include it
in the cover letter for v3.

> 
> In any case, very nice work!

Thanks!

> 
> Regards,
> Dumitru
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.

2021-07-12 Thread Ilya Maximets
On 7/2/21 1:05 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> Main documentation for the service model and tutorial with the use case
>> and configuration examples.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  Documentation/automake.mk|   1 +
>>  Documentation/ref/ovsdb.7.rst|  62 --
>>  Documentation/topics/index.rst   |   1 +
>>  Documentation/topics/ovsdb-relay.rst | 124 +++
>>  NEWS |   3 +
>>  ovsdb/ovsdb-server.1.in  |  27 +++---
>>  6 files changed, 200 insertions(+), 18 deletions(-)
>>  create mode 100644 Documentation/topics/ovsdb-relay.rst
>>
>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>> index bc30f94c5..213d9c867 100644
>> --- a/Documentation/automake.mk
>> +++ b/Documentation/automake.mk
>> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>>  Documentation/topics/networking-namespaces.rst \
>>  Documentation/topics/openflow.rst \
>>  Documentation/topics/ovs-extensions.rst \
>> +Documentation/topics/ovsdb-relay.rst \
>>  Documentation/topics/ovsdb-replication.rst \
>>  Documentation/topics/porting.rst \
>>  Documentation/topics/record-replay.rst \
>> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
>> index e4f1bf766..a5b8a9c33 100644
>> --- a/Documentation/ref/ovsdb.7.rst
>> +++ b/Documentation/ref/ovsdb.7.rst
>> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, 
>> respectively.
>>  Service Models
>>  ==
>>  
>> -OVSDB supports three service models for databases: **standalone**,
>> -**active-backup**, and **clustered**.  The service models provide different
>> -compromises among consistency, availability, and partition tolerance.  They
>> -also differ in the number of servers required and in terms of performance.  
>> The
>> -standalone and active-backup database service models share one on-disk 
>> format,
>> -and clustered databases use a different format, but the OVSDB programs work
>> -with both formats.  ``ovsdb(5)`` documents these file formats.
>> +OVSDB supports four service models for databases: **standalone**,
>> +**active-backup**, **relay** and **clustered**.  The service models provide
>> +different compromises among consistency, availability, and partition 
>> tolerance.
>> +They also differ in the number of servers required and in terms of 
>> performance.
>> +The standalone and active-backup database service models share one on-disk
>> +format, and clustered databases use a different format, but the OVSDB 
>> programs
>> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
>> +databases has no on-disk storage.
> 
> s/has/have

OK.

> 
>>  
>>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>>  any particular service model.
>> @@ -406,6 +407,50 @@ following consequences:
>>that the client previously read.  The OVSDB client library in Open vSwitch
>>uses this feature to avoid servers with stale data.
>>  
>> +Relay Service Model
>> +---
>> +
>> +A **relay** database is a way to scale out read-mostly access to the
>> +existing database working in any service model including relay.
>> +
>> +Relay database creates and maintains an OVSDB connection with other OVSDB
> 
> s/other/another

OK.

> 
>> +server.  It uses this connection to maintain in-memory copy of the remote
> 
> s/maintain/maintain an/

OK.

> 
>> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
>> +database content changes on relay source in the real time.
> 
> s/on/on the/

OK.

> 
>> +
>> +The purpose of relay server is to scale out the number of database clients.
>> +Read-only transactions and monitor requests are fully handled by the relay
>> +server itself.  For the transactions that requests database modifications,
>> +relay works as a proxy between the client and the relay source, i.e. it
>> +forwards transactions and replies between them.
>> +
>> +Compared to a clustered and active-backup models, relay service model 
>> provides
>> +read and write access to the database similarly to a clustered database (and
>> +even more scalable), but with generally insignificant performance overhead 
>> of
>> +an active-backup model.  At the same time it doesn't increase availability 
>> that
>> +needs to be covered by the service model of the relay source.
>> +
>> +Relay database has no on-disk storage and therefore cannot be converted to
>> +any other service model.
>> +
>> +If there is already a database started in any service model, to start a 
>> relay
>> +database server use ``ovsdb-server relay::``, where
>> + is the database name as specified in the schema of the 
>> database
>> +that existing server runs, and  is an OVSDB connection 
>> method
>> +(see `Connection Methods`_ below) that connects to the existing database
>> +server.   could contain a comma-separated list of 
>> 

Re: [ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.

2021-07-12 Thread Ilya Maximets
On 6/25/21 3:35 PM, Dumitru Ceara wrote:
> On 6/12/21 4:00 AM, Ilya Maximets wrote:
>> Main documentation for the service model and tutorial with the use case
>> and configuration examples.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> I left a few minor comments below.  With them addressed:
> 
> Acked-by: Dumitru Ceara 
> 
> Thanks!
> 
>>  Documentation/automake.mk|   1 +
>>  Documentation/ref/ovsdb.7.rst|  62 --
>>  Documentation/topics/index.rst   |   1 +
>>  Documentation/topics/ovsdb-relay.rst | 124 +++
>>  NEWS |   3 +
>>  ovsdb/ovsdb-server.1.in  |  27 +++---
>>  6 files changed, 200 insertions(+), 18 deletions(-)
>>  create mode 100644 Documentation/topics/ovsdb-relay.rst
>>
>> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
>> index bc30f94c5..213d9c867 100644
>> --- a/Documentation/automake.mk
>> +++ b/Documentation/automake.mk
>> @@ -52,6 +52,7 @@ DOC_SOURCE = \
>>  Documentation/topics/networking-namespaces.rst \
>>  Documentation/topics/openflow.rst \
>>  Documentation/topics/ovs-extensions.rst \
>> +Documentation/topics/ovsdb-relay.rst \
>>  Documentation/topics/ovsdb-replication.rst \
>>  Documentation/topics/porting.rst \
>>  Documentation/topics/record-replay.rst \
>> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst
>> index e4f1bf766..a5b8a9c33 100644
>> --- a/Documentation/ref/ovsdb.7.rst
>> +++ b/Documentation/ref/ovsdb.7.rst
>> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, 
>> respectively.
>>  Service Models
>>  ==
>>  
>> -OVSDB supports three service models for databases: **standalone**,
>> -**active-backup**, and **clustered**.  The service models provide different
>> -compromises among consistency, availability, and partition tolerance.  They
>> -also differ in the number of servers required and in terms of performance.  
>> The
>> -standalone and active-backup database service models share one on-disk 
>> format,
>> -and clustered databases use a different format, but the OVSDB programs work
>> -with both formats.  ``ovsdb(5)`` documents these file formats.
>> +OVSDB supports four service models for databases: **standalone**,
>> +**active-backup**, **relay** and **clustered**.  The service models provide
>> +different compromises among consistency, availability, and partition 
>> tolerance.
>> +They also differ in the number of servers required and in terms of 
>> performance.
>> +The standalone and active-backup database service models share one on-disk
>> +format, and clustered databases use a different format, but the OVSDB 
>> programs
>> +work with both formats.  ``ovsdb(5)`` documents these file formats.  Relay
>> +databases has no on-disk storage.
> 
> s/has/have

OK.

> 
>>  
>>  RFC 7047, which specifies the OVSDB protocol, does not mandate or specify
>>  any particular service model.
>> @@ -406,6 +407,50 @@ following consequences:
>>that the client previously read.  The OVSDB client library in Open vSwitch
>>uses this feature to avoid servers with stale data.
>>  
>> +Relay Service Model
>> +---
>> +
>> +A **relay** database is a way to scale out read-mostly access to the
>> +existing database working in any service model including relay.
>> +
>> +Relay database creates and maintains an OVSDB connection with other OVSDB
>> +server.  It uses this connection to maintain in-memory copy of the remote
>> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the
>> +database content changes on relay source in the real time.
>> +
>> +The purpose of relay server is to scale out the number of database clients.
>> +Read-only transactions and monitor requests are fully handled by the relay
>> +server itself.  For the transactions that requests database modifications,
> 
> s/requests/request

OK.

> 
>> +relay works as a proxy between the client and the relay source, i.e. it
>> +forwards transactions and replies between them.
>> +
>> +Compared to a clustered and active-backup models, relay service model 
>> provides
> 
> s/Compared to a/Compared to the

OK.

> 
>> +read and write access to the database similarly to a clustered database (and
>> +even more scalable), but with generally insignificant performance overhead 
>> of
> 
> Joke: citation needed

:)

> 
>> +an active-backup model.  At the same time it doesn't increase availability 
>> that
>> +needs to be covered by the service model of the relay source.
>> +
>> +Relay database has no on-disk storage and therefore cannot be converted to
>> +any other service model.
>> +
>> +If there is already a database started in any service model, to start a 
>> relay
>> +database server use ``ovsdb-server relay::``, where
>> + is the database name as specified in the schema of the 
>> database
>> +that existing server runs, and  is an OVSDB connection 
>> method
>> +(see 

Re: [ovs-dev] [PATCH ovn] northd: Process load balancer defrag flows once for all routers.

2021-07-12 Thread Mark Michelson
Hi Dumitru. Can you please rebase this? There's a conflict due to 
384a7c6237da8f88ab68a9abd0982f92d7d8c2d2 (northd: Refactor Logical Flows 
for routers with DNAT/Load Balancers).


On 7/6/21 6:45 AM, Lorenzo Bianconi wrote:

This allows creating the match strings for each LB VIP exactly once,
instead of once per datapath as it was before this change, reducing CPU
usage in the ovn-northd event processing loop.

On a scaled ovn-kubernetes-like deployment for 120 nodes, with 120
gateway logical routers and 16K load balancer VIPs attached to each
gateway router, this reduces event processing loop times in ovn-northd
from ~9.5 seconds to ~8.5 seconds.

Reported-at: https://bugzilla.redhat.com/1962833
Signed-off-by: Dumitru Ceara 


Acked-by: Lorenzo Bianconi 


---
  northd/ovn-northd.c | 98 ++---
  1 file changed, 48 insertions(+), 50 deletions(-)

diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 570c6a3ef..0b043edec 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -8993,6 +8993,48 @@ build_lswitch_flows_for_lb(struct ovn_northd_lb *lb, 
struct hmap *lflows,
  build_lb_rules(lflows, lb, match, action);
  }
  
+/* If there are any load balancing rules, we should send the packet to

+ * conntrack for defragmentation and tracking.  This helps with two things.
+ *
+ * 1. With tracking, we can send only new connections to pick a DNAT ip address
+ *from a group.
+ * 2. If there are L4 ports in load balancing rules, we need the
+ *defragmentation to match on L4 ports.
+ */
+static void
+build_lrouter_defrag_flows_for_lb(struct ovn_northd_lb *lb,
+  struct hmap *lflows,
+  struct ds *match)
+{
+if (!lb->n_nb_lr) {
+return;
+}
+
+/* A set to hold all ips that need defragmentation and tracking. */
+struct sset all_ips = SSET_INITIALIZER(_ips);
+for (size_t i = 0; i < lb->n_vips; i++) {
+struct ovn_lb_vip *lb_vip = >vips[i];
+
+if (!sset_add(_ips, lb_vip->vip_str)) {
+continue;
+}
+
+ds_clear(match);
+if (IN6_IS_ADDR_V4MAPPED(_vip->vip)) {
+ds_put_format(match, "ip && ip4.dst == %s", lb_vip->vip_str);
+} else {
+ds_put_format(match, "ip && ip6.dst == %s", lb_vip->vip_str);
+}
+for (size_t j = 0; j < lb->n_nb_lr; j++) {
+ovn_lflow_add_with_hint(lflows, lb->nb_lr[j],
+S_ROUTER_IN_DEFRAG, 100,
+ds_cstr(match), "ct_next;",
+>nlb->header_);
+}
+}
+sset_destroy(_ips);
+}
+
  static void
  build_lrouter_flows_for_lb(struct ovn_northd_lb *lb, struct hmap *lflows,
 struct shash *meter_groups,
@@ -9027,49 +9069,6 @@ build_lrouter_flows_for_lb(struct ovn_northd_lb *lb, 
struct hmap *lflows,
  }
  }
  
-static void

-build_lrouter_lb_flows(struct hmap *lflows, struct ovn_datapath *od,
-   struct hmap *lbs, struct ds *match)
-{
-/* A set to hold all ips that need defragmentation and tracking. */
-struct sset all_ips = SSET_INITIALIZER(_ips);
-
-for (int i = 0; i < od->nbr->n_load_balancer; i++) {
-struct nbrec_load_balancer *nb_lb = od->nbr->load_balancer[i];
-struct ovn_northd_lb *lb =
-ovn_northd_lb_find(lbs, _lb->header_.uuid);
-ovs_assert(lb);
-
-for (size_t j = 0; j < lb->n_vips; j++) {
-struct ovn_lb_vip *lb_vip = >vips[j];
-
-if (!sset_contains(_ips, lb_vip->vip_str)) {
-sset_add(_ips, lb_vip->vip_str);
-/* If there are any load balancing rules, we should send
- * the packet to conntrack for defragmentation and
- * tracking.  This helps with two things.
- *
- * 1. With tracking, we can send only new connections to
- *pick a DNAT ip address from a group.
- * 2. If there are L4 ports in load balancing rules, we
- *need the defragmentation to match on L4 ports. */
-ds_clear(match);
-if (IN6_IS_ADDR_V4MAPPED(_vip->vip)) {
-ds_put_format(match, "ip && ip4.dst == %s",
-  lb_vip->vip_str);
-} else {
-ds_put_format(match, "ip && ip6.dst == %s",
-  lb_vip->vip_str);
-}
-ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG,
-100, ds_cstr(match), "ct_next;",
-_lb->header_);
-}
-}
-}
-sset_destroy(_ips);
-}
-
  #define ND_RA_MAX_INTERVAL_MAX 1800
  #define ND_RA_MAX_INTERVAL_MIN 4
  
@@ -11810,9 +11809,7 @@ lrouter_check_nat_entry(struct ovn_datapath *od, const struct 

Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB

2021-07-12 Thread Ben Pfaff
On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote:
> Hi Ben,
> 
> As discussed earlier, during the OVN meeting, I've noticed a new
> performance issue with ovn-northd-ddlog when running it against a
> database from one of our more recent scale tests:
> 
> http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db
> 
> ovn-northd-ddlog uses 100% CPU and never really reaches the point to
> perform the first transaction to the Southbound.  Memory usage is also
> very high, I stopped it at 45GB RSS.
> 
> To test I did:
> SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox

Thanks.  I've been spending a lot of time with this Friday and today.
It is a bit different from the other issues I've looked at.  The
previous ones were inefficient production of relatively small output.
This one is inefficient production (and storage) of rather large output
(millions of flows).  I'm trying to get help from Leonid on how to
reduce the memory usage.

Thanks,

Ben.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 7/9] ovsdb: relay: Reflect connection status in _Server database.

2021-07-12 Thread Ilya Maximets
On 7/2/21 12:48 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> It might be important for clients to know that relay lost connection
>> with the relay remote, so they could re-connect to other relay.
> 
> Yeah this makes sense. I guess there are some deployment scenarios we
> should think about. For example:
> 
> * Are there any special considerations for upgrades? The relays seem to
> be quite ephemeral and could be stopped/started without much risk?

It should be easy to just stop them and start new ones, e.g.
scale up and down according to the current load.  So, upgrades
should not be a problem.  They also able to update database
schema on the fly, if needed.

> * What about raft leader elections?

Relays are not using 'leader-only', so they can connect to any
raft member.  Hence, elections should not be a problem.  If one of
the raft members will fell out of the cluster, relay will just
re-connect to a different server.  We may also extend relays in
the future to support leader-only connection, if needed.

> * I suppose there is a risk that inconsistencies could build up between
> relays in case of error?

If data inconsistency will be detected by relay, it will flag it,
re-connect, and download a fresh copy of a database.  If servers
are sending different consistent data to different relays, well..
we can't really do anything about this.  Something must be seriously
wrong with main servers.

> * Everything will be eventually consistent but I guess that is OK for OVN.

Yep.  All relays are getting the same updates, but may be in a
slightly different time.  The same is true for raft members, so
it should be fine for OVN.

> 
> I don't know a whole lot about this so I am just prompting you to see
> have you considered these?

Sure, I thought about this.  And these are good questions to ask.

> 
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  ovsdb/_server.xml| 17 +
>>  ovsdb/ovsdb-server.c |  3 ++-
>>  ovsdb/relay.c| 34 ++
>>  ovsdb/relay.h|  4 
>>  4 files changed, 49 insertions(+), 9 deletions(-)
>>
>> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
>> index 414be6715..b4606b25b 100644
>> --- a/ovsdb/_server.xml
>> +++ b/ovsdb/_server.xml
>> @@ -70,6 +70,15 @@
>>case of a relay database - until it connects to the relay source.
>>  
>>  
>> +
>> +  True if the database is connected to its storage.  A standalone 
>> database
>> +  is always connected.  A clustered database is connected if the server 
>> is
>> +  in contact with a majority of its cluster.  A relay database is 
>> connected
>> +  if the server is in contact with the relay source, i.e. is connected 
>> to
>> +  the server it syncs from.  An unconnected database cannot be modified 
>> and
>> +  its data might be unavailable or stale.
>> +
>> +
>>  
>>
>>  These columns are most interesting and in some cases only relevant 
>> for
>> @@ -77,14 +86,6 @@
>>  column is clustered.
>>
>>  
>> -  
>> -True if the database is connected to its storage.  A standalone or
>> -active-backup database is always connected.  A clustered database is
>> -connected if the server is in contact with a majority of its 
>> cluster.
>> -An unconnected database cannot be modified and its data might be
>> -unavailable or stale.
>> -  
>> -
>>
>>  True if the database is the leader in its cluster.  For a 
>> standalone or
>>  active-backup database, this is always true.  Always false for 
>> relay.
>> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
>> index 77b1fbe40..cdd6cf7fd 100644
>> --- a/ovsdb/ovsdb-server.c
>> +++ b/ovsdb/ovsdb-server.c
>> @@ -1190,7 +1190,8 @@ update_database_status(struct ovsdb_row *row, struct 
>> db *db)
>>  ovsdb_util_write_string_column(row, "model",
>>  db->db->is_relay ? "relay" : 
>> ovsdb_storage_get_model(db->db->storage));
>>  ovsdb_util_write_bool_column(row, "connected",
>> - 
>> ovsdb_storage_is_connected(db->db->storage));
>> +db->db->is_relay ? ovsdb_relay_is_connected(db->db)
>> + : ovsdb_storage_is_connected(db->db->storage));
>>  ovsdb_util_write_bool_column(row, "leader",
>>  db->db->is_relay ? false : 
>> ovsdb_storage_is_leader(db->db->storage));
>>  ovsdb_util_write_uuid_column(row, "cid",
>> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
>> index ef689c649..4a8f5c206 100644
>> --- a/ovsdb/relay.c
>> +++ b/ovsdb/relay.c
>> @@ -31,6 +31,7 @@
>>  #include "ovsdb-error.h"
>>  #include "row.h"
>>  #include "table.h"
>> +#include "timeval.h"
>>  #include "transaction.h"
>>  #include "transaction-forward.h"
>>  #include "util.h"
>> @@ -47,8 +48,36 @@ struct relay_ctx {
>>  struct ovsdb_schema *new_schema;
>>  schema_change_callback schema_change_cb;
>>  void 

Re: [ovs-dev] [PATCH v2 7/9] ovsdb: relay: Reflect connection status in _Server database.

2021-07-12 Thread Ilya Maximets
On 6/25/21 3:34 PM, Dumitru Ceara wrote:
> On 6/12/21 4:00 AM, Ilya Maximets wrote:
>> It might be important for clients to know that relay lost connection
>> with the relay remote, so they could re-connect to other relay.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> [...]
> 
>>  
>> +#define RELAY_MAX_RECONNECTION_MS 3
> 
> 30 seconds of relay "incorrectly" reporting that it is connected to the
> source seems quite long.  Also, should we make this configurable?

We can make it configurable in the future.
However, relays are meant to have multiple remotes, i.e. all servers
of a main ovsdb cluster, and they will re-connect between them as soon
as disconnection detected (by inactivity probe or in other way).
So, the case where relay is not connected to the source for a very long
time is twofold:

1. All main servers are down.
   We can't really do anything in this case, and it doesn't matter if
   clients know about this or not, as they have no place to re-connect
   anyway.

2. Our relay for some reason is not able to reach any of the main
   servers, but still has connection with clients.  This case seems to
   be rare and it's likely that clients are split from the rest of the
   network along with their relay.  It seems also unlikely that
   re-connection to a different relay will make any difference in this
   scenario.

All in all, I don't think that it's necessarily a bad thing to keep
clients connected for extra 30 seconds, because if relay is not able to
re-connect, than it's unlikely that clients will be able to do that.

As I said, we can make this value configurable in the future, if there
will be need for it.

What do you think?

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/9] ovsdb: relay: Add support for transaction forwarding.

2021-07-12 Thread Ilya Maximets
On 6/25/21 3:34 PM, Dumitru Ceara wrote:
> On 6/12/21 4:00 AM, Ilya Maximets wrote:
>> Current version of ovsdb relay allows to scale out read-only
>> access to the primary database.  However, many clients are not
>> read-only but read-mostly.  For example, ovn-controller.
>>
>> In order to scale out database access for this case ovsdb-server
>> need to process transactions that are not read-only.  Relay is not
>> allowed to do that, i.e. not allowed to modify the database, but it
>> can act like a proxy and forward transactions that includes database
>> modifications to the primary server and forward replies back to a
>> client.  At the same time it may serve read-only transactions and
>> monitor requests by itself greatly reducing the load on primary
>> server.
>>
>> This configuration will slightly increase transaction latency, but
>> it's not very important for read-mostly use cases.
>>
>> Implementation details:
>> With this change instead of creating a trigger to commit the
>> transaction, ovsdb-server will create a trigger for transaction
>> forwarding.  Later, ovsdb_relay_run() will send all new transactions
>> to the relay source.  Once transaction reply received from the
>> relay source, ovsdb-relay module will update the state of the
>> transaction forwarding with the reply.  After that, trigger_run()
>> will complete the trigger and jsonrpc_server_run() will send the
>> reply back to the client.  Since transaction reply from the relay
>> source will be received after all the updates, client will receive
>> all the updates before receiving the transaction reply as it is in
>> a normal scenario with other database models.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> I have a tiny nit below, otherwise:
> 
> Acked-by: Dumitru Ceara 
> 

Thanks!

> [...]
> 
>> diff --git a/ovsdb/relay.c b/ovsdb/relay.c
>> index 5f423a0b9..ef689c649 100644
>> --- a/ovsdb/relay.c
>> +++ b/ovsdb/relay.c
>> @@ -32,6 +32,7 @@
>>  #include "row.h"
>>  #include "table.h"
>>  #include "transaction.h"
>> +#include "transaction-forward.h"
>>  #include "util.h"
>>  
>>  VLOG_DEFINE_THIS_MODULE(relay);
>> @@ -298,6 +299,7 @@ ovsdb_relay_run(void)
>>  struct relay_ctx *ctx = node->data;
>>  struct ovs_list events;
>>  
>> +ovsdb_txn_forward_run(ctx->db, ctx->cs);
>>  ovsdb_cs_run(ctx->cs, );
>>  
>>  struct ovsdb_cs_event *event;
>> @@ -309,7 +311,9 @@ ovsdb_relay_run(void)
>>  
>>  switch (event->type) {
>>  case OVSDB_CS_EVENT_TYPE_RECONNECT:
>> -/* Nothing to do. */
>> +/* Cancelling all the transactions that was already sent but
> 
> Nit: s/was/were/
> 

OK.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 6/9] ovsdb: relay: Add support for transaction forwarding.

2021-07-12 Thread Ilya Maximets
On 7/2/21 12:24 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> Current version of ovsdb relay allows to scale out read-only
>> access to the primary database.  However, many clients are not
>> read-only but read-mostly.  For example, ovn-controller.
>>
>> In order to scale out database access for this case ovsdb-server
>> need to process transactions that are not read-only.  Relay is not
>> allowed to do that, i.e. not allowed to modify the database, but it
>> can act like a proxy and forward transactions that includes database
>> modifications to the primary server and forward replies back to a
>> client.  At the same time it may serve read-only transactions and
>> monitor requests by itself greatly reducing the load on primary
>> server.
>>
>> This configuration will slightly increase transaction latency, but
>> it's not very important for read-mostly use cases.
>>
>> Implementation details:
>> With this change instead of creating a trigger to commit the
>> transaction, ovsdb-server will create a trigger for transaction
>> forwarding.  Later, ovsdb_relay_run() will send all new transactions
>> to the relay source.  Once transaction reply received from the
>> relay source, ovsdb-relay module will update the state of the
>> transaction forwarding with the reply.  After that, trigger_run()
>> will complete the trigger and jsonrpc_server_run() will send the
>> reply back to the client.  Since transaction reply from the relay
>> source will be received after all the updates, client will receive
>> all the updates before receiving the transaction reply as it is in
>> a normal scenario with other database models.
>>
>> Signed-off-by: Ilya Maximets 
>> ---



>> @@ -188,7 +196,7 @@ static bool
>>  ovsdb_trigger_try(struct ovsdb_trigger *t, long long int now)
>>  {
>>  /* Handle "initialized" state. */
>> -if (!t->reply) {
>> +if (!t->reply && !t->txn_forward) {
>>  ovs_assert(!t->progress);
>>  
>>  struct ovsdb_txn *txn = NULL;
>> @@ -198,13 +206,14 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long 
>> int now)
>>  return false;
>>  }
>>  
>> -bool durable;
>> +bool durable, forwarding_needed;
>>  
>>  struct json *result;
>> +/* Trying to compose transaction. */
>>  txn = ovsdb_execute_compose(
>>  t->db, t->session, t->request->params, t->read_only,
>>  t->role, t->id, now - t->created, >timeout_msec,
>> -, );
>> +, _needed, );
>>  if (!txn) {
>>  if (result) {
>>  /* Complete.  There was an error but we still represent 
>> it
>> @@ -217,9 +226,20 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long 
>> int now)
>>  return false;
>>  }
>>  
>> -/* Transition to "committing" state. */
>> -t->reply = jsonrpc_create_reply(result, t->request->id);
>> -t->progress = ovsdb_txn_propose_commit(txn, durable);
>> +if (forwarding_needed) {
>> +/* Transaction is good, but we don't need it. */
>> +ovsdb_txn_abort(txn);
>> +json_destroy(result);
>> +/* Transition to "forwarding" state. */
>> +t->txn_forward = ovsdb_txn_forward_create(t->db, 
>> t->request);
>> +/* Forward will not be completed immediately.  Will check
>> + * next time. */
>> +return false;
>> +} else {
>> +/* Transition to "committing" state. */
>> +t->reply = jsonrpc_create_reply(result, t->request->id);
>> +t->progress = ovsdb_txn_propose_commit(txn, durable);
>> +}
>>  } else if (!strcmp(t->request->method, "convert")) {
>>  /* Permission check. */
>>  if (t->role && *t->role) {
>> @@ -348,6 +368,18 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long 
>> int now)
>>  ovsdb_trigger_complete(t);
>>  }
>>  
>> +return false;
>> +} else if (t->txn_forward) {
>> +/* Handle "forwarding" state. */
> 
> Should we assert that reply == NULL and progress == NULL?

Should not be necessary.  We're here from the else branch of
if (t->progress), so progrees is definitely NULL.
But we can check for 'reply'.  I'll add the check.

> 
>> +if (!ovsdb_txn_forward_is_complete(t->txn_forward)) {
>> +return false;
>> +}
>> +
>> +/* Transition to "complete". */

But I'll add a check here, to be sure that we're not leaking
the reply.

>> +t->reply = ovsdb_txn_forward_steal_reply(t->txn_forward);
>> +ovsdb_txn_forward_destroy(t->db, t->txn_forward);
>> +t->txn_forward = NULL;
>> +ovsdb_trigger_complete(t);
>>  return false;
>>  }
>>  
___
dev mailing list

Re: [ovs-dev] [PATCH v2 5/9] ovsdb: New ovsdb 'relay' service model.

2021-07-12 Thread Ilya Maximets
On 6/25/21 3:34 PM, Dumitru Ceara wrote:
> On 6/12/21 4:00 AM, Ilya Maximets wrote:
>> New database service model 'relay' that is needed to scale out
>> read-mostly database access, e.g. ovn-controller connections to
>> OVN_Southbound.
>>
>> In this service model ovsdb-server connects to existing OVSDB
>> server and maintains in-memory copy of the database.  It serves
>> read-only transactions and monitor requests by its own, but
>> forwards write transactions to the relay source.
>>
>> Key differences from the active-backup replication:
>> - support for "write" transactions (next commit).
>> - no on-disk storage. (probably, faster operation)
>> - support for multiple remotes (connect to the clustered db).
>> - doesn't try to keep connection as long as possible, but
>>   faster reconnects to other remotes to avoid missing updates.
>> - No need to know the complete database schema beforehand,
>>   only the schema name.
>> - can be used along with other standalone and clustered databases
>>   by the same ovsdb-server process. (doesn't turn the whole
>>   jsonrpc server to read-only mode)
>> - supports modern version of monitors (monitor_cond_since),
>>   because based on ovsdb-cs.
>> - could be chained, i.e. multiple relays could be connected
>>   one to another in a row or in a tree-like form.
>> - doesn't increase availability.
>> - cannot be converted to other service models or become a main
>>   active server.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
> 
> I have some very nitpicky comments below (and an unrelated bug report),
> nevertheless:
> 
> Acked-by: Dumitru Ceara 
> 
>>  ovsdb/_server.ovsschema |   7 +-
>>  ovsdb/_server.xml   |  16 +-
>>  ovsdb/automake.mk   |   2 +
>>  ovsdb/execution.c   |   5 +
>>  ovsdb/ovsdb-server.c|  97 
>>  ovsdb/ovsdb.c   |   2 +
>>  ovsdb/ovsdb.h   |   3 +
>>  ovsdb/relay.c   | 339 
>>  ovsdb/relay.h   |  34 
>>  9 files changed, 464 insertions(+), 41 deletions(-)
>>  create mode 100644 ovsdb/relay.c
>>  create mode 100644 ovsdb/relay.h
>>
>> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema
>> index a867e5cbf..e3d9d893b 100644
>> --- a/ovsdb/_server.ovsschema
>> +++ b/ovsdb/_server.ovsschema
>> @@ -1,13 +1,14 @@
>>  {"name": "_Server",
>> - "version": "1.1.0",
>> - "cksum": "3236486585 698",
>> + "version": "1.2.0",
>> + "cksum": "3009684573 744",
>>   "tables": {
>> "Database": {
>>   "columns": {
>> "name": {"type": "string"},
>> "model": {
>>   "type": {"key": {"type": "string",
>> -  "enum": ["set", ["standalone", "clustered"]]}}},
>> +  "enum": ["set",
>> + ["standalone", "clustered", 
>> "relay"]]}}},
>> "connected": {"type": "boolean"},
>> "leader": {"type": "boolean"},
>> "schema": {
>> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
>> index 70cd22db7..414be6715 100644
>> --- a/ovsdb/_server.xml
>> +++ b/ovsdb/_server.xml
>> @@ -60,12 +60,14 @@
>>  
>>  
>>The storage model: standalone for a standalone or
>> -  active-backup database, clustered for a clustered 
>> database.
>> +  active-backup database, clustered for a clustered 
>> database,
>> +  relay for a relay database.
>>  
>>  
>>  
>>The database schema, as a JSON string.  In the case of a clustered
>> -  database, this is empty until it finishes joining its cluster.
>> +  database, this is empty until it finishes joining its cluster.  In the
>> +  case of a relay database - until it connects to the relay source.
>>  
>>  
>>  
>> @@ -85,20 +87,20 @@
>>  
>>
>>  True if the database is the leader in its cluster.  For a 
>> standalone or
>> -active-backup database, this is always true.
>> +active-backup database, this is always true.  Always false for 
>> relay.
>>
>>  
>>
>>  The cluster ID for this database, which is the same for all of the
>> -servers that host this particular clustered database.  For a 
>> standalone
>> -or active-backup database, this is empty.
>> +servers that host this particular clustered database.  For a
>> +standalone, active-backup or relay database, this is empty.
>>
>>  
>>
>>  The server ID for this database, different for each server that 
>> hosts a
>>  particular clustered database.  A server that hosts more than one
>>  clustered database will have a different sid in each 
>> one.
>> -For a standalone or active-backup database, this is empty.
>> +For a standalone, active-backup or relay database, this is empty.
>>
>>  
>>
>> @@ -112,7 +114,7 @@
>>  
>>  
>>  
>> -  For a standalone or active-backup database, this is empty.
>> +  For a standalone, active-backup or 

Re: [ovs-dev] [PATCH v2 5/9] ovsdb: New ovsdb 'relay' service model.

2021-07-12 Thread Ilya Maximets
On 6/19/21 8:30 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> New database service model 'relay' that is needed to scale out
>> read-mostly database access, e.g. ovn-controller connections to
>> OVN_Southbound.
>>
>> In this service model ovsdb-server connects to existing OVSDB
>> server and maintains in-memory copy of the database.  It serves
>> read-only transactions and monitor requests by its own, but
>> forwards write transactions to the relay source.
>>
>> Key differences from the active-backup replication:
>> - support for "write" transactions (next commit).
>> - no on-disk storage. (probably, faster operation)
> 
> Any data to back this up?

Nope.  That's why "probably". :)

It's hard to directly compare active-backup with relay from this
acpect, because they are using different types of monitors to receive
updates, so performance might vary.  OTOH, my logic here is that both
implemntations are using ovsdb_txn_propose_commit_block() function to
apply received changes to the database.  And this function involves
ovsdb_storage_write(), which is no-op in case of relay, but the actual
write to the file in case of backup server.  Therefore relay should be
faster.

> 
>> - support for multiple remotes (connect to the clustered db).
>> - doesn't try to keep connection as long as possible, but
>>   faster reconnects to other remotes to avoid missing updates.
>> - No need to know the complete database schema beforehand,
>>   only the schema name.
>> - can be used along with other standalone and clustered databases
>>   by the same ovsdb-server process. (doesn't turn the whole
>>   jsonrpc server to read-only mode)
>> - supports modern version of monitors (monitor_cond_since),
>>   because based on ovsdb-cs.
>> - could be chained, i.e. multiple relays could be connected
>>   one to another in a row or in a tree-like form.
> 
> Cool!
> 
>> - doesn't increase availability.
>> - cannot be converted to other service models or become a main
>>   active server.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  ovsdb/_server.ovsschema |   7 +-
>>  ovsdb/_server.xml   |  16 +-
>>  ovsdb/automake.mk   |   2 +
>>  ovsdb/execution.c   |   5 +
>>  ovsdb/ovsdb-server.c|  97 
>>  ovsdb/ovsdb.c   |   2 +
>>  ovsdb/ovsdb.h   |   3 +
>>  ovsdb/relay.c   | 339 
>>  ovsdb/relay.h   |  34 
>>  9 files changed, 464 insertions(+), 41 deletions(-)
>>  create mode 100644 ovsdb/relay.c
>>  create mode 100644 ovsdb/relay.h
>>
>> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema
>> index a867e5cbf..e3d9d893b 100644
>> --- a/ovsdb/_server.ovsschema
>> +++ b/ovsdb/_server.ovsschema
>> @@ -1,13 +1,14 @@
>>  {"name": "_Server",
>> - "version": "1.1.0",
>> - "cksum": "3236486585 698",
>> + "version": "1.2.0",
>> + "cksum": "3009684573 744",
>>   "tables": {
>> "Database": {
>>   "columns": {
>> "name": {"type": "string"},
>> "model": {
>>   "type": {"key": {"type": "string",
>> -  "enum": ["set", ["standalone", "clustered"]]}}},
>> +  "enum": ["set",
>> + ["standalone", "clustered", 
>> "relay"]]}}},
>> "connected": {"type": "boolean"},
>> "leader": {"type": "boolean"},
>> "schema": {
>> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml
>> index 70cd22db7..414be6715 100644
>> --- a/ovsdb/_server.xml
>> +++ b/ovsdb/_server.xml
>> @@ -60,12 +60,14 @@
>>  
>>  
>>The storage model: standalone for a standalone or
>> -  active-backup database, clustered for a clustered 
>> database.
>> +  active-backup database, clustered for a clustered 
>> database,
>> +  relay for a relay database.
>>  
>>  
>>  
>>The database schema, as a JSON string.  In the case of a clustered
>> -  database, this is empty until it finishes joining its cluster.
>> +  database, this is empty until it finishes joining its cluster.  In the
>> +  case of a relay database - until it connects to the relay source.
> 
> suggestion: "In the case of a relay database, this is empty until it
> connects to the relay source"

OK.

>>  
>>  
>>  
>> @@ -85,20 +87,20 @@
>>  
>>
>>  True if the database is the leader in its cluster.  For a 
>> standalone or
>> -active-backup database, this is always true.
>> +active-backup database, this is always true.  Always false for 
>> relay.
> 
> suggestion: "For a relay database, this is always false"

OK.

>>
>>  
>>
>>  The cluster ID for this database, which is the same for all of the
>> -servers that host this particular clustered database.  For a 
>> standalone
>> -or active-backup database, this is empty.
>> +servers that host this particular clustered database.  For a
>> +standalone, active-backup or relay database, this 

Re: [ovs-dev] [PATCH v3 ovn 2/2] controller: incrementally create ras port_binding list

2021-07-12 Thread Mark Michelson

On 7/12/21 2:18 PM, Lorenzo Bianconi wrote:

Incrementally manage local_active_ports_ras map for interfaces
where periodic router advertisement has been enabled. This patch
allows to avoid looping over all local interfaces to check if
periodic RA is running on the current port binding.

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
  controller/binding.c|  7 +++
  controller/binding.h|  1 +
  controller/ovn-controller.c | 10 +++-
  controller/pinctrl.c| 93 -
  controller/pinctrl.h|  3 +-
  5 files changed, 69 insertions(+), 45 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 9711ac850..09793a6f6 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
  update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
  b_ctx_out->local_active_ports_ipv6_pd,
  "ipv6_prefix_delegation");
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
  
  enum en_lport_type lport_type = get_lport_type(pb);
  
@@ -2514,6 +2517,10 @@ delete_done:

  b_ctx_out->local_active_ports_ipv6_pd,
  "ipv6_prefix_delegation");
  
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,

+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  struct binding_lport *b_lport =

diff --git a/controller/binding.h b/controller/binding.h
index 60ad49da0..77197e742 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *);
  struct binding_ctx_out {
  struct hmap *local_datapaths;
  struct shash *local_active_ports_ipv6_pd;
+struct shash *local_active_ports_ras;
  struct local_binding_data *lbinding_data;
  
  /* sset of (potential) local lports. */

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index c4eb54755..34134e87b 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1031,6 +1031,7 @@ struct ed_type_runtime_data {
  struct hmap tracked_dp_bindings;
  
  struct shash local_active_ports_ipv6_pd;

+struct shash local_active_ports_ras;
  };
  
  /* struct ed_type_runtime_data has the below members for tracking the

@@ -1119,6 +1120,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
  smap_init(>local_iface_ids);
  local_binding_data_init(>lbinding_data);
  shash_init(>local_active_ports_ipv6_pd);
+shash_init(>local_active_ports_ras);
  
  /* Init the tracked data. */

  hmap_init(>tracked_dp_bindings);
@@ -1145,6 +1147,7 @@ en_runtime_data_cleanup(void *data)
  }
  hmap_destroy(_data->local_datapaths);
  shash_destroy(_data->local_active_ports_ipv6_pd);
+shash_destroy(_data->local_active_ports_ras);


Like in patch 1, this should be shash_destroy_free_data()


  local_binding_data_destroy(_data->lbinding_data);
  }
  
@@ -1225,6 +1228,8 @@ init_binding_ctx(struct engine_node *node,

  b_ctx_out->local_datapaths = _data->local_datapaths;
  b_ctx_out->local_active_ports_ipv6_pd =
  _data->local_active_ports_ipv6_pd;
+b_ctx_out->local_active_ports_ras =
+_data->local_active_ports_ras;
  b_ctx_out->local_lports = _data->local_lports;
  b_ctx_out->local_lports_changed = false;
  b_ctx_out->related_lports = _data->related_lports;
@@ -1243,6 +1248,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
  struct ed_type_runtime_data *rt_data = data;
  struct hmap *local_datapaths = _data->local_datapaths;
  struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd;
+struct shash *local_active_ras = _data->local_active_ports_ras;
  struct sset *local_lports = _data->local_lports;
  struct sset *active_tunnels = _data->active_tunnels;
  
@@ -1259,6 +1265,7 @@ en_runtime_data_run(struct engine_node *node, void *data)

  }
  hmap_clear(local_datapaths);
  shash_clear(local_active_ipv6_pd);
+shash_clear(local_active_ras);


Use shash_clear_free_data().


  local_binding_data_destroy(_data->lbinding_data);
  sset_destroy(local_lports);
  related_lports_destroy(_data->related_lports);
@@ -3274,7 +3281,8 @@ main(int argc, char *argv[])
  br_int, chassis,
  _data->local_datapaths,
  _data->active_tunnels,
-

Re: [ovs-dev] [PATCH v3 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list

2021-07-12 Thread Mark Michelson
Sorry Lorenzo but I found one more issue. Sorry for not noticing it 
during an earlier review.


On 7/12/21 2:18 PM, Lorenzo Bianconi wrote:

Incrementally manage local_active_ports_ipv6_pd map for interfaces
where IPv6 prefix-delegation has been enabled. This patch allows to
avoid looping over all local interfaces to check if prefix-delegation
is running on the current port binding.

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
  controller/binding.c|  32 +++
  controller/binding.h|   1 +
  controller/ovn-controller.c |  11 +++-
  controller/ovn-controller.h |   6 ++
  controller/pinctrl.c| 107 +---
  controller/pinctrl.h|   4 +-
  6 files changed, 103 insertions(+), 58 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..9711ac850 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb,
  }
  }
  
+static void

+update_active_pb_ras_pd(const struct sbrec_port_binding *pb,
+struct hmap *local_datapaths,
+struct shash *map, const char *conf)
+{
+bool ras_pd_conf = smap_get_bool(>options, conf, false);
+struct shash_node *iter = shash_find(map, pb->logical_port);
+
+if (iter && !ras_pd_conf) {
+shash_delete(map, iter);


There's a memory leak here. iter->data needs to be freed.


+return;
+}
+struct pb_ld_binding *ras_pd = NULL;
+if (!iter && ras_pd_conf) {
+ras_pd = xzalloc(sizeof *ras_pd);
+ras_pd->pb = pb;
+shash_add(map, pb->logical_port, ras_pd);
+}
+if (ras_pd) {


The logic here has changed since the first version of the patch, and I 
think it's wrong now. This now will only update ras_pd->ld if ras_pd was 
allocated during this function call. Previously, ld would be updated 
when the pb_ld_binding was found in the map. I think this is a bit 
confusing since you're dealing both with shash_node and pb_ld_binding 
types in this function. I think you can do something like this:


if (iter && !ras_pd_conf) {
   /* delete iter from map */
   return;
}
struct pb_ld_binding *ras_pd = NULL;
if (ras_pd_conf) {
if (iter) {
ras_pd = iter->data;
} else {
/* allocate ras_pd and add it to map */
}
ovs_assert(ras_pd);
ras_pd->ld = get_local_datapath(...);
}


+ras_pd->ld = get_local_datapath(local_datapaths,
+pb->datapath->tunnel_key);
+}
+}
+
  /* Corresponds to each Port_Binding.type. */
  enum en_lport_type {
  LP_UNKNOWN,
@@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
  const struct sbrec_port_binding *pb;
  SBREC_PORT_BINDING_TABLE_FOR_EACH (pb,
 b_ctx_in->port_binding_table) {
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  switch (lport_type) {

@@ -2482,6 +2510,10 @@ delete_done:
  continue;
  }
  
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,

+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  struct binding_lport *b_lport =

diff --git a/controller/binding.h b/controller/binding.h
index a08011ae2..60ad49da0 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *);
  
  struct binding_ctx_out {

  struct hmap *local_datapaths;
+struct shash *local_active_ports_ipv6_pd;
  struct local_binding_data *lbinding_data;
  
  /* sset of (potential) local lports. */

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 6a9c25f28..c4eb54755 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1029,6 +1029,8 @@ struct ed_type_runtime_data {
  bool tracked;
  bool local_lports_changed;
  struct hmap tracked_dp_bindings;
+
+struct shash local_active_ports_ipv6_pd;
  };
  
  /* struct ed_type_runtime_data has the below members for tracking the

@@ -1116,6 +1118,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
  sset_init(>egress_ifaces);
  smap_init(>local_iface_ids);
  local_binding_data_init(>lbinding_data);
+shash_init(>local_active_ports_ipv6_pd);
  
  /* Init the tracked data. */

  hmap_init(>tracked_dp_bindings);
@@ -1141,6 +1144,7 @@ en_runtime_data_cleanup(void *data)
  free(cur_node);
  }
  hmap_destroy(_data->local_datapaths);
+

Re: [ovs-dev] [PATCH v2 4/9] ovsdb: row: Add support for xor-based row updates.

2021-07-12 Thread Ilya Maximets
On 6/19/21 8:01 PM, Mark Gray wrote:
> On 12/06/2021 03:00, Ilya Maximets wrote:
>> This will be used to apply update3 type updates to ovsdb tables
>> while processing updates for future ovsdb 'relay' service model.
>>
>> 'ovsdb_datum_apply_diff' is allowed to fail, so adding support
>> to return this error.
>>
>> Signed-off-by: Ilya Maximets 
>> ---
>>  ovsdb/execution.c   |  5 +++--
>>  ovsdb/replication.c |  2 +-
>>  ovsdb/row.c | 30 +-
>>  ovsdb/row.h |  6 --
>>  ovsdb/table.c   |  9 +
>>  ovsdb/table.h   |  2 +-
>>  6 files changed, 39 insertions(+), 15 deletions(-)
>>
>> diff --git a/ovsdb/execution.c b/ovsdb/execution.c
>> index 3a0dad5d0..f6150e944 100644
>> --- a/ovsdb/execution.c
>> +++ b/ovsdb/execution.c
>> @@ -483,8 +483,9 @@ update_row_cb(const struct ovsdb_row *row, void *ur_)
>>  
>>  ur->n_matches++;
>>  if (!ovsdb_row_equal_columns(row, ur->row, ur->columns)) {
>> -ovsdb_row_update_columns(ovsdb_txn_row_modify(ur->txn, row),
>> - ur->row, ur->columns);
>> +ovsdb_error_assert(ovsdb_row_update_columns(
>> +   ovsdb_txn_row_modify(ur->txn, row),
>> +   ur->row, ur->columns, false));
>>  }
>>  
>>  return true;
>> diff --git a/ovsdb/replication.c b/ovsdb/replication.c
>> index b755976b0..d8b56d813 100644
>> --- a/ovsdb/replication.c
>> +++ b/ovsdb/replication.c
>> @@ -677,7 +677,7 @@ process_table_update(struct json *table_update, const 
>> char *table_name,
>>  struct ovsdb_error *error;
>>  error = (!new ? ovsdb_table_execute_delete(txn, , table)
>>   : !old ? ovsdb_table_execute_insert(txn, , table, new)
>> - : ovsdb_table_execute_update(txn, , table, new));
>> + : ovsdb_table_execute_update(txn, , table, new, 
>> false));
>>  if (error) {
>>  if (!strcmp(ovsdb_error_get_tag(error), "consistency 
>> violation")) {
>>  ovsdb_error_assert(error);
>> diff --git a/ovsdb/row.c b/ovsdb/row.c
>> index 755ab91a8..65a054621 100644
>> --- a/ovsdb/row.c
>> +++ b/ovsdb/row.c
>> @@ -163,20 +163,40 @@ ovsdb_row_equal_columns(const struct ovsdb_row *a,
>>  return true;
>>  }
>>  
>> -void
>> +struct ovsdb_error *
>>  ovsdb_row_update_columns(struct ovsdb_row *dst,
>>   const struct ovsdb_row *src,
>> - const struct ovsdb_column_set *columns)
>> + const struct ovsdb_column_set *columns,
>> + bool xor)
>>  {
>>  size_t i;
>>  
>>  for (i = 0; i < columns->n_columns; i++) {
>>  const struct ovsdb_column *column = columns->columns[i];
>> +struct ovsdb_datum xor_datum;
>> +struct ovsdb_error *error;
>> +
>> +if (xor) {
>> +error = ovsdb_datum_apply_diff(_datum,
>> +   >fields[column->index],
>> +   >fields[column->index],
>> +   >type);
>> +if (error) {
>> +return error;
>> +}
>> +}
>> +
>>  ovsdb_datum_destroy(>fields[column->index], >type);
>> -ovsdb_datum_clone(>fields[column->index],
>> -  >fields[column->index],
>> -  >type);
> 
> Could you move ovsdb_datum_destroy(>fields[column->index],
> >type) into the "else" clause below and then merge the "if"
> clause below into the "if" clause above?

We still need to destroy for both branches, so what I can do is
something like this:

@@ -184,13 +185,11 @@ ovsdb_row_update_columns(struct ovsdb_row *dst,
 if (error) {
 return error;
 }
-}
 
-ovsdb_datum_destroy(>fields[column->index], >type);
-
-if (xor) {
+ovsdb_datum_destroy(>fields[column->index], >type);
 ovsdb_datum_swap(>fields[column->index], _datum);
 } else {
+ovsdb_datum_destroy(>fields[column->index], >type);
 ovsdb_datum_clone(>fields[column->index],
   >fields[column->index],
   >type);
---

i.e. copy the ovsdb_datum_destroy() to both branches and merge the "if"s,
but I found this harder to read.

I'll keep as is for now, but if you think that above version will be better,
I can use it.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] reconnect: Add graceful reconnect.

2021-07-12 Thread Ilya Maximets
On 6/29/21 1:20 PM, Dumitru Ceara wrote:
> Until now clients that needed to reconnect immediately could only use
> reconnect_force_reconnect().  However, reconnect_force_reconnect()
> doesn't reset the backoff for connections that were alive long enough
> (more than backoff seconds).
> 
> Moreover, the reconnect library cannot determine the exact reason why a
> client wishes to initiate a reconnection.  In most cases reconnection
> happens because of a fatal error when communicating with the remote,
> e.g., in the ovsdb-cs layer, when invalid messages are received from
> ovsdb-server.  In such cases it makes sense to not reset the backoff
> because the remote seems to be unhealthy.
> 
> There are however cases when reconnection is needed for other reasons.
> One such example is when ovsdb-clients require "leader-only" connections
> to clustered ovsdb-server databases.  Whenever the client determines
> that the remote is not a leader anymore, it decides to reconnect to a
> new remote from its list, searching for the new leader.  Using
> jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will
> not reset backoff even though the former leader is still likely in good
> shape.
> 
> Since 3c2d6274bcee ("raft: Transfer leadership before creating
> snapshots.") leadership changes inside the clustered database happen
> more often and therefore "leader-only" clients need to reconnect more
> often too.  Not resetting the backoff every time a leadership change
> happens will cause all reconnections to happen with the maximum backoff
> (8 seconds) resulting in significant latency.
> 
> This commit also updates the Python reconnect and IDL implementations
> and adds tests for force-reconnect and graceful-reconnect.
> 
> Reported-at: https://bugzilla.redhat.com/1977264
> Signed-off-by: Dumitru Ceara 
> ---

Hi, Dumitru.

Thanks for working on this issue.  I've seen it in practice while running
OVN tests, but I still don't quiet understand why it happens.  Could you,
please, describe how state transitioning work here for the ovsdb-idl case?

> +# Forcefully reconnect.
> +force-reconnect
> +  in RECONNECT for 0 ms (2000 ms backoff)
> +  1 successful connections out of 3 attempts, seqno 2
> +  disconnected
> +run
> +  should disconnect
> +connecting
> +  in CONNECTING for 0 ms (2000 ms backoff)

Especially this part seems wrong to me.  Because after 'should disconnect'
there should be 'disconnect' of 'connect-fail', but not 'connecting'.  We
literally should disconnect here, otherwise it's a violation of the reconnect
API.  And my concern is that ovsdb-cs or jsonrpc violates the API somewhere
by not calling reconnect_disconnectd() when it is required, or there is some
other bug that makes 'reconnect' module to jump over few states in a fsm.

The logical workflow for the force-reconnect, from what I see in the code
should be:

1. force-reconnect --> transition to S_RECONNECT
2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT
3. disconnect -> check the state, update backoff and transition to S_BACKOFF
4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT
5. connected 

Something is fishy here, because ovsdb-cs somehow jumps over step #3 and
maybe also #4.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel: use /run instead of /var/run

2021-07-12 Thread Flavio Leitner
On Wed, May 12, 2021 at 05:08:08PM +0200, Timothy Redaelli wrote:
> Systemd unit file generates warnings about PID file path since /var/run
> is a legacy path so just use /run instead of /var/run.
> 
> /var/run is a symlink of /run starting from RHEL7 (and any other distribution
> that uses systemd).
> 
> Reported-at: https://bugzilla.redhat.com/1952081
> Signed-off-by: Timothy Redaelli 
> ---

Reproduced on F34:
Jul 12 17:03:28 p50 systemd[1]:
/usr/lib/systemd/system/ovs-vswitchd.service:12: PIDFile= references
a path below legacy directory /var/run/, updating
/var/run/openvswitch/ovs-vswitchd.pid →
/run/openvswitch/ovs-vswitchd.pid; please update the unit file
accordingly.

Acked-by: Flavio Leitner 

Thanks Timothy,
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix the description for LB's skip_snat option.

2021-07-12 Thread Numan Siddique
On Fri, Jul 9, 2021 at 2:49 PM Han Zhou  wrote:
>
> lb_force_snat_ip is a flag set in logical flow pipeline, while
> lb_force_snat_ip is the option configured in NB DB.  In NB document we
> should mention the actual option configured in NB instead of the flow
> details.
>
> Signed-off-by: Han Zhou 

Acked-by: Numan Siddique 

Numan

> ---
>  ovn-nb.xml | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index b6a0d1f43..d5efbb33e 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -1712,8 +1712,9 @@
>
>
>  If the load balancing rule is configured with skip_snat
> -option, the force_snat_for_lb option configured for the router
> -pipeline will not be applied for this load balancer.
> +option, the option lb_force_snat_ip configured for the logical router
> +that references this load balancer will not be applied for this load
> +balancer.
>
>
>
> --
> 2.30.2
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [ovn] branch-20.09 tests fail with OVS higher than 2.14.0

2021-07-12 Thread Numan Siddique
On Fri, Jul 9, 2021 at 9:01 AM Dumitru Ceara  wrote:
>
> On 7/8/21 6:34 PM, Vladislav Odintsov wrote:
> > Hi,
>
> Hi Vladislav,
>
> >
> > I see constantly failing test while OVN branch-20.09 against OVS higher 
> > than 2.14.0 (2.14.1, 2.14.2, branch-2.14):
> > ovn -- ensure one gw controller restart in HA doesn't bounce the master
> >
> > ##  ##
> > ## Tested programs. ##
> > ##  ##
> > ./testsuite.at:1: 
> > /builddir/build/BUILD/ovn-20.09.1/openvswitch-2.14.1/vswitchd/ovs-vswitchd 
> > --version
> > ovs-vswitchd (Open vSwitch) 2.14.1
> > ./testsuite.at:1: 
> > /builddir/build/BUILD/ovn-20.09.1/openvswitch-2.14.1/utilities/ovs-vsctl 
> > --version
> > ovs-vsctl (Open vSwitch) 2.14.1
> > DB Schema 8.2.0
> > ## -- ##
> > ## Running the tests. ##
> > ## -- ##
> > testsuite: starting at: Thu Jul  8 17:44:21 MSK 2021
> > testsuite: ending at: Thu Jul  8 17:44:52 MSK 2021
> > testsuite: test suite duration: 0h 0m 31s
> > ## - ##
> > ## Test results. ##
> > ## - ##
> > ERROR: 1 test was run,
> > 1 failed unexpectedly.
> > ##  ##
> > ## Summary of the failures. ##
> > ##  ##
> > Failed tests:
> > ovn 20.09.1 test suite test groups:
> >  NUM: FILE-NAME:LINE TEST-GROUP-NAME
> >   KEYWORDS
> >   91: ovn.at:12245   ovn -- ensure one gw controller restart in HA 
> > doesn't bounce the master
> > ## -- ##
> > ## Detailed failed tests. ##
> > ## -- ##
> > # -*- compilation -*-
> > 91. ovn.at:12245: testing ovn -- ensure one gw controller restart in HA 
> > doesn't bounce the master ...
> > creating ovn-sb database
> > creating ovn-nb database
> > starting ovn-northd
> > starting backup ovn-northd
> > adding simulator 'main'
> > adding simulator 'gw1'
> > adding simulator 'gw2'
> > adding simulator 'hv1'
> > ./ovn.at:12277: ovn_populate_arp__
> > stdout:
> > OK
> > OK
> > OK
> > OK
> > OK
> > OK
> > 194ab858-5fe5-448c-9600-f00a52a120e6
> > 511c7f52-8f85-4193-872b-f87c23420dfd
> > dc46bb8e-35f5-420c-8874-b493c843fd31
> > Waiting until 1 rows in sb Chassis with name=gw2...
> > ovn-macros.at:346: waiting until test $count = $(count_rows $db:$table $a 
> > $b $c)...
> > ovn-macros.at:346: wait failed after 30 seconds
> > sb table Chassis has the following rows. 0 rows match instead of expected 1:
> > _uuid   : a74cd080-1302-4224-9590-462017d88783
> > encaps  : [393b112b-329d-4db1-9926-cd06124a6f2b, 
> > df12c311-1563-4202-a197-02686d835867]
> > external_ids: {datapath-type="", 
> > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >  is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", 
> > ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-monitor-all="false"}
> > hostname: bldrvm02
> > name: hv1
> > nb_cfg  : 0
> > other_config: {datapath-type="", 
> > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >  is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", 
> > ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-monitor-all="false"}
> > transport_zones : []
> > vtep_logical_switches: []
> > _uuid   : 780ad589-47d9-4658-b1fb-e0ec96f96ad0
> > encaps  : [2bed8f4c-dc55-444d-a259-b12c04a63b62, 
> > 752086e4-6372-4840-9f92-fd4a8df3ba21]
> > external_ids: {datapath-type="", 
> > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >  is-interconn="false", ovn-bridge-mappings="phys:br-phys", 
> > ovn-chassis-mac-mappings="", ovn-cms-options="", 
> > ovn-enable-lflow-cache="true", ovn-monitor-all="false"}
> > hostname: bldrvm02
> > name: gw1
> > nb_cfg  : 0
> > other_config: {datapath-type="", 
> > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan",
> >  is-interconn="false", ovn-bridge-mappings="phys:br-phys", 
> > ovn-chassis-mac-mappings="", ovn-cms-options="", 
> > ovn-enable-lflow-cache="true", ovn-monitor-all="false"}
> > transport_zones : []
> > vtep_logical_switches: []
> > ./ovs-macros.at:222: hard failure
> > 91. ovn.at:12245: 91. ovn -- ensure one gw controller restart in HA doesn't 
> > bounce the master (ovn.at:12245): FAILED (ovs-macros.at:222)
> >
> >
> > I also tried to build OVN with OVS 2.15 and it doesn’t build at all because 
> > of renaming "slave" to "member" in OVS.
> >
> > Some questions here:
>
> I'm not a maintainer (maintainers in cc) but I'll try to answer some of
> your questions.
>
> >
> > 1. As I understand, changes in OVS branch brought regression in OVN. 
> > 

Re: [ovs-dev] [PATCH ovn branch-20.09] ovn-controller: Monitor chassis_private by chassis name.

2021-07-12 Thread Numan Siddique
On Fri, Jul 9, 2021 at 11:54 AM Dumitru Ceara  wrote:
>
> The backport looks good to me, thanks!

Thanks.  I applied this patch to branch-20.09.

Numan

>
> On 7/9/21 4:00 PM, Vladislav Odintsov wrote:
> > Signed-off-by: Vladislav Odintsov 
> >
> > Regards,
> > Vladislav Odintsov
> >
> >> On 9 Jul 2021, at 16:55, Vladislav Odintsov  wrote:
> >>
> >> Acked-by: Vladislav Odintsov 
> >>
> >> Regards,
> >> Vladislav Odintsov
> >>
> >>> On 9 Jul 2021, at 16:10, Vladislav Odintsov  wrote:
> >>>
> >>> From: Dumitru Ceara 
> >>>
> >>> Remove the use of sbrec_chassis_is_new() for uncommitted records.  This
> >>> is not the way IDL *_is_new() functions are supposed to be used.
> >>>
> >>> Note: With this change if the system-id changes there will be a
> >>> transient error in ovn-controller due to ovn-controller trying to insert
> >>> a new chassis_private record.  This is due to the fact that the view of
> >>> the chassis_private table changes and only chassis_private records
> >>> matching the new chassis name are sent to ovn-controller.  This gets
> >>> corrected though in the next iteration of the ovn-controller processing
> >>> loop.
> >>>
> >>> Suggested-by: Han Zhou 
> >>> Reported-at: 
> >>> https://mail.openvswitch.org/pipermail/ovs-dev/2020-October/376339.html
> >>> Fixes: dce1af31b550 ("chassis: Fix chassis_private record updates when 
> >>> the system-id changes.")
> >>> Signed-off-by: Dumitru Ceara 
> >>> Acked-by: Mark Gray 
> >>> Signed-off-by: Han Zhou 
> >>> (cherry picked from commit 1f915da95dc725131b7df094d494af9fda88ea92)
> >>> ---
> >>> controller/ovn-controller.c | 6 +++---
> >>> 1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>> index 3665c7b4e..b154a8486 100644
> >>> --- a/controller/ovn-controller.c
> >>> +++ b/controller/ovn-controller.c
> >>> @@ -181,7 +181,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >>> * chassis */
> >>>sbrec_port_binding_add_clause_type(, OVSDB_F_EQ, "chassisredirect");
> >>>sbrec_port_binding_add_clause_type(, OVSDB_F_EQ, "external");
> >>> -if (chassis && !sbrec_chassis_is_new(chassis)) {
> >>> +if (chassis) {
> >>>/* This should be mostly redundant with the other clauses for port
> >>> * bindings, but it allows us to catch any ports that are assigned 
> >>> to
> >>> * us but should not be.  That way, we can clear their chassis
> >>> @@ -205,8 +205,8 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >>>>header_.uuid);
> >>>
> >>>/* Monitors Chassis_Private record for current chassis only. */
> >>> -sbrec_chassis_private_add_clause_chassis(, OVSDB_F_EQ,
> >>> - >header_.uuid);
> >>> +sbrec_chassis_private_add_clause_name(, OVSDB_F_EQ,
> >>> +  chassis->name);
> >>>} else {
> >>>/* During initialization, we monitor all records in 
> >>> Chassis_Private so
> >>> * that we don't try to recreate existing ones. */
> >>> --
> >>> 2.30.0
> >>>
> >>
> >> ___
> >> dev mailing list
> >> d...@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v2] ovn-controller: Propagate nb-cfg-ts to local OVSDB.

2021-07-12 Thread Numan Siddique
On Thu, Jun 17, 2021 at 3:36 AM Dumitru Ceara  wrote:
>
> Also store the timestamp when ovn-controller started up.  This helps
> implementing alerts on the CMS side to detect whether ovn-controller is
> still alive and functioning well.
>
> Reported-at: https://bugzilla.redhat.com/1924751
> Reported-by: Casey Callendrello 
> Signed-off-by: Dumitru Ceara 
> ---

Thanks Dumitru and Mark.

I applied this patch to the main branch.

Numan

> v2:
> - Addressed Mark's comments:
>   - added units to documentation of timestamp fields.
>   - rephrased test comment.
>   - did *not* implement the micro optimization suggestion because
> there's a chance the local ovsdb gets out of sync (e.g., txns fail
> or values are changed externally) and ovn-controller should
> reconciliate the database.
> ---
>  controller/ovn-controller.8.xml | 25 +
>  controller/ovn-controller.c | 29 +++--
>  tests/ovn-controller.at | 11 +++
>  3 files changed, 59 insertions(+), 6 deletions(-)
>
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index 8886df568..77067c3a3 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -418,6 +418,18 @@
>  
>
>
> +  
> +external-ids:ovn-startup-ts in the Bridge
> +table
> +  
> +
> +  
> +
> +  This key represents the timestamp (in milliseconds) at which
> +  ovn-controller process was started.
> +
> +  
> +
>
>  external-ids:ovn-nb-cfg in the Bridge table
>
> @@ -429,6 +441,19 @@
>flows have been successfully installed in OVS.
>  
>
> +
> +  
> +external-ids:ovn-nb-cfg-ts in the Bridge
> +table
> +  
> +
> +  
> +
> +  This key represents the timestamp (in milliseconds) of the last 
> known
> +  OVN_Southbound.SB_Global.nb_cfg value for which all
> +  flows have been successfully installed in OVS.
> +
> +  
>  
>
>  OVN Southbound Database Usage
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index addb08755..2f8ceff9f 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -94,6 +94,8 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
>  #define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-flow-generation"
>
>  #define OVS_NB_CFG_NAME "ovn-nb-cfg"
> +#define OVS_NB_CFG_TS_NAME "ovn-nb-cfg-ts"
> +#define OVS_STARTUP_TS_NAME "ovn-startup-ts"
>
>  static char *parse_options(int argc, char *argv[]);
>  OVS_NO_RETURN static void usage(void);
> @@ -788,19 +790,30 @@ static void
>  store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn,
>   const struct sbrec_chassis_private *chassis,
>   const struct ovsrec_bridge *br_int,
> - unsigned int delay_nb_cfg_report)
> + unsigned int delay_nb_cfg_report, int64_t startup_ts)
>  {
>  struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos =
>  ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg);
>  uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked;
>
> +if (ovs_txn && br_int
> +&& startup_ts != smap_get_ullong(_int->external_ids,
> + OVS_STARTUP_TS_NAME, 0)) {
> +char *startup_ts_str = xasprintf("%"PRId64, startup_ts);
> +ovsrec_bridge_update_external_ids_setkey(br_int, OVS_STARTUP_TS_NAME,
> + startup_ts_str);
> +free(startup_ts_str);
> +}
> +
>  if (!cur_cfg) {
>  goto done;
>  }
>
> +long long ts_now = time_wall_msec();
> +
>  if (sb_txn && chassis && cur_cfg != chassis->nb_cfg) {
>  sbrec_chassis_private_set_nb_cfg(chassis, cur_cfg);
> -sbrec_chassis_private_set_nb_cfg_timestamp(chassis, 
> time_wall_msec());
> +sbrec_chassis_private_set_nb_cfg_timestamp(chassis, ts_now);
>
>  if (delay_nb_cfg_report) {
>  VLOG_INFO("Sleep for %u sec", delay_nb_cfg_report);
> @@ -808,12 +821,15 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct 
> ovsdb_idl_txn *ovs_txn,
>  }
>  }
>
> -if (ovs_txn && br_int &&
> -cur_cfg != smap_get_ullong(_int->external_ids,
> -   OVS_NB_CFG_NAME, 0)) {
> +if (ovs_txn && br_int && cur_cfg != 
> smap_get_ullong(_int->external_ids,
> +OVS_NB_CFG_NAME, 0)) 
> {
> +char *cur_cfg_ts_str = xasprintf("%lld", ts_now);
>  char *cur_cfg_str = xasprintf("%"PRId64, cur_cfg);
>  ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_NAME,
>   cur_cfg_str);
> +ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_TS_NAME,
> + 

Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs

2021-07-12 Thread Flavio Leitner


Hi Kumar,

There is an issue with the signed-offs reported by 0-day Robot.
For additional info, please check the link below and look for the
tag Co-authored-by:
https://github.com/openvswitch/ovs/blob/master/Documentation/internals/contributing/submitting-patches.rst#tags

Otherwise the patch looks good time.
Thanks,
fbl

On Mon, Jul 12, 2021 at 11:44:05AM +0530, kumar Amber wrote:
> From: Harry van Haaren 
> 
> This commit avoids many instances of "using subtable X for miniflow (x,y)"
> in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs
> when no specialized subtable is found, and the generic "_any" version of
> the avx512 subtable search implementation was used. This change logs the
> subtable usage once, avoiding duplicates.
> 
> Signed-off-by: Harry van Haaren 
> Signed-off-by: kumar Amber 
> 
> ---
> v4:
> - add doc updtae from Flavio
> v3:
> - add comments from Flavio
> - add documentation update
> ---
>  Documentation/topics/dpdk/bridge.rst   | 34 ++
>  lib/dpif-netdev-lookup-avx512-gather.c |  4 +--
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 0f70a0cad..374e03eb0 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is not 
> used. Put in logical
>  terms, a subtable is chosen if its priority is greater than the previous
>  best candidate.
>  
> +Optimizing Specific Subtable Search
> +~~~
> +
> +During the packet classification, the datapath can use specialized
> +lookup tables to optimize the search. However, not all situations
> +are optimized. If you see a message like the following one in the OVS
> +logs, it means that there is no specialized implementation available
> +for the current networking traffic. In this case, OVS will continue
> +to process the traffic normally using a more generic lookup table."
> +
> +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others."
> +
> +(Note that the numbers 4 and 1 will likely be different in your logs)
> +
> +Additional specialized lookups can be added to OVS if the user
> +provides that log message along with the command output as show
> +below to the OVS mailing list. Note that the numbers in the log
> +message ("subtable (X,Y)") need to match with the numbers in
> +the provided command output ("dp-extra-info:miniflow_bits(X,Y)").
> +
> +"ovs-appctl dpctl/dump-flows -m", which results in output like this:
> +
> +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, 
> skb_priority(0/0),skb_mark(0/0)
> +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),
> +
> dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00:
> +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type(
> +
> 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0
> +,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0,
> +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs,
> +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1)
> +
> +Please send an email to the OVS mailing list ovs-dev@openvswitch.org with
> +the output of the "dp-extra-info:miniflow_bits(4,1)" values.
> +
>  CPU ISA Testing and Validation
>  ~~
>  
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> index bc359dc4a..ced846aa7 100644
> --- a/lib/dpif-netdev-lookup-avx512-gather.c
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, 
> uint32_t u1_bits)
>   */
>  if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) {
>  f = dpcls_avx512_gather_mf_any;
> -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n",
> -  u0_bits, u1_bits);
> +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable"
> +   " (%d,%d) and possibly others.", u0_bits, u1_bits);
>  }
>  
>  return f;
> -- 
> 2.25.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

-- 
fbl
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v3 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list

2021-07-12 Thread Lorenzo Bianconi
Incrementally manage local_active_ports_ipv6_pd map for interfaces
where IPv6 prefix-delegation has been enabled. This patch allows to
avoid looping over all local interfaces to check if prefix-delegation
is running on the current port binding.

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
 controller/binding.c|  32 +++
 controller/binding.h|   1 +
 controller/ovn-controller.c |  11 +++-
 controller/ovn-controller.h |   6 ++
 controller/pinctrl.c| 107 +---
 controller/pinctrl.h|   4 +-
 6 files changed, 103 insertions(+), 58 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..9711ac850 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb,
 }
 }
 
+static void
+update_active_pb_ras_pd(const struct sbrec_port_binding *pb,
+struct hmap *local_datapaths,
+struct shash *map, const char *conf)
+{
+bool ras_pd_conf = smap_get_bool(>options, conf, false);
+struct shash_node *iter = shash_find(map, pb->logical_port);
+
+if (iter && !ras_pd_conf) {
+shash_delete(map, iter);
+return;
+}
+struct pb_ld_binding *ras_pd = NULL;
+if (!iter && ras_pd_conf) {
+ras_pd = xzalloc(sizeof *ras_pd);
+ras_pd->pb = pb;
+shash_add(map, pb->logical_port, ras_pd);
+}
+if (ras_pd) {
+ras_pd->ld = get_local_datapath(local_datapaths,
+pb->datapath->tunnel_key);
+}
+}
+
 /* Corresponds to each Port_Binding.type. */
 enum en_lport_type {
 LP_UNKNOWN,
@@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 const struct sbrec_port_binding *pb;
 SBREC_PORT_BINDING_TABLE_FOR_EACH (pb,
b_ctx_in->port_binding_table) {
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
 enum en_lport_type lport_type = get_lport_type(pb);
 
 switch (lport_type) {
@@ -2482,6 +2510,10 @@ delete_done:
 continue;
 }
 
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
 enum en_lport_type lport_type = get_lport_type(pb);
 
 struct binding_lport *b_lport =
diff --git a/controller/binding.h b/controller/binding.h
index a08011ae2..60ad49da0 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *);
 
 struct binding_ctx_out {
 struct hmap *local_datapaths;
+struct shash *local_active_ports_ipv6_pd;
 struct local_binding_data *lbinding_data;
 
 /* sset of (potential) local lports. */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 6a9c25f28..c4eb54755 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1029,6 +1029,8 @@ struct ed_type_runtime_data {
 bool tracked;
 bool local_lports_changed;
 struct hmap tracked_dp_bindings;
+
+struct shash local_active_ports_ipv6_pd;
 };
 
 /* struct ed_type_runtime_data has the below members for tracking the
@@ -1116,6 +1118,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
 sset_init(>egress_ifaces);
 smap_init(>local_iface_ids);
 local_binding_data_init(>lbinding_data);
+shash_init(>local_active_ports_ipv6_pd);
 
 /* Init the tracked data. */
 hmap_init(>tracked_dp_bindings);
@@ -1141,6 +1144,7 @@ en_runtime_data_cleanup(void *data)
 free(cur_node);
 }
 hmap_destroy(_data->local_datapaths);
+shash_destroy(_data->local_active_ports_ipv6_pd);
 local_binding_data_destroy(_data->lbinding_data);
 }
 
@@ -1219,6 +1223,8 @@ init_binding_ctx(struct engine_node *node,
 b_ctx_in->ovs_table = ovs_table;
 
 b_ctx_out->local_datapaths = _data->local_datapaths;
+b_ctx_out->local_active_ports_ipv6_pd =
+_data->local_active_ports_ipv6_pd;
 b_ctx_out->local_lports = _data->local_lports;
 b_ctx_out->local_lports_changed = false;
 b_ctx_out->related_lports = _data->related_lports;
@@ -1236,6 +1242,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 {
 struct ed_type_runtime_data *rt_data = data;
 struct hmap *local_datapaths = _data->local_datapaths;
+struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd;
 struct sset *local_lports = _data->local_lports;
 struct sset *active_tunnels = _data->active_tunnels;
 
@@ -1251,6 +1258,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 free(cur_node);
 }

[ovs-dev] [PATCH v3 ovn 2/2] controller: incrementally create ras port_binding list

2021-07-12 Thread Lorenzo Bianconi
Incrementally manage local_active_ports_ras map for interfaces
where periodic router advertisement has been enabled. This patch
allows to avoid looping over all local interfaces to check if
periodic RA is running on the current port binding.

Acked-by: Mark Michelson 
Signed-off-by: Lorenzo Bianconi 
---
 controller/binding.c|  7 +++
 controller/binding.h|  1 +
 controller/ovn-controller.c | 10 +++-
 controller/pinctrl.c| 93 -
 controller/pinctrl.h|  3 +-
 5 files changed, 69 insertions(+), 45 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 9711ac850..09793a6f6 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
 update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
 b_ctx_out->local_active_ports_ipv6_pd,
 "ipv6_prefix_delegation");
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
 
 enum en_lport_type lport_type = get_lport_type(pb);
 
@@ -2514,6 +2517,10 @@ delete_done:
 b_ctx_out->local_active_ports_ipv6_pd,
 "ipv6_prefix_delegation");
 
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
+
 enum en_lport_type lport_type = get_lport_type(pb);
 
 struct binding_lport *b_lport =
diff --git a/controller/binding.h b/controller/binding.h
index 60ad49da0..77197e742 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *);
 struct binding_ctx_out {
 struct hmap *local_datapaths;
 struct shash *local_active_ports_ipv6_pd;
+struct shash *local_active_ports_ras;
 struct local_binding_data *lbinding_data;
 
 /* sset of (potential) local lports. */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index c4eb54755..34134e87b 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1031,6 +1031,7 @@ struct ed_type_runtime_data {
 struct hmap tracked_dp_bindings;
 
 struct shash local_active_ports_ipv6_pd;
+struct shash local_active_ports_ras;
 };
 
 /* struct ed_type_runtime_data has the below members for tracking the
@@ -1119,6 +1120,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
 smap_init(>local_iface_ids);
 local_binding_data_init(>lbinding_data);
 shash_init(>local_active_ports_ipv6_pd);
+shash_init(>local_active_ports_ras);
 
 /* Init the tracked data. */
 hmap_init(>tracked_dp_bindings);
@@ -1145,6 +1147,7 @@ en_runtime_data_cleanup(void *data)
 }
 hmap_destroy(_data->local_datapaths);
 shash_destroy(_data->local_active_ports_ipv6_pd);
+shash_destroy(_data->local_active_ports_ras);
 local_binding_data_destroy(_data->lbinding_data);
 }
 
@@ -1225,6 +1228,8 @@ init_binding_ctx(struct engine_node *node,
 b_ctx_out->local_datapaths = _data->local_datapaths;
 b_ctx_out->local_active_ports_ipv6_pd =
 _data->local_active_ports_ipv6_pd;
+b_ctx_out->local_active_ports_ras =
+_data->local_active_ports_ras;
 b_ctx_out->local_lports = _data->local_lports;
 b_ctx_out->local_lports_changed = false;
 b_ctx_out->related_lports = _data->related_lports;
@@ -1243,6 +1248,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 struct ed_type_runtime_data *rt_data = data;
 struct hmap *local_datapaths = _data->local_datapaths;
 struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd;
+struct shash *local_active_ras = _data->local_active_ports_ras;
 struct sset *local_lports = _data->local_lports;
 struct sset *active_tunnels = _data->active_tunnels;
 
@@ -1259,6 +1265,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
 }
 hmap_clear(local_datapaths);
 shash_clear(local_active_ipv6_pd);
+shash_clear(local_active_ras);
 local_binding_data_destroy(_data->lbinding_data);
 sset_destroy(local_lports);
 related_lports_destroy(_data->related_lports);
@@ -3274,7 +3281,8 @@ main(int argc, char *argv[])
 br_int, chassis,
 _data->local_datapaths,
 _data->active_tunnels,
-_data->local_active_ports_ipv6_pd);
+_data->local_active_ports_ipv6_pd,
+_data->local_active_ports_ras);
 /* Updating monitor conditions 

[ovs-dev] [PATCH v3 ovn 0/2] incrementally process ras-ipv6 pd router ports

2021-07-12 Thread Lorenzo Bianconi
https://bugzilla.redhat.com/show_bug.cgi?id=1944220

Changes since v2:
- use smap_get_bool instead of smap_get in update_active_pb_ras_pd routine

Changes since v1:
- use shash instead of hamp
- always remove the entry from shash if the user removed the ipv6_pd/ras info
  in port_binding option column

Lorenzo Bianconi (2):
  controller: incrementally create ipv6 prefix delegation port_binding
list
  controller: incrementally create ras port_binding list

 controller/binding.c|  39 +++
 controller/binding.h|   2 +
 controller/ovn-controller.c |  19 +++-
 controller/ovn-controller.h |   6 ++
 controller/pinctrl.c| 198 ++--
 controller/pinctrl.h|   5 +-
 6 files changed, 169 insertions(+), 100 deletions(-)

-- 
2.31.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch

2021-07-12 Thread Flavio Leitner


Hi Joe,

Maybe you can take a look...
Thanks,
fbl

On Thu, Jul 08, 2021 at 11:40:12AM -0300, Flavio Leitner wrote:
> 
> Hi Pravin,
> 
> Any thoughts on this patch? We are closing OVS 2.16, so it would
> be nice to know if it looks okay or needs changes, specially
> changes related to the userspace interface.
> 
> Thanks,
> fbl
> 
> On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote:
> > The Open vSwitch kernel module uses the upcall mechanism to send
> > packets from kernel space to user space when it misses in the kernel
> > space flow table. The upcall sends packets via a Netlink socket.
> > Currently, a Netlink socket is created for every vport. In this way,
> > there is a 1:1 mapping between a vport and a Netlink socket.
> > When a packet is received by a vport, if it needs to be sent to
> > user space, it is sent via the corresponding Netlink socket.
> > 
> > This mechanism, with various iterations of the corresponding user
> > space code, has seen some limitations and issues:
> > 
> > * On systems with a large number of vports, there is a correspondingly
> > large number of Netlink sockets which can limit scaling.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306)
> > * Packet reordering on upcalls.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576)
> > * A thundering herd issue.
> > (https://bugzilla.redhat.com/show_bug.cgi?id=183)
> > 
> > This patch introduces an alternative, feature-negotiated, upcall
> > mode using a per-cpu dispatch rather than a per-vport dispatch.
> > 
> > In this mode, the Netlink socket to be used for the upcall is
> > selected based on the CPU of the thread that is executing the upcall.
> > In this way, it resolves the issues above as:
> > 
> > a) The number of Netlink sockets scales with the number of CPUs
> > rather than the number of vports.
> > b) Ordering per-flow is maintained as packets are distributed to
> > CPUs based on mechanisms such as RSS and flows are distributed
> > to a single user space thread.
> > c) Packets from a flow can only wake up one user space thread.
> > 
> > The corresponding user space code can be found at:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html
> > 
> > Bugzilla: https://bugzilla.redhat.com/1844576
> > Signed-off-by: Mark Gray 
> > ---
> > 
> > Notes:
> > v1 - Reworked based on Flavio's comments:
> >  * Fixed handling of userspace action case
> >  * Renamed 'struct dp_portids'
> >  * Fixed handling of return from kmalloc()
> >  * Removed check for dispatch type from ovs_dp_get_upcall_portid()
> >- Reworked based on Dan's comments:
> >  * Fixed handling of return from kmalloc()
> >- Reworked based on Pravin's comments:
> >  * Fixed handling of userspace action case
> >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids
> > 
> >  include/uapi/linux/openvswitch.h |  8 
> >  net/openvswitch/actions.c|  6 ++-
> >  net/openvswitch/datapath.c   | 70 +++-
> >  net/openvswitch/datapath.h   | 20 +
> >  4 files changed, 101 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/uapi/linux/openvswitch.h 
> > b/include/uapi/linux/openvswitch.h
> > index 8d16744edc31..6571b57b2268 100644
> > --- a/include/uapi/linux/openvswitch.h
> > +++ b/include/uapi/linux/openvswitch.h
> > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd {
> >   * set on the datapath port (for OVS_ACTION_ATTR_MISS).  Only valid on
> >   * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should
> >   * not be sent.
> > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when
> > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set.
> >   * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through 
> > the
> >   * datapath.  Always present in notifications.
> >   * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for 
> > the
> > @@ -87,6 +89,9 @@ enum ovs_datapath_attr {
> > OVS_DP_ATTR_USER_FEATURES,  /* OVS_DP_F_*  */
> > OVS_DP_ATTR_PAD,
> > OVS_DP_ATTR_MASKS_CACHE_SIZE,
> > +   OVS_DP_ATTR_PER_CPU_PIDS,   /* Netlink PIDS to receive upcalls in 
> > per-cpu
> > +* dispatch mode
> > +*/
> > __OVS_DP_ATTR_MAX
> >  };
> >  
> > @@ -127,6 +132,9 @@ struct ovs_vport_stats {
> >  /* Allow tc offload recirc sharing */
> >  #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2)
> >  
> > +/* Allow per-cpu dispatch of upcalls */
> > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU   (1 << 3)
> > +
> >  /* Fixed logical ports. */
> >  #define OVSP_LOCAL  ((__u32)0)
> >  
> > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
> > index ef15d9eb4774..f79679746c62 100644
> > --- a/net/openvswitch/actions.c
> > +++ b/net/openvswitch/actions.c
> > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, 
> > struct sk_buff *skb,
> >

Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-12 Thread 0-day Robot
Bleep bloop.  Greetings Anton Ivanov, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line has trailing whitespace
#171 FILE: lib/timeval.c:327:
 * shortcut. Otherwise there is at least one fd in it for 

Lines checked: 192, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 2/2] ipsec.at: Fix ipsec test flake

2021-07-12 Thread Mark Gray
Change order of command execution and add `ovn-nbctl --wait=hv sync`.
This ensures the vswitchd ovsdb instance is updated by the time
it is checked.

Fixes: ff2b6ff69740 ("ovn-controller: Add 'local_ip' option to tunnel ports for 
IPsec case")
Signed-off-by: Mark Gray 
---
 tests/ovn-ipsec.at | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/ovn-ipsec.at b/tests/ovn-ipsec.at
index 389ccff5836a..4c600a9f2779 100644
--- a/tests/ovn-ipsec.at
+++ b/tests/ovn-ipsec.at
@@ -14,9 +14,6 @@ ovn-nbctl lsp-set-addresses lp2 "f0:00:00:00:00:02 10.1.1.2"
 
 net_add n1   # Network to connect hv1 and hv2
 
-# Enable IPsec
-ovn-nbctl set nb_global . ipsec=true
-
 # Create hypervisor hv1 connected to n1
 sim_add hv1
 as hv1
@@ -45,6 +42,11 @@ ovs-vsctl \
 -- set Open_vSwitch . other_config:private_key=dummy-privkey.pem \
 -- set Open_vSwitch . other_config:ca_cert=dummy-cacert.pem
 
+# Enable IPsec
+ovn-nbctl set nb_global . ipsec=true
+
+check ovn-nbctl --wait=hv sync
+
 AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:remote_ip | tr -d 
'"\n'], [0], [192.168.0.1])
 AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:local_ip | tr -d 
'"\n'], [0], [192.168.0.2])
 AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:remote_name | tr -d 
'\n'], [0], [hv1])
-- 
2.27.0

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 0/2] tests: Fix test flakes

2021-07-12 Thread Mark Gray
Fix 2 test flakes that have been observed in the OVS CI

Mark Gray (2):
  system-test: Fix flake in ECMP IPv6 symmetric reply test
  ipsec.at: Fix ipsec test flake

 tests/ovn-ipsec.at  |  8 ---
 tests/system-ovn.at | 51 +
 2 files changed, 33 insertions(+), 26 deletions(-)

-- 
2.27.0


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn 1/2] system-test: Fix flake in ECMP IPv6 symmetric reply test

2021-07-12 Thread Mark Gray
Statically add IPv6 neighbor MAC addresses to avoid NS messages
evicting datapath flows causing occasional test failures.

We also configure all interfaces to send only one IPv6 router
solicitation message. These messages can cause datapath flows
to be unexpectedly evicted causing test failures.

Fixes: 7c927c0c0be1 ("ovn-northd: Fix IPv6 ECMP symmetric reply flows")
Signed-off-by: Mark Gray 
---
 tests/system-ovn.at | 51 +
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index 79879c6e003b..fc377bbd1a47 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -5833,17 +5833,35 @@ ovn-nbctl lr-route-add R3 fd01::/64 fd02::1
 
 # Logical port 'alice1' in switch 'alice'.
 ADD_NAMESPACES(alice1)
+# Only send 1 router solicitation as any additional ones can cause datapath
+# flows to get evicted, causing unexpected failures below.
+NS_CHECK_EXEC([alice1], [sysctl -w 
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
+net.ipv6.conf.default.router_solicitations = 1
+])
 ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \
  "fd01::1")
 OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep 
tentative)" = ""])
 ovn-nbctl lsp-add alice alice1 \
 -- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2"
+# Add neighbour MAC address to avoid sending IPv6 NS messages which could
+# cause datapath flows to be evicted
+NS_CHECK_EXEC([alice1], [ip -6 neigh add fd01::1 lladdr 00:00:01:01:02:03 dev 
alice1], [0])
 
 # Logical port 'bob1' in switch 'bob'.
 ADD_NAMESPACES(bob1)
+# Only send 1 router solicitation as any additional ones can cause datapath
+# flows to get evicted, causing unexpected failures below.
+NS_CHECK_EXEC([bob1], [sysctl -w 
net.ipv6.conf.default.router_solicitations=1], [0], [dnl
+net.ipv6.conf.default.router_solicitations = 1
+])
 ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \
  "fd07::2")
 OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep 
tentative)" = ""])
+# Add neighbour MAC addresses to avoid sending IPv6 NS messages which could
+# cause datapath flows to be evicted
+NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::2 lladdr 00:00:02:01:02:03 dev 
bob1], [0])
+NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::3 lladdr 00:00:01:01:02:04 dev 
bob1], [0])
+
 ovn-nbctl lsp-add bob bob1 \
 -- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1"
 
@@ -5852,45 +5870,32 @@ ovn-nbctl --wait=hv sync
 
 on_exit 'ovs-ofctl dump-flows br-int'
 
-# Later in this test we will check for a datapath flow that matches:
-# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)". Due
-# to the way OVS generates datapath flows with wildcards, ICMPv6 NS flows will
-# evict this datapath flow. In order to ensure that the flow does not
-# get evicted, we send one ping packet in order to carry out neighbor
-# discovery. We then flush the datpath to remove the NS flows so that the flow
-# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)" will
-# be present when we check for it.
-NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \
-[0], [dnl
-2 packets transmitted, 2 received, 0% packet loss, time 0ms
-])
-ovs-appctl dpctl/del-flows
-
 # 'bob1' should be able to ping 'alice1' directly.
 NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \
 [0], [dnl
 20 packets transmitted, 20 received, 0% packet loss, time 0ms
 ])
 
-# Ensure conntrack entry is present. We should not try to predict
-# the tunnel key for the output port, so we strip it from the labels
-# and just ensure that the known ethernet address is present.
-AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
-sed -e 's/zone=[[0-9]]*/zone=/' |
-sed -e 
's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'], 
[0], [dnl
-icmpv6,orig=(src=fd07::1,dst=fd01::2,id=,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=,type=129,code=0),zone=,labels=0x04010204
-])
-
 # Ensure datapaths show conntrack states as expected
 # Like with conntrack entries, we shouldn't try to predict
 # port binding tunnel keys. So omit them from expected labels.
 AT_CHECK([ovs-appctl dpctl/dump-flows | grep 
'ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)' -c], [0], 
[dnl
 1
 ])
+
 AT_CHECK([ovs-appctl dpctl/dump-flows | grep 
'ct_state(-new+est+rpl+trk).*ct_label(0x.*04010204/.*)' -c], [0], 
[dnl
 1
 ])
 
+# Ensure conntrack entry is present. We should not try to predict
+# the tunnel key for the output port, so we strip it from the labels
+# and just ensure that the known ethernet address is present.
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
+sed -e 's/zone=[[0-9]]*/zone=/' |
+sed -e 
's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'], 
[0], [dnl

[ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()

2021-07-12 Thread anton . ivanov
From: Anton Ivanov 

If we are not obtaining any useful information out of the poll(),
such as is a fd busy or not, we do not need to do a poll() if
an immediate_wake() has been requested.

This cuts out all the pollfd hash additions, forming the poll
arguments and the actual poll() after a call to
poll_immediate_wake()

Signed-off-by: Anton Ivanov 
---
 lib/poll-loop.c | 69 -
 lib/timeval.c   | 11 +++-
 2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/lib/poll-loop.c b/lib/poll-loop.c
index 4e751ff2c..09bc4f5c4 100644
--- a/lib/poll-loop.c
+++ b/lib/poll-loop.c
@@ -53,6 +53,7 @@ struct poll_loop {
  * wake up immediately, or LLONG_MAX to wait forever. */
 long long int timeout_when; /* In msecs as returned by time_msec(). */
 const char *timeout_where;  /* Where 'timeout_when' was set. */
+bool immediate_wake;
 };
 
 static struct poll_loop *poll_loop(void);
@@ -107,6 +108,13 @@ poll_create_node(int fd, HANDLE wevent, short int events, 
const char *where)
 
 COVERAGE_INC(poll_create_node);
 
+if (loop->immediate_wake) {
+/* We have been asked to bail out of this poll loop.
+ * There is no point to engage in yack shaving a poll hmap.
+ */
+return;
+}
+
 /* Both 'fd' and 'wevent' cannot be set. */
 ovs_assert(!fd != !wevent);
 
@@ -181,8 +189,15 @@ poll_wevent_wait_at(HANDLE wevent, const char *where)
 void
 poll_timer_wait_at(long long int msec, const char *where)
 {
-long long int now = time_msec();
+long long int now;
 long long int when;
+struct poll_loop *loop = poll_loop();
+
+if (loop->immediate_wake) {
+return;
+}
+
+now = time_msec();
 
 if (msec <= 0) {
 /* Wake up immediately. */
@@ -229,7 +244,9 @@ poll_timer_wait_until_at(long long int when, const char 
*where)
 void
 poll_immediate_wake_at(const char *where)
 {
+struct poll_loop *loop = poll_loop();
 poll_timer_wait_at(0, where);
+loop->immediate_wake = true;
 }
 
 /* Logs, if appropriate, that the poll loop was awakened by an event
@@ -320,10 +337,10 @@ poll_block(void)
 {
 struct poll_loop *loop = poll_loop();
 struct poll_node *node;
-struct pollfd *pollfds;
+struct pollfd *pollfds = NULL;
 HANDLE *wevents = NULL;
 int elapsed;
-int retval;
+int retval = 0;
 int i;
 
 /* Register fatal signal events before actually doing any real work for
@@ -335,34 +352,38 @@ poll_block(void)
 }
 
 timewarp_run();
-pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds);
+if (!loop->immediate_wake) {
+pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds);
 
 #ifdef _WIN32
-wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents);
+wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents);
 #endif
 
-/* Populate with all the fds and events. */
-i = 0;
-HMAP_FOR_EACH (node, hmap_node, >poll_nodes) {
-pollfds[i] = node->pollfd;
+/* Populate with all the fds and events. */
+i = 0;
+HMAP_FOR_EACH (node, hmap_node, >poll_nodes) {
+pollfds[i] = node->pollfd;
 #ifdef _WIN32
-wevents[i] = node->wevent;
-if (node->pollfd.fd && node->wevent) {
-short int wsa_events = 0;
-if (node->pollfd.events & POLLIN) {
-wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
-}
-if (node->pollfd.events & POLLOUT) {
-wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
+wevents[i] = node->wevent;
+if (node->pollfd.fd && node->wevent) {
+short int wsa_events = 0;
+if (node->pollfd.events & POLLIN) {
+wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE;
+}
+if (node->pollfd.events & POLLOUT) {
+wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE;
+}
+WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
 }
-WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events);
-}
 #endif
-i++;
-}
+i++;
+}
 
-retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents,
-   loop->timeout_when, );
+retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents,
+   loop->timeout_when, );
+} else {
+retval = time_poll(NULL, 0, NULL, loop->timeout_when, );
+}
 if (retval < 0) {
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 VLOG_ERR_RL(, "poll: %s", ovs_strerror(-retval));
@@ -381,6 +402,7 @@ poll_block(void)
 free_poll_nodes(loop);
 loop->timeout_when = LLONG_MAX;
 loop->timeout_where = NULL;
+loop->immediate_wake = false;
 free(pollfds);
 free(wevents);
 
@@ -417,6 +439,7 @@ poll_loop(void)
 loop = xzalloc(sizeof *loop);
 

[ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()

2021-07-12 Thread anton . ivanov
From: Anton Ivanov 

time_poll() makes an excessive number of time_msec() calls
which incur a performance penalty.

1. Avoid time_msec() call for timeout calculation when time_poll()
is asked to skip poll()

2. Reuse the time_msec() result from deadline calculation for
last_wakeup and timeout calculation.

Signed-off-by: Anton Ivanov 
---
 lib/timeval.c | 36 +---
 1 file changed, 21 insertions(+), 15 deletions(-)

diff --git a/lib/timeval.c b/lib/timeval.c
index c6ac87376..64ab22e05 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -287,7 +287,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
   long long int timeout_when, int *elapsed)
 {
 long long int *last_wakeup = last_wakeup_get();
-long long int start;
+long long int start, now;
 bool quiescent;
 int retval = 0;
 
@@ -297,28 +297,31 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
 if (*last_wakeup && !thread_is_pmd()) {
 log_poll_interval(*last_wakeup);
 }
-start = time_msec();
+now = start = time_msec();
 
 timeout_when = MIN(timeout_when, deadline);
 quiescent = ovsrcu_is_quiescent();
 
 for (;;) {
-long long int now = time_msec();
 int time_left;
 
-if (now >= timeout_when) {
+if (n_pollfds == 0) {
 time_left = 0;
-} else if ((unsigned long long int) timeout_when - now > INT_MAX) {
-time_left = INT_MAX;
 } else {
-time_left = timeout_when - now;
-}
-
-if (!quiescent) {
-if (!time_left) {
-ovsrcu_quiesce();
+if (now >= timeout_when) {
+time_left = 0;
+} else if ((unsigned long long int) timeout_when - now > INT_MAX) {
+time_left = INT_MAX;
 } else {
-ovsrcu_quiesce_start();
+time_left = timeout_when - now;
+}
+
+if (!quiescent) {
+if (!time_left) {
+ovsrcu_quiesce();
+} else {
+ovsrcu_quiesce_start();
+}
 }
 }
 
@@ -329,6 +332,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
  */
 if (n_pollfds != 0) {
 retval = poll(pollfds, n_pollfds, time_left);
+} else {
+retval = 0;
 }
 if (retval < 0) {
 retval = -errno;
@@ -355,7 +360,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
 ovsrcu_quiesce_end();
 }
 
-if (deadline <= time_msec()) {
+now = time_msec();
+if (deadline <= now) {
 #ifndef _WIN32
 fatal_signal_handler(SIGALRM);
 #else
@@ -372,7 +378,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE 
*handles OVS_UNUSED,
 break;
 }
 }
-*last_wakeup = time_msec();
+*last_wakeup = now;
 refresh_rusage();
 *elapsed = *last_wakeup - start;
 return retval;
-- 
2.20.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Aaron Conole
Aaron Conole  writes:

> Ilya Maximets  writes:
>
>> On 7/12/21 5:10 PM, David Marchand wrote:
>>> On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets  wrote:
>> ovsrobot has issues with reporting the status right now, but this
>> patch fails the build in GHA:
>>   https://github.com/ovsrobot/ovs/actions/runs/1021787643
>
> Thanks for linking on results.
>
> I've spot-checked a bunch of the failing builds, and found 2 fixable code 
> issues.
> A few of the CI run's I can't find/explain the error, but I don't know of 
> a good
> way to "jump to the error" line, am I missing a trick, or is scrolling 
> the whole
> compiler output and checking errors the best method?

 typing 'error:' in the 'Search logs' field, usually gets you
 to the actual error faster, but, unfortunately, scrolling is
 the most reliable option.
>>> 
>>> GHA ui jumps at the last line of a failing step, but the problem is
>>> that, in OVS, we dump all logs which adds a lot of noise.
>>> 
>>> We could stop dumping them, since those logs are attached to the job
>>> as an archive.
>>> Like what is done in DPDK.
>>> http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3
>>> 
>>> WDYT?
>>
>> Yes, that is good thing to do.  We didn't do that because of
>> Travis CI, where we have no artifacts collected.
>
> +1 - we should bend over backwards to make things easier on Travis CI to
> the detriment of other platforms.

And by this, I mean the opposite - we should *NOT* bend over backwards
to make things easier on Travis CI.

>> But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution.
>>
>> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Aaron Conole
Ilya Maximets  writes:

> On 7/12/21 5:10 PM, David Marchand wrote:
>> On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets  wrote:
> ovsrobot has issues with reporting the status right now, but this
> patch fails the build in GHA:
>   https://github.com/ovsrobot/ovs/actions/runs/1021787643

 Thanks for linking on results.

 I've spot-checked a bunch of the failing builds, and found 2 fixable code 
 issues.
 A few of the CI run's I can't find/explain the error, but I don't know of 
 a good
 way to "jump to the error" line, am I missing a trick, or is scrolling the 
 whole
 compiler output and checking errors the best method?
>>>
>>> typing 'error:' in the 'Search logs' field, usually gets you
>>> to the actual error faster, but, unfortunately, scrolling is
>>> the most reliable option.
>> 
>> GHA ui jumps at the last line of a failing step, but the problem is
>> that, in OVS, we dump all logs which adds a lot of noise.
>> 
>> We could stop dumping them, since those logs are attached to the job
>> as an archive.
>> Like what is done in DPDK.
>> http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3
>> 
>> WDYT?
>
> Yes, that is good thing to do.  We didn't do that because of
> Travis CI, where we have no artifacts collected.

+1 - we should bend over backwards to make things easier on Travis CI to
the detriment of other platforms.

> But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution.
>
> Best regards, Ilya Maximets.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: Add stopwatch to measure OF update duration.

2021-07-12 Thread Mark Michelson

I pushed this to master.

On 7/8/21 4:27 PM, Mark Michelson wrote:

Acked-by: Mark Michelson 

On 7/6/21 10:41 AM, Dumitru Ceara wrote:

Also, shorten the CONTROLLER_LOOP_STOPWATCH_NAME name as there is a bug
in lib/stopwatch.c which fails to report an error when the stopwatch
name is longer than 32 characters.  CONTROLLER_LOOP_STOPWATCH_NAME was
getting very close to that and future commits might mimic the long name
and happen to go over the limit.

Signed-off-by: Dumitru Ceara 
---
Note: The OVS lib/stopwatch.c implementation should also be fixed to
report an error (or even assert) if the name supplied to
stopwatch_create() is longer than 32 characters.  But that's out of the
scope of this patch.
---
  controller/ovn-controller.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 9050380f3..6a9c25f28 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -93,7 +93,8 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
  #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
-#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-flow-generation"
+#define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
+#define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
  #define OVS_NB_CFG_NAME "ovn-nb-cfg"
@@ -2845,6 +2846,7 @@ main(int argc, char *argv[])
  update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
  stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
+    stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
  /* Define inc-proc-engine nodes. */
  ENGINE_NODE_CUSTOM_DATA(ct_zones, "ct_zones");
@@ -3292,6 +3294,8 @@ main(int argc, char *argv[])
  pflow_output_data = 
engine_get_data(_pflow_output);

  if (lflow_output_data && pflow_output_data &&
  ct_zones_data) {
+    stopwatch_start(OFCTRL_PUT_STOPWATCH_NAME,
+    time_msec());
  ofctrl_put(_output_data->flow_table,
 _output_data->flow_table,
 _zones_data->pending,
@@ -3299,6 +3303,7 @@ main(int argc, char *argv[])
 ofctrl_seqno_get_req_cfg(),
 
engine_node_changed(_lflow_output),
 
engine_node_changed(_pflow_output));
+    stopwatch_stop(OFCTRL_PUT_STOPWATCH_NAME, 
time_msec());

  }
  ofctrl_seqno_run(ofctrl_get_cur_cfg());
  if_status_mgr_run(if_mgr, binding_data, 
!ovnsb_idl_txn,






___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] northd-ddlog: Fix IP family match for DNAT flows.

2021-07-12 Thread Mark Michelson

I pushed this to master and branch-21.06.

On 7/8/21 4:19 PM, Mark Michelson wrote:

Acked-by: Mark Michelson 

On 7/7/21 10:09 AM, Dumitru Ceara wrote:

This was causing some IPv6 system tests to fail when run with
ovn-northd-ddlog.

Also fix cleanup of the northd process in system-ovn.at.  A few tests
were trying to stop ovn-northd (C version) even when run with
ovn-northd-ddlog.

Signed-off-by: Dumitru Ceara 
---
Note: There are some system-ovn.at tests that still fail with
ovn-northd-ddlog  and need more investigation to see if it's a
test issue or a real bug.
---
  northd/ovn_northd.dl |  2 +-
  tests/system-ovn.at  | 24 
  2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl
index e27c944a0..dea13a91f 100644
--- a/northd/ovn_northd.dl
+++ b/northd/ovn_northd.dl
@@ -5687,7 +5687,7 @@ for (r in (._uuid = lr_uuid,
 } in
  if (nat.nat.__type == "dnat" or nat.nat.__type == 
"dnat_and_snat") {

  None = l3dgw_port in
-    var __match = "ip && ip4.dst == 
${nat.nat.external_ip}" in
+    var __match = "ip && ${ipX}.dst == 
${nat.nat.external_ip}" in
  (var ext_ip_match, var ext_flow) = 
lrouter_nat_add_ext_ip_match(

  r, nat, __match, ipX, true, mask) in
  {
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index f42cfc0db..c01fde131 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -1348,7 +1348,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -3121,7 +3121,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -4577,7 +4577,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -4663,7 +4663,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -4903,7 +4903,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -5287,7 +5287,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -5717,7 +5717,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -5879,7 +5879,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -5928,7 +5928,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
@@ -6021,7 +6021,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
@@ -6083,7 +6083,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d
@@ -6234,7 +6234,7 @@ as ovn-nb
  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
  as northd
-OVS_APP_EXIT_AND_WAIT([ovn-northd])
+OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE])
  as
  OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d





___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 06/12] dpif-netdev: Add packet count and core id paramters for study

2021-07-12 Thread Eelco Chaudron


On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> This commit introduces additional command line paramter
> for mfex study function. If user provides additional packet out
> it is used in study to compare minimum packets which must be processed
> else a default value is choosen.
> Also introduces a third paramter for choosing a particular pmd core.
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3
>
> Signed-off-by: Kumar Amber 
>
> ---
> v9:
> - fix review comments Flavio
> v7:
> - change the command paramters for core_id and study_pkt_cnt
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - introucde pmd core id parameter
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  39 -
>  lib/dpif-netdev-extract-study.c  |  26 +-
>  lib/dpif-netdev-private-extract.h|   9 ++
>  lib/dpif-netdev.c| 121 +--
>  4 files changed, 181 insertions(+), 14 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 4db416ddd..c31067c51 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -284,12 +284,45 @@ command also shows whether the CPU supports each 
> implementation ::
>
>  An implementation can be selected manually by the following command ::
>
> -$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name]
> + [study_cnt]
>
> -Also user can select the study implementation which studies the traffic for
> +The above command has two optional parameters: study_cnt and core_id.
> +The core_id set a particular miniflow extract function to a specific

The core_id sets

> +pmd thread on the core. Third parameter study_cnt, which is specific

The third parameter

> +to study and ignored by other implementations, means how many packets
> +are needed to choose the best implementation.
> +
> +The user can select the study implementation which studies the traffic for
>  a specific number of packets by applying all available implementaions of

implementations

>  miniflow extract and than chooses the one with most optimal result for that

and then chooses ... with the most optimal

> -traffic pattern.
> +traffic pattern. The user can optionally provide an packet count [study_cnt]
> +parameter which is the minimum number of packets that OVS must study before
> +choosing an optimal implementation. If no packet count is provided, then the
> +default value, 128 is chosen. Also, as there is no synchronization point
> +between threads, one PMD thread might still be running a previous round,
> +and can now decide on earlier data.
> +
> +The per packet count is a global value, and parallel `study()` executions 
> with

Should study() just be study?

> +differing packet counts will use the most recent count value provided by 
> usser.
> +
> +Study can be selected with packet count by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024
> +
> +Study can be selected with packet count and explicit PMD selection
> +by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024
> +
> +In the above command the last parameter is the CORE ID of the PMD
> +thread and this can also be used to explicitly set the miniflow
> +extraction function pointer on different PMD threads.
> +
> +Scalar can be selected on core 3 by the following command where
> +study count can be put as any arbitary number or left blank::

arbitrary

> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar
>
>  Miniflow Extract Validation
>  ~~~
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> index a19759bd9..2dc3faf83 100644
> --- a/lib/dpif-netdev-extract-study.c
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -25,7 +25,7 @@
>
>  VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
>
> -static uint32_t mfex_study_pkts_count = 0;
> +static uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT;
>
>  /* Struct to hold miniflow study stats. */
>  struct study_stats {
> @@ -48,6 +48,28 @@ mfex_study_get_study_stats_ptr(void)
>  return stats;
>  }
>
> +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count,
> +const char *name)

This needs to be int, not uint32_t as you return a negative value on error.

> +{
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +dpif_mfex_impl_info_get(_funcs);
> +
> +/* If the packet count is set and implementation called is study then
> + * set packet counter to requested number else set the packet counter
> + * to default number.
> + */
> +if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) &&
> +(pkt_cmp_count != 0)) {
> +
> +atomic_uintptr_t *study_pck_cnt = (void *)_study_pkts_count;
> +

Re: [ovs-dev] [PATCH v2 ovn 2/2] controller: incrementally create ras port_binding list

2021-07-12 Thread Mark Michelson

Acked-by: Mark Michelson 

On 7/10/21 6:13 AM, Lorenzo Bianconi wrote:

Incrementally manage local_active_ports_ras map for interfaces
where periodic router advertisement has been enabled. This patch
allows to avoid looping over all local interfaces to check if
periodic RA is running on the current port binding.

Signed-off-by: Lorenzo Bianconi 
---
  controller/binding.c|  7 +++
  controller/binding.h|  1 +
  controller/ovn-controller.c | 10 +++-
  controller/pinctrl.c| 93 -
  controller/pinctrl.h|  3 +-
  5 files changed, 69 insertions(+), 45 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index f87eaec0c..b1b1e3b84 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
  update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
  b_ctx_out->local_active_ports_ipv6_pd,
  "ipv6_prefix_delegation");
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
  
  enum en_lport_type lport_type = get_lport_type(pb);
  
@@ -2514,6 +2517,10 @@ delete_done:

  b_ctx_out->local_active_ports_ipv6_pd,
  "ipv6_prefix_delegation");
  
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,

+b_ctx_out->local_active_ports_ras,
+"ipv6_ra_send_periodic");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  struct binding_lport *b_lport =

diff --git a/controller/binding.h b/controller/binding.h
index 60ad49da0..77197e742 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *);
  struct binding_ctx_out {
  struct hmap *local_datapaths;
  struct shash *local_active_ports_ipv6_pd;
+struct shash *local_active_ports_ras;
  struct local_binding_data *lbinding_data;
  
  /* sset of (potential) local lports. */

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 2bd402ab2..db2d82035 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1030,6 +1030,7 @@ struct ed_type_runtime_data {
  struct hmap tracked_dp_bindings;
  
  struct shash local_active_ports_ipv6_pd;

+struct shash local_active_ports_ras;
  };
  
  /* struct ed_type_runtime_data has the below members for tracking the

@@ -1118,6 +1119,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
  smap_init(>local_iface_ids);
  local_binding_data_init(>lbinding_data);
  shash_init(>local_active_ports_ipv6_pd);
+shash_init(>local_active_ports_ras);
  
  /* Init the tracked data. */

  hmap_init(>tracked_dp_bindings);
@@ -1144,6 +1146,7 @@ en_runtime_data_cleanup(void *data)
  }
  hmap_destroy(_data->local_datapaths);
  shash_destroy(_data->local_active_ports_ipv6_pd);
+shash_destroy(_data->local_active_ports_ras);
  local_binding_data_destroy(_data->lbinding_data);
  }
  
@@ -1224,6 +1227,8 @@ init_binding_ctx(struct engine_node *node,

  b_ctx_out->local_datapaths = _data->local_datapaths;
  b_ctx_out->local_active_ports_ipv6_pd =
  _data->local_active_ports_ipv6_pd;
+b_ctx_out->local_active_ports_ras =
+_data->local_active_ports_ras;
  b_ctx_out->local_lports = _data->local_lports;
  b_ctx_out->local_lports_changed = false;
  b_ctx_out->related_lports = _data->related_lports;
@@ -1242,6 +1247,7 @@ en_runtime_data_run(struct engine_node *node, void *data)
  struct ed_type_runtime_data *rt_data = data;
  struct hmap *local_datapaths = _data->local_datapaths;
  struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd;
+struct shash *local_active_ras = _data->local_active_ports_ras;
  struct sset *local_lports = _data->local_lports;
  struct sset *active_tunnels = _data->active_tunnels;
  
@@ -1258,6 +1264,7 @@ en_runtime_data_run(struct engine_node *node, void *data)

  }
  hmap_clear(local_datapaths);
  shash_clear(local_active_ipv6_pd);
+shash_clear(local_active_ras);
  local_binding_data_destroy(_data->lbinding_data);
  sset_destroy(local_lports);
  related_lports_destroy(_data->related_lports);
@@ -3272,7 +3279,8 @@ main(int argc, char *argv[])
  br_int, chassis,
  _data->local_datapaths,
  _data->active_tunnels,
-_data->local_active_ports_ipv6_pd);
+_data->local_active_ports_ipv6_pd,
+   

Re: [ovs-dev] [PATCH v2 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list

2021-07-12 Thread Mark Michelson

For the approach,

Acked-by: Mark Michelson 

I have one final suggestion down below.

On 7/10/21 6:13 AM, Lorenzo Bianconi wrote:

Incrementally manage local_active_ports_ipv6_pd map for interfaces
where IPv6 prefix-delegation has been enabled. This patch allows to
avoid looping over all local interfaces to check if prefix-delegation
is running on the current port binding.

Signed-off-by: Lorenzo Bianconi 
---
  controller/binding.c|  32 +++
  controller/binding.h|   1 +
  controller/ovn-controller.c |  11 +++-
  controller/ovn-controller.h |   6 ++
  controller/pinctrl.c| 107 +---
  controller/pinctrl.h|   4 +-
  6 files changed, 103 insertions(+), 58 deletions(-)

diff --git a/controller/binding.c b/controller/binding.c
index 594babc98..f87eaec0c 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb,
  }
  }
  
+static void

+update_active_pb_ras_pd(const struct sbrec_port_binding *pb,
+struct hmap *local_datapaths,
+struct shash *map, const char *conf)
+{
+const char *ras_pd_conf = smap_get(>options, conf);


Since ras_pd_conf being "false" is the same as if it did not exist in 
the configuration, you could change this to:


bool ras_pd_conf = smap_get_bool(>options, conf, false);

Then you can just do boolean comparisons on ras_pd_conf instead of 
string comparisons.



+struct shash_node *iter = shash_find(map, pb->logical_port);
+
+if (iter && (!ras_pd_conf || !strcmp(ras_pd_conf, "false"))) {
+shash_delete(map, iter);
+return;
+}
+struct pb_ld_binding *ras_pd = NULL;
+if (!iter && ras_pd_conf && !strcmp(ras_pd_conf, "true")) {
+ras_pd = xzalloc(sizeof *ras_pd);
+ras_pd->pb = pb;
+shash_add(map, pb->logical_port, ras_pd);
+}
+if (ras_pd) {
+ras_pd->ld = get_local_datapath(local_datapaths,
+pb->datapath->tunnel_key);
+}
+}
+
  /* Corresponds to each Port_Binding.type. */
  enum en_lport_type {
  LP_UNKNOWN,
@@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct 
binding_ctx_out *b_ctx_out)
  const struct sbrec_port_binding *pb;
  SBREC_PORT_BINDING_TABLE_FOR_EACH (pb,
 b_ctx_in->port_binding_table) {
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,
+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  switch (lport_type) {

@@ -2482,6 +2510,10 @@ delete_done:
  continue;
  }
  
+update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths,

+b_ctx_out->local_active_ports_ipv6_pd,
+"ipv6_prefix_delegation");
+
  enum en_lport_type lport_type = get_lport_type(pb);
  
  struct binding_lport *b_lport =

diff --git a/controller/binding.h b/controller/binding.h
index a08011ae2..60ad49da0 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *);
  
  struct binding_ctx_out {

  struct hmap *local_datapaths;
+struct shash *local_active_ports_ipv6_pd;
  struct local_binding_data *lbinding_data;
  
  /* sset of (potential) local lports. */

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 9050380f3..2bd402ab2 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1028,6 +1028,8 @@ struct ed_type_runtime_data {
  bool tracked;
  bool local_lports_changed;
  struct hmap tracked_dp_bindings;
+
+struct shash local_active_ports_ipv6_pd;
  };
  
  /* struct ed_type_runtime_data has the below members for tracking the

@@ -1115,6 +1117,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED,
  sset_init(>egress_ifaces);
  smap_init(>local_iface_ids);
  local_binding_data_init(>lbinding_data);
+shash_init(>local_active_ports_ipv6_pd);
  
  /* Init the tracked data. */

  hmap_init(>tracked_dp_bindings);
@@ -1140,6 +1143,7 @@ en_runtime_data_cleanup(void *data)
  free(cur_node);
  }
  hmap_destroy(_data->local_datapaths);
+shash_destroy(_data->local_active_ports_ipv6_pd);
  local_binding_data_destroy(_data->lbinding_data);
  }
  
@@ -1218,6 +1222,8 @@ init_binding_ctx(struct engine_node *node,

  b_ctx_in->ovs_table = ovs_table;
  
  b_ctx_out->local_datapaths = _data->local_datapaths;

+b_ctx_out->local_active_ports_ipv6_pd =
+_data->local_active_ports_ipv6_pd;
  b_ctx_out->local_lports = _data->local_lports;
  b_ctx_out->local_lports_changed = false;
  b_ctx_out->related_lports = _data->related_lports;
@@ 

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Ilya Maximets
On 7/12/21 4:57 PM, Van Haaren, Harry wrote:
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Monday, July 12, 2021 3:43 PM
>> To: Van Haaren, Harry ; Ilya Maximets
>> ; Amber, Kumar ; ovs-
>> d...@openvswitch.org
>> Cc: f...@sysclose.org; echau...@redhat.com; Ferriter, Cian
>> ; Stokes, Ian 
>> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer 
>> for
>> miniflow extract
>>
>> On 7/12/21 4:02 PM, Van Haaren, Harry wrote:
 -Original Message-
 From: Ilya Maximets 
 Sent: Monday, July 12, 2021 2:25 PM
 To: Amber, Kumar ; ovs-dev@openvswitch.org
 Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren,
 Harry ; Ferriter, Cian 
 ;
 Stokes, Ian 
 Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer 
 for
 miniflow extract

 On 7/12/21 7:51 AM, kumar Amber wrote:
> From: Kumar Amber 
>
> This patch introduces the MFEX function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
>
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
>
> $ovs-appctl dpif-netdev/miniflow-parser-get
>
> Similarly an user can set the miniflow implementation by the following
> command :
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
>
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
>
> ---
> v9:
> - fix review comments from Flavio
> v7:
> - fix review comments(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add enum to hold mfex indexes
> - add new get and set implemenatations
> - add Atomic set and get
> ---

 ovsrobot has issues with reporting the status right now, but this
 patch fails the build in GHA:
   https://github.com/ovsrobot/ovs/actions/runs/1021787643
>>>
>>> Thanks for linking on results.
>>>
>>> I've spot-checked a bunch of the failing builds, and found 2 fixable code 
>>> issues.
>>> A few of the CI run's I can't find/explain the error, but I don't know of a 
>>> good
>>> way to "jump to the error" line, am I missing a trick, or is scrolling the 
>>> whole
>>> compiler output and checking errors the best method?
>>
>> typing 'error:' in the 'Search logs' field, usually gets you
>> to the actual error faster, but, unfortunately, scrolling is
>> the most reliable option.
> 
> Okay, thanks.
> 
> 
>>> ISSUES:
>>> #1 : OVS Requires Mutex issue (Linux clang test dpdk build)
>>> 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared
>> identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'?
>>> 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex);
>>>
>>> #2 : Unused Argument (As from mailing list review comment too, linux gcc 
>>> dpdk --
>> enable-shared)
>>> 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ 
>>> [-Werror=unused-
>> parameter]
>>> 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc,
>>>
>>> #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot 
>>> explain this?)
>>> make: *** [distcheck] Error 1
>>> 4490Makefile:5298: recipe for target 'distcheck' failed
>>> 4491+ cat '*/_build/sub/tests/testsuite.log'
>>> 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory
>>> 4493Error: Process completed with exit code 1.>
>>> SOLUTIONS:
>>> #1, likely to forward-decl the "dp_netdev_mutex" to make it available
>>> in the extract header file, and remove the "static" keyword so its no 
>>> longer limited
>>> to the dpif-netdev.c compilation unit.
>>>
>>> #2 is a simple OVS_UNUSED as Eelco suggested during review.
>>>
>>> #3, I'm not sure where the DistCheck issue arise from, it seems to be 
>>> missing
>> directories
>>> during the test run? Input appreciated, as pushing & hoping tends to be a 
>>> tiresome
>>> and long process.
>>
>> This is just a result of the previous build failure.  Build
>> never reached the testsuite phase, so there are no testsuite
>> logs there.  You should not see this problem once build is
>> fixed.
> 
> Aha, good to know. Then a respin with the fixes for the above issues is
> our next step, will arrive on the mailing list soon.

If you have a github account it might be good to push patches
one-by-one there to be sure that everything is fine before
sending to the mail list to avoid re-spins due to build issues.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Ilya Maximets
On 7/12/21 5:10 PM, David Marchand wrote:
> On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets  wrote:
 ovsrobot has issues with reporting the status right now, but this
 patch fails the build in GHA:
   https://github.com/ovsrobot/ovs/actions/runs/1021787643
>>>
>>> Thanks for linking on results.
>>>
>>> I've spot-checked a bunch of the failing builds, and found 2 fixable code 
>>> issues.
>>> A few of the CI run's I can't find/explain the error, but I don't know of a 
>>> good
>>> way to "jump to the error" line, am I missing a trick, or is scrolling the 
>>> whole
>>> compiler output and checking errors the best method?
>>
>> typing 'error:' in the 'Search logs' field, usually gets you
>> to the actual error faster, but, unfortunately, scrolling is
>> the most reliable option.
> 
> GHA ui jumps at the last line of a failing step, but the problem is
> that, in OVS, we dump all logs which adds a lot of noise.
> 
> We could stop dumping them, since those logs are attached to the job
> as an archive.
> Like what is done in DPDK.
> http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3
> 
> WDYT?

Yes, that is good thing to do.  We didn't do that because of
Travis CI, where we have no artifacts collected.
But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn v3] controller: Avoid unnecessary load balancer flow processing.

2021-07-12 Thread Mark Michelson
Since this addressed Han's findings in v1 and I had already ACKed it, I 
pushed this change to the main branch. Thanks, Dumitru and Han!


On 7/12/21 10:14 AM, Dumitru Ceara wrote:

Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
sequence of events happens:

1. The Southbound Load_Balancer record is updated.
2. The Southbound Datapath_Binding records on which the Load_Balancer is
applied are updated.
3. Southbound ovsdb-server sends updates about the Load_Balancer and
Datapath_Binding records to ovn-controller.
4. The IDL layer in ovn-controller processes the updates at #3, but
because of the SB schema references between tables [0] all logical
flows referencing the updated Datapath_Binding are marked as
"updated".  The same is true for Logical_DP_Group records
referencing the Datapath_Binding, and also for all logical flows
pointing to the new "updated" datapath groups.
5. ovn-controller ends up recomputing (removing/readding) all flows for
all these tracked updates.

 From the SB Schema:
 "Datapath_Binding": {
 "columns": {
 [...]
 "load_balancers": {"type": {"key": {"type": "uuid",
"refTable": "Load_Balancer",
"refType": "weak"},
 "min": 0,
 "max": "unlimited"}},
 [...]
 "Load_Balancer": {
 "columns": {
 "datapaths": {
 [...]
 "type": {"key": {"type": "uuid",
  "refTable": "Datapath_Binding"},
  "min": 0, "max": "unlimited"}},
 [...]
 "Logical_DP_Group": {
 "columns": {
 "datapaths":
 {"type": {"key": {"type": "uuid",
   "refTable": "Datapath_Binding",
   "refType": "weak"},
   "min": 0, "max": "unlimited"}}},
 [...]
 "Logical_Flow": {
 "columns": {
 "logical_datapath":
 {"type": {"key": {"type": "uuid",
   "refTable": "Datapath_Binding"},
   "min": 0, "max": 1}},
 "logical_dp_group":
 {"type": {"key": {"type": "uuid",
   "refTable": "Logical_DP_Group"},

In order to avoid this unnecessary Logical_Flow notification storm we
now remove the explicit reference from Datapath_Binding to
Load_Balancer and instead store raw UUIDs.

This means that on the ovn-controller side we need to perform a
Load_Balancer table lookup by UUID whenever a new datapath is added,
but that doesn't happen too often and the cost of the lookup is
negligible compared to the huge cost of processing the unnecessary
logical flow updates.

This change is backwards compatible because the contents stored in the
database are not changed, just that the schema constraints are relaxed a
bit.

Some performance measurements, on a scale test deployment simulating an
ovn-kubernetes deployment with 120 nodes and a large load balancer
with 16K VIPs associated to each node's logical switch, the event
processing loop time in ovn-controller, when adding a new VIP, is
reduced from ~39 seconds to ~8 seconds.

There's no need to change the northd DDlog implementation.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3: Update SB schema version.
v2: Address Han's comments and add Mark's ack.
---
  controller/lflow.c  |  6 --
  northd/ovn-northd.c | 14 ++
  ovn-sb.ovsschema|  8 +++-
  3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 60aa011ff..c58c4f25c 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1744,8 +1744,10 @@ lflow_processing_end:
  /* Add load balancer hairpin flows if the datapath has any load balancers
   * associated. */
  for (size_t i = 0; i < dp->n_load_balancers; i++) {
-consider_lb_hairpin_flows(dp->load_balancers[i],
-  l_ctx_in->local_datapaths,
+const struct sbrec_load_balancer *lb =
+sbrec_load_balancer_table_get_for_uuid(l_ctx_in->lb_table,
+   >load_balancers[i]);
+consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
l_ctx_out->flow_table);
  }
  
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c

index 562dc62b2..999c3f482 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -3635,19 +3635,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap 
*datapaths,
  

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread David Marchand
On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets  wrote:
> >> ovsrobot has issues with reporting the status right now, but this
> >> patch fails the build in GHA:
> >>   https://github.com/ovsrobot/ovs/actions/runs/1021787643
> >
> > Thanks for linking on results.
> >
> > I've spot-checked a bunch of the failing builds, and found 2 fixable code 
> > issues.
> > A few of the CI run's I can't find/explain the error, but I don't know of a 
> > good
> > way to "jump to the error" line, am I missing a trick, or is scrolling the 
> > whole
> > compiler output and checking errors the best method?
>
> typing 'error:' in the 'Search logs' field, usually gets you
> to the actual error faster, but, unfortunately, scrolling is
> the most reliable option.

GHA ui jumps at the last line of a failing step, but the problem is
that, in OVS, we dump all logs which adds a lot of noise.

We could stop dumping them, since those logs are attached to the job
as an archive.
Like what is done in DPDK.
http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3

WDYT?


-- 
David Marchand

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 2/3] dpif-netdev: Fix offloads of modified flows

2021-07-12 Thread Eli Britstein
Association of a mark to a flow is done as part of its offload handling,
in the offloading thread. However, the PMD thread specifies whether an
offload request is an "add" or "modify" by the association of a mark to
the flow.
This is exposed to a race condition. A flow might be created with
actions that cannot be fully offloaded, for example flooding (before MAC
learning), and later modified to have actions that can be fully
offloaded. If the two requests are queued before the offload thread
handling, they are both marked as "add". When the offload thread handles
them, the first request is partially offloaded, and the second one is
ignored as the flow is already considered as offloaded.

Fix it by specifying add/modify of an offload request by the actual flow
state change, without relying on the mark.

Fixes: 3c7330ebf036 ("netdev-offload-dpdk: Support offload of output action.")
Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---
 lib/dpif-netdev.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 21b0e025d..9b2b8d6d9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2451,7 +2451,8 @@ static void
 queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
   struct dp_netdev_flow *flow, struct match *match,
   const struct nlattr *actions, size_t actions_len,
-  odp_port_t orig_in_port)
+  odp_port_t orig_in_port,
+  const struct dp_netdev_actions *old_actions)
 {
 struct dp_flow_offload_item *offload;
 int op;
@@ -2467,11 +2468,9 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
 ovsthread_once_done(_thread_once);
 }
 
-if (flow->mark != INVALID_FLOW_MARK) {
-op = DP_NETDEV_FLOW_OFFLOAD_OP_MOD;
-} else {
-op = DP_NETDEV_FLOW_OFFLOAD_OP_ADD;
-}
+op = old_actions
+? DP_NETDEV_FLOW_OFFLOAD_OP_MOD
+: DP_NETDEV_FLOW_OFFLOAD_OP_ADD;
 offload = dp_netdev_alloc_flow_offload(pmd, flow, op);
 offload->match = *match;
 offload->actions = xmalloc(actions_len);
@@ -3323,7 +3322,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 dp_netdev_flow_hash(>ufid));
 
 queue_netdev_flow_put(pmd, flow, match, actions, actions_len,
-  orig_in_port);
+  orig_in_port, NULL);
 
 if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
 struct ds ds = DS_EMPTY_INITIALIZER;
@@ -3410,7 +3409,8 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
 ovsrcu_set(_flow->actions, new_actions);
 
 queue_netdev_flow_put(pmd, netdev_flow, match,
-  put->actions, put->actions_len, ODPP_NONE);
+  put->actions, put->actions_len, ODPP_NONE,
+  old_actions);
 
 if (stats) {
 get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL);
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 0/3] dpif-netdev offload transitions

2021-07-12 Thread Eli Britstein
This patch-set improves offloads transitions behavior.

Patch #1 avoids flushing PMD offloads unnecessarily.
Patch #2 fixes a race condition with flow modifications.
Patch #3 improves debuggability of flow modifications.

v2-v1: Rebase.

Travis:
v1: https://travis-ci.org/github/elibritstein/OVS/builds/767839987

GitHub Actions:
v1: https://github.com/elibritstein/OVS/actions/runs/769805954
- This run has encountered some internal GitHub problems.
- A previous good run, with the same code, only changed commit
  messages since:
https://github.com/elibritstein/OVS/actions/runs/70787
v2: https://github.com/elibritstein/OVS/actions/runs/1023045302

Eli Britstein (3):
  dpif-netdev: Do not flush PMD offloads on reload
  dpif-netdev: Fix offloads of modified flows
  dpif-netdev: Log flow modification in debug level

 lib/dpif-netdev.c | 130 ++
 1 file changed, 63 insertions(+), 67 deletions(-)

-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 3/3] dpif-netdev: Log flow modification in debug level

2021-07-12 Thread Eli Britstein
Log flow modifications to help debugging.

Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---
 lib/dpif-netdev.c | 101 +-
 1 file changed, 55 insertions(+), 46 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9b2b8d6d9..caed3e7f2 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2457,6 +2457,61 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd,
 struct dp_flow_offload_item *offload;
 int op;
 
+if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
+struct ds ds = DS_EMPTY_INITIALIZER;
+struct ofpbuf key_buf, mask_buf;
+struct odp_flow_key_parms odp_parms = {
+.flow = >flow,
+.mask = >wc.masks,
+.support = dp_netdev_support,
+};
+
+ofpbuf_init(_buf, 0);
+ofpbuf_init(_buf, 0);
+
+odp_flow_key_from_flow(_parms, _buf);
+odp_parms.key_buf = _buf;
+odp_flow_key_from_mask(_parms, _buf);
+
+if (old_actions) {
+ds_put_cstr(, "flow_mod: ");
+} else {
+ds_put_cstr(, "flow_add: ");
+}
+odp_format_ufid(>ufid, );
+ds_put_cstr(, " mega_");
+odp_format_ufid(>mega_ufid, );
+ds_put_cstr(, " ");
+odp_flow_format(key_buf.data, key_buf.size,
+mask_buf.data, mask_buf.size,
+NULL, , false);
+if (old_actions) {
+ds_put_cstr(, ", old_actions:");
+format_odp_actions(, old_actions->actions, old_actions->size,
+   NULL);
+}
+ds_put_cstr(, ", actions:");
+format_odp_actions(, actions, actions_len, NULL);
+
+VLOG_DBG("%s", ds_cstr());
+
+ofpbuf_uninit(_buf);
+ofpbuf_uninit(_buf);
+
+/* Add a printout of the actual match installed. */
+struct match m;
+ds_clear();
+ds_put_cstr(, "flow match: ");
+miniflow_expand(>cr.flow.mf, );
+miniflow_expand(>cr.mask->mf, );
+memset(_md, 0, sizeof m.tun_md);
+match_format(, NULL, , OFP_DEFAULT_PRIORITY);
+
+VLOG_DBG("%s", ds_cstr());
+
+ds_destroy();
+}
+
 if (!netdev_is_flow_api_enabled()) {
 return;
 }
@@ -3324,52 +3379,6 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 queue_netdev_flow_put(pmd, flow, match, actions, actions_len,
   orig_in_port, NULL);
 
-if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl {
-struct ds ds = DS_EMPTY_INITIALIZER;
-struct ofpbuf key_buf, mask_buf;
-struct odp_flow_key_parms odp_parms = {
-.flow = >flow,
-.mask = >wc.masks,
-.support = dp_netdev_support,
-};
-
-ofpbuf_init(_buf, 0);
-ofpbuf_init(_buf, 0);
-
-odp_flow_key_from_flow(_parms, _buf);
-odp_parms.key_buf = _buf;
-odp_flow_key_from_mask(_parms, _buf);
-
-ds_put_cstr(, "flow_add: ");
-odp_format_ufid(ufid, );
-ds_put_cstr(, " mega_");
-odp_format_ufid(>mega_ufid, );
-ds_put_cstr(, " ");
-odp_flow_format(key_buf.data, key_buf.size,
-mask_buf.data, mask_buf.size,
-NULL, , false);
-ds_put_cstr(, ", actions:");
-format_odp_actions(, actions, actions_len, NULL);
-
-VLOG_DBG("%s", ds_cstr());
-
-ofpbuf_uninit(_buf);
-ofpbuf_uninit(_buf);
-
-/* Add a printout of the actual match installed. */
-struct match m;
-ds_clear();
-ds_put_cstr(, "flow match: ");
-miniflow_expand(>cr.flow.mf, );
-miniflow_expand(>cr.mask->mf, );
-memset(_md, 0, sizeof m.tun_md);
-match_format(, NULL, , OFP_DEFAULT_PRIORITY);
-
-VLOG_DBG("%s", ds_cstr());
-
-ds_destroy();
-}
-
 return flow;
 }
 
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH V2 1/3] dpif-netdev: Do not flush PMD offloads on reload

2021-07-12 Thread Eli Britstein
Before flushing offloads of a removed port was supported by [1], it was
necessary to flush the 'marks'. In doing so, all offloads of the PMD are
removed, include the ones that are not related to the removed port and
that are not modified following this removal. As a result such flows are
evicted from being offloaded, and won't resume offloading.

As PMD offload flush is not necessary, avoid it.

[1] 62d1c28e9ce0 ("dpif-netdev: Flush offload rules upon port deletion.")

Signed-off-by: Eli Britstein 
Reviewed-by: Gaetan Rivet 
---
 lib/dpif-netdev.c | 13 -
 1 file changed, 13 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 610949f36..21b0e025d 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -2232,18 +2232,6 @@ mark_to_flow_disassociate(struct dp_netdev_pmd_thread 
*pmd,
 return ret;
 }
 
-static void
-flow_mark_flush(struct dp_netdev_pmd_thread *pmd)
-{
-struct dp_netdev_flow *flow;
-
-CMAP_FOR_EACH (flow, mark_node, _mark.mark_to_flow) {
-if (flow->pmd_id == pmd->core_id) {
-queue_netdev_flow_del(pmd, flow);
-}
-}
-}
-
 static struct dp_netdev_flow *
 mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd,
   const uint32_t mark)
@@ -4811,7 +4799,6 @@ reload_affected_pmds(struct dp_netdev *dp)
 
 CMAP_FOR_EACH (pmd, node, >poll_threads) {
 if (pmd->need_reload) {
-flow_mark_flush(pmd);
 dp_netdev_reload_pmd__(pmd);
 }
 }
-- 
2.28.0.2311.g225365fb51

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Van Haaren, Harry
> -Original Message-
> From: Ilya Maximets 
> Sent: Monday, July 12, 2021 3:43 PM
> To: Van Haaren, Harry ; Ilya Maximets
> ; Amber, Kumar ; ovs-
> d...@openvswitch.org
> Cc: f...@sysclose.org; echau...@redhat.com; Ferriter, Cian
> ; Stokes, Ian 
> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer for
> miniflow extract
> 
> On 7/12/21 4:02 PM, Van Haaren, Harry wrote:
> >> -Original Message-
> >> From: Ilya Maximets 
> >> Sent: Monday, July 12, 2021 2:25 PM
> >> To: Amber, Kumar ; ovs-dev@openvswitch.org
> >> Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren,
> >> Harry ; Ferriter, Cian 
> >> ;
> >> Stokes, Ian 
> >> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer 
> >> for
> >> miniflow extract
> >>
> >> On 7/12/21 7:51 AM, kumar Amber wrote:
> >>> From: Kumar Amber 
> >>>
> >>> This patch introduces the MFEX function pointers which allows
> >>> the user to switch between different miniflow extract implementations
> >>> which are provided by the OVS based on optimized ISA CPU.
> >>>
> >>> The user can query for the available minflow extract variants available
> >>> for that CPU by following commands:
> >>>
> >>> $ovs-appctl dpif-netdev/miniflow-parser-get
> >>>
> >>> Similarly an user can set the miniflow implementation by the following
> >>> command :
> >>>
> >>> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> >>>
> >>> This allows for more performance and flexibility to the user to choose
> >>> the miniflow implementation according to the needs.
> >>>
> >>> Signed-off-by: Kumar Amber 
> >>> Co-authored-by: Harry van Haaren 
> >>> Signed-off-by: Harry van Haaren 
> >>>
> >>> ---
> >>> v9:
> >>> - fix review comments from Flavio
> >>> v7:
> >>> - fix review comments(Eelco, Flavio)
> >>> v5:
> >>> - fix review comments(Ian, Flavio, Eelco)
> >>> - add enum to hold mfex indexes
> >>> - add new get and set implemenatations
> >>> - add Atomic set and get
> >>> ---
> >>
> >> ovsrobot has issues with reporting the status right now, but this
> >> patch fails the build in GHA:
> >>   https://github.com/ovsrobot/ovs/actions/runs/1021787643
> >
> > Thanks for linking on results.
> >
> > I've spot-checked a bunch of the failing builds, and found 2 fixable code 
> > issues.
> > A few of the CI run's I can't find/explain the error, but I don't know of a 
> > good
> > way to "jump to the error" line, am I missing a trick, or is scrolling the 
> > whole
> > compiler output and checking errors the best method?
> 
> typing 'error:' in the 'Search logs' field, usually gets you
> to the actual error faster, but, unfortunately, scrolling is
> the most reliable option.

Okay, thanks.


> > ISSUES:
> > #1 : OVS Requires Mutex issue (Linux clang test dpdk build)
> > 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared
> identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'?
> > 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex);
> >
> > #2 : Unused Argument (As from mailing list review comment too, linux gcc 
> > dpdk --
> enable-shared)
> > 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ 
> > [-Werror=unused-
> parameter]
> > 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc,
> >
> > #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot 
> > explain this?)
> > make: *** [distcheck] Error 1
> > 4490Makefile:5298: recipe for target 'distcheck' failed
> > 4491+ cat '*/_build/sub/tests/testsuite.log'
> > 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory
> > 4493Error: Process completed with exit code 1.>
> > SOLUTIONS:
> > #1, likely to forward-decl the "dp_netdev_mutex" to make it available
> > in the extract header file, and remove the "static" keyword so its no 
> > longer limited
> > to the dpif-netdev.c compilation unit.
> >
> > #2 is a simple OVS_UNUSED as Eelco suggested during review.
> >
> > #3, I'm not sure where the DistCheck issue arise from, it seems to be 
> > missing
> directories
> > during the test run? Input appreciated, as pushing & hoping tends to be a 
> > tiresome
> > and long process.
> 
> This is just a result of the previous build failure.  Build
> never reached the testsuite phase, so there are no testsuite
> logs there.  You should not see this problem once build is
> fixed.

Aha, good to know. Then a respin with the fixes for the above issues is
our next step, will arrive on the mailing list soon.


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Ilya Maximets
On 7/12/21 4:02 PM, Van Haaren, Harry wrote:
>> -Original Message-
>> From: Ilya Maximets 
>> Sent: Monday, July 12, 2021 2:25 PM
>> To: Amber, Kumar ; ovs-dev@openvswitch.org
>> Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren,
>> Harry ; Ferriter, Cian ;
>> Stokes, Ian 
>> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer 
>> for
>> miniflow extract
>>
>> On 7/12/21 7:51 AM, kumar Amber wrote:
>>> From: Kumar Amber 
>>>
>>> This patch introduces the MFEX function pointers which allows
>>> the user to switch between different miniflow extract implementations
>>> which are provided by the OVS based on optimized ISA CPU.
>>>
>>> The user can query for the available minflow extract variants available
>>> for that CPU by following commands:
>>>
>>> $ovs-appctl dpif-netdev/miniflow-parser-get
>>>
>>> Similarly an user can set the miniflow implementation by the following
>>> command :
>>>
>>> $ ovs-appctl dpif-netdev/miniflow-parser-set name
>>>
>>> This allows for more performance and flexibility to the user to choose
>>> the miniflow implementation according to the needs.
>>>
>>> Signed-off-by: Kumar Amber 
>>> Co-authored-by: Harry van Haaren 
>>> Signed-off-by: Harry van Haaren 
>>>
>>> ---
>>> v9:
>>> - fix review comments from Flavio
>>> v7:
>>> - fix review comments(Eelco, Flavio)
>>> v5:
>>> - fix review comments(Ian, Flavio, Eelco)
>>> - add enum to hold mfex indexes
>>> - add new get and set implemenatations
>>> - add Atomic set and get
>>> ---
>>
>> ovsrobot has issues with reporting the status right now, but this
>> patch fails the build in GHA:
>>   https://github.com/ovsrobot/ovs/actions/runs/1021787643
> 
> Thanks for linking on results.
> 
> I've spot-checked a bunch of the failing builds, and found 2 fixable code 
> issues.
> A few of the CI run's I can't find/explain the error, but I don't know of a 
> good
> way to "jump to the error" line, am I missing a trick, or is scrolling the 
> whole
> compiler output and checking errors the best method?

typing 'error:' in the 'Search logs' field, usually gets you
to the actual error faster, but, unfortunately, scrolling is
the most reliable option.

> 
> ISSUES:
> #1 : OVS Requires Mutex issue (Linux clang test dpdk build)
> 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared 
> identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? 
> 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex);
> 
> #2 : Unused Argument (As from mailing list review comment too, linux gcc dpdk 
> --enable-shared)
> 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ 
> [-Werror=unused-parameter] 
> 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc,
> 
> #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot 
> explain this?)
> make: *** [distcheck] Error 1 
> 4490Makefile:5298: recipe for target 'distcheck' failed 
> 4491+ cat '*/_build/sub/tests/testsuite.log' 
> 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory 
> 4493Error: Process completed with exit code 1.> 
> SOLUTIONS:
> #1, likely to forward-decl the "dp_netdev_mutex" to make it available
> in the extract header file, and remove the "static" keyword so its no longer 
> limited
> to the dpif-netdev.c compilation unit.
> 
> #2 is a simple OVS_UNUSED as Eelco suggested during review.
> 
> #3, I'm not sure where the DistCheck issue arise from, it seems to be missing 
> directories
> during the test run? Input appreciated, as pushing & hoping tends to be a 
> tiresome
> and long process.

This is just a result of the previous build failure.  Build
never reached the testsuite phase, so there are no testsuite
logs there.  You should not see this problem once build is
fixed.

> 
> 
>> Best regards, Ilya Maximets.
> 
> Regards, -Harry
> 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovn] controller: Avoid unnecessary load balancer flow processing.

2021-07-12 Thread Dumitru Ceara
On 7/12/21 10:11 AM, Dumitru Ceara wrote:
> On 7/9/21 6:11 PM, Han Zhou wrote:
>>> To avoid this potentially expensive table walk, we use the load_balancer
>>> uuids stored in the datapath record itself (it's probably best to see
>>> those as hints I guess).
>>>
>> Thanks for the explain. What you described is indeed a dependency between
>> lflow and sb_load_balancer because in lflow's compute/change handlers
>> sb_load_balancer data is required. (otherwise we would not need to call
>> sbrec_load_balancer_get_for_uuid().
>>
>> However, since this dependency is already captured in the I-P, it is just
>> easy for this use case. We should simply use
>> sbrec_load_balancer_table_get_for_uuid() instead, which takes struct
>> sbrec_load_balancer_table* as argument and we already have it in the
>> lflow_ctx_in.lb_table as the input to lflow engine node.
>>
> 
> You're right, it's simpler like this, thanks for pointing out the
> sbrec_*table_get_for_uuid() variant.
> 
> I sent a v2:
> 
> http://patchwork.ozlabs.org/project/ovn/list/?series=253029
> 

Sorry for the noise;  Ilya mentioned offline that I forgot to update the
schema version number, I sent a v3 taking care of that:

http://patchwork.ozlabs.org/project/ovn/list/?series=253094

Regards,
Dumitru

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn v3] controller: Avoid unnecessary load balancer flow processing.

2021-07-12 Thread Dumitru Ceara
Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
sequence of events happens:

1. The Southbound Load_Balancer record is updated.
2. The Southbound Datapath_Binding records on which the Load_Balancer is
   applied are updated.
3. Southbound ovsdb-server sends updates about the Load_Balancer and
   Datapath_Binding records to ovn-controller.
4. The IDL layer in ovn-controller processes the updates at #3, but
   because of the SB schema references between tables [0] all logical
   flows referencing the updated Datapath_Binding are marked as
   "updated".  The same is true for Logical_DP_Group records
   referencing the Datapath_Binding, and also for all logical flows
   pointing to the new "updated" datapath groups.
5. ovn-controller ends up recomputing (removing/readding) all flows for
   all these tracked updates.

>From the SB Schema:
"Datapath_Binding": {
"columns": {
[...]
"load_balancers": {"type": {"key": {"type": "uuid",
   "refTable": "Load_Balancer",
   "refType": "weak"},
"min": 0,
"max": "unlimited"}},
[...]
"Load_Balancer": {
"columns": {
"datapaths": {
[...]
"type": {"key": {"type": "uuid",
 "refTable": "Datapath_Binding"},
 "min": 0, "max": "unlimited"}},
[...]
"Logical_DP_Group": {
"columns": {
"datapaths":
{"type": {"key": {"type": "uuid",
  "refTable": "Datapath_Binding",
  "refType": "weak"},
  "min": 0, "max": "unlimited"}}},
[...]
"Logical_Flow": {
"columns": {
"logical_datapath":
{"type": {"key": {"type": "uuid",
  "refTable": "Datapath_Binding"},
  "min": 0, "max": 1}},
"logical_dp_group":
{"type": {"key": {"type": "uuid",
  "refTable": "Logical_DP_Group"},

In order to avoid this unnecessary Logical_Flow notification storm we
now remove the explicit reference from Datapath_Binding to
Load_Balancer and instead store raw UUIDs.

This means that on the ovn-controller side we need to perform a
Load_Balancer table lookup by UUID whenever a new datapath is added,
but that doesn't happen too often and the cost of the lookup is
negligible compared to the huge cost of processing the unnecessary
logical flow updates.

This change is backwards compatible because the contents stored in the
database are not changed, just that the schema constraints are relaxed a
bit.

Some performance measurements, on a scale test deployment simulating an
ovn-kubernetes deployment with 120 nodes and a large load balancer
with 16K VIPs associated to each node's logical switch, the event
processing loop time in ovn-controller, when adding a new VIP, is
reduced from ~39 seconds to ~8 seconds.

There's no need to change the northd DDlog implementation.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
Acked-by: Mark Michelson 
Signed-off-by: Dumitru Ceara 
---
v3: Update SB schema version.
v2: Address Han's comments and add Mark's ack.
---
 controller/lflow.c  |  6 --
 northd/ovn-northd.c | 14 ++
 ovn-sb.ovsschema|  8 +++-
 3 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/controller/lflow.c b/controller/lflow.c
index 60aa011ff..c58c4f25c 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1744,8 +1744,10 @@ lflow_processing_end:
 /* Add load balancer hairpin flows if the datapath has any load balancers
  * associated. */
 for (size_t i = 0; i < dp->n_load_balancers; i++) {
-consider_lb_hairpin_flows(dp->load_balancers[i],
-  l_ctx_in->local_datapaths,
+const struct sbrec_load_balancer *lb =
+sbrec_load_balancer_table_get_for_uuid(l_ctx_in->lb_table,
+   >load_balancers[i]);
+consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
   l_ctx_out->flow_table);
 }
 
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 562dc62b2..999c3f482 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -3635,19 +3635,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap 
*datapaths,
 continue;
 }
 
-const struct sbrec_load_balancer **sbrec_lbs =
-xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
+struct uuid *lb_uuids =
+xmalloc(od->nbs->n_load_balancer * sizeof 

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Van Haaren, Harry
> -Original Message-
> From: Ilya Maximets 
> Sent: Monday, July 12, 2021 2:25 PM
> To: Amber, Kumar ; ovs-dev@openvswitch.org
> Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren,
> Harry ; Ferriter, Cian ;
> Stokes, Ian 
> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer for
> miniflow extract
> 
> On 7/12/21 7:51 AM, kumar Amber wrote:
> > From: Kumar Amber 
> >
> > This patch introduces the MFEX function pointers which allows
> > the user to switch between different miniflow extract implementations
> > which are provided by the OVS based on optimized ISA CPU.
> >
> > The user can query for the available minflow extract variants available
> > for that CPU by following commands:
> >
> > $ovs-appctl dpif-netdev/miniflow-parser-get
> >
> > Similarly an user can set the miniflow implementation by the following
> > command :
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set name
> >
> > This allows for more performance and flexibility to the user to choose
> > the miniflow implementation according to the needs.
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> > v9:
> > - fix review comments from Flavio
> > v7:
> > - fix review comments(Eelco, Flavio)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - add enum to hold mfex indexes
> > - add new get and set implemenatations
> > - add Atomic set and get
> > ---
> 
> ovsrobot has issues with reporting the status right now, but this
> patch fails the build in GHA:
>   https://github.com/ovsrobot/ovs/actions/runs/1021787643

Thanks for linking on results.

I've spot-checked a bunch of the failing builds, and found 2 fixable code 
issues.
A few of the CI run's I can't find/explain the error, but I don't know of a good
way to "jump to the error" line, am I missing a trick, or is scrolling the whole
compiler output and checking errors the best method?

ISSUES:
#1 : OVS Requires Mutex issue (Linux clang test dpdk build)
1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared 
identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? 
1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex);

#2 : Unused Argument (As from mailing list review comment too, linux gcc dpdk 
--enable-shared)
2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ 
[-Werror=unused-parameter] 
2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc,

#3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot 
explain this?)
make: *** [distcheck] Error 1 
4490Makefile:5298: recipe for target 'distcheck' failed 
4491+ cat '*/_build/sub/tests/testsuite.log' 
4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory 
4493Error: Process completed with exit code 1.

SOLUTIONS:
#1, likely to forward-decl the "dp_netdev_mutex" to make it available
in the extract header file, and remove the "static" keyword so its no longer 
limited
to the dpif-netdev.c compilation unit.

#2 is a simple OVS_UNUSED as Eelco suggested during review.

#3, I'm not sure where the DistCheck issue arise from, it seems to be missing 
directories
during the test run? Input appreciated, as pushing & hoping tends to be a 
tiresome
and long process.


> Best regards, Ilya Maximets.

Regards, -Harry
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Ilya Maximets
On 7/12/21 7:51 AM, kumar Amber wrote:
> From: Kumar Amber 
> 
> This patch introduces the MFEX function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
> 
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
> 
> $ovs-appctl dpif-netdev/miniflow-parser-get
> 
> Similarly an user can set the miniflow implementation by the following
> command :
> 
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
> 
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
> 
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> 
> ---
> v9:
> - fix review comments from Flavio
> v7:
> - fix review comments(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add enum to hold mfex indexes
> - add new get and set implemenatations
> - add Atomic set and get
> ---

ovsrobot has issues with reporting the status right now, but this
patch fails the build in GHA:
  https://github.com/ovsrobot/ovs/actions/runs/1021787643

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 05/12] dpif-netdev: Add configure to enable autovalidator at build time.

2021-07-12 Thread Eelco Chaudron



On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> This commit adds a new command to allow the user to enable
> autovalidatior by default at build time thus allowing for
> runnig unit test by default.
>
>  $ ./configure --enable-mfex-default-autovalidator
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
>
> ---
> v9:
> - fix review comments Flavio
> v7:
> - fix review commens(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst |  5 +
>  NEWS |  3 ++-
>  acinclude.m4 | 16 
>  configure.ac |  1 +
>  lib/dpif-netdev-private-extract.c|  8 ++--
>  5 files changed, 30 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 7c618cf1f..4db416ddd 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -307,3 +307,8 @@ implementations provide the same results.
>  To set the Miniflow autovalidator, use this command ::
>
>  $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> +
> +A compile time option is available in order to test it with the OVS unit
> +test suite. Use the following configure option ::
> +
> +$ ./configure --enable-mfex-default-autovalidator
> diff --git a/NEWS b/NEWS
> index 4a7b89409..581bff225 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -38,6 +38,8 @@ Post-v2.15.0
>   * Add study function to miniflow function table which studies packet
> and automatically chooses the best miniflow implementation for that
> traffic.
> + * Add build time configure command to enable auto-validatior as default
> +   miniflow implementation at build time.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> @@ -57,7 +59,6 @@ Post-v2.15.0
>   whether the SNAT with all-zero IP address is supported.
>   See ovs-vswitchd.conf.db(5) for details.
>
> -

You are removing a white space here unrelated to your changes. Please leave it 
in.

>  v2.15.0 - 15 Feb 2021
>  -
> - OVSDB:
> diff --git a/acinclude.m4 b/acinclude.m4
> index 343303447..5a48f0335 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -14,6 +14,22 @@
>  # See the License for the specific language governing permissions and
>  # limitations under the License.
>
> +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time?
> +dnl This enables automatically running all unit tests with all MFEX
> +dnl implementations.
> +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [
> +  AC_ARG_ENABLE([mfex-default-autovalidator],
> +[AC_HELP_STRING([--enable-mfex-default-autovalidator], 
> [Enable MFEX autovalidator as default miniflow_extract implementation.])],
> +[autovalidator=yes],[autovalidator=no])
> +  AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation])
> +  if test "$autovalidator" != yes; then
> +AC_MSG_RESULT([no])
> +  else
> +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT"
> +AC_MSG_RESULT([yes])
> +  fi
> +])
> +
>  dnl Set OVS DPCLS Autovalidator as default subtable search at compile time?
>  dnl This enables automatically running all unit tests with all DPCLS
>  dnl implementations.
> diff --git a/configure.ac b/configure.ac
> index e45685a6c..46c402892 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE
>  OVS_CTAGS_IDENTIFIERS
>  OVS_CHECK_DPCLS_AUTOVALIDATOR
>  OVS_CHECK_DPIF_AVX512_DEFAULT
> +OVS_CHECK_MFEX_AUTOVALIDATOR
>  OVS_CHECK_BINUTILS_AVX512
>
>  AC_ARG_VAR(KARCH, [Kernel Architecture String])
> diff --git a/lib/dpif-netdev-private-extract.c 
> b/lib/dpif-netdev-private-extract.c
> index 4ea111f94..ad71f238e 100644
> --- a/lib/dpif-netdev-private-extract.c
> +++ b/lib/dpif-netdev-private-extract.c
> @@ -77,20 +77,24 @@ dp_mfex_impl_get_default(void)
>  {
>  atomic_uintptr_t *mfex_func = (void *)_mfex_func;
>  static bool default_mfex_func_set = false;
> +#ifdef MFEX_AUTOVALIDATOR_DEFAULT
> +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR;
> +#else
>  int mfex_idx = MFEX_IMPL_SCALAR;
> +#endif
>
>  /* For the first call, this will be choosen based on the
>   * compile time flag and if nor flag is set it is set to
>   * default scalar.
>   */
>  if (OVS_UNLIKELY(!default_mfex_func_set)) {
> -VLOG_INFO("Default MFEX implementation is %s.\n",
> +
> +VLOG_INFO("Default miniflow extract implementation%s.\n",

Guess the text should have been updated in the patch introducing it.

>mfex_impls[mfex_idx].name);
>  atomic_store_relaxed(mfex_func, (uintptr_t) mfex_impls
>   [mfex_idx].extract_func);
>  default_mfex_func_set = true;

Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Flavio Leitner
On Mon, Jul 12, 2021 at 02:22:46PM +0200, Eelco Chaudron wrote:
> See some comments below…
> 
> For this patch series, I’m only looking at the diff from v6..v9, not a full 
> review.
> I will do basic compilation and some tests at the end.
> 
> Cheers,
> 
> Eelco
> 
> 
> On 12 Jul 2021, at 7:51, kumar Amber wrote:
> 
> > From: Kumar Amber 
> >
> > This patch introduces the MFEX function pointers which allows
> > the user to switch between different miniflow extract implementations
> > which are provided by the OVS based on optimized ISA CPU.
> >
> > The user can query for the available minflow extract variants available
> > for that CPU by following commands:
> >
> > $ovs-appctl dpif-netdev/miniflow-parser-get
> >
> > Similarly an user can set the miniflow implementation by the following
> > command :
> >
> > $ ovs-appctl dpif-netdev/miniflow-parser-set name
> >
> > This allows for more performance and flexibility to the user to choose
> > the miniflow implementation according to the needs.
> >
> > Signed-off-by: Kumar Amber 
> > Co-authored-by: Harry van Haaren 
> > Signed-off-by: Harry van Haaren 
> >
> > ---
> > v9:
> > - fix review comments from Flavio
> > v7:
> > - fix review comments(Eelco, Flavio)
> > v5:
> > - fix review comments(Ian, Flavio, Eelco)
> > - add enum to hold mfex indexes
> > - add new get and set implemenatations
> > - add Atomic set and get
> > ---
> > ---
> >  NEWS  |   1 +
> >  lib/automake.mk   |   2 +
> >  lib/dpif-netdev-avx512.c  |  31 +-
> >  lib/dpif-netdev-private-extract.c | 162 ++
> >  lib/dpif-netdev-private-extract.h | 111 
> >  lib/dpif-netdev-private-thread.h  |   8 ++
> >  lib/dpif-netdev.c | 105 +++
> >  7 files changed, 416 insertions(+), 4 deletions(-)
> >  create mode 100644 lib/dpif-netdev-private-extract.c
> >  create mode 100644 lib/dpif-netdev-private-extract.h
> >
> > diff --git a/NEWS b/NEWS
> > index 6cdccc715..b0f08e96d 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -32,6 +32,7 @@ Post-v2.15.0
> >   * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction 
> > if the
> > CPU supports it. This enhances performance by using the native 
> > vpopcount
> > instructions, instead of the emulated version of vpopcount.
> > + * Add command line option to switch between MFEX function pointers.
> > - ovs-ctl:
> >   * New option '--no-record-hostname' to disable hostname configuration
> > in ovsdb on startup.
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index 3c9523c1a..53b8abc0f 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
> > lib/dpif-netdev-private-dpcls.h \
> > lib/dpif-netdev-private-dpif.c \
> > lib/dpif-netdev-private-dpif.h \
> > +   lib/dpif-netdev-private-extract.c \
> > +   lib/dpif-netdev-private-extract.h \
> > lib/dpif-netdev-private-flow.h \
> > lib/dpif-netdev-private-thread.h \
> > lib/dpif-netdev-private.h \
> > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> > index 6f9aa8284..7772b7abf 100644
> > --- a/lib/dpif-netdev-avx512.c
> > +++ b/lib/dpif-netdev-avx512.c
> > @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >   * // do all processing (HWOL->MFEX->EMC->SMC)
> >   * }
> >   */
> > +
> > +/* Do a batch minfilow extract into keys. */
> > +uint32_t mf_mask = 0;
> > +miniflow_extract_func mfex_func;
> > +atomic_read_relaxed(>miniflow_extract_opt, _func);
> > +if (mfex_func) {
> > +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
> > +}
> > +
> >  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
> >  uint32_t iter = lookup_pkts_bitmask;
> >  while (iter) {
> > @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >  pkt_metadata_init(>md, in_port);
> >
> >  struct dp_netdev_flow *f = NULL;
> > +struct netdev_flow_key *key = [i];
> > +
> > +/* Check the minfiflow mask to see if the packet was correctly
> > + * classifed by vector mfex else do a scalar miniflow extract
> > + * for that packet.
> > + */
> > +bool mfex_hit = !!(mf_mask & (1 << i));
> >
> >  /* Check for a partial hardware offload match. */
> >  if (hwol_enabled) {
> > @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct 
> > dp_netdev_pmd_thread *pmd,
> >  }
> >  if (f) {
> >  rules[i] = >cr;
> > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> > +/* If AVX512 MFEX already classified the packet, use it. */
> > +if (mfex_hit) {
> > +pkt_meta[i].tcp_flags = 
> > miniflow_get_tcp_flags(>mf);
> > +} else {
> > + 

Re: [ovs-dev] [v9 04/12] docs/dpdk/bridge: add miniflow extract section.

2021-07-12 Thread Eelco Chaudron


On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> This commit adds a section to the dpdk/bridge.rst netdev documentation,
> detailing the added miniflow functionality. The newly added commands are
> documented, and sample output is provided.
>
> The use of auto-validator and special study function is also described
> in detail as well as running fuzzy tests.
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Cian Ferriter 
> Signed-off-by: Cian Ferriter 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
> Acked-by: Flavio Leitner 
>
> ---
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> ---
> ---
>  Documentation/topics/dpdk/bridge.rst | 51 
>  1 file changed, 51 insertions(+)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index 2d0850836..7c618cf1f 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -256,3 +256,54 @@ The following line should be seen in the configure 
> output when the above option
>  is used ::
>
>  checking whether DPIF AVX512 is default implementation... yes
> +
> +Miniflow Extract
> +
> +
> +Miniflow extract (MFEX) performs parsing of the raw packets and extracts the
> +important header information into a compressed miniflow. This miniflow is
> +composed of bits and blocks where the bits signify which blocks are set or
> +have values where as the blocks hold the metadata, ip, udp, vlan, etc. These
> +values are used by the datapath for switching decisions later.The Optimized
> +miniflow extract is traffic specific to speed up the lookup, whereas the
> +scalar works for ALL traffic patterns
> +
> +Most modern CPUs have SIMD capabilities. These SIMD instructions are able
> +to process a vector rather than act on one single data.

This sounds odd “rather than act on one single data.”?

> OVS provides multiple
> +implementations of miniflow extract. This allows the user to take advantage
> +of SIMD instructions like AVX512 to gain additional performance.
> +
> +A list of implementations can be obtained by the following command. The
> +command also shows whether the CPU supports each implementation ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-get
> +Available Optimized Miniflow Extracts:
> +autovalidator (available: True, pmds: none)
> +scalar (available: True, pmds: 1,15)
> +study (available: True, pmds: none)
> +
> +An implementation can be selected manually by the following command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set study
> +
> +Also user can select the study implementation which studies the traffic for
> +a specific number of packets by applying all available implementaions of

implementations

> +miniflow extract and than chooses the one with most optimal result for that

than -> then

most optimal -> the most optimal

> +traffic pattern.
> +
> +Miniflow Extract Validation
> +~~~
> +
> +As multiple versions of miniflow extract can co-exist, each with different
> +CPU ISA optimizations, it is important to validate that they all give the
> +exact same results. To easily test all miniflow implementations, an
> +``autovalidator`` implementation of the miniflow exists. This implementation
> +runs all other available miniflow extract implementations, and verifies that
> +the results are identical.
> +
> +Running the OVS unit tests with the autovalidator enabled ensures all
> +implementations provide the same results.
> +
> +To set the Miniflow autovalidator, use this command ::
> +
> +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator
> -- 
> 2.25.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-12 Thread Eelco Chaudron


On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
>
> Study can be run at runtime using the following command:
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
>
> ---
> v9:
> - fix comments Flavio
> v8:
> - fix review comments Flavio
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add Atomic set in study
> ---
> ---
>  NEWS  |   3 +
>  lib/automake.mk   |   1 +
>  lib/dpif-netdev-extract-study.c   | 136 ++
>  lib/dpif-netdev-private-extract.c |  11 +++
>  lib/dpif-netdev-private-extract.h |  23 +
>  5 files changed, 174 insertions(+)
>  create mode 100644 lib/dpif-netdev-extract-study.c
>
> diff --git a/NEWS b/NEWS
> index cf254bcfe..4a7b89409 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -35,6 +35,9 @@ Post-v2.15.0
>   * Add command line option to switch between MFEX function pointers.
>   * Add miniflow extract auto-validator function to compare different
> miniflow extract implementations against default implementation.
> + * Add study function to miniflow function table which studies packet
> +   and automatically chooses the best miniflow implementation for that
> +   traffic.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 53b8abc0f..f4f36325e 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dp-packet.h \
>   lib/dp-packet.c \
>   lib/dpdk.h \
> + lib/dpif-netdev-extract-study.c \
>   lib/dpif-netdev-lookup.h \
>   lib/dpif-netdev-lookup.c \
>   lib/dpif-netdev-lookup-autovalidator.c \
> diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c
> new file mode 100644
> index 0..a19759bd9
> --- /dev/null
> +++ b/lib/dpif-netdev-extract-study.c
> @@ -0,0 +1,136 @@
> +/*
> + * Copyright (c) 2021 Intel.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "dpif-netdev-private-thread.h"
> +#include "openvswitch/vlog.h"
> +#include "ovs-thread.h"
> +
> +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study);
> +
> +static uint32_t mfex_study_pkts_count = 0;
> +
> +/* Struct to hold miniflow study stats. */
> +struct study_stats {
> +uint32_t pkt_count;
> +uint32_t impl_hitcount[MFEX_IMPL_MAX];
> +};
> +
> +/* Define per thread data to hold the study stats. */
> +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats);
> +
> +/* Allocate per thread PMD pointer space for study_stats. */
> +static inline struct study_stats *
> +mfex_study_get_study_stats_ptr(void)
> +{
> +struct study_stats *stats = study_stats_get();
> +if (OVS_UNLIKELY(!stats)) {
> +   stats = xzalloc(sizeof *stats);
> +   study_stats_set_unsafe(stats);
> +}
> +return stats;
> +}
> +
> +uint32_t
> +mfex_study_traffic(struct dp_packet_batch *packets,
> +   struct netdev_flow_key *keys,
> +   uint32_t keys_size, odp_port_t in_port,
> +   struct dp_netdev_pmd_thread *pmd_handle)
> +{
> +uint32_t hitmask = 0;
> +uint32_t mask = 0;
> +struct dp_netdev_pmd_thread *pmd = pmd_handle;
> +struct dpif_miniflow_extract_impl *miniflow_funcs;
> +struct study_stats *stats = mfex_study_get_study_stats_ptr();
> +dpif_mfex_impl_info_get(_funcs);
> +
> +/* Run traffic optimized miniflow_extract to collect the hitmask
> + * to be compared after certain packets have been hit to choose
> + * the best miniflow_extract version for that traffic.
> + */
> +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) {
> +if (!miniflow_funcs[i].available) {
> +continue;
> +}
> +
> +hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size,
> + in_port, pmd_handle);
> +stats->impl_hitcount[i] += count_1bits(hitmask);
> +
> +/* If traffic is 

Re: [ovs-dev] [PATCH ovn] controller: instrument ovn-controller loop with stopwatch

2021-07-12 Thread 0-day Robot
Bleep bloop.  Greetings Lorenzo Bianconi, I am a robot and I have tried out 
your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 controller: instrument ovn-controller loop with stopwatch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-tc: verify the flower rule installed

2021-07-12 Thread Marcelo Ricardo Leitner
On Mon, Jul 12, 2021 at 10:28:15AM +0200, Eelco Chaudron wrote:
>
>
> On 9 Jul 2021, at 20:23, Ilya Maximets wrote:
>
> > On 7/9/21 10:35 AM, Eelco Chaudron wrote:
> >>
> >>
> >> On 8 Jul 2021, at 22:18, Ilya Maximets wrote:
> >>
> >>> On 5/17/21 3:20 PM, Eelco Chaudron wrote:
>  When OVs installs the flower rule, it only checks for the OK from the
>  kernel. It does not check if the rule requested matches the one
>  actually programmed. This change will add this check and warns the
>  user if this is not the case.
> 
>  Signed-off-by: Eelco Chaudron 
>  ---
>   lib/tc.c |   59 
>  +++
>   1 file changed, 59 insertions(+)
> 
>  diff --git a/lib/tc.c b/lib/tc.c
>  index a27cca2cc..e134f6a06 100644
>  --- a/lib/tc.c
>  +++ b/lib/tc.c
>  @@ -2979,6 +2979,50 @@ nl_msg_put_flower_options(struct ofpbuf *request, 
>  struct tc_flower *flower)
>   return 0;
>   }
> 
>  +static bool
>  +cmp_tc_flower_match_action(const struct tc_flower *a,
>  +   const struct tc_flower *b)
>  +{
>  +if (memcmp(>mask, >mask, sizeof a->mask)) {
>  +VLOG_DBG_RL(_rl, "tc flower compare failed mask compare");
>  +return false;
>  +}
>  +
>  +/* We can not memcmp() the key as some keys might be set while the 
>  mask
>  + * is not.*/
>  +
>  +for (int i = 0; i < sizeof a->key; i++) {
>  +uint8_t mask = ((uint8_t *)>mask)[i];
>  +uint8_t key_a = ((uint8_t *)>key)[i] & mask;
>  +uint8_t key_b = ((uint8_t *)>key)[i] & mask;
>  +
>  +if (key_a != key_b) {
>  +VLOG_DBG_RL(_rl, "tc flower compare failed key 
>  compare at "
>  +"%d", i);
>  +return false;
>  +}
>  +}
>  +
>  +/* Compare the actions. */
>  +const struct tc_action *action_a = a->actions;
>  +const struct tc_action *action_b = b->actions;
>  +
>  +if (a->action_count != b->action_count) {
>  +VLOG_DBG_RL(_rl, "tc flower compare failed action length 
>  check");
>  +return false;
>  +}
>  +
>  +for (int i = 0; i < a->action_count; i++, action_a++, action_b++) {
>  +if (memcmp(action_a, action_b, sizeof *action_a)) {
>  +VLOG_DBG_RL(_rl, "tc flower compare failed action 
>  compare "
>  +"for %d", i);
>  +return false;
>  +}
>  +}
>  +
>  +return true;
>  +}
>  +
>   int
>   tc_replace_flower(struct tcf_id *id, struct tc_flower *flower)
>   {
>  @@ -3010,6 +3054,21 @@ tc_replace_flower(struct tcf_id *id, struct 
>  tc_flower *flower)
> 
>   id->prio = tc_get_major(tc->tcm_info);
>   id->handle = tc->tcm_handle;
>  +
>  +if (id->prio != TC_RESERVED_PRIORITY_POLICE) {
>  +struct tc_flower flower_out;
>  +struct tcf_id id_out;
>  +int ret;
>  +
>  +ret = parse_netlink_to_tc_flower(reply, _out, 
>  _out,
>  + false);
>  +
>  +if (ret || !cmp_tc_flower_match_action(flower, 
>  _out)) {
>  +VLOG_WARN_RL(_rl, "Kernel flower acknowledgment 
>  does "
>  + "not match request!\n Set dpif_netlink to 
>  dbg to "
>  + "see which rule caused this error.");
> >>>
> >>> So we're only printing the warning and not reverting the change
> >>> and not returning an error, right?  So, OVS will continue to
> >>> work with the incorrect rule installed?
> >>> I think, we should revert the incorrect change and return the
> >>> error, so the flow could be installed to the OVS kernel datapath,
> >>> but maybe this is a task for a separate change.
> >>>
> >>> What do you think?
> >>
> >> The goal was to make sure we do not break anything, in case there is an 
> >> existing kernel bug. As unfortunately, we are missing a good set of TC 
> >> unit tests.
> >>
> >> With the "warning only" option, we can backport this. And if in the field 
> >> we do not see any (false) reports, a follow-up patch can do as you 
> >> suggested.
> >
> > Make sense.  I removed '\n' from a warning (these doesn't look good in the 
> > log)
> > and applied to master.
>
> Thanks!
>
> > You and Marcelo are talking about backporting, do you think it make sense to
> > backport to stable branches?
>
> If it applies cleanly, I would suggest backporting it all the way to 2.13. 
> Marcelo?

I don't know how different is the support for 2.13 and 2.15. I mean,
if 2.13 is only for critical patches or so. Anyhow, I'd say 2.15 yes
and 2.13 on best effort. :)

>
> //Eelco

Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function

2021-07-12 Thread Eelco Chaudron



On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> The study function runs all the available implementations
> of miniflow_extract and makes a choice whose hitmask has
> maximum hits and sets the mfex to that function.
>
> Study can be run at runtime using the following command:
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set study
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
>
> ---
> v9:
> - fix comments Flavio
> v8:
> - fix review comments Flavio
> v7:
> - fix review comments(Eelco)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add Atomic set in study
> ---

Acked-by: Eelco Chaudron 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH ovn] controller: instrument ovn-controller loop with stopwatch

2021-07-12 Thread Lorenzo Bianconi
Introduce stopwatch instrumentation to the following ovn-controller
routines:
- commit_ct_zones
- bfd_run
- patch_run
- pinctrl_run
- if_status_mgr_update
- if_status_mgr_run
- ofctrl_seqno_run

Signed-off-by: Lorenzo Bianconi 
---
This patch is based on controller:
"Add stopwatch to measure OF update duration"
http://patchwork.ozlabs.org/project/ovn/patch/20210706144149.3553-1-dce...@redhat.com/
---
 controller/ovn-controller.c | 38 +
 1 file changed, 38 insertions(+)

diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 6a9c25f28..57a0bf393 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -95,6 +95,13 @@ static unixctl_cb_func debug_delay_nb_cfg_report;
 
 #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
 #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
+#define PINCTRL_RUN_STOPWATCH_NAME "pinctrl-run"
+#define PATCH_RUN_STOPWATCH_NAME "patch-run"
+#define CT_ZONE_COMMIT_STOPWATCH_NAME "ct-zone-commit"
+#define IF_STATUS_MGR_RUN_STOPWATCH_NAME "if-status-mgr-run"
+#define IF_STATUS_MGR_UPDATE_STOPWATCH_NAME "if-status-mgr-update"
+#define OFCTRL_SEQNO_RUN_STOPWATCH_NAME "ofctrl-seqno-run"
+#define BFD_RUN_STOPWATCH_NAME "bfd-run"
 
 #define OVS_NB_CFG_NAME "ovn-nb-cfg"
 
@@ -2847,6 +2854,13 @@ main(int argc, char *argv[])
 
 stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
 stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
+stopwatch_create(PINCTRL_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(PATCH_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(CT_ZONE_COMMIT_STOPWATCH_NAME, SW_MS);
+stopwatch_create(IF_STATUS_MGR_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, SW_MS);
+stopwatch_create(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, SW_MS);
+stopwatch_create(BFD_RUN_STOPWATCH_NAME, SW_MS);
 
 /* Define inc-proc-engine nodes. */
 ENGINE_NODE_CUSTOM_DATA(ct_zones, "ct_zones");
@@ -3231,23 +3245,33 @@ main(int argc, char *argv[])
 ct_zones_data = engine_get_data(_ct_zones);
 if (ovs_idl_txn) {
 if (ct_zones_data) {
+stopwatch_start(CT_ZONE_COMMIT_STOPWATCH_NAME,
+time_msec());
 commit_ct_zones(br_int, _zones_data->pending);
+stopwatch_stop(CT_ZONE_COMMIT_STOPWATCH_NAME,
+   time_msec());
 }
+stopwatch_start(BFD_RUN_STOPWATCH_NAME, time_msec());
 bfd_run(ovsrec_interface_table_get(ovs_idl_loop.idl),
 br_int, chassis,
 sbrec_ha_chassis_group_table_get(
 ovnsb_idl_loop.idl),
 sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
+stopwatch_stop(BFD_RUN_STOPWATCH_NAME, time_msec());
 }
 
 runtime_data = engine_get_data(_runtime_data);
 if (runtime_data) {
+stopwatch_start(PATCH_RUN_STOPWATCH_NAME, time_msec());
 patch_run(ovs_idl_txn,
 sbrec_port_binding_by_type,
 ovsrec_bridge_table_get(ovs_idl_loop.idl),
 ovsrec_open_vswitch_table_get(ovs_idl_loop.idl),
 ovsrec_port_table_get(ovs_idl_loop.idl),
 br_int, chassis, _data->local_datapaths);
+stopwatch_stop(PATCH_RUN_STOPWATCH_NAME, time_msec());
+stopwatch_start(PINCTRL_RUN_STOPWATCH_NAME,
+time_msec());
 pinctrl_run(ovnsb_idl_txn,
 sbrec_datapath_binding_by_key,
 sbrec_port_binding_by_datapath,
@@ -3266,6 +3290,8 @@ main(int argc, char *argv[])
 br_int, chassis,
 _data->local_datapaths,
 _data->active_tunnels);
+stopwatch_stop(PINCTRL_RUN_STOPWATCH_NAME,
+   time_msec());
 /* Updating monitor conditions if runtime data or
  * logical datapath goups changed. */
 if (engine_node_changed(_runtime_data)
@@ -3288,7 +3314,11 @@ main(int argc, char *argv[])
 
 struct local_binding_data *binding_data =
 runtime_data ? _data->lbinding_data : NULL;
+stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
+time_msec());
 if_status_mgr_update(if_mgr, binding_data);
+   

Re: [ovs-dev] [PATCH ovs v1] tunnel: Remove the padding from packet when encapsulating.

2021-07-12 Thread Tonghao Zhang
On Mon, Jun 28, 2021 at 10:07 AM Tonghao Zhang 
wrote:

> On Thu, Apr 1, 2021 at 9:34 PM Tonghao Zhang 
> wrote:
> >
> > On Mon, Dec 14, 2020 at 11:11 AM  wrote:
> > >
> > > From: Tonghao Zhang 
> > >
> > > The root cause is that the old version of openvswitch doesn't
> > > remove the padding from packet before L3+ conntrack processing
> > > and then packets is dropped in linux kernel stack. The patch [1]
> > > fixes the issue. We fix this issue on gateway which running ovs-dpdk
> > > as a quick workaround. Padding should be removed because tunnel size
> > > + inner size > 64B. More detailes, see [1]
> > >
> > > [1] -
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9382fe71c0058465e942a633869629929102843d
> > > Signed-off-by: Tonghao Zhang 
> > ping :)
> friendly ping.

Hi Ilya
Can you help me to review this patch?


> > > ---
> > >  lib/netdev-native-tnl.c | 4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c
> > > index b89dfdd52..acfbb13c4 100644
> > > --- a/lib/netdev-native-tnl.c
> > > +++ b/lib/netdev-native-tnl.c
> > > @@ -149,11 +149,15 @@ void *
> > >  netdev_tnl_push_ip_header(struct dp_packet *packet,
> > > const void *header, int size, int *ip_tot_size)
> > >  {
> > > +int padding = dp_packet_l2_pad_size(packet);
> > >  struct eth_header *eth;
> > >  struct ip_header *ip;
> > >  struct ovs_16aligned_ip6_hdr *ip6;
> > >
> > >  eth = dp_packet_push_uninit(packet, size);
> > > +if (padding) {
> > > +dp_packet_set_size(packet, dp_packet_size(packet) - padding);
> > > +}
> > >  *ip_tot_size = dp_packet_size(packet) - sizeof (struct
> eth_header);
> > >
> > >  memcpy(eth, header, size);
> > > --
> > > 2.14.1
> > >
> >
> >
> > --
> > Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao
>
-- 
Best regards, Tonghao
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract

2021-07-12 Thread Eelco Chaudron
See some comments below…

For this patch series, I’m only looking at the diff from v6..v9, not a full 
review.
I will do basic compilation and some tests at the end.

Cheers,

Eelco


On 12 Jul 2021, at 7:51, kumar Amber wrote:

> From: Kumar Amber 
>
> This patch introduces the MFEX function pointers which allows
> the user to switch between different miniflow extract implementations
> which are provided by the OVS based on optimized ISA CPU.
>
> The user can query for the available minflow extract variants available
> for that CPU by following commands:
>
> $ovs-appctl dpif-netdev/miniflow-parser-get
>
> Similarly an user can set the miniflow implementation by the following
> command :
>
> $ ovs-appctl dpif-netdev/miniflow-parser-set name
>
> This allows for more performance and flexibility to the user to choose
> the miniflow implementation according to the needs.
>
> Signed-off-by: Kumar Amber 
> Co-authored-by: Harry van Haaren 
> Signed-off-by: Harry van Haaren 
>
> ---
> v9:
> - fix review comments from Flavio
> v7:
> - fix review comments(Eelco, Flavio)
> v5:
> - fix review comments(Ian, Flavio, Eelco)
> - add enum to hold mfex indexes
> - add new get and set implemenatations
> - add Atomic set and get
> ---
> ---
>  NEWS  |   1 +
>  lib/automake.mk   |   2 +
>  lib/dpif-netdev-avx512.c  |  31 +-
>  lib/dpif-netdev-private-extract.c | 162 ++
>  lib/dpif-netdev-private-extract.h | 111 
>  lib/dpif-netdev-private-thread.h  |   8 ++
>  lib/dpif-netdev.c | 105 +++
>  7 files changed, 416 insertions(+), 4 deletions(-)
>  create mode 100644 lib/dpif-netdev-private-extract.c
>  create mode 100644 lib/dpif-netdev-private-extract.h
>
> diff --git a/NEWS b/NEWS
> index 6cdccc715..b0f08e96d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -32,6 +32,7 @@ Post-v2.15.0
>   * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if 
> the
> CPU supports it. This enhances performance by using the native 
> vpopcount
> instructions, instead of the emulated version of vpopcount.
> + * Add command line option to switch between MFEX function pointers.
> - ovs-ctl:
>   * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 3c9523c1a..53b8abc0f 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \
>   lib/dpif-netdev-private-dpcls.h \
>   lib/dpif-netdev-private-dpif.c \
>   lib/dpif-netdev-private-dpif.h \
> + lib/dpif-netdev-private-extract.c \
> + lib/dpif-netdev-private-extract.h \
>   lib/dpif-netdev-private-flow.h \
>   lib/dpif-netdev-private-thread.h \
>   lib/dpif-netdev-private.h \
> diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c
> index 6f9aa8284..7772b7abf 100644
> --- a/lib/dpif-netdev-avx512.c
> +++ b/lib/dpif-netdev-avx512.c
> @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>   * // do all processing (HWOL->MFEX->EMC->SMC)
>   * }
>   */
> +
> +/* Do a batch minfilow extract into keys. */
> +uint32_t mf_mask = 0;
> +miniflow_extract_func mfex_func;
> +atomic_read_relaxed(>miniflow_extract_opt, _func);
> +if (mfex_func) {
> +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd);
> +}
> +
>  uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1;
>  uint32_t iter = lookup_pkts_bitmask;
>  while (iter) {
> @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  pkt_metadata_init(>md, in_port);
>
>  struct dp_netdev_flow *f = NULL;
> +struct netdev_flow_key *key = [i];
> +
> +/* Check the minfiflow mask to see if the packet was correctly
> + * classifed by vector mfex else do a scalar miniflow extract
> + * for that packet.
> + */
> +bool mfex_hit = !!(mf_mask & (1 << i));
>
>  /* Check for a partial hardware offload match. */
>  if (hwol_enabled) {
> @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,
>  }
>  if (f) {
>  rules[i] = >cr;
> -pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +/* If AVX512 MFEX already classified the packet, use it. */
> +if (mfex_hit) {
> +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf);
> +} else {
> +pkt_meta[i].tcp_flags = parse_tcp_flags(packet);
> +}
> +
>  pkt_meta[i].bytes = dp_packet_size(packet);
>  phwol_hits++;
>  hwol_emc_smc_hitmask |= (1 << i);
> @@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread 
> *pmd,

Re: [ovs-dev] [PATCH 2/2] netdev-offload-dpdk: Fix vxlan vni cast-align warnings

2021-07-12 Thread Van Haaren, Harry
> -Original Message-
> From: Eli Britstein 
> Sent: Sunday, July 11, 2021 6:15 AM
> To: d...@openvswitch.org; Ilya Maximets ; Van Haaren,
> Harry 
> Cc: Gaetan Rivet ; Majd Dibbiny ; Eli
> Britstein 
> Subject: [PATCH 2/2] netdev-offload-dpdk: Fix vxlan vni cast-align warnings
> 
> Compiling with -Werror and -Wcast-align has errors like:
> 
> lib/netdev-offload-dpdk.c: In function 'dump_flow_pattern':
> lib/netdev-offload-dpdk.c:385:38: error: cast increases required alignment
> of target type [-Werror=cast-align]
>   385 |ntohl(*(ovs_be32 *) vxlan_spec->vni) >> 8,
>   |   ^
> 
> Fix them.
> 
> Reported-by: Harry Van Haaren 
> Fixes: 4e432d6f8128 ("netdev-offload-dpdk: Support tnl/push using vxlan encap
> attribute.")
> Fixes: e098c2f966cb ("netdev-dpdk-offload: Add vxlan pattern matching 
> function.")
> Signed-off-by: Eli Britstein 

Thanks Eli, tested compilation, and the cast-align issue is resolved for master 
branch with
these the patches in this series. I cannot test functionality here, so just 
compile tested.

Series-Tested-by: Harry van Haaren 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-offload-tc: verify the flower rule installed

2021-07-12 Thread Eelco Chaudron



On 9 Jul 2021, at 20:23, Ilya Maximets wrote:

> On 7/9/21 10:35 AM, Eelco Chaudron wrote:
>>
>>
>> On 8 Jul 2021, at 22:18, Ilya Maximets wrote:
>>
>>> On 5/17/21 3:20 PM, Eelco Chaudron wrote:
 When OVs installs the flower rule, it only checks for the OK from the
 kernel. It does not check if the rule requested matches the one
 actually programmed. This change will add this check and warns the
 user if this is not the case.

 Signed-off-by: Eelco Chaudron 
 ---
  lib/tc.c |   59 
 +++
  1 file changed, 59 insertions(+)

 diff --git a/lib/tc.c b/lib/tc.c
 index a27cca2cc..e134f6a06 100644
 --- a/lib/tc.c
 +++ b/lib/tc.c
 @@ -2979,6 +2979,50 @@ nl_msg_put_flower_options(struct ofpbuf *request, 
 struct tc_flower *flower)
  return 0;
  }

 +static bool
 +cmp_tc_flower_match_action(const struct tc_flower *a,
 +   const struct tc_flower *b)
 +{
 +if (memcmp(>mask, >mask, sizeof a->mask)) {
 +VLOG_DBG_RL(_rl, "tc flower compare failed mask compare");
 +return false;
 +}
 +
 +/* We can not memcmp() the key as some keys might be set while the 
 mask
 + * is not.*/
 +
 +for (int i = 0; i < sizeof a->key; i++) {
 +uint8_t mask = ((uint8_t *)>mask)[i];
 +uint8_t key_a = ((uint8_t *)>key)[i] & mask;
 +uint8_t key_b = ((uint8_t *)>key)[i] & mask;
 +
 +if (key_a != key_b) {
 +VLOG_DBG_RL(_rl, "tc flower compare failed key compare 
 at "
 +"%d", i);
 +return false;
 +}
 +}
 +
 +/* Compare the actions. */
 +const struct tc_action *action_a = a->actions;
 +const struct tc_action *action_b = b->actions;
 +
 +if (a->action_count != b->action_count) {
 +VLOG_DBG_RL(_rl, "tc flower compare failed action length 
 check");
 +return false;
 +}
 +
 +for (int i = 0; i < a->action_count; i++, action_a++, action_b++) {
 +if (memcmp(action_a, action_b, sizeof *action_a)) {
 +VLOG_DBG_RL(_rl, "tc flower compare failed action 
 compare "
 +"for %d", i);
 +return false;
 +}
 +}
 +
 +return true;
 +}
 +
  int
  tc_replace_flower(struct tcf_id *id, struct tc_flower *flower)
  {
 @@ -3010,6 +3054,21 @@ tc_replace_flower(struct tcf_id *id, struct 
 tc_flower *flower)

  id->prio = tc_get_major(tc->tcm_info);
  id->handle = tc->tcm_handle;
 +
 +if (id->prio != TC_RESERVED_PRIORITY_POLICE) {
 +struct tc_flower flower_out;
 +struct tcf_id id_out;
 +int ret;
 +
 +ret = parse_netlink_to_tc_flower(reply, _out, _out,
 + false);
 +
 +if (ret || !cmp_tc_flower_match_action(flower, _out)) {
 +VLOG_WARN_RL(_rl, "Kernel flower acknowledgment 
 does "
 + "not match request!\n Set dpif_netlink to 
 dbg to "
 + "see which rule caused this error.");
>>>
>>> So we're only printing the warning and not reverting the change
>>> and not returning an error, right?  So, OVS will continue to
>>> work with the incorrect rule installed?
>>> I think, we should revert the incorrect change and return the
>>> error, so the flow could be installed to the OVS kernel datapath,
>>> but maybe this is a task for a separate change.
>>>
>>> What do you think?
>>
>> The goal was to make sure we do not break anything, in case there is an 
>> existing kernel bug. As unfortunately, we are missing a good set of TC unit 
>> tests.
>>
>> With the "warning only" option, we can backport this. And if in the field we 
>> do not see any (false) reports, a follow-up patch can do as you suggested.
>
> Make sense.  I removed '\n' from a warning (these doesn't look good in the 
> log)
> and applied to master.

Thanks!

> You and Marcelo are talking about backporting, do you think it make sense to
> backport to stable branches?

If it applies cleanly, I would suggest backporting it all the way to 2.13. 
Marcelo?

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


  1   2   >