[ovs-dev] [v10 12/12] dpif-netdev: add mfex options to scalar dpif
From: kumar Amber This commits add the mfex optimized options to be executed as part of scalar DPIF. Signed-off-by: kumar Amber Acked-by: Flavio Leitner --- lib/dpif-netdev.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index cca211837..14c98e450 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7029,6 +7029,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0, n_mfex_opt_hit = 0; struct dfc_cache *cache = >flow_cache; struct dp_packet *packet; +struct dp_packet_batch single_packet; const size_t cnt = dp_packet_batch_size(packets_); uint32_t cur_min = pmd->ctx.emc_insert_min; const uint32_t recirc_depth = *recirc_depth_get(); @@ -7039,6 +7040,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, size_t map_cnt = 0; bool batch_enable = true; +single_packet.count = 1; + +miniflow_extract_func mfex_func; +atomic_read_relaxed(>miniflow_extract_opt, _func); + atomic_read_relaxed(>dp->smc_enable_db, _enable_db); pmd_perf_update_counter(>perf_stats, md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, @@ -7089,7 +7095,22 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, } } -miniflow_extract(packet, >mf); +/* Set the count and packet for miniflow_opt with batch_size 1. */ +if ((mfex_func) && (!md_is_valid)) { +single_packet.packets[0] = packet; +int mf_ret; + +mf_ret = mfex_func(_packet, key, 1, port_no, pmd); +/* Fallback to original miniflow_extract if there is a miss. */ +if (mf_ret) { +n_mfex_opt_hit++; +} else { +miniflow_extract(packet, >mf); +} +} else { +miniflow_extract(packet, >mf); +} + key->len = 0; /* Not computed yet. */ key->hash = (md_is_valid == false) -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [v10 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles
From: Harry van Haaren This commit adds 3 new traffic profile implementations to the existing avx512 miniflow extract infrastructure. The profiles added are: - Ether()/IP()/TCP() - Ether()/Dot1Q()/IP()/UDP() - Ether()/Dot1Q()/IP()/TCP() The design of the avx512 code here is for scalability to add more traffic profiles, as well as enabling CPU ISA. Note that an implementation is primarily adding static const data, which the compiler then specializes away when the profile specific function is declared below. As a result, the code is relatively maintainable, and scalable for new traffic profiles as well as new ISA, and does not lower performance compared with manually written code for each profile/ISA. Note that confidence in the correctness of each implementation is achieved through autovalidation, unit tests with known packets, and fuzz tested packets. Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron Acked-by: Flavio Leitner --- Hi Readers, If you have a traffic profile you'd like to see accelerated using avx512 code, please send me an email and we can collaborate on adding support for it! Regards, -Harry --- v5: - fix review comments(Ian, Flavio, Eelco) --- --- NEWS | 2 + lib/dpif-netdev-extract-avx512.c | 152 ++ lib/dpif-netdev-private-extract.c | 30 ++ lib/dpif-netdev-private-extract.h | 10 ++ 4 files changed, 194 insertions(+) diff --git a/NEWS b/NEWS index 26cd85978..849008a80 100644 --- a/NEWS +++ b/NEWS @@ -41,6 +41,8 @@ Post-v2.15.0 * Add build time configure command to enable auto-validatior as default miniflow implementation at build time. * Cache results for CPU ISA checks, reduces overhead on repeated lookups. + * Add AVX512 based optimized miniflow extract function for traffic type + IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c index c06e53582..ecb0be70d 100644 --- a/lib/dpif-netdev-extract-avx512.c +++ b/lib/dpif-netdev-extract-avx512.c @@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF) #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00) +#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00) + +/* VLAN (Dot1Q) patterns and masks. */ +#define PATTERN_DT1Q_MASK \ + 0x00, 0x00, 0xFF, 0xFF, +#define PATTERN_DT1Q_IPV4 \ + 0x00, 0x00, 0x08, 0x00, /* Generator for checking IPv4 ver, ihl, and proto */ #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \ @@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) 34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */ \ NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ +/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */ +#define PATTERN_IPV4_TCP_SHUFFLE \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, NU, NU, /* Ether */ \ + 26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */ \ + NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */ \ + NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ + +#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \ + /* Ether (2 blocks): Note that *VLAN* type is written here. */ \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 0, 0, \ + /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */ \ + 12, 13, 14, 15, 0, 0, 0, 0, \ + 30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ \ + 38, 39, 40, 41, NU, NU, NU, NU, /* UDP */ + +#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \ + /* Ether (2 blocks): Note that *VLAN* type is written here. */ \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 0, 0, \ + /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */ \ + 12, 13, 14, 15, 0, 0, 0, 0, \ + 30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ \ + NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */ \ + NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ /* Generation of K-mask bitmask values, to zero out data in result. Note that * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be @@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) *
[ovs-dev] [v10 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
From: Harry van Haaren This commit adds AVX512 implementations of miniflow extract. By using the 64 bytes available in an AVX512 register, it is possible to convert a packet to a miniflow data-structure in a small quantity instructions. The implementation here probes for Ether()/IP()/UDP() traffic, and builds the appropriate miniflow data-structure for packets that match the probe. The implementation here is auto-validated by the miniflow extract autovalidator, hence its correctness can be easily tested and verified. Note that this commit is designed to easily allow addition of new traffic profiles in a scalable way, without code duplication for each traffic profile. Signed-off-by: Harry van Haaren --- v9: - include comments from flavio v8: - include documentation on AVX512 MFEX as per Eelco's suggestion v7: - fix minor review sentences (Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - inlcude assert for flow abi change - include assert for offset changes --- --- lib/automake.mk | 1 + lib/dpif-netdev-extract-avx512.c | 478 ++ lib/dpif-netdev-private-extract.c | 13 + lib/dpif-netdev-private-extract.h | 30 ++ 4 files changed, 522 insertions(+) create mode 100644 lib/dpif-netdev-extract-avx512.c diff --git a/lib/automake.mk b/lib/automake.mk index f4f36325e..299f81939 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \ $(AM_CFLAGS) lib_libopenvswitchavx512_la_SOURCES = \ lib/dpif-netdev-lookup-avx512-gather.c \ + lib/dpif-netdev-extract-avx512.c \ lib/dpif-netdev-avx512.c lib_libopenvswitchavx512_la_LDFLAGS = \ -static diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c new file mode 100644 index 0..c06e53582 --- /dev/null +++ b/lib/dpif-netdev-extract-avx512.c @@ -0,0 +1,478 @@ +/* + * Copyright (c) 2021 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * AVX512 Miniflow Extract. + * + * This file contains optimized implementations of miniflow_extract() + * for specific common traffic patterns. The optimizations allow for + * quick probing of a specific packet type, and if a match with a specific + * type is found, a shuffle like procedure builds up the required miniflow. + * + * Process + * - + * + * The procedure is to classify the packet based on the traffic type + * using predifined bit-masks and arrage the packet header data using shuffle + * instructions to a pre-defined place as required by the miniflow. + * This elimates the if-else ladder to identify the packet data and add data + * as per protocol which is present. + */ + +#ifdef __x86_64__ +/* Sparse cannot handle the AVX512 instructions. */ +#if !defined(__CHECKER__) + +#include +#include +#include +#include +#include + +#include "flow.h" +#include "dpdk.h" + +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-extract.h" +#include "dpif-netdev-private-flow.h" + +/* AVX512-BW level permutex2var_epi8 emulation. */ +static inline __m512i +__attribute__((target("avx512bw"))) +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask, + __m512i v_data_0, + __m512i v_shuf_idxs, + __m512i v_data_1) +{ +/* Manipulate shuffle indexes for u16 size. */ +__mmask64 k_mask_odd_lanes = 0x; +/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */ +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes, +v_shuf_idxs, +_mm512_setzero_si512()); +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1); + +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9); + +/* Shuffle each half at 16-bit width. */ +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn, +v_data_1); +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd, +v_data_1); + +/* Find if the shuffle index was odd, via mask and compare. */ +uint16_t index_odd_mask = 0x1; +const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask); + +/* EVEN lanes, find if u8 index was odd, result as u16 bitmask. */ +__m512i
[ovs-dev] [v10 09/12] dpdk: add additional CPU ISA detection strings
From: Harry van Haaren This commit enables OVS to at runtime check for more detailed AVX512 capabilities, specifically Byte and Word (BW) extensions, and Vector Bit Manipulation Instructions (VBMI). These instructions will be used in the CPU ISA optimized implementations of traffic profile aware miniflow extract. Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron Acked-by: Flavio Leitner --- NEWS | 1 + lib/dpdk.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/NEWS b/NEWS index 581bff225..26cd85978 100644 --- a/NEWS +++ b/NEWS @@ -40,6 +40,7 @@ Post-v2.15.0 traffic. * Add build time configure command to enable auto-validatior as default miniflow implementation at build time. + * Cache results for CPU ISA checks, reduces overhead on repeated lookups. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpdk.c b/lib/dpdk.c index 9de2af58e..1b8f8e55b 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature) #if __x86_64__ /* CPU flags only defined for the architecture that support it. */ CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F); +CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW); +CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI); CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ); CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2); #endif -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [v10 08/12] dpif/stats: add miniflow extract opt hits counter
From: Harry van Haaren This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits:6786432 (100.0 %) Signed-off-by: Harry van Haaren Acked-by: Flavio Leitner --- v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) --- --- lib/dpif-netdev-avx512.c| 3 +++ lib/dpif-netdev-perf.c | 3 +++ lib/dpif-netdev-perf.h | 1 + lib/dpif-netdev-unixctl.man | 4 lib/dpif-netdev.c | 12 +++- tests/pmd.at| 6 -- 6 files changed, 22 insertions(+), 7 deletions(-) diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 7772b7abf..544d36903 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } /* At this point we don't return error anymore, so commit stats here. */ +uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask); pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size); pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits); +pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT, +mfex_hit_cnt); pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits); pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits); pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT, diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index 7103a2d4d..d7676ea2b 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, " Rx packets:%12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n" " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n" " - PHWOL hits: %12"PRIu64" (%5.1f %%)\n" +" - MFEX Opt hits: %12"PRIu64" (%5.1f %%)\n" " - EMC hits:%12"PRIu64" (%5.1f %%)\n" " - SMC hits:%12"PRIu64" (%5.1f %%)\n" " - Megaflow hits: %12"PRIu64" (%5.1f %%, %.2f " @@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, passes, rx_packets ? 1.0 * passes / rx_packets : 0, stats[PMD_STAT_PHWOL_HIT], 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes, +stats[PMD_STAT_MFEX_OPT_HIT], +100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes, stats[PMD_STAT_EXACT_HIT], 100.0 * stats[PMD_STAT_EXACT_HIT] / passes, stats[PMD_STAT_SMC_HIT], diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 8b1a52387..834c26260 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -57,6 +57,7 @@ extern "C" { enum pmd_stat_type { PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */ +PMD_STAT_MFEX_OPT_HIT, /* Packets that had miniflow optimized match. */ PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ PMD_STAT_SMC_HIT, /* Packets that had a sig match hit (SMC). */ PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */ diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man index 83ce4f1c5..f34758416 100644 --- a/lib/dpif-netdev-unixctl.man +++ b/lib/dpif-netdev-unixctl.man @@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a recirculated packet experiences one additional lookup per recirculation, so there may be more lookups than forwarded packets in the datapath. +The MFEX Opt hits displays the number of packets which is processed by the +optimized miniflow extract implementations. + Cycles are counted using the TSC or similar facilities (when available on the platform). The duration of one cycle depends on the processing platform. @@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1: Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes:3599415 (1.50 passes/pkt) - PHWOL hits: 0 ( 0.0 %) + - MFEX Opt hits:3570133 ( 99.5 %) - EMC hits: 336472 ( 9.3 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits:3262943 ( 90.7 %, 1.00 subtbl lookups/hit) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1132a0ad5..cca211837 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply, " packet recirculations: %"PRIu64"\n" " avg. datapath passes per packet: %.02f\n" " phwol hits: %"PRIu64"\n" + " mfex opt hits:
[ovs-dev] [v10 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator
Tests: 6: OVS-DPDK - MFEX Autovalidator 7: OVS-DPDK - MFEX Autovalidator Fuzzy Added a new directory to store the PCAP file used in the tests and a script to generate the fuzzy traffic type pcap to be used in fuzzy unit test. Signed-off-by: Kumar Amber Acked-by: Flavio Leitner --- v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - remove sleep from first test and added minor 5 sec sleep to fuzzy --- --- Documentation/topics/dpdk/bridge.rst | 55 +++ tests/.gitignore | 1 + tests/automake.mk| 5 +++ tests/mfex_fuzzy.py | 31 +++ tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes tests/system-dpdk.at | 49 6 files changed, 141 insertions(+) create mode 100755 tests/mfex_fuzzy.py create mode 100644 tests/pcap/mfex_test.pcap diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 662446401..7b81d0305 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -345,3 +345,58 @@ A compile time option is available in order to test it with the OVS unit test suite. Use the following configure option :: $ ./configure --enable-mfex-default-autovalidator + +Unit Test Miniflow Extract +++ + +Unit test can also be used to test the workflow mentioned above by running +the following test-case in tests/system-dpdk.at :: + +make check-dpdk TESTSUITEFLAGS='-k MFEX' +OVS-DPDK - MFEX Autovalidator + +The unit test uses mulitple traffic type to test the correctness of the +implementaions. + +Running Fuzzy test with Autovalidator ++ + +Fuzzy tests can also be done on miniflow extract with the help of +auto-validator and Scapy. The steps below describes the steps to +reproduce the setup with IP being fuzzed to generate packets. + +Scapy is used to create fuzzy IP packets and save them into a PCAP :: + +pkt = fuzz(Ether()/IP()/TCP()) + +Set the miniflow extract to autovalidator using :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator + +OVS is configured to receive the generated packets :: + +$ ovs-vsctl add-port br0 pcap0 -- \ +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0 +"rx_pcap=fuzzy.pcap" + +With this workflow, the autovalidator will ensure that all MFEX +implementations are classifying each packet in exactly the same way. +If an optimized MFEX implementation causes a different miniflow to be +generated, the autovalidator has ovs_assert and logging statements that +will inform about the issue. + +Unit Fuzzy test with Autovalidator ++ + +The prerquiste before running the unit test is to run the script provided :: + +tests/mfex_fuzzy.py + +This script generates a pcap with mulitple type of fuzzed packets to be used +in the below unit test-case. + +Unit test can also be used to test the workflow mentioned above by running +the following test-case in tests/system-dpdk.at :: + +make check-dpdk TESTSUITEFLAGS='-k MFEX' +OVS-DPDK - MFEX Autovalidator Fuzzy diff --git a/tests/.gitignore b/tests/.gitignore index 45b4f67b2..a3d927e5d 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -11,6 +11,7 @@ /ovsdb-cluster-testsuite /ovsdb-cluster-testsuite.dir/ /ovsdb-cluster-testsuite.log +/pcap/ /pki/ /system-afxdp-testsuite /system-afxdp-testsuite.dir/ diff --git a/tests/automake.mk b/tests/automake.mk index f45f8d76c..2bcf054b0 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk echo "TEST_FUZZ_REGRESSION([$$basename])"; \ done > $@.tmp && mv $@.tmp $@ +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS) +MFEX_AUTOVALIDATOR_TESTS = \ + tests/pcap/mfex_test.pcap \ + tests/mfex_fuzzy.py + OVSDB_CLUSTER_TESTSUITE_AT = \ tests/ovsdb-cluster-testsuite.at \ tests/ovsdb-execution.at \ diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py new file mode 100755 index 0..395158b0d --- /dev/null +++ b/tests/mfex_fuzzy.py @@ -0,0 +1,31 @@ +#!/usr/bin/python3 +try: + from scapy.all import * +except ModuleNotFoundError as err: + print(err + ": Scapy") +import sys + +path = str(sys.argv[1]) + "/pcap/fuzzy.pcap" +pktdump = PcapWriter(path, append=False, sync=True) + +for i in range(0, 2000): + + # Generate random protocol bases, use a fuzz() over the combined packet for full fuzzing. + eth = Ether(src=RandMAC(), dst=RandMAC()) + vlan = Dot1Q() + ipv4 = IP(src=RandIP(), dst=RandIP()) + ipv6 = IPv6(src=RandIP6(), dst=RandIP6()) + udp = UDP(dport=RandShort(), sport=RandShort()) + tcp = TCP(dport=RandShort(), sport=RandShort()) + + # IPv4 packets with fuzzing + pktdump.write(fuzz(eth/ipv4/udp)) + pktdump.write(fuzz(eth/ipv4/tcp)) +
[ovs-dev] [v10 06/12] dpif-netdev: Add packet count and core id paramters for study
This commit introduces additional command line paramter for mfex study function. If user provides additional packet out it is used in study to compare minimum packets which must be processed else a default value is choosen. Also introduces a third paramter for choosing a particular pmd core. $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 Signed-off-by: Kumar Amber --- v10: - fix review comments Eelco v9: - fix review comments Flavio v7: - change the command paramters for core_id and study_pkt_cnt v5: - fix review comments(Ian, Flavio, Eelco) - introucde pmd core id parameter --- --- Documentation/topics/dpdk/bridge.rst | 37 +++- lib/dpif-netdev-extract-study.c | 25 +- lib/dpif-netdev-private-extract.h| 9 ++ lib/dpif-netdev.c| 128 +-- 4 files changed, 187 insertions(+), 12 deletions(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 0fa9341ac..662446401 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -284,12 +284,45 @@ command also shows whether the CPU supports each implementation :: An implementation can be selected manually by the following command :: -$ ovs-appctl dpif-netdev/miniflow-parser-set study +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name] + [study_cnt] + +The above command has two optional parameters: study_cnt and core_id. +The core_id sets a particular miniflow extract function to a specific +pmd thread on the core.The third parameter study_cnt, which is specific +to study and ignored by other implementations, means how many packets +are needed to choose the best implementation. Also user can select the study implementation which studies the traffic for a specific number of packets by applying all available implementations of miniflow extract and then chooses the one with the most optimal result for -that traffic pattern. +that traffic pattern. The user can optionally provide an packet count +[study_cnt] parameter which is the minimum number of packets that OVS must +study before choosing an optimal implementation. If no packet count is +provided, then the default value, 128 is chosen. Also, as there is no +synchronization point between threads, one PMD thread might still be running +a previous round, and can now decide on earlier data. + +The per packet count is a global value, and parallel study() executions with +differing packet counts will use the most recent count value provided by usser. + +Study can be selected with packet count by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024 + +Study can be selected with packet count and explicit PMD selection +by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024 + +In the above command the last parameter is the CORE ID of the PMD +thread and this can also be used to explicitly set the miniflow +extraction function pointer on different PMD threads. + +Scalar can be selected on core 3 by the following command where +study count can be put as any arbitrary number or left blank:: + +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar Miniflow Extract Validation ~~~ diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c index eddb35682..61260cb70 100644 --- a/lib/dpif-netdev-extract-study.c +++ b/lib/dpif-netdev-extract-study.c @@ -25,7 +25,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); -static atomic_uint32_t mfex_study_pkts_count = 0; +static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT; /* Struct to hold miniflow study stats. */ struct study_stats { @@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void) return stats; } +int +mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name) +{ +struct dpif_miniflow_extract_impl *miniflow_funcs; +miniflow_funcs = dpif_mfex_impl_info_get(); + +/* If the packet count is set and implementation called is study then + * set packet counter to requested number else set the packet counter + * to default number. + */ +if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) && +(pkt_cmp_count != 0)) { + +mfex_study_pkts_count = pkt_cmp_count; + +return 0; +} + +return -EINVAL; +} + uint32_t mfex_study_traffic(struct dp_packet_batch *packets, struct netdev_flow_key *keys, @@ -86,7 +107,7 @@ mfex_study_traffic(struct dp_packet_batch *packets, /* Choose the best implementation after a minimum packets have been * processed. */ -if (stats->pkt_count >= MFEX_MAX_PKT_COUNT) { +if (stats->pkt_count >= mfex_study_pkts_count) { uint32_t best_func_index = MFEX_IMPL_START_IDX; uint32_t max_hits = 0; for (int i = MFEX_IMPL_START_IDX; i
[ovs-dev] [v10 05/12] dpif-netdev: Add configure to enable autovalidator at build time.
This commit adds a new command to allow the user to enable autovalidatior by default at build time thus allowing for runnig unit test by default. $ ./configure --enable-mfex-default-autovalidator Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v10: - rework default set v9: - fix review comments Flavio v7: - fix review commens(Eelco, Flavio) v5: - fix review comments(Ian, Flavio, Eelco) --- --- Documentation/topics/dpdk/bridge.rst | 5 + NEWS | 3 ++- acinclude.m4 | 16 configure.ac | 1 + lib/dpif-netdev-private-extract.c| 4 5 files changed, 28 insertions(+), 1 deletion(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 6f37f2a75..0fa9341ac 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -307,3 +307,8 @@ implementations provide the same results. To set the Miniflow autovalidator, use this command :: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator + +A compile time option is available in order to test it with the OVS unit +test suite. Use the following configure option :: + +$ ./configure --enable-mfex-default-autovalidator diff --git a/NEWS b/NEWS index 4a7b89409..581bff225 100644 --- a/NEWS +++ b/NEWS @@ -38,6 +38,8 @@ Post-v2.15.0 * Add study function to miniflow function table which studies packet and automatically chooses the best miniflow implementation for that traffic. + * Add build time configure command to enable auto-validatior as default + miniflow implementation at build time. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. @@ -57,7 +59,6 @@ Post-v2.15.0 whether the SNAT with all-zero IP address is supported. See ovs-vswitchd.conf.db(5) for details. - v2.15.0 - 15 Feb 2021 - - OVSDB: diff --git a/acinclude.m4 b/acinclude.m4 index 343303447..5a48f0335 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -14,6 +14,22 @@ # See the License for the specific language governing permissions and # limitations under the License. +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time? +dnl This enables automatically running all unit tests with all MFEX +dnl implementations. +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [ + AC_ARG_ENABLE([mfex-default-autovalidator], +[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable MFEX autovalidator as default miniflow_extract implementation.])], +[autovalidator=yes],[autovalidator=no]) + AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation]) + if test "$autovalidator" != yes; then +AC_MSG_RESULT([no]) + else +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT" +AC_MSG_RESULT([yes]) + fi +]) + dnl Set OVS DPCLS Autovalidator as default subtable search at compile time? dnl This enables automatically running all unit tests with all DPCLS dnl implementations. diff --git a/configure.ac b/configure.ac index e45685a6c..46c402892 100644 --- a/configure.ac +++ b/configure.ac @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE OVS_CTAGS_IDENTIFIERS OVS_CHECK_DPCLS_AUTOVALIDATOR OVS_CHECK_DPIF_AVX512_DEFAULT +OVS_CHECK_MFEX_AUTOVALIDATOR OVS_CHECK_BINUTILS_AVX512 AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git a/lib/dpif-netdev-private-extract.c b/lib/dpif-netdev-private-extract.c index 64745f66c..f007a7a80 100644 --- a/lib/dpif-netdev-private-extract.c +++ b/lib/dpif-netdev-private-extract.c @@ -60,7 +60,11 @@ void dpif_miniflow_extract_init(void) { atomic_uintptr_t *mfex_func = (void *)_mfex_func; +#ifdef MFEX_AUTOVALIDATOR_DEFAULT +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR; +#else int mfex_idx = MFEX_IMPL_SCALAR; +#endif /* Call probe on each impl, and cache the result. */ for (int i = 0; i < MFEX_IMPL_MAX; i++) { -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [v10 04/12] docs/dpdk/bridge: add miniflow extract section.
This commit adds a section to the dpdk/bridge.rst netdev documentation, detailing the added miniflow functionality. The newly added commands are documented, and sample output is provided. The use of auto-validator and special study function is also described in detail as well as running fuzzy tests. Signed-off-by: Kumar Amber Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren Acked-by: Flavio Leitner --- v10: - fix minor typos. v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) --- --- Documentation/topics/dpdk/bridge.rst | 51 1 file changed, 51 insertions(+) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 2d0850836..6f37f2a75 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -256,3 +256,54 @@ The following line should be seen in the configure output when the above option is used :: checking whether DPIF AVX512 is default implementation... yes + +Miniflow Extract + + +Miniflow extract (MFEX) performs parsing of the raw packets and extracts the +important header information into a compressed miniflow. This miniflow is +composed of bits and blocks where the bits signify which blocks are set or +have values where as the blocks hold the metadata, ip, udp, vlan, etc. These +values are used by the datapath for switching decisions later.The Optimized +miniflow extract is traffic specific to speed up the lookup, whereas the +scalar works for ALL traffic patterns + +Most modern CPUs have SIMD capabilities. These SIMD instructions are able +to process a vector rather than act one single data. OVS provides multiple +implementations of miniflow extract. This allows the user to take advantage +of SIMD instructions like AVX512 to gain additional performance. + +A list of implementations can be obtained by the following command. The +command also shows whether the CPU supports each implementation :: + +$ ovs-appctl dpif-netdev/miniflow-parser-get +Available Optimized Miniflow Extracts: +autovalidator (available: True, pmds: none) +scalar (available: True, pmds: 1,15) +study (available: True, pmds: none) + +An implementation can be selected manually by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set study + +Also user can select the study implementation which studies the traffic for +a specific number of packets by applying all available implementations of +miniflow extract and then chooses the one with the most optimal result for +that traffic pattern. + +Miniflow Extract Validation +~~~ + +As multiple versions of miniflow extract can co-exist, each with different +CPU ISA optimizations, it is important to validate that they all give the +exact same results. To easily test all miniflow implementations, an +``autovalidator`` implementation of the miniflow exists. This implementation +runs all other available miniflow extract implementations, and verifies that +the results are identical. + +Running the OVS unit tests with the autovalidator enabled ensures all +implementations provide the same results. + +To set the Miniflow autovalidator, use this command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [v10 03/12] dpif-netdev: Add study function to select the best mfex function
The study function runs all the available implementations of miniflow_extract and makes a choice whose hitmask has maximum hits and sets the mfex to that function. Study can be run at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set study Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron --- v10: - fix minor comments from Eelco v9: - fix comments Flavio v8: - fix review comments Flavio v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - add Atomic set in study --- --- NEWS | 3 + lib/automake.mk | 1 + lib/dpif-netdev-extract-study.c | 136 ++ lib/dpif-netdev-private-extract.c | 12 +++ lib/dpif-netdev-private-extract.h | 19 + 5 files changed, 171 insertions(+) create mode 100644 lib/dpif-netdev-extract-study.c diff --git a/NEWS b/NEWS index cf254bcfe..4a7b89409 100644 --- a/NEWS +++ b/NEWS @@ -35,6 +35,9 @@ Post-v2.15.0 * Add command line option to switch between MFEX function pointers. * Add miniflow extract auto-validator function to compare different miniflow extract implementations against default implementation. + * Add study function to miniflow function table which studies packet + and automatically chooses the best miniflow implementation for that + traffic. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/automake.mk b/lib/automake.mk index 53b8abc0f..f4f36325e 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dp-packet.h \ lib/dp-packet.c \ lib/dpdk.h \ + lib/dpif-netdev-extract-study.c \ lib/dpif-netdev-lookup.h \ lib/dpif-netdev-lookup.c \ lib/dpif-netdev-lookup-autovalidator.c \ diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c new file mode 100644 index 0..eddb35682 --- /dev/null +++ b/lib/dpif-netdev-extract-study.c @@ -0,0 +1,136 @@ +/* + * Copyright (c) 2021 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include + +#include "dpif-netdev-private-thread.h" +#include "openvswitch/vlog.h" +#include "ovs-thread.h" + +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); + +static atomic_uint32_t mfex_study_pkts_count = 0; + +/* Struct to hold miniflow study stats. */ +struct study_stats { +uint32_t pkt_count; +uint32_t impl_hitcount[MFEX_IMPL_MAX]; +}; + +/* Define per thread data to hold the study stats. */ +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats); + +/* Allocate per thread PMD pointer space for study_stats. */ +static inline struct study_stats * +mfex_study_get_study_stats_ptr(void) +{ +struct study_stats *stats = study_stats_get(); +if (OVS_UNLIKELY(!stats)) { + stats = xzalloc(sizeof *stats); + study_stats_set_unsafe(stats); +} +return stats; +} + +uint32_t +mfex_study_traffic(struct dp_packet_batch *packets, + struct netdev_flow_key *keys, + uint32_t keys_size, odp_port_t in_port, + struct dp_netdev_pmd_thread *pmd_handle) +{ +uint32_t hitmask = 0; +uint32_t mask = 0; +struct dp_netdev_pmd_thread *pmd = pmd_handle; +struct dpif_miniflow_extract_impl *miniflow_funcs; +struct study_stats *stats = mfex_study_get_study_stats_ptr(); +miniflow_funcs = dpif_mfex_impl_info_get(); + +/* Run traffic optimized miniflow_extract to collect the hitmask + * to be compared after certain packets have been hit to choose + * the best miniflow_extract version for that traffic. + */ +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) { +if (!miniflow_funcs[i].available) { +continue; +} + +hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size, + in_port, pmd_handle); +stats->impl_hitcount[i] += count_1bits(hitmask); + +/* If traffic is not classified then we dont overwrite the keys + * array in minfiflow implementations so its safe to create a + * mask for all those packets whose miniflow have been created. + */ +mask |= hitmask; +} + +
[ovs-dev] [v10 02/12] dpif-netdev: Add auto validation function for miniflow extract
This patch introduced the auto-validation function which allows users to compare the batch of packets obtained from different miniflow implementations against the linear miniflow extract and return a hitmask. The autovaidator function can be triggered at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v9: - fix review comments Flavio v6: -fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - remove ovs assert and switch to default after a batch of packets is processed - Atomic set and get introduced - fix raw_ctz for windows build --- --- NEWS | 2 + lib/dpif-netdev-private-extract.c | 150 ++ lib/dpif-netdev-private-extract.h | 22 + lib/dpif-netdev.c | 2 +- 4 files changed, 175 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index b0f08e96d..cf254bcfe 100644 --- a/NEWS +++ b/NEWS @@ -33,6 +33,8 @@ Post-v2.15.0 CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. * Add command line option to switch between MFEX function pointers. + * Add miniflow extract auto-validator function to compare different + miniflow extract implementations against default implementation. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpif-netdev-private-extract.c b/lib/dpif-netdev-private-extract.c index 11d2ed2ec..6c5afd13d 100644 --- a/lib/dpif-netdev-private-extract.c +++ b/lib/dpif-netdev-private-extract.c @@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func = NULL; */ static struct dpif_miniflow_extract_impl mfex_impls[] = { +[MFEX_IMPL_AUTOVALIDATOR] = { +.probe = NULL, +.extract_func = dpif_miniflow_extract_autovalidator, +.name = "autovalidator", }, + [MFEX_IMPL_SCALAR] = { .probe = NULL, .extract_func = NULL, @@ -155,3 +160,148 @@ dp_mfex_impl_get_by_name(const char *name, miniflow_extract_func *out_func) return -ENOENT; } + +uint32_t +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets, +struct netdev_flow_key *keys, +uint32_t keys_size, odp_port_t in_port, +struct dp_netdev_pmd_thread *pmd_handle) +{ +const size_t cnt = dp_packet_batch_size(packets); +uint16_t good_l2_5_ofs[NETDEV_MAX_BURST]; +uint16_t good_l3_ofs[NETDEV_MAX_BURST]; +uint16_t good_l4_ofs[NETDEV_MAX_BURST]; +uint16_t good_l2_pad_size[NETDEV_MAX_BURST]; +struct dp_packet *packet; +struct dp_netdev_pmd_thread *pmd = pmd_handle; +struct netdev_flow_key test_keys[NETDEV_MAX_BURST]; + +if (keys_size < cnt) { +miniflow_extract_func default_func = NULL; +atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt; +atomic_store_relaxed(pmd_func, (uintptr_t) default_func); +VLOG_ERR("Invalid key size supplied, Key_size: %d less than" + "batch_size: %" PRIuSIZE"\n", keys_size, cnt); +VLOG_ERR("Autovalidatior is disabled.\n"); +return 0; +} + +/* Run scalar miniflow_extract to get default result. */ +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) { +pkt_metadata_init(>md, in_port); +miniflow_extract(packet, [i].mf); + +/* Store known good metadata to compare with optimized metadata. */ +good_l2_5_ofs[i] = packet->l2_5_ofs; +good_l3_ofs[i] = packet->l3_ofs; +good_l4_ofs[i] = packet->l4_ofs; +good_l2_pad_size[i] = packet->l2_pad_size; +} + +uint32_t batch_failed = 0; +/* Iterate through each version of miniflow implementations. */ +for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) { +if (!mfex_impls[j].available) { +continue; +} +/* Reset keys and offsets before each implementation. */ +memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key)); +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) { +dp_packet_reset_offsets(packet); +} +/* Call optimized miniflow for each batch of packet. */ +uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys, + keys_size, in_port, + pmd_handle); + +/* Do a miniflow compare for bits, blocks and offsets for all the + * classified packets in the hitmask marked by set bits. */ +while (hit_mask) { +/* Index for the set bit. */ +uint32_t i = raw_ctz(hit_mask); +/* Set the index in hitmask to Zero. */ +
[ovs-dev] [v10 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v10: - fix build errors - rework default set and atomic global variable v9: - fix review comments from Flavio v7: - fix review comments(Eelco, Flavio) v5: - fix review comments(Ian, Flavio, Eelco) - add enum to hold mfex indexes - add new get and set implemenatations - add Atomic set and get --- --- NEWS | 1 + lib/automake.mk | 2 + lib/dpif-netdev-avx512.c | 31 +- lib/dpif-netdev-private-extract.c | 157 ++ lib/dpif-netdev-private-extract.h | 113 + lib/dpif-netdev-private-thread.h | 8 ++ lib/dpif-netdev.c | 108 +++- 7 files changed, 415 insertions(+), 5 deletions(-) create mode 100644 lib/dpif-netdev-private-extract.c create mode 100644 lib/dpif-netdev-private-extract.h diff --git a/NEWS b/NEWS index 6cdccc715..b0f08e96d 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,7 @@ Post-v2.15.0 * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. + * Add command line option to switch between MFEX function pointers. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/automake.mk b/lib/automake.mk index 3c9523c1a..53b8abc0f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-private-dpcls.h \ lib/dpif-netdev-private-dpif.c \ lib/dpif-netdev-private-dpif.h \ + lib/dpif-netdev-private-extract.c \ + lib/dpif-netdev-private-extract.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 6f9aa8284..7772b7abf 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, * // do all processing (HWOL->MFEX->EMC->SMC) * } */ + +/* Do a batch minfilow extract into keys. */ +uint32_t mf_mask = 0; +miniflow_extract_func mfex_func; +atomic_read_relaxed(>miniflow_extract_opt, _func); +if (mfex_func) { +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd); +} + uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; uint32_t iter = lookup_pkts_bitmask; while (iter) { @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, pkt_metadata_init(>md, in_port); struct dp_netdev_flow *f = NULL; +struct netdev_flow_key *key = [i]; + +/* Check the minfiflow mask to see if the packet was correctly + * classifed by vector mfex else do a scalar miniflow extract + * for that packet. + */ +bool mfex_hit = !!(mf_mask & (1 << i)); /* Check for a partial hardware offload match. */ if (hwol_enabled) { @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } if (f) { rules[i] = >cr; -pkt_meta[i].tcp_flags = parse_tcp_flags(packet); +/* If AVX512 MFEX already classified the packet, use it. */ +if (mfex_hit) { +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf); +} else { +pkt_meta[i].tcp_flags = parse_tcp_flags(packet); +} + pkt_meta[i].bytes = dp_packet_size(packet); phwol_hits++; hwol_emc_smc_hitmask |= (1 << i); @@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } } -/* Do miniflow extract into keys. */ -struct netdev_flow_key *key = [i]; -miniflow_extract(packet, >mf); +if (!mfex_hit) { +/* Do a scalar miniflow extract into keys. */ +miniflow_extract(packet, >mf); +} /* Cache TCP and byte values for all packets. */ pkt_meta[i].bytes =
[ovs-dev] [109 12/12] dpif-netdev: add mfex options to scalar dpif
This commits add the mfex optimized options to be executed as part of scalar DPIF. Signed-off-by: kumar Amber Acked-by: Flavio Leitner --- lib/dpif-netdev.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index cca211837..14c98e450 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -7029,6 +7029,7 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, size_t n_missed = 0, n_emc_hit = 0, n_phwol_hit = 0, n_mfex_opt_hit = 0; struct dfc_cache *cache = >flow_cache; struct dp_packet *packet; +struct dp_packet_batch single_packet; const size_t cnt = dp_packet_batch_size(packets_); uint32_t cur_min = pmd->ctx.emc_insert_min; const uint32_t recirc_depth = *recirc_depth_get(); @@ -7039,6 +7040,11 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, size_t map_cnt = 0; bool batch_enable = true; +single_packet.count = 1; + +miniflow_extract_func mfex_func; +atomic_read_relaxed(>miniflow_extract_opt, _func); + atomic_read_relaxed(>dp->smc_enable_db, _enable_db); pmd_perf_update_counter(>perf_stats, md_is_valid ? PMD_STAT_RECIRC : PMD_STAT_RECV, @@ -7089,7 +7095,22 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd, } } -miniflow_extract(packet, >mf); +/* Set the count and packet for miniflow_opt with batch_size 1. */ +if ((mfex_func) && (!md_is_valid)) { +single_packet.packets[0] = packet; +int mf_ret; + +mf_ret = mfex_func(_packet, key, 1, port_no, pmd); +/* Fallback to original miniflow_extract if there is a miss. */ +if (mf_ret) { +n_mfex_opt_hit++; +} else { +miniflow_extract(packet, >mf); +} +} else { +miniflow_extract(packet, >mf); +} + key->len = 0; /* Not computed yet. */ key->hash = (md_is_valid == false) -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [109 11/12] dpif-netdev/mfex: add more AVX512 traffic profiles
From: Harry van Haaren This commit adds 3 new traffic profile implementations to the existing avx512 miniflow extract infrastructure. The profiles added are: - Ether()/IP()/TCP() - Ether()/Dot1Q()/IP()/UDP() - Ether()/Dot1Q()/IP()/TCP() The design of the avx512 code here is for scalability to add more traffic profiles, as well as enabling CPU ISA. Note that an implementation is primarily adding static const data, which the compiler then specializes away when the profile specific function is declared below. As a result, the code is relatively maintainable, and scalable for new traffic profiles as well as new ISA, and does not lower performance compared with manually written code for each profile/ISA. Note that confidence in the correctness of each implementation is achieved through autovalidation, unit tests with known packets, and fuzz tested packets. Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron Acked-by: Flavio Leitner --- Hi Readers, If you have a traffic profile you'd like to see accelerated using avx512 code, please send me an email and we can collaborate on adding support for it! Regards, -Harry --- v5: - fix review comments(Ian, Flavio, Eelco) --- --- NEWS | 2 + lib/dpif-netdev-extract-avx512.c | 152 ++ lib/dpif-netdev-private-extract.c | 30 ++ lib/dpif-netdev-private-extract.h | 10 ++ 4 files changed, 194 insertions(+) diff --git a/NEWS b/NEWS index 26cd85978..849008a80 100644 --- a/NEWS +++ b/NEWS @@ -41,6 +41,8 @@ Post-v2.15.0 * Add build time configure command to enable auto-validatior as default miniflow implementation at build time. * Cache results for CPU ISA checks, reduces overhead on repeated lookups. + * Add AVX512 based optimized miniflow extract function for traffic type + IPv4/UDP, IPv4/TCP, Vlan/IPv4/UDP and Vlan/Ipv4/TCP. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c index c06e53582..ecb0be70d 100644 --- a/lib/dpif-netdev-extract-avx512.c +++ b/lib/dpif-netdev-extract-avx512.c @@ -136,6 +136,13 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) #define PATTERN_ETHERTYPE_MASK PATTERN_ETHERTYPE_GEN(0xFF, 0xFF) #define PATTERN_ETHERTYPE_IPV4 PATTERN_ETHERTYPE_GEN(0x08, 0x00) +#define PATTERN_ETHERTYPE_DT1Q PATTERN_ETHERTYPE_GEN(0x81, 0x00) + +/* VLAN (Dot1Q) patterns and masks. */ +#define PATTERN_DT1Q_MASK \ + 0x00, 0x00, 0xFF, 0xFF, +#define PATTERN_DT1Q_IPV4 \ + 0x00, 0x00, 0x08, 0x00, /* Generator for checking IPv4 ver, ihl, and proto */ #define PATTERN_IPV4_GEN(VER_IHL, FLAG_OFF_B0, FLAG_OFF_B1, PROTO) \ @@ -161,6 +168,29 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) 34, 35, 36, 37, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* UDP */ \ NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ +/* TCP shuffle: tcp_ctl bits require mask/processing, not included here. */ +#define PATTERN_IPV4_TCP_SHUFFLE \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, NU, NU, /* Ether */ \ + 26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */ \ + NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */ \ + NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ + +#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \ + /* Ether (2 blocks): Note that *VLAN* type is written here. */ \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 0, 0, \ + /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */ \ + 12, 13, 14, 15, 0, 0, 0, 0, \ + 30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ \ + 38, 39, 40, 41, NU, NU, NU, NU, /* UDP */ + +#define PATTERN_DT1Q_IPV4_TCP_SHUFFLE \ + /* Ether (2 blocks): Note that *VLAN* type is written here. */ \ + 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 0, 0, \ + /* VLAN (1 block): Note that the *EtherHdr->Type* is written here. */ \ + 12, 13, 14, 15, 0, 0, 0, 0, \ + 30, 31, 32, 33, 34, 35, 36, 37, 0, 0, 0, 0, 24, 19, 26, 27, /* IPv4 */ \ + NU, NU, NU, NU, NU, NU, NU, NU, 38, 39, 40, 41, NU, NU, NU, NU, /* TCP */ \ + NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */ /* Generation of K-mask bitmask values, to zero out data in result. Note that * these correspond 1:1 to the above "*_SHUFFLE" values, and bit used must be @@ -170,12 +200,22 @@ _mm512_maskz_permutexvar_epi8_wrap(__mmask64 kmask, __m512i idx, __m512i a) *
[ovs-dev] [109 10/12] dpif-netdev/mfex: Add AVX512 based optimized miniflow extract
From: Harry van Haaren This commit adds AVX512 implementations of miniflow extract. By using the 64 bytes available in an AVX512 register, it is possible to convert a packet to a miniflow data-structure in a small quantity instructions. The implementation here probes for Ether()/IP()/UDP() traffic, and builds the appropriate miniflow data-structure for packets that match the probe. The implementation here is auto-validated by the miniflow extract autovalidator, hence its correctness can be easily tested and verified. Note that this commit is designed to easily allow addition of new traffic profiles in a scalable way, without code duplication for each traffic profile. Signed-off-by: Harry van Haaren --- v9: - include comments from flavio v8: - include documentation on AVX512 MFEX as per Eelco's suggestion v7: - fix minor review sentences (Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - inlcude assert for flow abi change - include assert for offset changes --- --- lib/automake.mk | 1 + lib/dpif-netdev-extract-avx512.c | 478 ++ lib/dpif-netdev-private-extract.c | 13 + lib/dpif-netdev-private-extract.h | 30 ++ 4 files changed, 522 insertions(+) create mode 100644 lib/dpif-netdev-extract-avx512.c diff --git a/lib/automake.mk b/lib/automake.mk index f4f36325e..299f81939 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -39,6 +39,7 @@ lib_libopenvswitchavx512_la_CFLAGS = \ $(AM_CFLAGS) lib_libopenvswitchavx512_la_SOURCES = \ lib/dpif-netdev-lookup-avx512-gather.c \ + lib/dpif-netdev-extract-avx512.c \ lib/dpif-netdev-avx512.c lib_libopenvswitchavx512_la_LDFLAGS = \ -static diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c new file mode 100644 index 0..c06e53582 --- /dev/null +++ b/lib/dpif-netdev-extract-avx512.c @@ -0,0 +1,478 @@ +/* + * Copyright (c) 2021 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/* + * AVX512 Miniflow Extract. + * + * This file contains optimized implementations of miniflow_extract() + * for specific common traffic patterns. The optimizations allow for + * quick probing of a specific packet type, and if a match with a specific + * type is found, a shuffle like procedure builds up the required miniflow. + * + * Process + * - + * + * The procedure is to classify the packet based on the traffic type + * using predifined bit-masks and arrage the packet header data using shuffle + * instructions to a pre-defined place as required by the miniflow. + * This elimates the if-else ladder to identify the packet data and add data + * as per protocol which is present. + */ + +#ifdef __x86_64__ +/* Sparse cannot handle the AVX512 instructions. */ +#if !defined(__CHECKER__) + +#include +#include +#include +#include +#include + +#include "flow.h" +#include "dpdk.h" + +#include "dpif-netdev-private-dpcls.h" +#include "dpif-netdev-private-extract.h" +#include "dpif-netdev-private-flow.h" + +/* AVX512-BW level permutex2var_epi8 emulation. */ +static inline __m512i +__attribute__((target("avx512bw"))) +_mm512_maskz_permutex2var_epi8_skx(__mmask64 k_mask, + __m512i v_data_0, + __m512i v_shuf_idxs, + __m512i v_data_1) +{ +/* Manipulate shuffle indexes for u16 size. */ +__mmask64 k_mask_odd_lanes = 0x; +/* Clear away ODD lane bytes. Cannot be done above due to no u8 shift. */ +__m512i v_shuf_idx_evn = _mm512_mask_blend_epi8(k_mask_odd_lanes, +v_shuf_idxs, +_mm512_setzero_si512()); +v_shuf_idx_evn = _mm512_srli_epi16(v_shuf_idx_evn, 1); + +__m512i v_shuf_idx_odd = _mm512_srli_epi16(v_shuf_idxs, 9); + +/* Shuffle each half at 16-bit width. */ +__m512i v_shuf1 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_evn, +v_data_1); +__m512i v_shuf2 = _mm512_permutex2var_epi16(v_data_0, v_shuf_idx_odd, +v_data_1); + +/* Find if the shuffle index was odd, via mask and compare. */ +uint16_t index_odd_mask = 0x1; +const __m512i v_index_mask_u16 = _mm512_set1_epi16(index_odd_mask); + +/* EVEN lanes, find if u8 index was odd, result as u16 bitmask. */ +__m512i
[ovs-dev] [109 09/12] dpdk: add additional CPU ISA detection strings
From: Harry van Haaren This commit enables OVS to at runtime check for more detailed AVX512 capabilities, specifically Byte and Word (BW) extensions, and Vector Bit Manipulation Instructions (VBMI). These instructions will be used in the CPU ISA optimized implementations of traffic profile aware miniflow extract. Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron Acked-by: Flavio Leitner --- NEWS | 1 + lib/dpdk.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/NEWS b/NEWS index 581bff225..26cd85978 100644 --- a/NEWS +++ b/NEWS @@ -40,6 +40,7 @@ Post-v2.15.0 traffic. * Add build time configure command to enable auto-validatior as default miniflow implementation at build time. + * Cache results for CPU ISA checks, reduces overhead on repeated lookups. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpdk.c b/lib/dpdk.c index 9de2af58e..1b8f8e55b 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -706,6 +706,8 @@ dpdk_get_cpu_has_isa(const char *arch, const char *feature) #if __x86_64__ /* CPU flags only defined for the architecture that support it. */ CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F); +CHECK_CPU_FEATURE(feature, "avx512bw", RTE_CPUFLAG_AVX512BW); +CHECK_CPU_FEATURE(feature, "avx512vbmi", RTE_CPUFLAG_AVX512VBMI); CHECK_CPU_FEATURE(feature, "avx512vpopcntdq", RTE_CPUFLAG_AVX512VPOPCNTDQ); CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2); #endif -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [109 08/12] dpif/stats: add miniflow extract opt hits counter
From: Harry van Haaren This commit adds a new counter to be displayed to the user when requesting datapath packet statistics. It counts the number of packets that are parsed and a miniflow built up from it by the optimized miniflow extract parsers. The ovs-appctl command "dpif-netdev/pmd-perf-show" now has an extra entry indicating if the optimized MFEX was hit: - MFEX Opt hits:6786432 (100.0 %) Signed-off-by: Harry van Haaren Acked-by: Flavio Leitner --- v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) --- --- lib/dpif-netdev-avx512.c| 3 +++ lib/dpif-netdev-perf.c | 3 +++ lib/dpif-netdev-perf.h | 1 + lib/dpif-netdev-unixctl.man | 4 lib/dpif-netdev.c | 12 +++- tests/pmd.at| 6 -- 6 files changed, 22 insertions(+), 7 deletions(-) diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 7772b7abf..544d36903 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -310,8 +310,11 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } /* At this point we don't return error anymore, so commit stats here. */ +uint32_t mfex_hit_cnt = __builtin_popcountll(mf_mask); pmd_perf_update_counter(>perf_stats, PMD_STAT_RECV, batch_size); pmd_perf_update_counter(>perf_stats, PMD_STAT_PHWOL_HIT, phwol_hits); +pmd_perf_update_counter(>perf_stats, PMD_STAT_MFEX_OPT_HIT, +mfex_hit_cnt); pmd_perf_update_counter(>perf_stats, PMD_STAT_EXACT_HIT, emc_hits); pmd_perf_update_counter(>perf_stats, PMD_STAT_SMC_HIT, smc_hits); pmd_perf_update_counter(>perf_stats, PMD_STAT_MASKED_HIT, diff --git a/lib/dpif-netdev-perf.c b/lib/dpif-netdev-perf.c index 7103a2d4d..d7676ea2b 100644 --- a/lib/dpif-netdev-perf.c +++ b/lib/dpif-netdev-perf.c @@ -247,6 +247,7 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, " Rx packets:%12"PRIu64" (%.0f Kpps, %.0f cycles/pkt)\n" " Datapath passes: %12"PRIu64" (%.2f passes/pkt)\n" " - PHWOL hits: %12"PRIu64" (%5.1f %%)\n" +" - MFEX Opt hits: %12"PRIu64" (%5.1f %%)\n" " - EMC hits:%12"PRIu64" (%5.1f %%)\n" " - SMC hits:%12"PRIu64" (%5.1f %%)\n" " - Megaflow hits: %12"PRIu64" (%5.1f %%, %.2f " @@ -258,6 +259,8 @@ pmd_perf_format_overall_stats(struct ds *str, struct pmd_perf_stats *s, passes, rx_packets ? 1.0 * passes / rx_packets : 0, stats[PMD_STAT_PHWOL_HIT], 100.0 * stats[PMD_STAT_PHWOL_HIT] / passes, +stats[PMD_STAT_MFEX_OPT_HIT], +100.0 * stats[PMD_STAT_MFEX_OPT_HIT] / passes, stats[PMD_STAT_EXACT_HIT], 100.0 * stats[PMD_STAT_EXACT_HIT] / passes, stats[PMD_STAT_SMC_HIT], diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h index 8b1a52387..834c26260 100644 --- a/lib/dpif-netdev-perf.h +++ b/lib/dpif-netdev-perf.h @@ -57,6 +57,7 @@ extern "C" { enum pmd_stat_type { PMD_STAT_PHWOL_HIT, /* Packets that had a partial HWOL hit (phwol). */ +PMD_STAT_MFEX_OPT_HIT, /* Packets that had miniflow optimized match. */ PMD_STAT_EXACT_HIT, /* Packets that had an exact match (emc). */ PMD_STAT_SMC_HIT, /* Packets that had a sig match hit (SMC). */ PMD_STAT_MASKED_HIT,/* Packets that matched in the flow table. */ diff --git a/lib/dpif-netdev-unixctl.man b/lib/dpif-netdev-unixctl.man index 83ce4f1c5..f34758416 100644 --- a/lib/dpif-netdev-unixctl.man +++ b/lib/dpif-netdev-unixctl.man @@ -16,6 +16,9 @@ packet lookups performed by the datapath. Beware that a recirculated packet experiences one additional lookup per recirculation, so there may be more lookups than forwarded packets in the datapath. +The MFEX Opt hits displays the number of packets which is processed by the +optimized miniflow extract implementations. + Cycles are counted using the TSC or similar facilities (when available on the platform). The duration of one cycle depends on the processing platform. @@ -136,6 +139,7 @@ pmd thread numa_id 0 core_id 1: Rx packets: 2399607 (2381 Kpps, 848 cycles/pkt) Datapath passes:3599415 (1.50 passes/pkt) - PHWOL hits: 0 ( 0.0 %) + - MFEX Opt hits:3570133 ( 99.5 %) - EMC hits: 336472 ( 9.3 %) - SMC hits: 0 ( 0.0 %) - Megaflow hits:3262943 ( 90.7 %, 1.00 subtbl lookups/hit) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1132a0ad5..cca211837 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -648,6 +648,7 @@ pmd_info_show_stats(struct ds *reply, " packet recirculations: %"PRIu64"\n" " avg. datapath passes per packet: %.02f\n" " phwol hits: %"PRIu64"\n" + " mfex opt hits:
[ovs-dev] [109 07/12] test/sytem-dpdk: Add unit test for mfex autovalidator
From: Kumar Amber Tests: 6: OVS-DPDK - MFEX Autovalidator 7: OVS-DPDK - MFEX Autovalidator Fuzzy Added a new directory to store the PCAP file used in the tests and a script to generate the fuzzy traffic type pcap to be used in fuzzy unit test. Signed-off-by: Kumar Amber Acked-by: Flavio Leitner --- v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - remove sleep from first test and added minor 5 sec sleep to fuzzy --- --- Documentation/topics/dpdk/bridge.rst | 55 +++ tests/.gitignore | 1 + tests/automake.mk| 5 +++ tests/mfex_fuzzy.py | 31 +++ tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes tests/system-dpdk.at | 49 6 files changed, 141 insertions(+) create mode 100755 tests/mfex_fuzzy.py create mode 100644 tests/pcap/mfex_test.pcap diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 662446401..7b81d0305 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -345,3 +345,58 @@ A compile time option is available in order to test it with the OVS unit test suite. Use the following configure option :: $ ./configure --enable-mfex-default-autovalidator + +Unit Test Miniflow Extract +++ + +Unit test can also be used to test the workflow mentioned above by running +the following test-case in tests/system-dpdk.at :: + +make check-dpdk TESTSUITEFLAGS='-k MFEX' +OVS-DPDK - MFEX Autovalidator + +The unit test uses mulitple traffic type to test the correctness of the +implementaions. + +Running Fuzzy test with Autovalidator ++ + +Fuzzy tests can also be done on miniflow extract with the help of +auto-validator and Scapy. The steps below describes the steps to +reproduce the setup with IP being fuzzed to generate packets. + +Scapy is used to create fuzzy IP packets and save them into a PCAP :: + +pkt = fuzz(Ether()/IP()/TCP()) + +Set the miniflow extract to autovalidator using :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator + +OVS is configured to receive the generated packets :: + +$ ovs-vsctl add-port br0 pcap0 -- \ +set Interface pcap0 type=dpdk options:dpdk-devargs=net_pcap0 +"rx_pcap=fuzzy.pcap" + +With this workflow, the autovalidator will ensure that all MFEX +implementations are classifying each packet in exactly the same way. +If an optimized MFEX implementation causes a different miniflow to be +generated, the autovalidator has ovs_assert and logging statements that +will inform about the issue. + +Unit Fuzzy test with Autovalidator ++ + +The prerquiste before running the unit test is to run the script provided :: + +tests/mfex_fuzzy.py + +This script generates a pcap with mulitple type of fuzzed packets to be used +in the below unit test-case. + +Unit test can also be used to test the workflow mentioned above by running +the following test-case in tests/system-dpdk.at :: + +make check-dpdk TESTSUITEFLAGS='-k MFEX' +OVS-DPDK - MFEX Autovalidator Fuzzy diff --git a/tests/.gitignore b/tests/.gitignore index 45b4f67b2..a3d927e5d 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -11,6 +11,7 @@ /ovsdb-cluster-testsuite /ovsdb-cluster-testsuite.dir/ /ovsdb-cluster-testsuite.log +/pcap/ /pki/ /system-afxdp-testsuite /system-afxdp-testsuite.dir/ diff --git a/tests/automake.mk b/tests/automake.mk index f45f8d76c..2bcf054b0 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -143,6 +143,11 @@ $(srcdir)/tests/fuzz-regression-list.at: tests/automake.mk echo "TEST_FUZZ_REGRESSION([$$basename])"; \ done > $@.tmp && mv $@.tmp $@ +EXTRA_DIST += $(MFEX_AUTOVALIDATOR_TESTS) +MFEX_AUTOVALIDATOR_TESTS = \ + tests/pcap/mfex_test.pcap \ + tests/mfex_fuzzy.py + OVSDB_CLUSTER_TESTSUITE_AT = \ tests/ovsdb-cluster-testsuite.at \ tests/ovsdb-execution.at \ diff --git a/tests/mfex_fuzzy.py b/tests/mfex_fuzzy.py new file mode 100755 index 0..395158b0d --- /dev/null +++ b/tests/mfex_fuzzy.py @@ -0,0 +1,31 @@ +#!/usr/bin/python3 +try: + from scapy.all import * +except ModuleNotFoundError as err: + print(err + ": Scapy") +import sys + +path = str(sys.argv[1]) + "/pcap/fuzzy.pcap" +pktdump = PcapWriter(path, append=False, sync=True) + +for i in range(0, 2000): + + # Generate random protocol bases, use a fuzz() over the combined packet for full fuzzing. + eth = Ether(src=RandMAC(), dst=RandMAC()) + vlan = Dot1Q() + ipv4 = IP(src=RandIP(), dst=RandIP()) + ipv6 = IPv6(src=RandIP6(), dst=RandIP6()) + udp = UDP(dport=RandShort(), sport=RandShort()) + tcp = TCP(dport=RandShort(), sport=RandShort()) + + # IPv4 packets with fuzzing + pktdump.write(fuzz(eth/ipv4/udp)) + pktdump.write(fuzz(eth/ipv4/tcp)) +
[ovs-dev] [109 06/12] dpif-netdev: Add packet count and core id paramters for study
From: Kumar Amber This commit introduces additional command line paramter for mfex study function. If user provides additional packet out it is used in study to compare minimum packets which must be processed else a default value is choosen. Also introduces a third paramter for choosing a particular pmd core. $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 Signed-off-by: Kumar Amber --- v10: - fix review comments Eelco v9: - fix review comments Flavio v7: - change the command paramters for core_id and study_pkt_cnt v5: - fix review comments(Ian, Flavio, Eelco) - introucde pmd core id parameter --- --- Documentation/topics/dpdk/bridge.rst | 37 +++- lib/dpif-netdev-extract-study.c | 25 +- lib/dpif-netdev-private-extract.h| 9 ++ lib/dpif-netdev.c| 128 +-- 4 files changed, 187 insertions(+), 12 deletions(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 0fa9341ac..662446401 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -284,12 +284,45 @@ command also shows whether the CPU supports each implementation :: An implementation can be selected manually by the following command :: -$ ovs-appctl dpif-netdev/miniflow-parser-set study +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name] + [study_cnt] + +The above command has two optional parameters: study_cnt and core_id. +The core_id sets a particular miniflow extract function to a specific +pmd thread on the core.The third parameter study_cnt, which is specific +to study and ignored by other implementations, means how many packets +are needed to choose the best implementation. Also user can select the study implementation which studies the traffic for a specific number of packets by applying all available implementations of miniflow extract and then chooses the one with the most optimal result for -that traffic pattern. +that traffic pattern. The user can optionally provide an packet count +[study_cnt] parameter which is the minimum number of packets that OVS must +study before choosing an optimal implementation. If no packet count is +provided, then the default value, 128 is chosen. Also, as there is no +synchronization point between threads, one PMD thread might still be running +a previous round, and can now decide on earlier data. + +The per packet count is a global value, and parallel study() executions with +differing packet counts will use the most recent count value provided by usser. + +Study can be selected with packet count by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024 + +Study can be selected with packet count and explicit PMD selection +by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024 + +In the above command the last parameter is the CORE ID of the PMD +thread and this can also be used to explicitly set the miniflow +extraction function pointer on different PMD threads. + +Scalar can be selected on core 3 by the following command where +study count can be put as any arbitrary number or left blank:: + +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar Miniflow Extract Validation ~~~ diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c index eddb35682..61260cb70 100644 --- a/lib/dpif-netdev-extract-study.c +++ b/lib/dpif-netdev-extract-study.c @@ -25,7 +25,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); -static atomic_uint32_t mfex_study_pkts_count = 0; +static atomic_uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT; /* Struct to hold miniflow study stats. */ struct study_stats { @@ -48,6 +48,27 @@ mfex_study_get_study_stats_ptr(void) return stats; } +int +mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, const char *name) +{ +struct dpif_miniflow_extract_impl *miniflow_funcs; +miniflow_funcs = dpif_mfex_impl_info_get(); + +/* If the packet count is set and implementation called is study then + * set packet counter to requested number else set the packet counter + * to default number. + */ +if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) && +(pkt_cmp_count != 0)) { + +mfex_study_pkts_count = pkt_cmp_count; + +return 0; +} + +return -EINVAL; +} + uint32_t mfex_study_traffic(struct dp_packet_batch *packets, struct netdev_flow_key *keys, @@ -86,7 +107,7 @@ mfex_study_traffic(struct dp_packet_batch *packets, /* Choose the best implementation after a minimum packets have been * processed. */ -if (stats->pkt_count >= MFEX_MAX_PKT_COUNT) { +if (stats->pkt_count >= mfex_study_pkts_count) { uint32_t best_func_index = MFEX_IMPL_START_IDX; uint32_t max_hits = 0; for (int i =
[ovs-dev] [109 05/12] dpif-netdev: Add configure to enable autovalidator at build time.
From: Kumar Amber This commit adds a new command to allow the user to enable autovalidatior by default at build time thus allowing for runnig unit test by default. $ ./configure --enable-mfex-default-autovalidator Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v10: - rework default set v9: - fix review comments Flavio v7: - fix review commens(Eelco, Flavio) v5: - fix review comments(Ian, Flavio, Eelco) --- --- Documentation/topics/dpdk/bridge.rst | 5 + NEWS | 3 ++- acinclude.m4 | 16 configure.ac | 1 + lib/dpif-netdev-private-extract.c| 4 5 files changed, 28 insertions(+), 1 deletion(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 6f37f2a75..0fa9341ac 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -307,3 +307,8 @@ implementations provide the same results. To set the Miniflow autovalidator, use this command :: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator + +A compile time option is available in order to test it with the OVS unit +test suite. Use the following configure option :: + +$ ./configure --enable-mfex-default-autovalidator diff --git a/NEWS b/NEWS index 4a7b89409..581bff225 100644 --- a/NEWS +++ b/NEWS @@ -38,6 +38,8 @@ Post-v2.15.0 * Add study function to miniflow function table which studies packet and automatically chooses the best miniflow implementation for that traffic. + * Add build time configure command to enable auto-validatior as default + miniflow implementation at build time. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. @@ -57,7 +59,6 @@ Post-v2.15.0 whether the SNAT with all-zero IP address is supported. See ovs-vswitchd.conf.db(5) for details. - v2.15.0 - 15 Feb 2021 - - OVSDB: diff --git a/acinclude.m4 b/acinclude.m4 index 343303447..5a48f0335 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -14,6 +14,22 @@ # See the License for the specific language governing permissions and # limitations under the License. +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time? +dnl This enables automatically running all unit tests with all MFEX +dnl implementations. +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [ + AC_ARG_ENABLE([mfex-default-autovalidator], +[AC_HELP_STRING([--enable-mfex-default-autovalidator], [Enable MFEX autovalidator as default miniflow_extract implementation.])], +[autovalidator=yes],[autovalidator=no]) + AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation]) + if test "$autovalidator" != yes; then +AC_MSG_RESULT([no]) + else +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT" +AC_MSG_RESULT([yes]) + fi +]) + dnl Set OVS DPCLS Autovalidator as default subtable search at compile time? dnl This enables automatically running all unit tests with all DPCLS dnl implementations. diff --git a/configure.ac b/configure.ac index e45685a6c..46c402892 100644 --- a/configure.ac +++ b/configure.ac @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE OVS_CTAGS_IDENTIFIERS OVS_CHECK_DPCLS_AUTOVALIDATOR OVS_CHECK_DPIF_AVX512_DEFAULT +OVS_CHECK_MFEX_AUTOVALIDATOR OVS_CHECK_BINUTILS_AVX512 AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git a/lib/dpif-netdev-private-extract.c b/lib/dpif-netdev-private-extract.c index 64745f66c..f007a7a80 100644 --- a/lib/dpif-netdev-private-extract.c +++ b/lib/dpif-netdev-private-extract.c @@ -60,7 +60,11 @@ void dpif_miniflow_extract_init(void) { atomic_uintptr_t *mfex_func = (void *)_mfex_func; +#ifdef MFEX_AUTOVALIDATOR_DEFAULT +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR; +#else int mfex_idx = MFEX_IMPL_SCALAR; +#endif /* Call probe on each impl, and cache the result. */ for (int i = 0; i < MFEX_IMPL_MAX; i++) { -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [109 04/12] docs/dpdk/bridge: add miniflow extract section.
From: Kumar Amber This commit adds a section to the dpdk/bridge.rst netdev documentation, detailing the added miniflow functionality. The newly added commands are documented, and sample output is provided. The use of auto-validator and special study function is also described in detail as well as running fuzzy tests. Signed-off-by: Kumar Amber Co-authored-by: Cian Ferriter Signed-off-by: Cian Ferriter Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren Acked-by: Flavio Leitner --- v10: - fix minor typos. v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) --- --- Documentation/topics/dpdk/bridge.rst | 51 1 file changed, 51 insertions(+) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 2d0850836..6f37f2a75 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -256,3 +256,54 @@ The following line should be seen in the configure output when the above option is used :: checking whether DPIF AVX512 is default implementation... yes + +Miniflow Extract + + +Miniflow extract (MFEX) performs parsing of the raw packets and extracts the +important header information into a compressed miniflow. This miniflow is +composed of bits and blocks where the bits signify which blocks are set or +have values where as the blocks hold the metadata, ip, udp, vlan, etc. These +values are used by the datapath for switching decisions later.The Optimized +miniflow extract is traffic specific to speed up the lookup, whereas the +scalar works for ALL traffic patterns + +Most modern CPUs have SIMD capabilities. These SIMD instructions are able +to process a vector rather than act one single data. OVS provides multiple +implementations of miniflow extract. This allows the user to take advantage +of SIMD instructions like AVX512 to gain additional performance. + +A list of implementations can be obtained by the following command. The +command also shows whether the CPU supports each implementation :: + +$ ovs-appctl dpif-netdev/miniflow-parser-get +Available Optimized Miniflow Extracts: +autovalidator (available: True, pmds: none) +scalar (available: True, pmds: 1,15) +study (available: True, pmds: none) + +An implementation can be selected manually by the following command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set study + +Also user can select the study implementation which studies the traffic for +a specific number of packets by applying all available implementations of +miniflow extract and then chooses the one with the most optimal result for +that traffic pattern. + +Miniflow Extract Validation +~~~ + +As multiple versions of miniflow extract can co-exist, each with different +CPU ISA optimizations, it is important to validate that they all give the +exact same results. To easily test all miniflow implementations, an +``autovalidator`` implementation of the miniflow exists. This implementation +runs all other available miniflow extract implementations, and verifies that +the results are identical. + +Running the OVS unit tests with the autovalidator enabled ensures all +implementations provide the same results. + +To set the Miniflow autovalidator, use this command :: + +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [109 03/12] dpif-netdev: Add study function to select the best mfex function
From: Kumar Amber The study function runs all the available implementations of miniflow_extract and makes a choice whose hitmask has maximum hits and sets the mfex to that function. Study can be run at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set study Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren Acked-by: Eelco Chaudron --- v10: - fix minor comments from Eelco v9: - fix comments Flavio v8: - fix review comments Flavio v7: - fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - add Atomic set in study --- --- NEWS | 3 + lib/automake.mk | 1 + lib/dpif-netdev-extract-study.c | 136 ++ lib/dpif-netdev-private-extract.c | 12 +++ lib/dpif-netdev-private-extract.h | 19 + 5 files changed, 171 insertions(+) create mode 100644 lib/dpif-netdev-extract-study.c diff --git a/NEWS b/NEWS index cf254bcfe..4a7b89409 100644 --- a/NEWS +++ b/NEWS @@ -35,6 +35,9 @@ Post-v2.15.0 * Add command line option to switch between MFEX function pointers. * Add miniflow extract auto-validator function to compare different miniflow extract implementations against default implementation. + * Add study function to miniflow function table which studies packet + and automatically chooses the best miniflow implementation for that + traffic. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/automake.mk b/lib/automake.mk index 53b8abc0f..f4f36325e 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/dp-packet.h \ lib/dp-packet.c \ lib/dpdk.h \ + lib/dpif-netdev-extract-study.c \ lib/dpif-netdev-lookup.h \ lib/dpif-netdev-lookup.c \ lib/dpif-netdev-lookup-autovalidator.c \ diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c new file mode 100644 index 0..eddb35682 --- /dev/null +++ b/lib/dpif-netdev-extract-study.c @@ -0,0 +1,136 @@ +/* + * Copyright (c) 2021 Intel. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include + +#include "dpif-netdev-private-thread.h" +#include "openvswitch/vlog.h" +#include "ovs-thread.h" + +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); + +static atomic_uint32_t mfex_study_pkts_count = 0; + +/* Struct to hold miniflow study stats. */ +struct study_stats { +uint32_t pkt_count; +uint32_t impl_hitcount[MFEX_IMPL_MAX]; +}; + +/* Define per thread data to hold the study stats. */ +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats); + +/* Allocate per thread PMD pointer space for study_stats. */ +static inline struct study_stats * +mfex_study_get_study_stats_ptr(void) +{ +struct study_stats *stats = study_stats_get(); +if (OVS_UNLIKELY(!stats)) { + stats = xzalloc(sizeof *stats); + study_stats_set_unsafe(stats); +} +return stats; +} + +uint32_t +mfex_study_traffic(struct dp_packet_batch *packets, + struct netdev_flow_key *keys, + uint32_t keys_size, odp_port_t in_port, + struct dp_netdev_pmd_thread *pmd_handle) +{ +uint32_t hitmask = 0; +uint32_t mask = 0; +struct dp_netdev_pmd_thread *pmd = pmd_handle; +struct dpif_miniflow_extract_impl *miniflow_funcs; +struct study_stats *stats = mfex_study_get_study_stats_ptr(); +miniflow_funcs = dpif_mfex_impl_info_get(); + +/* Run traffic optimized miniflow_extract to collect the hitmask + * to be compared after certain packets have been hit to choose + * the best miniflow_extract version for that traffic. + */ +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) { +if (!miniflow_funcs[i].available) { +continue; +} + +hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size, + in_port, pmd_handle); +stats->impl_hitcount[i] += count_1bits(hitmask); + +/* If traffic is not classified then we dont overwrite the keys + * array in minfiflow implementations so its safe to create a + * mask for all those packets whose miniflow have been created. + */ +mask |= hitmask; +
[ovs-dev] [109 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
From: Kumar Amber This patch introduces the MFEX function pointers which allows the user to switch between different miniflow extract implementations which are provided by the OVS based on optimized ISA CPU. The user can query for the available minflow extract variants available for that CPU by following commands: $ovs-appctl dpif-netdev/miniflow-parser-get Similarly an user can set the miniflow implementation by the following command : $ ovs-appctl dpif-netdev/miniflow-parser-set name This allows for more performance and flexibility to the user to choose the miniflow implementation according to the needs. Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v10: - fix build errors - rework default set and atomic global variable v9: - fix review comments from Flavio v7: - fix review comments(Eelco, Flavio) v5: - fix review comments(Ian, Flavio, Eelco) - add enum to hold mfex indexes - add new get and set implemenatations - add Atomic set and get --- --- NEWS | 1 + lib/automake.mk | 2 + lib/dpif-netdev-avx512.c | 31 +- lib/dpif-netdev-private-extract.c | 157 ++ lib/dpif-netdev-private-extract.h | 113 + lib/dpif-netdev-private-thread.h | 8 ++ lib/dpif-netdev.c | 108 +++- 7 files changed, 415 insertions(+), 5 deletions(-) create mode 100644 lib/dpif-netdev-private-extract.c create mode 100644 lib/dpif-netdev-private-extract.h diff --git a/NEWS b/NEWS index 6cdccc715..b0f08e96d 100644 --- a/NEWS +++ b/NEWS @@ -32,6 +32,7 @@ Post-v2.15.0 * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if the CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. + * Add command line option to switch between MFEX function pointers. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/automake.mk b/lib/automake.mk index 3c9523c1a..53b8abc0f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/dpif-netdev-private-dpcls.h \ lib/dpif-netdev-private-dpif.c \ lib/dpif-netdev-private-dpif.h \ + lib/dpif-netdev-private-extract.c \ + lib/dpif-netdev-private-extract.h \ lib/dpif-netdev-private-flow.h \ lib/dpif-netdev-private-thread.h \ lib/dpif-netdev-private.h \ diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c index 6f9aa8284..7772b7abf 100644 --- a/lib/dpif-netdev-avx512.c +++ b/lib/dpif-netdev-avx512.c @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, * // do all processing (HWOL->MFEX->EMC->SMC) * } */ + +/* Do a batch minfilow extract into keys. */ +uint32_t mf_mask = 0; +miniflow_extract_func mfex_func; +atomic_read_relaxed(>miniflow_extract_opt, _func); +if (mfex_func) { +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd); +} + uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; uint32_t iter = lookup_pkts_bitmask; while (iter) { @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, pkt_metadata_init(>md, in_port); struct dp_netdev_flow *f = NULL; +struct netdev_flow_key *key = [i]; + +/* Check the minfiflow mask to see if the packet was correctly + * classifed by vector mfex else do a scalar miniflow extract + * for that packet. + */ +bool mfex_hit = !!(mf_mask & (1 << i)); /* Check for a partial hardware offload match. */ if (hwol_enabled) { @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } if (f) { rules[i] = >cr; -pkt_meta[i].tcp_flags = parse_tcp_flags(packet); +/* If AVX512 MFEX already classified the packet, use it. */ +if (mfex_hit) { +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf); +} else { +pkt_meta[i].tcp_flags = parse_tcp_flags(packet); +} + pkt_meta[i].bytes = dp_packet_size(packet); phwol_hits++; hwol_emc_smc_hitmask |= (1 << i); @@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread *pmd, } } -/* Do miniflow extract into keys. */ -struct netdev_flow_key *key = [i]; -miniflow_extract(packet, >mf); +if (!mfex_hit) { +/* Do a scalar miniflow extract into keys. */ +miniflow_extract(packet, >mf); +} /* Cache TCP and byte values for all packets. */
[ovs-dev] [109 02/12] dpif-netdev: Add auto validation function for miniflow extract
From: Kumar Amber This patch introduced the auto-validation function which allows users to compare the batch of packets obtained from different miniflow implementations against the linear miniflow extract and return a hitmask. The autovaidator function can be triggered at runtime using the following command: $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator Signed-off-by: Kumar Amber Co-authored-by: Harry van Haaren Signed-off-by: Harry van Haaren --- v9: - fix review comments Flavio v6: -fix review comments(Eelco) v5: - fix review comments(Ian, Flavio, Eelco) - remove ovs assert and switch to default after a batch of packets is processed - Atomic set and get introduced - fix raw_ctz for windows build --- --- NEWS | 2 + lib/dpif-netdev-private-extract.c | 150 ++ lib/dpif-netdev-private-extract.h | 22 + lib/dpif-netdev.c | 2 +- 4 files changed, 175 insertions(+), 1 deletion(-) diff --git a/NEWS b/NEWS index b0f08e96d..cf254bcfe 100644 --- a/NEWS +++ b/NEWS @@ -33,6 +33,8 @@ Post-v2.15.0 CPU supports it. This enhances performance by using the native vpopcount instructions, instead of the emulated version of vpopcount. * Add command line option to switch between MFEX function pointers. + * Add miniflow extract auto-validator function to compare different + miniflow extract implementations against default implementation. - ovs-ctl: * New option '--no-record-hostname' to disable hostname configuration in ovsdb on startup. diff --git a/lib/dpif-netdev-private-extract.c b/lib/dpif-netdev-private-extract.c index 11d2ed2ec..6c5afd13d 100644 --- a/lib/dpif-netdev-private-extract.c +++ b/lib/dpif-netdev-private-extract.c @@ -38,6 +38,11 @@ static ATOMIC(miniflow_extract_func) default_mfex_func = NULL; */ static struct dpif_miniflow_extract_impl mfex_impls[] = { +[MFEX_IMPL_AUTOVALIDATOR] = { +.probe = NULL, +.extract_func = dpif_miniflow_extract_autovalidator, +.name = "autovalidator", }, + [MFEX_IMPL_SCALAR] = { .probe = NULL, .extract_func = NULL, @@ -155,3 +160,148 @@ dp_mfex_impl_get_by_name(const char *name, miniflow_extract_func *out_func) return -ENOENT; } + +uint32_t +dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets, +struct netdev_flow_key *keys, +uint32_t keys_size, odp_port_t in_port, +struct dp_netdev_pmd_thread *pmd_handle) +{ +const size_t cnt = dp_packet_batch_size(packets); +uint16_t good_l2_5_ofs[NETDEV_MAX_BURST]; +uint16_t good_l3_ofs[NETDEV_MAX_BURST]; +uint16_t good_l4_ofs[NETDEV_MAX_BURST]; +uint16_t good_l2_pad_size[NETDEV_MAX_BURST]; +struct dp_packet *packet; +struct dp_netdev_pmd_thread *pmd = pmd_handle; +struct netdev_flow_key test_keys[NETDEV_MAX_BURST]; + +if (keys_size < cnt) { +miniflow_extract_func default_func = NULL; +atomic_uintptr_t *pmd_func = (void *)>miniflow_extract_opt; +atomic_store_relaxed(pmd_func, (uintptr_t) default_func); +VLOG_ERR("Invalid key size supplied, Key_size: %d less than" + "batch_size: %" PRIuSIZE"\n", keys_size, cnt); +VLOG_ERR("Autovalidatior is disabled.\n"); +return 0; +} + +/* Run scalar miniflow_extract to get default result. */ +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) { +pkt_metadata_init(>md, in_port); +miniflow_extract(packet, [i].mf); + +/* Store known good metadata to compare with optimized metadata. */ +good_l2_5_ofs[i] = packet->l2_5_ofs; +good_l3_ofs[i] = packet->l3_ofs; +good_l4_ofs[i] = packet->l4_ofs; +good_l2_pad_size[i] = packet->l2_pad_size; +} + +uint32_t batch_failed = 0; +/* Iterate through each version of miniflow implementations. */ +for (int j = MFEX_IMPL_START_IDX; j < MFEX_IMPL_MAX; j++) { +if (!mfex_impls[j].available) { +continue; +} +/* Reset keys and offsets before each implementation. */ +memset(test_keys, 0, keys_size * sizeof(struct netdev_flow_key)); +DP_PACKET_BATCH_FOR_EACH (i, packet, packets) { +dp_packet_reset_offsets(packet); +} +/* Call optimized miniflow for each batch of packet. */ +uint32_t hit_mask = mfex_impls[j].extract_func(packets, test_keys, + keys_size, in_port, + pmd_handle); + +/* Do a miniflow compare for bits, blocks and offsets for all the + * classified packets in the hitmask marked by set bits. */ +while (hit_mask) { +/* Index for the set bit. */ +uint32_t i = raw_ctz(hit_mask); +/* Set the index in hitmask to
[ovs-dev] [v10 00/12] MFEX Infrastructure + Optimizations
v10 update: - re-worked the default implementation - fix comments from Flavio and Eelco - Include Acks from Eelco in study v9 update: - Include review comments from Flavio - Rebase onto Master - Include Acks from Flavio v8 updates: - Include documentation on AVX512 MFEX as per Eelco's suggestion on list v7 updates: - Rebase onto DPIF v15 - Changed commands to get and set MFEX - Fixed comments from Flavio, Eelco - Segrated addition of MFEX options to seaprate patch 12 for Scalar DPIF - Removed sleep from auto-validator and added frame counter check - Documentation updates - Minor bug fixes v6 updates: - Fix non-ssl build v5 updates: - reabse onto latest DPIF v14 - use Enum for mfex impls - add pmd core id set paramter in set command - get command modified to display the pmd thread for individual mfex functions - resolved comments from Eelco, Ian, Flavio - Use Atomic to get and set miniflow implementations - removed and reduced sleep in unit tests - fixed scalar miniflow perf degradation v4 updates: - rebase on to latest DPIF v13 - fix fuzzy.py script with random mac/ip v3 updates: - rebase on to latest DPIF v12 - add additonal AVX512 traffic profiles for tcp and vlan - add new command line for study function to add packet count - add unit tests for fuzzy testing and auto-validation of mfex - add mfex option hit stats to perf-show command v2 updates: - rebase on to latest DPIF v11 This patchset introduces miniflow extract Infrastructure changes which allows user to choose different type of ISA based optimized miniflow extract variants which can be user choosen or set based on packets studies automatically by OVS using different commands. The Infrastructure also provides a way to check the correctness of different ISA optimized miniflow extract variants against the scalar version. Harry van Haaren (4): dpif/stats: add miniflow extract opt hits counter dpdk: add additional CPU ISA detection strings dpif-netdev/mfex: Add AVX512 based optimized miniflow extract dpif-netdev/mfex: add more AVX512 traffic profiles Kumar Amber (7): dpif-netdev: Add command line and function pointer for miniflow extract dpif-netdev: Add auto validation function for miniflow extract dpif-netdev: Add study function to select the best mfex function docs/dpdk/bridge: add miniflow extract section. dpif-netdev: Add configure to enable autovalidator at build time. dpif-netdev: Add packet count and core id paramters for study test/sytem-dpdk: Add unit test for mfex autovalidator kumar Amber (1): dpif-netdev: add mfex options to scalar dpif Documentation/topics/dpdk/bridge.rst | 144 ++ NEWS | 12 +- acinclude.m4 | 16 + configure.ac | 1 + lib/automake.mk | 4 + lib/dpdk.c | 2 + lib/dpif-netdev-avx512.c | 34 +- lib/dpif-netdev-extract-avx512.c | 630 +++ lib/dpif-netdev-extract-study.c | 157 +++ lib/dpif-netdev-perf.c | 3 + lib/dpif-netdev-perf.h | 1 + lib/dpif-netdev-private-extract.c| 366 lib/dpif-netdev-private-extract.h| 203 + lib/dpif-netdev-private-thread.h | 8 + lib/dpif-netdev-unixctl.man | 4 + lib/dpif-netdev.c| 255 ++- tests/.gitignore | 1 + tests/automake.mk| 5 + tests/mfex_fuzzy.py | 31 ++ tests/pcap/mfex_test.pcap| Bin 0 -> 416 bytes tests/pmd.at | 6 +- tests/system-dpdk.at | 49 +++ 22 files changed, 1918 insertions(+), 14 deletions(-) create mode 100644 lib/dpif-netdev-extract-avx512.c create mode 100644 lib/dpif-netdev-extract-study.c create mode 100644 lib/dpif-netdev-private-extract.c create mode 100644 lib/dpif-netdev-private-extract.h create mode 100755 tests/mfex_fuzzy.py create mode 100644 tests/pcap/mfex_test.pcap -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 2/2] netdev-offload-dpdk: Fix ethernet type for VLANs
For VLANs, the match of ethernet type should be specified in inner_type field of the vlan match, and not type field in ethernet match. Fix it. Fixes: e8a2b5bf92bb ("netdev-dpdk: implement flow offload with rte flow") Signed-off-by: Eli Britstein Reviewed-by: Salem Sol --- lib/netdev-offload-dpdk.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index 3e0d0643b..65f9b3685 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -1106,12 +1106,13 @@ parse_flow_match(struct netdev *netdev, spec->tci = match->flow.vlans[0].tci & ~htons(VLAN_CFI); mask->tci = match->wc.masks.vlans[0].tci & ~htons(VLAN_CFI); -/* Match any protocols. */ -mask->inner_type = 0; - if (eth_spec && eth_mask) { eth_spec->has_vlan = 1; eth_mask->has_vlan = 1; +spec->inner_type = eth_spec->type; +mask->inner_type = eth_mask->type; +eth_spec->type = match->flow.vlans[0].tpid; +eth_mask->type = match->wc.masks.vlans[0].tpid; } add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_VLAN, spec, mask); -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 1/2] netdev-offload-dpdk: Use has_vlan match attribute
DPDK 20.11 introduced an ability to specify existance/non-existance of VLAN tag by [1]. Use this attribute. [1]: 09315fc83861 ("ethdev: add VLAN attributes to ethernet and VLAN items") Signed-off-by: Eli Britstein Reviewed-by: Salem Sol --- lib/netdev-offload-dpdk.c | 16 1 file changed, 16 insertions(+) diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c index e7913292e..3e0d0643b 100644 --- a/lib/netdev-offload-dpdk.c +++ b/lib/netdev-offload-dpdk.c @@ -210,6 +210,8 @@ dump_flow_pattern(struct ds *s, ds_put_cstr(s, "eth "); if (eth_spec) { +uint32_t has_vlan_mask; + if (!eth_mask) { eth_mask = _flow_item_eth_mask; } @@ -222,6 +224,9 @@ dump_flow_pattern(struct ds *s, DUMP_PATTERN_ITEM(eth_mask->type, "type", "0x%04"PRIx16, ntohs(eth_spec->type), ntohs(eth_mask->type)); +has_vlan_mask = eth_mask->has_vlan ? UINT32_MAX : 0; +DUMP_PATTERN_ITEM(has_vlan_mask, "has_vlan", "%d", + eth_spec->has_vlan, eth_mask->has_vlan); } ds_put_cstr(s, "/ "); } else if (item->type == RTE_FLOW_ITEM_TYPE_VLAN) { @@ -1037,6 +1042,7 @@ parse_flow_match(struct netdev *netdev, struct flow_patterns *patterns, struct match *match) { +struct rte_flow_item_eth *eth_spec = NULL, *eth_mask = NULL; struct flow *consumed_masks; uint8_t proto = 0; @@ -1082,6 +1088,11 @@ parse_flow_match(struct netdev *netdev, memset(_masks->dl_src, 0, sizeof consumed_masks->dl_src); consumed_masks->dl_type = 0; +spec->has_vlan = 0; +mask->has_vlan = 1; +eth_spec = spec; +eth_mask = mask; + add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_ETH, spec, mask); } @@ -1098,6 +1109,11 @@ parse_flow_match(struct netdev *netdev, /* Match any protocols. */ mask->inner_type = 0; +if (eth_spec && eth_mask) { +eth_spec->has_vlan = 1; +eth_mask->has_vlan = 1; +} + add_flow_pattern(patterns, RTE_FLOW_ITEM_TYPE_VLAN, spec, mask); } /* For untagged matching match->wc.masks.vlans[0].tci is 0x and -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 06/12] dpif-netdev: Add packet count and core id paramters for study
Hi Eelco, Pls find my comments inline. > -Original Message- > From: Eelco Chaudron > Sent: Monday, July 12, 2021 9:17 PM > To: Amber, Kumar > Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van > Haaren, Harry ; Ferriter, Cian > ; Stokes, Ian > Subject: Re: [v9 06/12] dpif-netdev: Add packet count and core id paramters > for > study > > > > On 12 Jul 2021, at 7:51, kumar Amber wrote: > > > From: Kumar Amber > > > > This commit introduces additional command line paramter for mfex study > > function. If user provides additional packet out it is used in study > > to compare minimum packets which must be processed else a default > > value is choosen. > > Also introduces a third paramter for choosing a particular pmd core. > > > > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 > > > > Signed-off-by: Kumar Amber > > > > --- > > v9: > > - fix review comments Flavio > > v7: > > - change the command paramters for core_id and study_pkt_cnt > > v5: > > - fix review comments(Ian, Flavio, Eelco) > > - introucde pmd core id parameter > > --- > > --- > > Documentation/topics/dpdk/bridge.rst | 39 - > > lib/dpif-netdev-extract-study.c | 26 +- > > lib/dpif-netdev-private-extract.h| 9 ++ > > lib/dpif-netdev.c| 121 +-- > > 4 files changed, 181 insertions(+), 14 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/bridge.rst > > b/Documentation/topics/dpdk/bridge.rst > > index 4db416ddd..c31067c51 100644 > > --- a/Documentation/topics/dpdk/bridge.rst > > +++ b/Documentation/topics/dpdk/bridge.rst > > @@ -284,12 +284,45 @@ command also shows whether the CPU supports > each implementation :: > > > > An implementation can be selected manually by the following command :: > > > > -$ ovs-appctl dpif-netdev/miniflow-parser-set study > > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name] > > + [study_cnt] > > > > -Also user can select the study implementation which studies the > > traffic for > > +The above command has two optional parameters: study_cnt and core_id. > > +The core_id set a particular miniflow extract function to a specific > > The core_id sets > > > +pmd thread on the core. Third parameter study_cnt, which is specific > > The third parameter > Fixed both typos. > > +to study and ignored by other implementations, means how many packets > > +are needed to choose the best implementation. > > + > > +The user can select the study implementation which studies the > > +traffic for > > a specific number of packets by applying all available implementaions > > of > > implementations > > > miniflow extract and than chooses the one with most optimal result > > for that > > and then chooses ... with the most optimal > Fixed these 2 as well. > > -traffic pattern. > > +traffic pattern. The user can optionally provide an packet count > > +[study_cnt] parameter which is the minimum number of packets that OVS > > +must study before choosing an optimal implementation. If no packet > > +count is provided, then the default value, 128 is chosen. Also, as > > +there is no synchronization point between threads, one PMD thread > > +might still be running a previous round, and can now decide on earlier > > data. > > + > > +The per packet count is a global value, and parallel `study()` > > +executions with > > Should study() just be study? > Changed > > +differing packet counts will use the most recent count value provided by > usser. > > + > > +Study can be selected with packet count by the following command :: > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024 > > + > > +Study can be selected with packet count and explicit PMD selection by > > +the following command :: > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024 > > + > > +In the above command the last parameter is the CORE ID of the PMD > > +thread and this can also be used to explicitly set the miniflow > > +extraction function pointer on different PMD threads. > > + > > +Scalar can be selected on core 3 by the following command where study > > +count can be put as any arbitary number or left blank:: > > arbitrary > Fixed. > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar > > > > Miniflow Extract Validation > > ~~~ > > diff --git a/lib/dpif-netdev-extract-study.c > > b/lib/dpif-netdev-extract-study.c index a19759bd9..2dc3faf83 100644 > > --- a/lib/dpif-netdev-extract-study.c > > +++ b/lib/dpif-netdev-extract-study.c > > @@ -25,7 +25,7 @@ > > > > VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); > > > > -static uint32_t mfex_study_pkts_count = 0; > > +static uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT; > > > > /* Struct to hold miniflow study stats. */ struct study_stats { @@ > > -48,6 +48,28 @@ mfex_study_get_study_stats_ptr(void) > > return stats; >
Re: [ovs-dev] [v9 05/12] dpif-netdev: Add configure to enable autovalidator at build time.
Hi Eelco, Fixed all and reworked default. > -Original Message- > From: Eelco Chaudron > Sent: Monday, July 12, 2021 6:50 PM > To: Amber, Kumar > Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van > Haaren, Harry ; Ferriter, Cian > ; Stokes, Ian > Subject: Re: [v9 05/12] dpif-netdev: Add configure to enable autovalidator at > build time. > > > > On 12 Jul 2021, at 7:51, kumar Amber wrote: > > > From: Kumar Amber > > > > This commit adds a new command to allow the user to enable > > autovalidatior by default at build time thus allowing for runnig unit > > test by default. > > > > $ ./configure --enable-mfex-default-autovalidator > > > > Signed-off-by: Kumar Amber > > Co-authored-by: Harry van Haaren > > Signed-off-by: Harry van Haaren > > > > --- > > v9: > > - fix review comments Flavio > > v7: > > - fix review commens(Eelco, Flavio) > > v5: > > - fix review comments(Ian, Flavio, Eelco) > > --- > > --- > > Documentation/topics/dpdk/bridge.rst | 5 + > > NEWS | 3 ++- > > acinclude.m4 | 16 > > configure.ac | 1 + > > lib/dpif-netdev-private-extract.c| 8 ++-- > > 5 files changed, 30 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/bridge.rst > > b/Documentation/topics/dpdk/bridge.rst > > index 7c618cf1f..4db416ddd 100644 > > --- a/Documentation/topics/dpdk/bridge.rst > > +++ b/Documentation/topics/dpdk/bridge.rst > > @@ -307,3 +307,8 @@ implementations provide the same results. > > To set the Miniflow autovalidator, use this command :: > > > > $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator > > + > > +A compile time option is available in order to test it with the OVS > > +unit test suite. Use the following configure option :: > > + > > +$ ./configure --enable-mfex-default-autovalidator > > diff --git a/NEWS b/NEWS > > index 4a7b89409..581bff225 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -38,6 +38,8 @@ Post-v2.15.0 > > * Add study function to miniflow function table which studies packet > > and automatically chooses the best miniflow implementation for that > > traffic. > > + * Add build time configure command to enable auto-validatior as > > default > > + miniflow implementation at build time. > > - ovs-ctl: > > * New option '--no-record-hostname' to disable hostname configuration > > in ovsdb on startup. > > @@ -57,7 +59,6 @@ Post-v2.15.0 > > whether the SNAT with all-zero IP address is supported. > > See ovs-vswitchd.conf.db(5) for details. > > > > - > > You are removing a white space here unrelated to your changes. Please leave it > in. > > > v2.15.0 - 15 Feb 2021 > > - > > - OVSDB: > > diff --git a/acinclude.m4 b/acinclude.m4 index 343303447..5a48f0335 > > 100644 > > --- a/acinclude.m4 > > +++ b/acinclude.m4 > > @@ -14,6 +14,22 @@ > > # See the License for the specific language governing permissions and > > # limitations under the License. > > > > +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time? > > +dnl This enables automatically running all unit tests with all MFEX > > +dnl implementations. > > +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [ > > + AC_ARG_ENABLE([mfex-default-autovalidator], > > +[AC_HELP_STRING([--enable-mfex-default-autovalidator], > > [Enable > MFEX autovalidator as default miniflow_extract implementation.])], > > +[autovalidator=yes],[autovalidator=no]) > > + AC_MSG_CHECKING([whether MFEX Autovalidator is default > > +implementation]) > > + if test "$autovalidator" != yes; then > > +AC_MSG_RESULT([no]) > > + else > > +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT" > > +AC_MSG_RESULT([yes]) > > + fi > > +]) > > + > > dnl Set OVS DPCLS Autovalidator as default subtable search at compile time? > > dnl This enables automatically running all unit tests with all DPCLS > > dnl implementations. > > diff --git a/configure.ac b/configure.ac index e45685a6c..46c402892 > > 100644 > > --- a/configure.ac > > +++ b/configure.ac > > @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE > > OVS_CTAGS_IDENTIFIERS > > OVS_CHECK_DPCLS_AUTOVALIDATOR > > OVS_CHECK_DPIF_AVX512_DEFAULT > > +OVS_CHECK_MFEX_AUTOVALIDATOR > > OVS_CHECK_BINUTILS_AVX512 > > > > AC_ARG_VAR(KARCH, [Kernel Architecture String]) diff --git > > a/lib/dpif-netdev-private-extract.c > > b/lib/dpif-netdev-private-extract.c > > index 4ea111f94..ad71f238e 100644 > > --- a/lib/dpif-netdev-private-extract.c > > +++ b/lib/dpif-netdev-private-extract.c > > @@ -77,20 +77,24 @@ dp_mfex_impl_get_default(void) { > > atomic_uintptr_t *mfex_func = (void *)_mfex_func; > > static bool default_mfex_func_set = false; > > +#ifdef MFEX_AUTOVALIDATOR_DEFAULT > > +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR; #else > > int mfex_idx = MFEX_IMPL_SCALAR; > > +#endif > > > >
Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs
Hi Flavio, All fixed ready to merge > -Original Message- > From: Flavio Leitner > Sent: Monday, July 12, 2021 11:53 PM > To: Amber, Kumar > Cc: ovs-dev@openvswitch.org > Subject: Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs > > > Hi Kumar, > > There is an issue with the signed-offs reported by 0-day Robot. > For additional info, please check the link below and look for the tag Co- > authored-by: > https://github.com/openvswitch/ovs/blob/master/Documentation/internals/co > ntributing/submitting-patches.rst#tags > > Otherwise the patch looks good time. > Thanks, > fbl > > On Mon, Jul 12, 2021 at 11:44:05AM +0530, kumar Amber wrote: > > From: Harry van Haaren > > > > This commit avoids many instances of "using subtable X for miniflow (x,y)" > > in the ovs-vswitchd log when using the DPCLS Autovalidator. This > > occurs when no specialized subtable is found, and the generic "_any" > > version of the avx512 subtable search implementation was used. This > > change logs the subtable usage once, avoiding duplicates. > > > > Signed-off-by: Harry van Haaren > > Signed-off-by: kumar Amber > > > > --- > > v4: > > - add doc updtae from Flavio > > v3: > > - add comments from Flavio > > - add documentation update > > --- > > Documentation/topics/dpdk/bridge.rst | 34 ++ > > lib/dpif-netdev-lookup-avx512-gather.c | 4 +-- > > 2 files changed, 36 insertions(+), 2 deletions(-) > > > > diff --git a/Documentation/topics/dpdk/bridge.rst > > b/Documentation/topics/dpdk/bridge.rst > > index 0f70a0cad..374e03eb0 100644 > > --- a/Documentation/topics/dpdk/bridge.rst > > +++ b/Documentation/topics/dpdk/bridge.rst > > @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is > > not used. Put in logical terms, a subtable is chosen if its priority > > is greater than the previous best candidate. > > > > +Optimizing Specific Subtable Search > > +~~~ > > + > > +During the packet classification, the datapath can use specialized > > +lookup tables to optimize the search. However, not all situations are > > +optimized. If you see a message like the following one in the OVS > > +logs, it means that there is no specialized implementation available > > +for the current networking traffic. In this case, OVS will continue > > +to process the traffic normally using a more generic lookup table." > > + > > +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly > > others." > > + > > +(Note that the numbers 4 and 1 will likely be different in your logs) > > + > > +Additional specialized lookups can be added to OVS if the user > > +provides that log message along with the command output as show below > > +to the OVS mailing list. Note that the numbers in the log message > > +("subtable (X,Y)") need to match with the numbers in the provided > > +command output ("dp-extra-info:miniflow_bits(X,Y)"). > > + > > +"ovs-appctl dpctl/dump-flows -m", which results in output like this: > > + > > +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, > skb_priority(0/0),skb_mark(0/0) > > +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0), > > + > dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00: > > +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type( > > + > 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0 > > + > ,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0, > > +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs, > > +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1) > > + > > +Please send an email to the OVS mailing list ovs-dev@openvswitch.org > > +with the output of the "dp-extra-info:miniflow_bits(4,1)" values. > > + > > CPU ISA Testing and Validation > > ~~ > > > > diff --git a/lib/dpif-netdev-lookup-avx512-gather.c > > b/lib/dpif-netdev-lookup-avx512-gather.c > > index bc359dc4a..ced846aa7 100644 > > --- a/lib/dpif-netdev-lookup-avx512-gather.c > > +++ b/lib/dpif-netdev-lookup-avx512-gather.c > > @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t > u0_bits, uint32_t u1_bits) > > */ > > if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) { > > f = dpcls_avx512_gather_mf_any; > > -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n", > > - u0_bits, u1_bits); > > +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable" > > + " (%d,%d) and possibly others.", u0_bits, > > + u1_bits); > > } > > > > return f; > > -- > > 2.25.1 > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > -- > fbl ___ dev mailing list d...@openvswitch.org
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
Hi Eelco, Flavio , Pls find my comments Inline. > > +/* For the first call, this will be choosen based on the > > + * compile time flag and if nor flag is set it is set to > > + * default scalar. > > + */ > > +if (OVS_UNLIKELY(!default_mfex_func_set)) { > > +VLOG_INFO("Default MFEX implementation is %s.\n", > > + mfex_impls[mfex_idx].name); > > +atomic_store_relaxed(mfex_func, (uintptr_t) mfex_impls > > + [mfex_idx].extract_func); > > +default_mfex_func_set = true; > > If this only needs to be done once, why not move it to > dpif_miniflow_extract_init() as suggested during the v6 review (in a later > patch)? > This will remove this check every time dp_mfex_impl_get_default() gets called. > Yes it does make . > > +} > > + > > +return default_mfex_func; > > +} > > + > > +int > > +dp_mfex_impl_set_default_by_name(const char *name) { > > +miniflow_extract_func new_default; > > +atomic_uintptr_t *mfex_func = (void *)_mfex_func; > > + > > +int err = dp_mfex_impl_get_by_name(name, _default); > > + > > +if (!err) { > > +atomic_store_relaxed(mfex_func, (uintptr_t) new_default); > > +} > > + > > +return err; > > + > > +} > > + > > +void > > +dp_mfex_impl_get(struct ds *reply, struct dp_netdev_pmd_thread > **pmd_list, > > + size_t pmd_list_size) { > > +/* Add all MFEX functions to reply string. */ > > +ds_put_cstr(reply, "Available MFEX implementations:\n"); > > + > > +for (int i = 0; i < MFEX_IMPL_MAX; i++) { > > +ds_put_format(reply, " %s (available: %s pmds: ", > > + mfex_impls[i].name, mfex_impls[i].available ? > > + "True" : "False"); > > Flavio mentioned that True/False did not make sense to an end-user, not sure > if > he has the same feeling here? > Maybe yes/no make more sense here? Flavio? > Changes to available same as previous comments. > > + > > +for (size_t j = 0; j < pmd_list_size; j++) { > > +struct dp_netdev_pmd_thread *pmd = pmd_list[j]; > > +if (pmd->core_id == NON_PMD_CORE_ID) { > > +continue; > > +} > > + > > +if (pmd->miniflow_extract_opt == mfex_impls[i].extract_func) { > > +ds_put_format(reply, "%u,", pmd->core_id); > > +} > > +} > > + > > +ds_chomp(reply, ','); > > + > > +if (ds_last(reply) == ' ') { > > +ds_put_cstr(reply, "none"); > > +} > > + > > +ds_put_cstr(reply, ")\n"); > > +} > > + > > +} > > + > > +/* This function checks all available MFEX implementations, and > > +selects and > > + * returns the function pointer to the one requested by "name". If > > +nothing > > + * is found it reutrns error. > > reutrns -> returns > Fixed. > > + */ > > +int > > +dp_mfex_impl_get_by_name(const char *name, miniflow_extract_func > > +*out_func) { > > +if ((name == NULL) || (out_func == NULL)) { > > +return -EINVAL; > > +} > > + > > +for (int i = 0; i < MFEX_IMPL_MAX; i++) { > > +if (strcmp(mfex_impls[i].name, name) == 0) { > > +/* Check available is set before exec. */ > > +if (!mfex_impls[i].available) { > > +*out_func = NULL; > > +return -ENODEV; > > +} > > + > > +*out_func = mfex_impls[i].extract_func; > > +return 0; > > +} > > +} > > + > > +return -ENOENT; > > +} > > diff --git a/lib/dpif-netdev-private-extract.h > > b/lib/dpif-netdev-private-extract.h > > new file mode 100644 > > index 0..ddf2e2845 > > --- /dev/null > > +++ b/lib/dpif-netdev-private-extract.h > > @@ -0,0 +1,111 @@ > > +/* > > + * Copyright (c) 2021 Intel. > > + * > > + * Licensed under the Apache License, Version 2.0 (the "License"); > > + * you may not use this file except in compliance with the License. > > + * You may obtain a copy of the License at: > > + * > > + * http://www.apache.org/licenses/LICENSE-2.0 > > + * > > + * Unless required by applicable law or agreed to in writing, > > +software > > + * distributed under the License is distributed on an "AS IS" BASIS, > > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or > implied. > > + * See the License for the specific language governing permissions > > +and > > + * limitations under the License. > > + */ > > + > > +#ifndef MFEX_AVX512_EXTRACT > > +#define MFEX_AVX512_EXTRACT 1 > > + > > +#include > > + > > +/* Forward declarations. */ > > +struct dp_packet; > > +struct miniflow; > > +struct dp_netdev_pmd_thread; > > +struct dp_packet_batch; > > +struct netdev_flow_key; > > + > > +/* Function pointer prototype to be implemented in the optimized > > +miniflow > > + * extract code. > > + * returns the hitmask of the processed packets on success. > > + * returns zero on failure. > > + */ > > +typedef uint32_t
Re: [ovs-dev] [v9 04/12] docs/dpdk/bridge: add miniflow extract section.
Hi Eelco fixed all typos in v10 > -Original Message- > From: Eelco Chaudron > Sent: Monday, July 12, 2021 6:37 PM > To: Amber, Kumar > Cc: ovs-dev@openvswitch.org; f...@sysclose.org; i.maxim...@ovn.org; Van > Haaren, Harry ; Ferriter, Cian > ; Stokes, Ian > Subject: Re: [v9 04/12] docs/dpdk/bridge: add miniflow extract section. > > > > On 12 Jul 2021, at 7:51, kumar Amber wrote: > > > From: Kumar Amber > > > > This commit adds a section to the dpdk/bridge.rst netdev > > documentation, detailing the added miniflow functionality. The newly > > added commands are documented, and sample output is provided. > > > > The use of auto-validator and special study function is also described > > in detail as well as running fuzzy tests. > > > > Signed-off-by: Kumar Amber > > Co-authored-by: Cian Ferriter > > Signed-off-by: Cian Ferriter > > Co-authored-by: Harry van Haaren > > Signed-off-by: Harry van Haaren > > Acked-by: Flavio Leitner > > > > --- > > v7: > > - fix review comments(Eelco) > > v5: > > - fix review comments(Ian, Flavio, Eelco) > > --- > > --- > > Documentation/topics/dpdk/bridge.rst | 51 > > > > 1 file changed, 51 insertions(+) > > > > diff --git a/Documentation/topics/dpdk/bridge.rst > > b/Documentation/topics/dpdk/bridge.rst > > index 2d0850836..7c618cf1f 100644 > > --- a/Documentation/topics/dpdk/bridge.rst > > +++ b/Documentation/topics/dpdk/bridge.rst > > @@ -256,3 +256,54 @@ The following line should be seen in the > > configure output when the above option is used :: > > > > checking whether DPIF AVX512 is default implementation... yes > > + > > +Miniflow Extract > > + > > + > > +Miniflow extract (MFEX) performs parsing of the raw packets and > > +extracts the important header information into a compressed miniflow. > > +This miniflow is composed of bits and blocks where the bits signify > > +which blocks are set or have values where as the blocks hold the > > +metadata, ip, udp, vlan, etc. These values are used by the datapath > > +for switching decisions later.The Optimized miniflow extract is > > +traffic specific to speed up the lookup, whereas the scalar works for > > +ALL traffic patterns > > + > > +Most modern CPUs have SIMD capabilities. These SIMD instructions are > > +able to process a vector rather than act on one single data. > > This sounds odd “rather than act on one single data.”? > > > OVS provides multiple > > +implementations of miniflow extract. This allows the user to take > > +advantage of SIMD instructions like AVX512 to gain additional performance. > > + > > +A list of implementations can be obtained by the following command. > > +The command also shows whether the CPU supports each implementation :: > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-get > > +Available Optimized Miniflow Extracts: > > +autovalidator (available: True, pmds: none) > > +scalar (available: True, pmds: 1,15) > > +study (available: True, pmds: none) > > + > > +An implementation can be selected manually by the following command :: > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-set study > > + > > +Also user can select the study implementation which studies the > > +traffic for a specific number of packets by applying all available > > +implementaions of > > implementations > > > +miniflow extract and than chooses the one with most optimal result > > +for that > > than -> then > > most optimal -> the most optimal > > > +traffic pattern. > > + > > +Miniflow Extract Validation > > +~~~ > > + > > +As multiple versions of miniflow extract can co-exist, each with > > +different CPU ISA optimizations, it is important to validate that > > +they all give the exact same results. To easily test all miniflow > > +implementations, an ``autovalidator`` implementation of the miniflow > > +exists. This implementation runs all other available miniflow extract > > +implementations, and verifies that the results are identical. > > + > > +Running the OVS unit tests with the autovalidator enabled ensures all > > +implementations provide the same results. > > + > > +To set the Miniflow autovalidator, use this command :: > > + > > +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator > > -- > > 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function
Hi Eelco , Thanks for Ack. Will also fix the minor changes in next patch > > +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) { > > +VLOG_DBG("MFEX study results for implementation %s:" > > + " (hits %d/%d pkts)", > > + miniflow_funcs[i].name, stats->impl_hitcount[i], > > + stats->pkt_count); > > +} > > For all logs above “hits %d/%d pkts” should be “hits %u/%u pkts” > Sure. > > +} > > + > > +/* Reset stats so that study function can be called again > > + * for next traffic type and optimal function ptr can be > > + * chosen. > > + */ > > +memset(stats, 0, sizeof(struct study_stats)); > > +} > > +return mask; > > +} > > diff --git a/lib/dpif-netdev-private-extract.c > > b/lib/dpif-netdev-private-extract.c > > index 19f350349..4ea111f94 100644 > > --- a/lib/dpif-netdev-private-extract.c > > +++ b/lib/dpif-netdev-private-extract.c > > @@ -47,6 +47,11 @@ static struct dpif_miniflow_extract_impl mfex_impls[] = > { > > .probe = NULL, > > .extract_func = NULL, > > .name = "scalar", }, > > + > > +[MFEX_IMPL_STUDY] = { > > +.probe = NULL, > > +.extract_func = mfex_study_traffic, > > +.name = "study", }, > > }; > > > > BUILD_ASSERT_DECL(MFEX_IMPL_MAX == ARRAY_SIZE(mfex_impls)); @@ - > 166,6 > > +171,12 @@ dp_mfex_impl_get_by_name(const char *name, > miniflow_extract_func *out_func) > > return -ENOENT; > > } > > > > +void > > +dpif_mfex_impl_info_get(struct dpif_miniflow_extract_impl **out_ptr) > > +{ > > +*out_ptr = mfex_impls; > > +} > > If we are only interested in getting a pointer, why not just return it: > > struct dpif_miniflow_extract_impl *dpif_mfex_impl_info_get(void) { > return mfex_impls; > } > True. > > uint32_t > > dpif_miniflow_extract_autovalidator(struct dp_packet_batch *packets, > > struct netdev_flow_key *keys, > > diff --git a/lib/dpif-netdev-private-extract.h > > b/lib/dpif-netdev-private-extract.h > > index de3270c88..32b7ccbb3 100644 > > --- a/lib/dpif-netdev-private-extract.h > > +++ b/lib/dpif-netdev-private-extract.h > > @@ -80,6 +80,7 @@ struct dpif_miniflow_extract_impl { enum > > dpif_miniflow_extract_impl_idx { > > MFEX_IMPL_AUTOVALIDATOR, > > MFEX_IMPL_SCALAR, > > +MFEX_IMPL_STUDY, > > MFEX_IMPL_MAX > > }; > > > > @@ -89,6 +90,9 @@ enum dpif_miniflow_extract_impl_idx { > > > > #define MFEX_IMPL_START_IDX MFEX_IMPL_MAX > > > > +/* Max count of packets to be compared. */ #define > MFEX_MAX_PKT_COUNT > > +(128) > > + > > /* This function returns all available implementations to the caller. The > > * quantity of implementations is returned by the int return value. > > */ > > @@ -109,6 +113,13 @@ miniflow_extract_func > > dp_mfex_impl_get_default(void); > > /* Overrides the default MFEX with the user set MFEX. */ int > > dp_mfex_impl_set_default_by_name(const char *name); > > > > +/* Retrieve the array of miniflow implementations for iteration. > > + * On error, returns a negative number. > > + * On success, returns the size of the arrays pointed to by the out > > parameter. > > + */ > > +void > > +dpif_mfex_impl_info_get(struct dpif_miniflow_extract_impl **out_ptr); > > + > > > > /* Initializes the available miniflow extract implementations by probing > > for > > * the CPU ISA requirements. As the runtime available CPU ISA does > > not change @@ -130,4 +141,16 @@ > dpif_miniflow_extract_autovalidator(struct dp_packet_batch *batch, > > uint32_t keys_size, odp_port_t in_port, > > struct dp_netdev_pmd_thread > > *pmd_handle); > > > > +/* Retrieve the number of packets by studying packets using different > > +miniflow > > + * implementations to choose the best implementation using the > > +maximum hitmask > > + * count. > > + * On error, returns a zero for no packets. > > + * On success, returns mask of the packets hit. > > + */ > > +uint32_t > > +mfex_study_traffic(struct dp_packet_batch *packets, > > + struct netdev_flow_key *keys, > > + uint32_t keys_size, odp_port_t in_port, > > + struct dp_netdev_pmd_thread *pmd_handle); > > + > > #endif /* MFEX_AVX512_EXTRACT */ > > -- > > 2.25.1 Regards Amber ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [v5] dpif/dpcls: limit count subtable search info logs
From: Harry van Haaren This commit avoids many instances of "using subtable X for miniflow (x,y)" in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs when no specialized subtable is found, and the generic "_any" version of the avx512 subtable search implementation was used. This change logs the subtable usage once, avoiding duplicates. Signed-off-by: Harry van Haaren Signed-off-by: kumar Amber Co-authored-by: kumar Amber --- v5: - fix checkpatch error v4: - add doc updtae from Flavio v3: - add comments from Flavio - add documentation update --- Documentation/topics/dpdk/bridge.rst | 34 ++ lib/dpif-netdev-lookup-avx512-gather.c | 4 +-- 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/Documentation/topics/dpdk/bridge.rst b/Documentation/topics/dpdk/bridge.rst index 0f70a0cad..374e03eb0 100644 --- a/Documentation/topics/dpdk/bridge.rst +++ b/Documentation/topics/dpdk/bridge.rst @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is not used. Put in logical terms, a subtable is chosen if its priority is greater than the previous best candidate. +Optimizing Specific Subtable Search +~~~ + +During the packet classification, the datapath can use specialized +lookup tables to optimize the search. However, not all situations +are optimized. If you see a message like the following one in the OVS +logs, it means that there is no specialized implementation available +for the current networking traffic. In this case, OVS will continue +to process the traffic normally using a more generic lookup table." + +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others." + +(Note that the numbers 4 and 1 will likely be different in your logs) + +Additional specialized lookups can be added to OVS if the user +provides that log message along with the command output as show +below to the OVS mailing list. Note that the numbers in the log +message ("subtable (X,Y)") need to match with the numbers in +the provided command output ("dp-extra-info:miniflow_bits(X,Y)"). + +"ovs-appctl dpctl/dump-flows -m", which results in output like this: + +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, skb_priority(0/0),skb_mark(0/0) +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0), +dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00: +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type( +0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0 +,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0, +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs, +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1) + +Please send an email to the OVS mailing list ovs-dev@openvswitch.org with +the output of the "dp-extra-info:miniflow_bits(4,1)" values. + CPU ISA Testing and Validation ~~ diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c index bc359dc4a..ced846aa7 100644 --- a/lib/dpif-netdev-lookup-avx512-gather.c +++ b/lib/dpif-netdev-lookup-avx512-gather.c @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, uint32_t u1_bits) */ if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) { f = dpcls_avx512_gather_mf_any; -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n", - u0_bits, u1_bits); +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable" + " (%d,%d) and possibly others.", u0_bits, u1_bits); } return f; -- 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix the description for LB's skip_snat option.
On Mon, Jul 12, 2021 at 12:42 PM Numan Siddique wrote: > > On Fri, Jul 9, 2021 at 2:49 PM Han Zhou wrote: > > > > lb_force_snat_ip is a flag set in logical flow pipeline, while > > lb_force_snat_ip is the option configured in NB DB. In NB document we > > should mention the actual option configured in NB instead of the flow > > details. > > > > Signed-off-by: Han Zhou > > Acked-by: Numan Siddique > > Numan Thanks Numan! Applied. > > > --- > > ovn-nb.xml | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/ovn-nb.xml b/ovn-nb.xml > > index b6a0d1f43..d5efbb33e 100644 > > --- a/ovn-nb.xml > > +++ b/ovn-nb.xml > > @@ -1712,8 +1712,9 @@ > > > > > > If the load balancing rule is configured with skip_snat > > -option, the force_snat_for_lb option configured for the router > > -pipeline will not be applied for this load balancer. > > +option, the option lb_force_snat_ip configured for the logical router > > +that references this load balancer will not be applied for this load > > +balancer. > > > > > > > > -- > > 2.30.2 > > > > ___ > > dev mailing list > > d...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn 1/2] system-test: Fix flake in ECMP IPv6 symmetric reply test
On Mon, Jul 12, 2021 at 10:28 AM Mark Gray wrote: > > Statically add IPv6 neighbor MAC addresses to avoid NS messages > evicting datapath flows causing occasional test failures. > > We also configure all interfaces to send only one IPv6 router > solicitation message. These messages can cause datapath flows > to be unexpectedly evicted causing test failures. > > Fixes: 7c927c0c0be1 ("ovn-northd: Fix IPv6 ECMP symmetric reply flows") > Signed-off-by: Mark Gray > --- > tests/system-ovn.at | 51 + > 1 file changed, 28 insertions(+), 23 deletions(-) > > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > index 79879c6e003b..fc377bbd1a47 100644 > --- a/tests/system-ovn.at > +++ b/tests/system-ovn.at > @@ -5833,17 +5833,35 @@ ovn-nbctl lr-route-add R3 fd01::/64 fd02::1 > > # Logical port 'alice1' in switch 'alice'. > ADD_NAMESPACES(alice1) > +# Only send 1 router solicitation as any additional ones can cause datapath > +# flows to get evicted, causing unexpected failures below. > +NS_CHECK_EXEC([alice1], [sysctl -w net.ipv6.conf.default.router_solicitations=1], [0], [dnl > +net.ipv6.conf.default.router_solicitations = 1 > +]) > ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \ > "fd01::1") > OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep tentative)" = ""]) > ovn-nbctl lsp-add alice alice1 \ > -- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2" > +# Add neighbour MAC address to avoid sending IPv6 NS messages which could > +# cause datapath flows to be evicted > +NS_CHECK_EXEC([alice1], [ip -6 neigh add fd01::1 lladdr 00:00:01:01:02:03 dev alice1], [0]) > > # Logical port 'bob1' in switch 'bob'. > ADD_NAMESPACES(bob1) > +# Only send 1 router solicitation as any additional ones can cause datapath > +# flows to get evicted, causing unexpected failures below. > +NS_CHECK_EXEC([bob1], [sysctl -w net.ipv6.conf.default.router_solicitations=1], [0], [dnl > +net.ipv6.conf.default.router_solicitations = 1 > +]) > ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \ > "fd07::2") > OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep tentative)" = ""]) > +# Add neighbour MAC addresses to avoid sending IPv6 NS messages which could > +# cause datapath flows to be evicted > +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::2 lladdr 00:00:02:01:02:03 dev bob1], [0]) > +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::3 lladdr 00:00:01:01:02:04 dev bob1], [0]) > + > ovn-nbctl lsp-add bob bob1 \ > -- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1" > > @@ -5852,45 +5870,32 @@ ovn-nbctl --wait=hv sync > > on_exit 'ovs-ofctl dump-flows br-int' > > -# Later in this test we will check for a datapath flow that matches: > -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)". Due > -# to the way OVS generates datapath flows with wildcards, ICMPv6 NS flows will > -# evict this datapath flow. In order to ensure that the flow does not > -# get evicted, we send one ping packet in order to carry out neighbor > -# discovery. We then flush the datpath to remove the NS flows so that the flow > -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)" will > -# be present when we check for it. > -NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \ > -[0], [dnl > -2 packets transmitted, 2 received, 0% packet loss, time 0ms > -]) > -ovs-appctl dpctl/del-flows > - > # 'bob1' should be able to ping 'alice1' directly. > NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \ > [0], [dnl > 20 packets transmitted, 20 received, 0% packet loss, time 0ms > ]) > > -# Ensure conntrack entry is present. We should not try to predict > -# the tunnel key for the output port, so we strip it from the labels > -# and just ensure that the known ethernet address is present. > -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \ > -sed -e 's/zone=[[0-9]]*/zone=/' | > -sed -e 's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'], [0], [dnl > -icmpv6,orig=(src=fd07::1,dst=fd01::2,id=,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=,type=129,code=0),zone=,labels=0x04010204 > -]) > - > # Ensure datapaths show conntrack states as expected > # Like with conntrack entries, we shouldn't try to predict > # port binding tunnel keys. So omit them from expected labels. > AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)' -c], [0], [dnl > 1 > ]) > + > AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x.*04010204/.*)' -c], [0], [dnl > 1 > ]) > > +# Ensure conntrack entry is present. We should not try to predict > +# the tunnel key for the output port, so we strip it from the labels > +# and just ensure that the known ethernet address is present. > +AT_CHECK([ovs-appctl
Re: [ovs-dev] [PATCH v2 0/9] OVSDB Relay Service Model. (Was: OVSDB 2-Tier deployment)
On 6/25/21 3:33 PM, Dumitru Ceara wrote: > On 6/12/21 3:59 AM, Ilya Maximets wrote: >> Replication can be used to scale out read-only access to the database. >> But there are clients that are not read-only, but read-mostly. >> One of the main examples is ovn-controller that mostly monitors >> updates from the Southbound DB, but needs to claim ports by sending >> transactions that changes some database tables. >> >> Southbound database serves lots of connections: all connections >> from ovn-controllers and some service connections from cloud >> infrastructure, e.g. some OpenStack agents are monitoring updates. >> At a high scale and with a big size of the database ovsdb-server >> spends too much time processing monitor updates and it's required >> to move this load somewhere else. This patch-set aims to introduce >> required functionality to scale out read-mostly connections by >> introducing a new OVSDB 'relay' service model . >> >> In this new service model ovsdb-server connects to existing OVSDB >> server and maintains in-memory copy of the database. It serves >> read-only transactions and monitor requests by its own, but forwards >> write transactions to the relay source. >> >> Key differences from the active-backup replication: >> - support for "write" transactions. >> - no on-disk storage. (probably, faster operation) >> - support for multiple remotes (connect to the clustered db). >> - doesn't try to keep connection as long as possible, but >> faster reconnects to other remotes to avoid missing updates. >> - No need to know the complete database schema beforehand, >> only the schema name. >> - can be used along with other standalone and clustered databases >> by the same ovsdb-server process. (doesn't turn the whole >> jsonrpc server to read-only mode) >> - supports modern version of monitors (monitor_cond_since), >> because based on ovsdb-cs. >> - could be chained, i.e. multiple relays could be connected >> one to another in a row or in a tree-like form. >> >> Bringing all above functionality to the existing active-backup >> replication doesn't look right as it will make it less reliable >> for the actual backup use case, and this also would be much >> harder from the implementation point of view, because current >> replication code is not based on ovsdb-cs or idl and all the required >> features would be likely duplicated or replication would be fully >> re-written on top of ovsdb-cs with severe modifications of the former. >> >> Relay is somewhere in the middle between active-backup replication and >> the clustered model taking a lot from both, therefore is hard to >> implement on top of any of them. >> >> To run ovsdb-server in relay mode, user need to simply run: >> >> ovsdb-server --remote=punix:db.sock relay:: >> >> e.g. >> >> ovsdb-server --remote=punix:db.sock relay:OVN_Southbound:tcp:127.0.0.1:6642 >> >> More details and examples in the documentation in the last patch >> of the series. >> >> I actually tried to implement transaction forwarding on top of >> active-backup replication in v1 of this seies, but it required >> a lot of tricky changes, including schema format changes in order >> to bring required information to the end clients, so I decided >> to fully rewrite the functionality in v2 with a different approach. >> >> Future work: >> - Add support for transaction history (it could be just inherited >> from the transaction ids received from the relay source). This >> will allow clients to utilize monitor_cond_since while working >> with relay. > > Hi Ilya, > > I acked most of the patches in the series (except 7/9 which I think > might need a rather straightforward change) and I saw Mark also left > some comments. > > I wonder though if the lack of monitor_cond_since will be a show stopper > for deploying this in production? Or do you expect reconnects to happen > less often do to the multi-tier nature of new deployments? I do expect that relays will hide most of the re-connections, so clients will have more stable connections. In this case it should be fine to not have monitor_cond_since for clients. For sure, I'll work on adding support for it. Another factor is that deployments will, likely, have more relays than the main servers, and so it should be easier to handle extra load of downloading the whole database, if required. > > I guess we need some scale test data with this deployed to have a better > idea. Sure, I collected some data from the scale tests and will include it in the cover letter for v3. > > In any case, very nice work! Thanks! > > Regards, > Dumitru > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.
On 7/2/21 1:05 PM, Mark Gray wrote: > On 12/06/2021 03:00, Ilya Maximets wrote: >> Main documentation for the service model and tutorial with the use case >> and configuration examples. >> >> Signed-off-by: Ilya Maximets >> --- >> Documentation/automake.mk| 1 + >> Documentation/ref/ovsdb.7.rst| 62 -- >> Documentation/topics/index.rst | 1 + >> Documentation/topics/ovsdb-relay.rst | 124 +++ >> NEWS | 3 + >> ovsdb/ovsdb-server.1.in | 27 +++--- >> 6 files changed, 200 insertions(+), 18 deletions(-) >> create mode 100644 Documentation/topics/ovsdb-relay.rst >> >> diff --git a/Documentation/automake.mk b/Documentation/automake.mk >> index bc30f94c5..213d9c867 100644 >> --- a/Documentation/automake.mk >> +++ b/Documentation/automake.mk >> @@ -52,6 +52,7 @@ DOC_SOURCE = \ >> Documentation/topics/networking-namespaces.rst \ >> Documentation/topics/openflow.rst \ >> Documentation/topics/ovs-extensions.rst \ >> +Documentation/topics/ovsdb-relay.rst \ >> Documentation/topics/ovsdb-replication.rst \ >> Documentation/topics/porting.rst \ >> Documentation/topics/record-replay.rst \ >> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst >> index e4f1bf766..a5b8a9c33 100644 >> --- a/Documentation/ref/ovsdb.7.rst >> +++ b/Documentation/ref/ovsdb.7.rst >> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, >> respectively. >> Service Models >> == >> >> -OVSDB supports three service models for databases: **standalone**, >> -**active-backup**, and **clustered**. The service models provide different >> -compromises among consistency, availability, and partition tolerance. They >> -also differ in the number of servers required and in terms of performance. >> The >> -standalone and active-backup database service models share one on-disk >> format, >> -and clustered databases use a different format, but the OVSDB programs work >> -with both formats. ``ovsdb(5)`` documents these file formats. >> +OVSDB supports four service models for databases: **standalone**, >> +**active-backup**, **relay** and **clustered**. The service models provide >> +different compromises among consistency, availability, and partition >> tolerance. >> +They also differ in the number of servers required and in terms of >> performance. >> +The standalone and active-backup database service models share one on-disk >> +format, and clustered databases use a different format, but the OVSDB >> programs >> +work with both formats. ``ovsdb(5)`` documents these file formats. Relay >> +databases has no on-disk storage. > > s/has/have OK. > >> >> RFC 7047, which specifies the OVSDB protocol, does not mandate or specify >> any particular service model. >> @@ -406,6 +407,50 @@ following consequences: >>that the client previously read. The OVSDB client library in Open vSwitch >>uses this feature to avoid servers with stale data. >> >> +Relay Service Model >> +--- >> + >> +A **relay** database is a way to scale out read-mostly access to the >> +existing database working in any service model including relay. >> + >> +Relay database creates and maintains an OVSDB connection with other OVSDB > > s/other/another OK. > >> +server. It uses this connection to maintain in-memory copy of the remote > > s/maintain/maintain an/ OK. > >> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the >> +database content changes on relay source in the real time. > > s/on/on the/ OK. > >> + >> +The purpose of relay server is to scale out the number of database clients. >> +Read-only transactions and monitor requests are fully handled by the relay >> +server itself. For the transactions that requests database modifications, >> +relay works as a proxy between the client and the relay source, i.e. it >> +forwards transactions and replies between them. >> + >> +Compared to a clustered and active-backup models, relay service model >> provides >> +read and write access to the database similarly to a clustered database (and >> +even more scalable), but with generally insignificant performance overhead >> of >> +an active-backup model. At the same time it doesn't increase availability >> that >> +needs to be covered by the service model of the relay source. >> + >> +Relay database has no on-disk storage and therefore cannot be converted to >> +any other service model. >> + >> +If there is already a database started in any service model, to start a >> relay >> +database server use ``ovsdb-server relay::``, where >> + is the database name as specified in the schema of the >> database >> +that existing server runs, and is an OVSDB connection >> method >> +(see `Connection Methods`_ below) that connects to the existing database >> +server. could contain a comma-separated list of >>
Re: [ovs-dev] [PATCH v2 9/9] docs: Add documentation for ovsdb relay mode.
On 6/25/21 3:35 PM, Dumitru Ceara wrote: > On 6/12/21 4:00 AM, Ilya Maximets wrote: >> Main documentation for the service model and tutorial with the use case >> and configuration examples. >> >> Signed-off-by: Ilya Maximets >> --- > > I left a few minor comments below. With them addressed: > > Acked-by: Dumitru Ceara > > Thanks! > >> Documentation/automake.mk| 1 + >> Documentation/ref/ovsdb.7.rst| 62 -- >> Documentation/topics/index.rst | 1 + >> Documentation/topics/ovsdb-relay.rst | 124 +++ >> NEWS | 3 + >> ovsdb/ovsdb-server.1.in | 27 +++--- >> 6 files changed, 200 insertions(+), 18 deletions(-) >> create mode 100644 Documentation/topics/ovsdb-relay.rst >> >> diff --git a/Documentation/automake.mk b/Documentation/automake.mk >> index bc30f94c5..213d9c867 100644 >> --- a/Documentation/automake.mk >> +++ b/Documentation/automake.mk >> @@ -52,6 +52,7 @@ DOC_SOURCE = \ >> Documentation/topics/networking-namespaces.rst \ >> Documentation/topics/openflow.rst \ >> Documentation/topics/ovs-extensions.rst \ >> +Documentation/topics/ovsdb-relay.rst \ >> Documentation/topics/ovsdb-replication.rst \ >> Documentation/topics/porting.rst \ >> Documentation/topics/record-replay.rst \ >> diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst >> index e4f1bf766..a5b8a9c33 100644 >> --- a/Documentation/ref/ovsdb.7.rst >> +++ b/Documentation/ref/ovsdb.7.rst >> @@ -121,13 +121,14 @@ schema checksum from a schema or database file, >> respectively. >> Service Models >> == >> >> -OVSDB supports three service models for databases: **standalone**, >> -**active-backup**, and **clustered**. The service models provide different >> -compromises among consistency, availability, and partition tolerance. They >> -also differ in the number of servers required and in terms of performance. >> The >> -standalone and active-backup database service models share one on-disk >> format, >> -and clustered databases use a different format, but the OVSDB programs work >> -with both formats. ``ovsdb(5)`` documents these file formats. >> +OVSDB supports four service models for databases: **standalone**, >> +**active-backup**, **relay** and **clustered**. The service models provide >> +different compromises among consistency, availability, and partition >> tolerance. >> +They also differ in the number of servers required and in terms of >> performance. >> +The standalone and active-backup database service models share one on-disk >> +format, and clustered databases use a different format, but the OVSDB >> programs >> +work with both formats. ``ovsdb(5)`` documents these file formats. Relay >> +databases has no on-disk storage. > > s/has/have OK. > >> >> RFC 7047, which specifies the OVSDB protocol, does not mandate or specify >> any particular service model. >> @@ -406,6 +407,50 @@ following consequences: >>that the client previously read. The OVSDB client library in Open vSwitch >>uses this feature to avoid servers with stale data. >> >> +Relay Service Model >> +--- >> + >> +A **relay** database is a way to scale out read-mostly access to the >> +existing database working in any service model including relay. >> + >> +Relay database creates and maintains an OVSDB connection with other OVSDB >> +server. It uses this connection to maintain in-memory copy of the remote >> +database (a.k.a. the ``relay source``) keeping the copy up-to-date as the >> +database content changes on relay source in the real time. >> + >> +The purpose of relay server is to scale out the number of database clients. >> +Read-only transactions and monitor requests are fully handled by the relay >> +server itself. For the transactions that requests database modifications, > > s/requests/request OK. > >> +relay works as a proxy between the client and the relay source, i.e. it >> +forwards transactions and replies between them. >> + >> +Compared to a clustered and active-backup models, relay service model >> provides > > s/Compared to a/Compared to the OK. > >> +read and write access to the database similarly to a clustered database (and >> +even more scalable), but with generally insignificant performance overhead >> of > > Joke: citation needed :) > >> +an active-backup model. At the same time it doesn't increase availability >> that >> +needs to be covered by the service model of the relay source. >> + >> +Relay database has no on-disk storage and therefore cannot be converted to >> +any other service model. >> + >> +If there is already a database started in any service model, to start a >> relay >> +database server use ``ovsdb-server relay::``, where >> + is the database name as specified in the schema of the >> database >> +that existing server runs, and is an OVSDB connection >> method >> +(see
Re: [ovs-dev] [PATCH ovn] northd: Process load balancer defrag flows once for all routers.
Hi Dumitru. Can you please rebase this? There's a conflict due to 384a7c6237da8f88ab68a9abd0982f92d7d8c2d2 (northd: Refactor Logical Flows for routers with DNAT/Load Balancers). On 7/6/21 6:45 AM, Lorenzo Bianconi wrote: This allows creating the match strings for each LB VIP exactly once, instead of once per datapath as it was before this change, reducing CPU usage in the ovn-northd event processing loop. On a scaled ovn-kubernetes-like deployment for 120 nodes, with 120 gateway logical routers and 16K load balancer VIPs attached to each gateway router, this reduces event processing loop times in ovn-northd from ~9.5 seconds to ~8.5 seconds. Reported-at: https://bugzilla.redhat.com/1962833 Signed-off-by: Dumitru Ceara Acked-by: Lorenzo Bianconi --- northd/ovn-northd.c | 98 ++--- 1 file changed, 48 insertions(+), 50 deletions(-) diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 570c6a3ef..0b043edec 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -8993,6 +8993,48 @@ build_lswitch_flows_for_lb(struct ovn_northd_lb *lb, struct hmap *lflows, build_lb_rules(lflows, lb, match, action); } +/* If there are any load balancing rules, we should send the packet to + * conntrack for defragmentation and tracking. This helps with two things. + * + * 1. With tracking, we can send only new connections to pick a DNAT ip address + *from a group. + * 2. If there are L4 ports in load balancing rules, we need the + *defragmentation to match on L4 ports. + */ +static void +build_lrouter_defrag_flows_for_lb(struct ovn_northd_lb *lb, + struct hmap *lflows, + struct ds *match) +{ +if (!lb->n_nb_lr) { +return; +} + +/* A set to hold all ips that need defragmentation and tracking. */ +struct sset all_ips = SSET_INITIALIZER(_ips); +for (size_t i = 0; i < lb->n_vips; i++) { +struct ovn_lb_vip *lb_vip = >vips[i]; + +if (!sset_add(_ips, lb_vip->vip_str)) { +continue; +} + +ds_clear(match); +if (IN6_IS_ADDR_V4MAPPED(_vip->vip)) { +ds_put_format(match, "ip && ip4.dst == %s", lb_vip->vip_str); +} else { +ds_put_format(match, "ip && ip6.dst == %s", lb_vip->vip_str); +} +for (size_t j = 0; j < lb->n_nb_lr; j++) { +ovn_lflow_add_with_hint(lflows, lb->nb_lr[j], +S_ROUTER_IN_DEFRAG, 100, +ds_cstr(match), "ct_next;", +>nlb->header_); +} +} +sset_destroy(_ips); +} + static void build_lrouter_flows_for_lb(struct ovn_northd_lb *lb, struct hmap *lflows, struct shash *meter_groups, @@ -9027,49 +9069,6 @@ build_lrouter_flows_for_lb(struct ovn_northd_lb *lb, struct hmap *lflows, } } -static void -build_lrouter_lb_flows(struct hmap *lflows, struct ovn_datapath *od, - struct hmap *lbs, struct ds *match) -{ -/* A set to hold all ips that need defragmentation and tracking. */ -struct sset all_ips = SSET_INITIALIZER(_ips); - -for (int i = 0; i < od->nbr->n_load_balancer; i++) { -struct nbrec_load_balancer *nb_lb = od->nbr->load_balancer[i]; -struct ovn_northd_lb *lb = -ovn_northd_lb_find(lbs, _lb->header_.uuid); -ovs_assert(lb); - -for (size_t j = 0; j < lb->n_vips; j++) { -struct ovn_lb_vip *lb_vip = >vips[j]; - -if (!sset_contains(_ips, lb_vip->vip_str)) { -sset_add(_ips, lb_vip->vip_str); -/* If there are any load balancing rules, we should send - * the packet to conntrack for defragmentation and - * tracking. This helps with two things. - * - * 1. With tracking, we can send only new connections to - *pick a DNAT ip address from a group. - * 2. If there are L4 ports in load balancing rules, we - *need the defragmentation to match on L4 ports. */ -ds_clear(match); -if (IN6_IS_ADDR_V4MAPPED(_vip->vip)) { -ds_put_format(match, "ip && ip4.dst == %s", - lb_vip->vip_str); -} else { -ds_put_format(match, "ip && ip6.dst == %s", - lb_vip->vip_str); -} -ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_DEFRAG, -100, ds_cstr(match), "ct_next;", -_lb->header_); -} -} -} -sset_destroy(_ips); -} - #define ND_RA_MAX_INTERVAL_MAX 1800 #define ND_RA_MAX_INTERVAL_MIN 4 @@ -11810,9 +11809,7 @@ lrouter_check_nat_entry(struct ovn_datapath *od, const struct
Re: [ovs-dev] ovn-northd-ddlog - high mem and cpu usage when started with an existing DB
On Thu, Jul 08, 2021 at 08:59:24PM +0200, Dumitru Ceara wrote: > Hi Ben, > > As discussed earlier, during the OVN meeting, I've noticed a new > performance issue with ovn-northd-ddlog when running it against a > database from one of our more recent scale tests: > > http://people.redhat.com/~dceara/ovn-northd-ddlog-tests/20210708/ovnnb_db.db > > ovn-northd-ddlog uses 100% CPU and never really reaches the point to > perform the first transaction to the Southbound. Memory usage is also > very high, I stopped it at 45GB RSS. > > To test I did: > SANDBOXFLAGS="--nbdb-source=/tmp/ovnnb_db.db --ddlog" make sandbox Thanks. I've been spending a lot of time with this Friday and today. It is a bit different from the other issues I've looked at. The previous ones were inefficient production of relatively small output. This one is inefficient production (and storage) of rather large output (millions of flows). I'm trying to get help from Leonid on how to reduce the memory usage. Thanks, Ben. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 7/9] ovsdb: relay: Reflect connection status in _Server database.
On 7/2/21 12:48 PM, Mark Gray wrote: > On 12/06/2021 03:00, Ilya Maximets wrote: >> It might be important for clients to know that relay lost connection >> with the relay remote, so they could re-connect to other relay. > > Yeah this makes sense. I guess there are some deployment scenarios we > should think about. For example: > > * Are there any special considerations for upgrades? The relays seem to > be quite ephemeral and could be stopped/started without much risk? It should be easy to just stop them and start new ones, e.g. scale up and down according to the current load. So, upgrades should not be a problem. They also able to update database schema on the fly, if needed. > * What about raft leader elections? Relays are not using 'leader-only', so they can connect to any raft member. Hence, elections should not be a problem. If one of the raft members will fell out of the cluster, relay will just re-connect to a different server. We may also extend relays in the future to support leader-only connection, if needed. > * I suppose there is a risk that inconsistencies could build up between > relays in case of error? If data inconsistency will be detected by relay, it will flag it, re-connect, and download a fresh copy of a database. If servers are sending different consistent data to different relays, well.. we can't really do anything about this. Something must be seriously wrong with main servers. > * Everything will be eventually consistent but I guess that is OK for OVN. Yep. All relays are getting the same updates, but may be in a slightly different time. The same is true for raft members, so it should be fine for OVN. > > I don't know a whole lot about this so I am just prompting you to see > have you considered these? Sure, I thought about this. And these are good questions to ask. > >> >> Signed-off-by: Ilya Maximets >> --- >> ovsdb/_server.xml| 17 + >> ovsdb/ovsdb-server.c | 3 ++- >> ovsdb/relay.c| 34 ++ >> ovsdb/relay.h| 4 >> 4 files changed, 49 insertions(+), 9 deletions(-) >> >> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml >> index 414be6715..b4606b25b 100644 >> --- a/ovsdb/_server.xml >> +++ b/ovsdb/_server.xml >> @@ -70,6 +70,15 @@ >>case of a relay database - until it connects to the relay source. >> >> >> + >> + True if the database is connected to its storage. A standalone >> database >> + is always connected. A clustered database is connected if the server >> is >> + in contact with a majority of its cluster. A relay database is >> connected >> + if the server is in contact with the relay source, i.e. is connected >> to >> + the server it syncs from. An unconnected database cannot be modified >> and >> + its data might be unavailable or stale. >> + >> + >> >> >> These columns are most interesting and in some cases only relevant >> for >> @@ -77,14 +86,6 @@ >> column is clustered. >> >> >> - >> -True if the database is connected to its storage. A standalone or >> -active-backup database is always connected. A clustered database is >> -connected if the server is in contact with a majority of its >> cluster. >> -An unconnected database cannot be modified and its data might be >> -unavailable or stale. >> - >> - >> >> True if the database is the leader in its cluster. For a >> standalone or >> active-backup database, this is always true. Always false for >> relay. >> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c >> index 77b1fbe40..cdd6cf7fd 100644 >> --- a/ovsdb/ovsdb-server.c >> +++ b/ovsdb/ovsdb-server.c >> @@ -1190,7 +1190,8 @@ update_database_status(struct ovsdb_row *row, struct >> db *db) >> ovsdb_util_write_string_column(row, "model", >> db->db->is_relay ? "relay" : >> ovsdb_storage_get_model(db->db->storage)); >> ovsdb_util_write_bool_column(row, "connected", >> - >> ovsdb_storage_is_connected(db->db->storage)); >> +db->db->is_relay ? ovsdb_relay_is_connected(db->db) >> + : ovsdb_storage_is_connected(db->db->storage)); >> ovsdb_util_write_bool_column(row, "leader", >> db->db->is_relay ? false : >> ovsdb_storage_is_leader(db->db->storage)); >> ovsdb_util_write_uuid_column(row, "cid", >> diff --git a/ovsdb/relay.c b/ovsdb/relay.c >> index ef689c649..4a8f5c206 100644 >> --- a/ovsdb/relay.c >> +++ b/ovsdb/relay.c >> @@ -31,6 +31,7 @@ >> #include "ovsdb-error.h" >> #include "row.h" >> #include "table.h" >> +#include "timeval.h" >> #include "transaction.h" >> #include "transaction-forward.h" >> #include "util.h" >> @@ -47,8 +48,36 @@ struct relay_ctx { >> struct ovsdb_schema *new_schema; >> schema_change_callback schema_change_cb; >> void
Re: [ovs-dev] [PATCH v2 7/9] ovsdb: relay: Reflect connection status in _Server database.
On 6/25/21 3:34 PM, Dumitru Ceara wrote: > On 6/12/21 4:00 AM, Ilya Maximets wrote: >> It might be important for clients to know that relay lost connection >> with the relay remote, so they could re-connect to other relay. >> >> Signed-off-by: Ilya Maximets >> --- > > [...] > >> >> +#define RELAY_MAX_RECONNECTION_MS 3 > > 30 seconds of relay "incorrectly" reporting that it is connected to the > source seems quite long. Also, should we make this configurable? We can make it configurable in the future. However, relays are meant to have multiple remotes, i.e. all servers of a main ovsdb cluster, and they will re-connect between them as soon as disconnection detected (by inactivity probe or in other way). So, the case where relay is not connected to the source for a very long time is twofold: 1. All main servers are down. We can't really do anything in this case, and it doesn't matter if clients know about this or not, as they have no place to re-connect anyway. 2. Our relay for some reason is not able to reach any of the main servers, but still has connection with clients. This case seems to be rare and it's likely that clients are split from the rest of the network along with their relay. It seems also unlikely that re-connection to a different relay will make any difference in this scenario. All in all, I don't think that it's necessarily a bad thing to keep clients connected for extra 30 seconds, because if relay is not able to re-connect, than it's unlikely that clients will be able to do that. As I said, we can make this value configurable in the future, if there will be need for it. What do you think? Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 6/9] ovsdb: relay: Add support for transaction forwarding.
On 6/25/21 3:34 PM, Dumitru Ceara wrote: > On 6/12/21 4:00 AM, Ilya Maximets wrote: >> Current version of ovsdb relay allows to scale out read-only >> access to the primary database. However, many clients are not >> read-only but read-mostly. For example, ovn-controller. >> >> In order to scale out database access for this case ovsdb-server >> need to process transactions that are not read-only. Relay is not >> allowed to do that, i.e. not allowed to modify the database, but it >> can act like a proxy and forward transactions that includes database >> modifications to the primary server and forward replies back to a >> client. At the same time it may serve read-only transactions and >> monitor requests by itself greatly reducing the load on primary >> server. >> >> This configuration will slightly increase transaction latency, but >> it's not very important for read-mostly use cases. >> >> Implementation details: >> With this change instead of creating a trigger to commit the >> transaction, ovsdb-server will create a trigger for transaction >> forwarding. Later, ovsdb_relay_run() will send all new transactions >> to the relay source. Once transaction reply received from the >> relay source, ovsdb-relay module will update the state of the >> transaction forwarding with the reply. After that, trigger_run() >> will complete the trigger and jsonrpc_server_run() will send the >> reply back to the client. Since transaction reply from the relay >> source will be received after all the updates, client will receive >> all the updates before receiving the transaction reply as it is in >> a normal scenario with other database models. >> >> Signed-off-by: Ilya Maximets >> --- > > I have a tiny nit below, otherwise: > > Acked-by: Dumitru Ceara > Thanks! > [...] > >> diff --git a/ovsdb/relay.c b/ovsdb/relay.c >> index 5f423a0b9..ef689c649 100644 >> --- a/ovsdb/relay.c >> +++ b/ovsdb/relay.c >> @@ -32,6 +32,7 @@ >> #include "row.h" >> #include "table.h" >> #include "transaction.h" >> +#include "transaction-forward.h" >> #include "util.h" >> >> VLOG_DEFINE_THIS_MODULE(relay); >> @@ -298,6 +299,7 @@ ovsdb_relay_run(void) >> struct relay_ctx *ctx = node->data; >> struct ovs_list events; >> >> +ovsdb_txn_forward_run(ctx->db, ctx->cs); >> ovsdb_cs_run(ctx->cs, ); >> >> struct ovsdb_cs_event *event; >> @@ -309,7 +311,9 @@ ovsdb_relay_run(void) >> >> switch (event->type) { >> case OVSDB_CS_EVENT_TYPE_RECONNECT: >> -/* Nothing to do. */ >> +/* Cancelling all the transactions that was already sent but > > Nit: s/was/were/ > OK. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH v2 6/9] ovsdb: relay: Add support for transaction forwarding.
On 7/2/21 12:24 PM, Mark Gray wrote: > On 12/06/2021 03:00, Ilya Maximets wrote: >> Current version of ovsdb relay allows to scale out read-only >> access to the primary database. However, many clients are not >> read-only but read-mostly. For example, ovn-controller. >> >> In order to scale out database access for this case ovsdb-server >> need to process transactions that are not read-only. Relay is not >> allowed to do that, i.e. not allowed to modify the database, but it >> can act like a proxy and forward transactions that includes database >> modifications to the primary server and forward replies back to a >> client. At the same time it may serve read-only transactions and >> monitor requests by itself greatly reducing the load on primary >> server. >> >> This configuration will slightly increase transaction latency, but >> it's not very important for read-mostly use cases. >> >> Implementation details: >> With this change instead of creating a trigger to commit the >> transaction, ovsdb-server will create a trigger for transaction >> forwarding. Later, ovsdb_relay_run() will send all new transactions >> to the relay source. Once transaction reply received from the >> relay source, ovsdb-relay module will update the state of the >> transaction forwarding with the reply. After that, trigger_run() >> will complete the trigger and jsonrpc_server_run() will send the >> reply back to the client. Since transaction reply from the relay >> source will be received after all the updates, client will receive >> all the updates before receiving the transaction reply as it is in >> a normal scenario with other database models. >> >> Signed-off-by: Ilya Maximets >> --- >> @@ -188,7 +196,7 @@ static bool >> ovsdb_trigger_try(struct ovsdb_trigger *t, long long int now) >> { >> /* Handle "initialized" state. */ >> -if (!t->reply) { >> +if (!t->reply && !t->txn_forward) { >> ovs_assert(!t->progress); >> >> struct ovsdb_txn *txn = NULL; >> @@ -198,13 +206,14 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long >> int now) >> return false; >> } >> >> -bool durable; >> +bool durable, forwarding_needed; >> >> struct json *result; >> +/* Trying to compose transaction. */ >> txn = ovsdb_execute_compose( >> t->db, t->session, t->request->params, t->read_only, >> t->role, t->id, now - t->created, >timeout_msec, >> -, ); >> +, _needed, ); >> if (!txn) { >> if (result) { >> /* Complete. There was an error but we still represent >> it >> @@ -217,9 +226,20 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long >> int now) >> return false; >> } >> >> -/* Transition to "committing" state. */ >> -t->reply = jsonrpc_create_reply(result, t->request->id); >> -t->progress = ovsdb_txn_propose_commit(txn, durable); >> +if (forwarding_needed) { >> +/* Transaction is good, but we don't need it. */ >> +ovsdb_txn_abort(txn); >> +json_destroy(result); >> +/* Transition to "forwarding" state. */ >> +t->txn_forward = ovsdb_txn_forward_create(t->db, >> t->request); >> +/* Forward will not be completed immediately. Will check >> + * next time. */ >> +return false; >> +} else { >> +/* Transition to "committing" state. */ >> +t->reply = jsonrpc_create_reply(result, t->request->id); >> +t->progress = ovsdb_txn_propose_commit(txn, durable); >> +} >> } else if (!strcmp(t->request->method, "convert")) { >> /* Permission check. */ >> if (t->role && *t->role) { >> @@ -348,6 +368,18 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long >> int now) >> ovsdb_trigger_complete(t); >> } >> >> +return false; >> +} else if (t->txn_forward) { >> +/* Handle "forwarding" state. */ > > Should we assert that reply == NULL and progress == NULL? Should not be necessary. We're here from the else branch of if (t->progress), so progrees is definitely NULL. But we can check for 'reply'. I'll add the check. > >> +if (!ovsdb_txn_forward_is_complete(t->txn_forward)) { >> +return false; >> +} >> + >> +/* Transition to "complete". */ But I'll add a check here, to be sure that we're not leaking the reply. >> +t->reply = ovsdb_txn_forward_steal_reply(t->txn_forward); >> +ovsdb_txn_forward_destroy(t->db, t->txn_forward); >> +t->txn_forward = NULL; >> +ovsdb_trigger_complete(t); >> return false; >> } >> ___ dev mailing list
Re: [ovs-dev] [PATCH v2 5/9] ovsdb: New ovsdb 'relay' service model.
On 6/25/21 3:34 PM, Dumitru Ceara wrote: > On 6/12/21 4:00 AM, Ilya Maximets wrote: >> New database service model 'relay' that is needed to scale out >> read-mostly database access, e.g. ovn-controller connections to >> OVN_Southbound. >> >> In this service model ovsdb-server connects to existing OVSDB >> server and maintains in-memory copy of the database. It serves >> read-only transactions and monitor requests by its own, but >> forwards write transactions to the relay source. >> >> Key differences from the active-backup replication: >> - support for "write" transactions (next commit). >> - no on-disk storage. (probably, faster operation) >> - support for multiple remotes (connect to the clustered db). >> - doesn't try to keep connection as long as possible, but >> faster reconnects to other remotes to avoid missing updates. >> - No need to know the complete database schema beforehand, >> only the schema name. >> - can be used along with other standalone and clustered databases >> by the same ovsdb-server process. (doesn't turn the whole >> jsonrpc server to read-only mode) >> - supports modern version of monitors (monitor_cond_since), >> because based on ovsdb-cs. >> - could be chained, i.e. multiple relays could be connected >> one to another in a row or in a tree-like form. >> - doesn't increase availability. >> - cannot be converted to other service models or become a main >> active server. >> >> Signed-off-by: Ilya Maximets >> --- > > I have some very nitpicky comments below (and an unrelated bug report), > nevertheless: > > Acked-by: Dumitru Ceara > >> ovsdb/_server.ovsschema | 7 +- >> ovsdb/_server.xml | 16 +- >> ovsdb/automake.mk | 2 + >> ovsdb/execution.c | 5 + >> ovsdb/ovsdb-server.c| 97 >> ovsdb/ovsdb.c | 2 + >> ovsdb/ovsdb.h | 3 + >> ovsdb/relay.c | 339 >> ovsdb/relay.h | 34 >> 9 files changed, 464 insertions(+), 41 deletions(-) >> create mode 100644 ovsdb/relay.c >> create mode 100644 ovsdb/relay.h >> >> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema >> index a867e5cbf..e3d9d893b 100644 >> --- a/ovsdb/_server.ovsschema >> +++ b/ovsdb/_server.ovsschema >> @@ -1,13 +1,14 @@ >> {"name": "_Server", >> - "version": "1.1.0", >> - "cksum": "3236486585 698", >> + "version": "1.2.0", >> + "cksum": "3009684573 744", >> "tables": { >> "Database": { >> "columns": { >> "name": {"type": "string"}, >> "model": { >> "type": {"key": {"type": "string", >> - "enum": ["set", ["standalone", "clustered"]]}}}, >> + "enum": ["set", >> + ["standalone", "clustered", >> "relay"]]}}}, >> "connected": {"type": "boolean"}, >> "leader": {"type": "boolean"}, >> "schema": { >> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml >> index 70cd22db7..414be6715 100644 >> --- a/ovsdb/_server.xml >> +++ b/ovsdb/_server.xml >> @@ -60,12 +60,14 @@ >> >> >>The storage model: standalone for a standalone or >> - active-backup database, clustered for a clustered >> database. >> + active-backup database, clustered for a clustered >> database, >> + relay for a relay database. >> >> >> >>The database schema, as a JSON string. In the case of a clustered >> - database, this is empty until it finishes joining its cluster. >> + database, this is empty until it finishes joining its cluster. In the >> + case of a relay database - until it connects to the relay source. >> >> >> >> @@ -85,20 +87,20 @@ >> >> >> True if the database is the leader in its cluster. For a >> standalone or >> -active-backup database, this is always true. >> +active-backup database, this is always true. Always false for >> relay. >> >> >> >> The cluster ID for this database, which is the same for all of the >> -servers that host this particular clustered database. For a >> standalone >> -or active-backup database, this is empty. >> +servers that host this particular clustered database. For a >> +standalone, active-backup or relay database, this is empty. >> >> >> >> The server ID for this database, different for each server that >> hosts a >> particular clustered database. A server that hosts more than one >> clustered database will have a different sid in each >> one. >> -For a standalone or active-backup database, this is empty. >> +For a standalone, active-backup or relay database, this is empty. >> >> >> >> @@ -112,7 +114,7 @@ >> >> >> >> - For a standalone or active-backup database, this is empty. >> + For a standalone, active-backup or
Re: [ovs-dev] [PATCH v2 5/9] ovsdb: New ovsdb 'relay' service model.
On 6/19/21 8:30 PM, Mark Gray wrote: > On 12/06/2021 03:00, Ilya Maximets wrote: >> New database service model 'relay' that is needed to scale out >> read-mostly database access, e.g. ovn-controller connections to >> OVN_Southbound. >> >> In this service model ovsdb-server connects to existing OVSDB >> server and maintains in-memory copy of the database. It serves >> read-only transactions and monitor requests by its own, but >> forwards write transactions to the relay source. >> >> Key differences from the active-backup replication: >> - support for "write" transactions (next commit). >> - no on-disk storage. (probably, faster operation) > > Any data to back this up? Nope. That's why "probably". :) It's hard to directly compare active-backup with relay from this acpect, because they are using different types of monitors to receive updates, so performance might vary. OTOH, my logic here is that both implemntations are using ovsdb_txn_propose_commit_block() function to apply received changes to the database. And this function involves ovsdb_storage_write(), which is no-op in case of relay, but the actual write to the file in case of backup server. Therefore relay should be faster. > >> - support for multiple remotes (connect to the clustered db). >> - doesn't try to keep connection as long as possible, but >> faster reconnects to other remotes to avoid missing updates. >> - No need to know the complete database schema beforehand, >> only the schema name. >> - can be used along with other standalone and clustered databases >> by the same ovsdb-server process. (doesn't turn the whole >> jsonrpc server to read-only mode) >> - supports modern version of monitors (monitor_cond_since), >> because based on ovsdb-cs. >> - could be chained, i.e. multiple relays could be connected >> one to another in a row or in a tree-like form. > > Cool! > >> - doesn't increase availability. >> - cannot be converted to other service models or become a main >> active server. >> >> Signed-off-by: Ilya Maximets >> --- >> ovsdb/_server.ovsschema | 7 +- >> ovsdb/_server.xml | 16 +- >> ovsdb/automake.mk | 2 + >> ovsdb/execution.c | 5 + >> ovsdb/ovsdb-server.c| 97 >> ovsdb/ovsdb.c | 2 + >> ovsdb/ovsdb.h | 3 + >> ovsdb/relay.c | 339 >> ovsdb/relay.h | 34 >> 9 files changed, 464 insertions(+), 41 deletions(-) >> create mode 100644 ovsdb/relay.c >> create mode 100644 ovsdb/relay.h >> >> diff --git a/ovsdb/_server.ovsschema b/ovsdb/_server.ovsschema >> index a867e5cbf..e3d9d893b 100644 >> --- a/ovsdb/_server.ovsschema >> +++ b/ovsdb/_server.ovsschema >> @@ -1,13 +1,14 @@ >> {"name": "_Server", >> - "version": "1.1.0", >> - "cksum": "3236486585 698", >> + "version": "1.2.0", >> + "cksum": "3009684573 744", >> "tables": { >> "Database": { >> "columns": { >> "name": {"type": "string"}, >> "model": { >> "type": {"key": {"type": "string", >> - "enum": ["set", ["standalone", "clustered"]]}}}, >> + "enum": ["set", >> + ["standalone", "clustered", >> "relay"]]}}}, >> "connected": {"type": "boolean"}, >> "leader": {"type": "boolean"}, >> "schema": { >> diff --git a/ovsdb/_server.xml b/ovsdb/_server.xml >> index 70cd22db7..414be6715 100644 >> --- a/ovsdb/_server.xml >> +++ b/ovsdb/_server.xml >> @@ -60,12 +60,14 @@ >> >> >>The storage model: standalone for a standalone or >> - active-backup database, clustered for a clustered >> database. >> + active-backup database, clustered for a clustered >> database, >> + relay for a relay database. >> >> >> >>The database schema, as a JSON string. In the case of a clustered >> - database, this is empty until it finishes joining its cluster. >> + database, this is empty until it finishes joining its cluster. In the >> + case of a relay database - until it connects to the relay source. > > suggestion: "In the case of a relay database, this is empty until it > connects to the relay source" OK. >> >> >> >> @@ -85,20 +87,20 @@ >> >> >> True if the database is the leader in its cluster. For a >> standalone or >> -active-backup database, this is always true. >> +active-backup database, this is always true. Always false for >> relay. > > suggestion: "For a relay database, this is always false" OK. >> >> >> >> The cluster ID for this database, which is the same for all of the >> -servers that host this particular clustered database. For a >> standalone >> -or active-backup database, this is empty. >> +servers that host this particular clustered database. For a >> +standalone, active-backup or relay database, this
Re: [ovs-dev] [PATCH v3 ovn 2/2] controller: incrementally create ras port_binding list
On 7/12/21 2:18 PM, Lorenzo Bianconi wrote: Incrementally manage local_active_ports_ras map for interfaces where periodic router advertisement has been enabled. This patch allows to avoid looping over all local interfaces to check if periodic RA is running on the current port binding. Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 7 +++ controller/binding.h| 1 + controller/ovn-controller.c | 10 +++- controller/pinctrl.c| 93 - controller/pinctrl.h| 3 +- 5 files changed, 69 insertions(+), 45 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 9711ac850..09793a6f6 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); enum en_lport_type lport_type = get_lport_type(pb); @@ -2514,6 +2517,10 @@ delete_done: b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index 60ad49da0..77197e742 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; struct shash *local_active_ports_ipv6_pd; +struct shash *local_active_ports_ras; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c4eb54755..34134e87b 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1031,6 +1031,7 @@ struct ed_type_runtime_data { struct hmap tracked_dp_bindings; struct shash local_active_ports_ipv6_pd; +struct shash local_active_ports_ras; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1119,6 +1120,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); shash_init(>local_active_ports_ipv6_pd); +shash_init(>local_active_ports_ras); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1145,6 +1147,7 @@ en_runtime_data_cleanup(void *data) } hmap_destroy(_data->local_datapaths); shash_destroy(_data->local_active_ports_ipv6_pd); +shash_destroy(_data->local_active_ports_ras); Like in patch 1, this should be shash_destroy_free_data() local_binding_data_destroy(_data->lbinding_data); } @@ -1225,6 +1228,8 @@ init_binding_ctx(struct engine_node *node, b_ctx_out->local_datapaths = _data->local_datapaths; b_ctx_out->local_active_ports_ipv6_pd = _data->local_active_ports_ipv6_pd; +b_ctx_out->local_active_ports_ras = +_data->local_active_ports_ras; b_ctx_out->local_lports = _data->local_lports; b_ctx_out->local_lports_changed = false; b_ctx_out->related_lports = _data->related_lports; @@ -1243,6 +1248,7 @@ en_runtime_data_run(struct engine_node *node, void *data) struct ed_type_runtime_data *rt_data = data; struct hmap *local_datapaths = _data->local_datapaths; struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd; +struct shash *local_active_ras = _data->local_active_ports_ras; struct sset *local_lports = _data->local_lports; struct sset *active_tunnels = _data->active_tunnels; @@ -1259,6 +1265,7 @@ en_runtime_data_run(struct engine_node *node, void *data) } hmap_clear(local_datapaths); shash_clear(local_active_ipv6_pd); +shash_clear(local_active_ras); Use shash_clear_free_data(). local_binding_data_destroy(_data->lbinding_data); sset_destroy(local_lports); related_lports_destroy(_data->related_lports); @@ -3274,7 +3281,8 @@ main(int argc, char *argv[]) br_int, chassis, _data->local_datapaths, _data->active_tunnels, -
Re: [ovs-dev] [PATCH v3 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list
Sorry Lorenzo but I found one more issue. Sorry for not noticing it during an earlier review. On 7/12/21 2:18 PM, Lorenzo Bianconi wrote: Incrementally manage local_active_ports_ipv6_pd map for interfaces where IPv6 prefix-delegation has been enabled. This patch allows to avoid looping over all local interfaces to check if prefix-delegation is running on the current port binding. Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 32 +++ controller/binding.h| 1 + controller/ovn-controller.c | 11 +++- controller/ovn-controller.h | 6 ++ controller/pinctrl.c| 107 +--- controller/pinctrl.h| 4 +- 6 files changed, 103 insertions(+), 58 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 594babc98..9711ac850 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb, } } +static void +update_active_pb_ras_pd(const struct sbrec_port_binding *pb, +struct hmap *local_datapaths, +struct shash *map, const char *conf) +{ +bool ras_pd_conf = smap_get_bool(>options, conf, false); +struct shash_node *iter = shash_find(map, pb->logical_port); + +if (iter && !ras_pd_conf) { +shash_delete(map, iter); There's a memory leak here. iter->data needs to be freed. +return; +} +struct pb_ld_binding *ras_pd = NULL; +if (!iter && ras_pd_conf) { +ras_pd = xzalloc(sizeof *ras_pd); +ras_pd->pb = pb; +shash_add(map, pb->logical_port, ras_pd); +} +if (ras_pd) { The logic here has changed since the first version of the patch, and I think it's wrong now. This now will only update ras_pd->ld if ras_pd was allocated during this function call. Previously, ld would be updated when the pb_ld_binding was found in the map. I think this is a bit confusing since you're dealing both with shash_node and pb_ld_binding types in this function. I think you can do something like this: if (iter && !ras_pd_conf) { /* delete iter from map */ return; } struct pb_ld_binding *ras_pd = NULL; if (ras_pd_conf) { if (iter) { ras_pd = iter->data; } else { /* allocate ras_pd and add it to map */ } ovs_assert(ras_pd); ras_pd->ld = get_local_datapath(...); } +ras_pd->ld = get_local_datapath(local_datapaths, +pb->datapath->tunnel_key); +} +} + /* Corresponds to each Port_Binding.type. */ enum en_lport_type { LP_UNKNOWN, @@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) const struct sbrec_port_binding *pb; SBREC_PORT_BINDING_TABLE_FOR_EACH (pb, b_ctx_in->port_binding_table) { +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); switch (lport_type) { @@ -2482,6 +2510,10 @@ delete_done: continue; } +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index a08011ae2..60ad49da0 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; +struct shash *local_active_ports_ipv6_pd; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 6a9c25f28..c4eb54755 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1029,6 +1029,8 @@ struct ed_type_runtime_data { bool tracked; bool local_lports_changed; struct hmap tracked_dp_bindings; + +struct shash local_active_ports_ipv6_pd; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1116,6 +1118,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, sset_init(>egress_ifaces); smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); +shash_init(>local_active_ports_ipv6_pd); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1141,6 +1144,7 @@ en_runtime_data_cleanup(void *data) free(cur_node); } hmap_destroy(_data->local_datapaths); +
Re: [ovs-dev] [PATCH v2 4/9] ovsdb: row: Add support for xor-based row updates.
On 6/19/21 8:01 PM, Mark Gray wrote: > On 12/06/2021 03:00, Ilya Maximets wrote: >> This will be used to apply update3 type updates to ovsdb tables >> while processing updates for future ovsdb 'relay' service model. >> >> 'ovsdb_datum_apply_diff' is allowed to fail, so adding support >> to return this error. >> >> Signed-off-by: Ilya Maximets >> --- >> ovsdb/execution.c | 5 +++-- >> ovsdb/replication.c | 2 +- >> ovsdb/row.c | 30 +- >> ovsdb/row.h | 6 -- >> ovsdb/table.c | 9 + >> ovsdb/table.h | 2 +- >> 6 files changed, 39 insertions(+), 15 deletions(-) >> >> diff --git a/ovsdb/execution.c b/ovsdb/execution.c >> index 3a0dad5d0..f6150e944 100644 >> --- a/ovsdb/execution.c >> +++ b/ovsdb/execution.c >> @@ -483,8 +483,9 @@ update_row_cb(const struct ovsdb_row *row, void *ur_) >> >> ur->n_matches++; >> if (!ovsdb_row_equal_columns(row, ur->row, ur->columns)) { >> -ovsdb_row_update_columns(ovsdb_txn_row_modify(ur->txn, row), >> - ur->row, ur->columns); >> +ovsdb_error_assert(ovsdb_row_update_columns( >> + ovsdb_txn_row_modify(ur->txn, row), >> + ur->row, ur->columns, false)); >> } >> >> return true; >> diff --git a/ovsdb/replication.c b/ovsdb/replication.c >> index b755976b0..d8b56d813 100644 >> --- a/ovsdb/replication.c >> +++ b/ovsdb/replication.c >> @@ -677,7 +677,7 @@ process_table_update(struct json *table_update, const >> char *table_name, >> struct ovsdb_error *error; >> error = (!new ? ovsdb_table_execute_delete(txn, , table) >> : !old ? ovsdb_table_execute_insert(txn, , table, new) >> - : ovsdb_table_execute_update(txn, , table, new)); >> + : ovsdb_table_execute_update(txn, , table, new, >> false)); >> if (error) { >> if (!strcmp(ovsdb_error_get_tag(error), "consistency >> violation")) { >> ovsdb_error_assert(error); >> diff --git a/ovsdb/row.c b/ovsdb/row.c >> index 755ab91a8..65a054621 100644 >> --- a/ovsdb/row.c >> +++ b/ovsdb/row.c >> @@ -163,20 +163,40 @@ ovsdb_row_equal_columns(const struct ovsdb_row *a, >> return true; >> } >> >> -void >> +struct ovsdb_error * >> ovsdb_row_update_columns(struct ovsdb_row *dst, >> const struct ovsdb_row *src, >> - const struct ovsdb_column_set *columns) >> + const struct ovsdb_column_set *columns, >> + bool xor) >> { >> size_t i; >> >> for (i = 0; i < columns->n_columns; i++) { >> const struct ovsdb_column *column = columns->columns[i]; >> +struct ovsdb_datum xor_datum; >> +struct ovsdb_error *error; >> + >> +if (xor) { >> +error = ovsdb_datum_apply_diff(_datum, >> + >fields[column->index], >> + >fields[column->index], >> + >type); >> +if (error) { >> +return error; >> +} >> +} >> + >> ovsdb_datum_destroy(>fields[column->index], >type); >> -ovsdb_datum_clone(>fields[column->index], >> - >fields[column->index], >> - >type); > > Could you move ovsdb_datum_destroy(>fields[column->index], > >type) into the "else" clause below and then merge the "if" > clause below into the "if" clause above? We still need to destroy for both branches, so what I can do is something like this: @@ -184,13 +185,11 @@ ovsdb_row_update_columns(struct ovsdb_row *dst, if (error) { return error; } -} -ovsdb_datum_destroy(>fields[column->index], >type); - -if (xor) { +ovsdb_datum_destroy(>fields[column->index], >type); ovsdb_datum_swap(>fields[column->index], _datum); } else { +ovsdb_datum_destroy(>fields[column->index], >type); ovsdb_datum_clone(>fields[column->index], >fields[column->index], >type); --- i.e. copy the ovsdb_datum_destroy() to both branches and merge the "if"s, but I found this harder to read. I'll keep as is for now, but if you think that above version will be better, I can use it. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] reconnect: Add graceful reconnect.
On 6/29/21 1:20 PM, Dumitru Ceara wrote: > Until now clients that needed to reconnect immediately could only use > reconnect_force_reconnect(). However, reconnect_force_reconnect() > doesn't reset the backoff for connections that were alive long enough > (more than backoff seconds). > > Moreover, the reconnect library cannot determine the exact reason why a > client wishes to initiate a reconnection. In most cases reconnection > happens because of a fatal error when communicating with the remote, > e.g., in the ovsdb-cs layer, when invalid messages are received from > ovsdb-server. In such cases it makes sense to not reset the backoff > because the remote seems to be unhealthy. > > There are however cases when reconnection is needed for other reasons. > One such example is when ovsdb-clients require "leader-only" connections > to clustered ovsdb-server databases. Whenever the client determines > that the remote is not a leader anymore, it decides to reconnect to a > new remote from its list, searching for the new leader. Using > jsonrpc_force_reconnect() (which calls reconnect_force_reconnect()) will > not reset backoff even though the former leader is still likely in good > shape. > > Since 3c2d6274bcee ("raft: Transfer leadership before creating > snapshots.") leadership changes inside the clustered database happen > more often and therefore "leader-only" clients need to reconnect more > often too. Not resetting the backoff every time a leadership change > happens will cause all reconnections to happen with the maximum backoff > (8 seconds) resulting in significant latency. > > This commit also updates the Python reconnect and IDL implementations > and adds tests for force-reconnect and graceful-reconnect. > > Reported-at: https://bugzilla.redhat.com/1977264 > Signed-off-by: Dumitru Ceara > --- Hi, Dumitru. Thanks for working on this issue. I've seen it in practice while running OVN tests, but I still don't quiet understand why it happens. Could you, please, describe how state transitioning work here for the ovsdb-idl case? > +# Forcefully reconnect. > +force-reconnect > + in RECONNECT for 0 ms (2000 ms backoff) > + 1 successful connections out of 3 attempts, seqno 2 > + disconnected > +run > + should disconnect > +connecting > + in CONNECTING for 0 ms (2000 ms backoff) Especially this part seems wrong to me. Because after 'should disconnect' there should be 'disconnect' of 'connect-fail', but not 'connecting'. We literally should disconnect here, otherwise it's a violation of the reconnect API. And my concern is that ovsdb-cs or jsonrpc violates the API somewhere by not calling reconnect_disconnectd() when it is required, or there is some other bug that makes 'reconnect' module to jump over few states in a fsm. The logical workflow for the force-reconnect, from what I see in the code should be: 1. force-reconnect --> transition to S_RECONNECT 2. run -> in S_RECONNECT, so returning RECONNECT_DISCONNECT 3. disconnect -> check the state, update backoff and transition to S_BACKOFF 4. run -> in S_BACKOFF, so returning RECONNECT_CONNECT 5. connected Something is fishy here, because ovsdb-cs somehow jumps over step #3 and maybe also #4. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] rhel: use /run instead of /var/run
On Wed, May 12, 2021 at 05:08:08PM +0200, Timothy Redaelli wrote: > Systemd unit file generates warnings about PID file path since /var/run > is a legacy path so just use /run instead of /var/run. > > /var/run is a symlink of /run starting from RHEL7 (and any other distribution > that uses systemd). > > Reported-at: https://bugzilla.redhat.com/1952081 > Signed-off-by: Timothy Redaelli > --- Reproduced on F34: Jul 12 17:03:28 p50 systemd[1]: /usr/lib/systemd/system/ovs-vswitchd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/openvswitch/ovs-vswitchd.pid → /run/openvswitch/ovs-vswitchd.pid; please update the unit file accordingly. Acked-by: Flavio Leitner Thanks Timothy, fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] ovn-nb.xml: Fix the description for LB's skip_snat option.
On Fri, Jul 9, 2021 at 2:49 PM Han Zhou wrote: > > lb_force_snat_ip is a flag set in logical flow pipeline, while > lb_force_snat_ip is the option configured in NB DB. In NB document we > should mention the actual option configured in NB instead of the flow > details. > > Signed-off-by: Han Zhou Acked-by: Numan Siddique Numan > --- > ovn-nb.xml | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/ovn-nb.xml b/ovn-nb.xml > index b6a0d1f43..d5efbb33e 100644 > --- a/ovn-nb.xml > +++ b/ovn-nb.xml > @@ -1712,8 +1712,9 @@ > > > If the load balancing rule is configured with skip_snat > -option, the force_snat_for_lb option configured for the router > -pipeline will not be applied for this load balancer. > +option, the option lb_force_snat_ip configured for the logical router > +that references this load balancer will not be applied for this load > +balancer. > > > > -- > 2.30.2 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [ovn] branch-20.09 tests fail with OVS higher than 2.14.0
On Fri, Jul 9, 2021 at 9:01 AM Dumitru Ceara wrote: > > On 7/8/21 6:34 PM, Vladislav Odintsov wrote: > > Hi, > > Hi Vladislav, > > > > > I see constantly failing test while OVN branch-20.09 against OVS higher > > than 2.14.0 (2.14.1, 2.14.2, branch-2.14): > > ovn -- ensure one gw controller restart in HA doesn't bounce the master > > > > ## ## > > ## Tested programs. ## > > ## ## > > ./testsuite.at:1: > > /builddir/build/BUILD/ovn-20.09.1/openvswitch-2.14.1/vswitchd/ovs-vswitchd > > --version > > ovs-vswitchd (Open vSwitch) 2.14.1 > > ./testsuite.at:1: > > /builddir/build/BUILD/ovn-20.09.1/openvswitch-2.14.1/utilities/ovs-vsctl > > --version > > ovs-vsctl (Open vSwitch) 2.14.1 > > DB Schema 8.2.0 > > ## -- ## > > ## Running the tests. ## > > ## -- ## > > testsuite: starting at: Thu Jul 8 17:44:21 MSK 2021 > > testsuite: ending at: Thu Jul 8 17:44:52 MSK 2021 > > testsuite: test suite duration: 0h 0m 31s > > ## - ## > > ## Test results. ## > > ## - ## > > ERROR: 1 test was run, > > 1 failed unexpectedly. > > ## ## > > ## Summary of the failures. ## > > ## ## > > Failed tests: > > ovn 20.09.1 test suite test groups: > > NUM: FILE-NAME:LINE TEST-GROUP-NAME > > KEYWORDS > > 91: ovn.at:12245 ovn -- ensure one gw controller restart in HA > > doesn't bounce the master > > ## -- ## > > ## Detailed failed tests. ## > > ## -- ## > > # -*- compilation -*- > > 91. ovn.at:12245: testing ovn -- ensure one gw controller restart in HA > > doesn't bounce the master ... > > creating ovn-sb database > > creating ovn-nb database > > starting ovn-northd > > starting backup ovn-northd > > adding simulator 'main' > > adding simulator 'gw1' > > adding simulator 'gw2' > > adding simulator 'hv1' > > ./ovn.at:12277: ovn_populate_arp__ > > stdout: > > OK > > OK > > OK > > OK > > OK > > OK > > 194ab858-5fe5-448c-9600-f00a52a120e6 > > 511c7f52-8f85-4193-872b-f87c23420dfd > > dc46bb8e-35f5-420c-8874-b493c843fd31 > > Waiting until 1 rows in sb Chassis with name=gw2... > > ovn-macros.at:346: waiting until test $count = $(count_rows $db:$table $a > > $b $c)... > > ovn-macros.at:346: wait failed after 30 seconds > > sb table Chassis has the following rows. 0 rows match instead of expected 1: > > _uuid : a74cd080-1302-4224-9590-462017d88783 > > encaps : [393b112b-329d-4db1-9926-cd06124a6f2b, > > df12c311-1563-4202-a197-02686d835867] > > external_ids: {datapath-type="", > > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", > > is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", > > ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-monitor-all="false"} > > hostname: bldrvm02 > > name: hv1 > > nb_cfg : 0 > > other_config: {datapath-type="", > > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", > > is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", > > ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-monitor-all="false"} > > transport_zones : [] > > vtep_logical_switches: [] > > _uuid : 780ad589-47d9-4658-b1fb-e0ec96f96ad0 > > encaps : [2bed8f4c-dc55-444d-a259-b12c04a63b62, > > 752086e4-6372-4840-9f92-fd4a8df3ba21] > > external_ids: {datapath-type="", > > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", > > is-interconn="false", ovn-bridge-mappings="phys:br-phys", > > ovn-chassis-mac-mappings="", ovn-cms-options="", > > ovn-enable-lflow-cache="true", ovn-monitor-all="false"} > > hostname: bldrvm02 > > name: gw1 > > nb_cfg : 0 > > other_config: {datapath-type="", > > iface-types="dummy,dummy-internal,dummy-pmd,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", > > is-interconn="false", ovn-bridge-mappings="phys:br-phys", > > ovn-chassis-mac-mappings="", ovn-cms-options="", > > ovn-enable-lflow-cache="true", ovn-monitor-all="false"} > > transport_zones : [] > > vtep_logical_switches: [] > > ./ovs-macros.at:222: hard failure > > 91. ovn.at:12245: 91. ovn -- ensure one gw controller restart in HA doesn't > > bounce the master (ovn.at:12245): FAILED (ovs-macros.at:222) > > > > > > I also tried to build OVN with OVS 2.15 and it doesn’t build at all because > > of renaming "slave" to "member" in OVS. > > > > Some questions here: > > I'm not a maintainer (maintainers in cc) but I'll try to answer some of > your questions. > > > > > 1. As I understand, changes in OVS branch brought regression in OVN. > >
Re: [ovs-dev] [PATCH ovn branch-20.09] ovn-controller: Monitor chassis_private by chassis name.
On Fri, Jul 9, 2021 at 11:54 AM Dumitru Ceara wrote: > > The backport looks good to me, thanks! Thanks. I applied this patch to branch-20.09. Numan > > On 7/9/21 4:00 PM, Vladislav Odintsov wrote: > > Signed-off-by: Vladislav Odintsov > > > > Regards, > > Vladislav Odintsov > > > >> On 9 Jul 2021, at 16:55, Vladislav Odintsov wrote: > >> > >> Acked-by: Vladislav Odintsov > >> > >> Regards, > >> Vladislav Odintsov > >> > >>> On 9 Jul 2021, at 16:10, Vladislav Odintsov wrote: > >>> > >>> From: Dumitru Ceara > >>> > >>> Remove the use of sbrec_chassis_is_new() for uncommitted records. This > >>> is not the way IDL *_is_new() functions are supposed to be used. > >>> > >>> Note: With this change if the system-id changes there will be a > >>> transient error in ovn-controller due to ovn-controller trying to insert > >>> a new chassis_private record. This is due to the fact that the view of > >>> the chassis_private table changes and only chassis_private records > >>> matching the new chassis name are sent to ovn-controller. This gets > >>> corrected though in the next iteration of the ovn-controller processing > >>> loop. > >>> > >>> Suggested-by: Han Zhou > >>> Reported-at: > >>> https://mail.openvswitch.org/pipermail/ovs-dev/2020-October/376339.html > >>> Fixes: dce1af31b550 ("chassis: Fix chassis_private record updates when > >>> the system-id changes.") > >>> Signed-off-by: Dumitru Ceara > >>> Acked-by: Mark Gray > >>> Signed-off-by: Han Zhou > >>> (cherry picked from commit 1f915da95dc725131b7df094d494af9fda88ea92) > >>> --- > >>> controller/ovn-controller.c | 6 +++--- > >>> 1 file changed, 3 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>> index 3665c7b4e..b154a8486 100644 > >>> --- a/controller/ovn-controller.c > >>> +++ b/controller/ovn-controller.c > >>> @@ -181,7 +181,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, > >>> * chassis */ > >>>sbrec_port_binding_add_clause_type(, OVSDB_F_EQ, "chassisredirect"); > >>>sbrec_port_binding_add_clause_type(, OVSDB_F_EQ, "external"); > >>> -if (chassis && !sbrec_chassis_is_new(chassis)) { > >>> +if (chassis) { > >>>/* This should be mostly redundant with the other clauses for port > >>> * bindings, but it allows us to catch any ports that are assigned > >>> to > >>> * us but should not be. That way, we can clear their chassis > >>> @@ -205,8 +205,8 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl, > >>>>header_.uuid); > >>> > >>>/* Monitors Chassis_Private record for current chassis only. */ > >>> -sbrec_chassis_private_add_clause_chassis(, OVSDB_F_EQ, > >>> - >header_.uuid); > >>> +sbrec_chassis_private_add_clause_name(, OVSDB_F_EQ, > >>> + chassis->name); > >>>} else { > >>>/* During initialization, we monitor all records in > >>> Chassis_Private so > >>> * that we don't try to recreate existing ones. */ > >>> -- > >>> 2.30.0 > >>> > >> > >> ___ > >> dev mailing list > >> d...@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v2] ovn-controller: Propagate nb-cfg-ts to local OVSDB.
On Thu, Jun 17, 2021 at 3:36 AM Dumitru Ceara wrote: > > Also store the timestamp when ovn-controller started up. This helps > implementing alerts on the CMS side to detect whether ovn-controller is > still alive and functioning well. > > Reported-at: https://bugzilla.redhat.com/1924751 > Reported-by: Casey Callendrello > Signed-off-by: Dumitru Ceara > --- Thanks Dumitru and Mark. I applied this patch to the main branch. Numan > v2: > - Addressed Mark's comments: > - added units to documentation of timestamp fields. > - rephrased test comment. > - did *not* implement the micro optimization suggestion because > there's a chance the local ovsdb gets out of sync (e.g., txns fail > or values are changed externally) and ovn-controller should > reconciliate the database. > --- > controller/ovn-controller.8.xml | 25 + > controller/ovn-controller.c | 29 +++-- > tests/ovn-controller.at | 11 +++ > 3 files changed, 59 insertions(+), 6 deletions(-) > > diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml > index 8886df568..77067c3a3 100644 > --- a/controller/ovn-controller.8.xml > +++ b/controller/ovn-controller.8.xml > @@ -418,6 +418,18 @@ > > > > + > +external-ids:ovn-startup-ts in the Bridge > +table > + > + > + > + > + This key represents the timestamp (in milliseconds) at which > + ovn-controller process was started. > + > + > + > > external-ids:ovn-nb-cfg in the Bridge table > > @@ -429,6 +441,19 @@ >flows have been successfully installed in OVS. > > > + > + > +external-ids:ovn-nb-cfg-ts in the Bridge > +table > + > + > + > + > + This key represents the timestamp (in milliseconds) of the last > known > + OVN_Southbound.SB_Global.nb_cfg value for which all > + flows have been successfully installed in OVS. > + > + > > > OVN Southbound Database Usage > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index addb08755..2f8ceff9f 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -94,6 +94,8 @@ static unixctl_cb_func debug_delay_nb_cfg_report; > #define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-flow-generation" > > #define OVS_NB_CFG_NAME "ovn-nb-cfg" > +#define OVS_NB_CFG_TS_NAME "ovn-nb-cfg-ts" > +#define OVS_STARTUP_TS_NAME "ovn-startup-ts" > > static char *parse_options(int argc, char *argv[]); > OVS_NO_RETURN static void usage(void); > @@ -788,19 +790,30 @@ static void > store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct ovsdb_idl_txn *ovs_txn, > const struct sbrec_chassis_private *chassis, > const struct ovsrec_bridge *br_int, > - unsigned int delay_nb_cfg_report) > + unsigned int delay_nb_cfg_report, int64_t startup_ts) > { > struct ofctrl_acked_seqnos *acked_nb_cfg_seqnos = > ofctrl_acked_seqnos_get(ofctrl_seq_type_nb_cfg); > uint64_t cur_cfg = acked_nb_cfg_seqnos->last_acked; > > +if (ovs_txn && br_int > +&& startup_ts != smap_get_ullong(_int->external_ids, > + OVS_STARTUP_TS_NAME, 0)) { > +char *startup_ts_str = xasprintf("%"PRId64, startup_ts); > +ovsrec_bridge_update_external_ids_setkey(br_int, OVS_STARTUP_TS_NAME, > + startup_ts_str); > +free(startup_ts_str); > +} > + > if (!cur_cfg) { > goto done; > } > > +long long ts_now = time_wall_msec(); > + > if (sb_txn && chassis && cur_cfg != chassis->nb_cfg) { > sbrec_chassis_private_set_nb_cfg(chassis, cur_cfg); > -sbrec_chassis_private_set_nb_cfg_timestamp(chassis, > time_wall_msec()); > +sbrec_chassis_private_set_nb_cfg_timestamp(chassis, ts_now); > > if (delay_nb_cfg_report) { > VLOG_INFO("Sleep for %u sec", delay_nb_cfg_report); > @@ -808,12 +821,15 @@ store_nb_cfg(struct ovsdb_idl_txn *sb_txn, struct > ovsdb_idl_txn *ovs_txn, > } > } > > -if (ovs_txn && br_int && > -cur_cfg != smap_get_ullong(_int->external_ids, > - OVS_NB_CFG_NAME, 0)) { > +if (ovs_txn && br_int && cur_cfg != > smap_get_ullong(_int->external_ids, > +OVS_NB_CFG_NAME, 0)) > { > +char *cur_cfg_ts_str = xasprintf("%lld", ts_now); > char *cur_cfg_str = xasprintf("%"PRId64, cur_cfg); > ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_NAME, > cur_cfg_str); > +ovsrec_bridge_update_external_ids_setkey(br_int, OVS_NB_CFG_TS_NAME, > +
Re: [ovs-dev] [v4] dpif/dpcls: limit count subtable search info logs
Hi Kumar, There is an issue with the signed-offs reported by 0-day Robot. For additional info, please check the link below and look for the tag Co-authored-by: https://github.com/openvswitch/ovs/blob/master/Documentation/internals/contributing/submitting-patches.rst#tags Otherwise the patch looks good time. Thanks, fbl On Mon, Jul 12, 2021 at 11:44:05AM +0530, kumar Amber wrote: > From: Harry van Haaren > > This commit avoids many instances of "using subtable X for miniflow (x,y)" > in the ovs-vswitchd log when using the DPCLS Autovalidator. This occurs > when no specialized subtable is found, and the generic "_any" version of > the avx512 subtable search implementation was used. This change logs the > subtable usage once, avoiding duplicates. > > Signed-off-by: Harry van Haaren > Signed-off-by: kumar Amber > > --- > v4: > - add doc updtae from Flavio > v3: > - add comments from Flavio > - add documentation update > --- > Documentation/topics/dpdk/bridge.rst | 34 ++ > lib/dpif-netdev-lookup-avx512-gather.c | 4 +-- > 2 files changed, 36 insertions(+), 2 deletions(-) > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index 0f70a0cad..374e03eb0 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -182,6 +182,40 @@ chosen, and the 2nd occurance of that priority is not > used. Put in logical > terms, a subtable is chosen if its priority is greater than the previous > best candidate. > > +Optimizing Specific Subtable Search > +~~~ > + > +During the packet classification, the datapath can use specialized > +lookup tables to optimize the search. However, not all situations > +are optimized. If you see a message like the following one in the OVS > +logs, it means that there is no specialized implementation available > +for the current networking traffic. In this case, OVS will continue > +to process the traffic normally using a more generic lookup table." > + > +"Using non-specialized AVX512 lookup for subtable (4,1) and possibly others." > + > +(Note that the numbers 4 and 1 will likely be different in your logs) > + > +Additional specialized lookups can be added to OVS if the user > +provides that log message along with the command output as show > +below to the OVS mailing list. Note that the numbers in the log > +message ("subtable (X,Y)") need to match with the numbers in > +the provided command output ("dp-extra-info:miniflow_bits(X,Y)"). > + > +"ovs-appctl dpctl/dump-flows -m", which results in output like this: > + > +ufid:82770b5d-ca38-44ff-8283-74ba36bd1ca5, > skb_priority(0/0),skb_mark(0/0) > +,ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0), > + > dp_hash(0/0),in_port(pcap0),packet_type(ns=0,id=0),eth(src=00:00:00:00:00: > +00/00:00:00:00:00:00,dst=ff:ff:ff:ff:ff:ff/00:00:00:00:00:00),eth_type( > + > 0x8100),vlan(vid=1,pcp=0),encap(eth_type(0x0800),ipv4(src=127.0.0.1/0.0.0.0 > +,dst=127.0.0.1/0.0.0.0,proto=17/0,tos=0/0,ttl=64/0,frag=no),udp(src=53/0, > +dst=53/0)), packets:77072681, bytes:3545343326, used:0.000s, dp:ovs, > +actions:vhostuserclient0, dp-extra-info:miniflow_bits(4,1) > + > +Please send an email to the OVS mailing list ovs-dev@openvswitch.org with > +the output of the "dp-extra-info:miniflow_bits(4,1)" values. > + > CPU ISA Testing and Validation > ~~ > > diff --git a/lib/dpif-netdev-lookup-avx512-gather.c > b/lib/dpif-netdev-lookup-avx512-gather.c > index bc359dc4a..ced846aa7 100644 > --- a/lib/dpif-netdev-lookup-avx512-gather.c > +++ b/lib/dpif-netdev-lookup-avx512-gather.c > @@ -411,8 +411,8 @@ dpcls_subtable_avx512_gather_probe(uint32_t u0_bits, > uint32_t u1_bits) > */ > if (!f && (u0_bits + u1_bits) < (NUM_U64_IN_ZMM_REG * 2)) { > f = dpcls_avx512_gather_mf_any; > -VLOG_INFO("Using avx512_gather_mf_any for subtable (%d,%d)\n", > - u0_bits, u1_bits); > +VLOG_INFO_ONCE("Using non-specialized AVX512 lookup for subtable" > + " (%d,%d) and possibly others.", u0_bits, u1_bits); > } > > return f; > -- > 2.25.1 > > ___ > dev mailing list > d...@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev -- fbl ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH v3 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list
Incrementally manage local_active_ports_ipv6_pd map for interfaces where IPv6 prefix-delegation has been enabled. This patch allows to avoid looping over all local interfaces to check if prefix-delegation is running on the current port binding. Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 32 +++ controller/binding.h| 1 + controller/ovn-controller.c | 11 +++- controller/ovn-controller.h | 6 ++ controller/pinctrl.c| 107 +--- controller/pinctrl.h| 4 +- 6 files changed, 103 insertions(+), 58 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 594babc98..9711ac850 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb, } } +static void +update_active_pb_ras_pd(const struct sbrec_port_binding *pb, +struct hmap *local_datapaths, +struct shash *map, const char *conf) +{ +bool ras_pd_conf = smap_get_bool(>options, conf, false); +struct shash_node *iter = shash_find(map, pb->logical_port); + +if (iter && !ras_pd_conf) { +shash_delete(map, iter); +return; +} +struct pb_ld_binding *ras_pd = NULL; +if (!iter && ras_pd_conf) { +ras_pd = xzalloc(sizeof *ras_pd); +ras_pd->pb = pb; +shash_add(map, pb->logical_port, ras_pd); +} +if (ras_pd) { +ras_pd->ld = get_local_datapath(local_datapaths, +pb->datapath->tunnel_key); +} +} + /* Corresponds to each Port_Binding.type. */ enum en_lport_type { LP_UNKNOWN, @@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) const struct sbrec_port_binding *pb; SBREC_PORT_BINDING_TABLE_FOR_EACH (pb, b_ctx_in->port_binding_table) { +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); switch (lport_type) { @@ -2482,6 +2510,10 @@ delete_done: continue; } +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index a08011ae2..60ad49da0 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; +struct shash *local_active_ports_ipv6_pd; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 6a9c25f28..c4eb54755 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1029,6 +1029,8 @@ struct ed_type_runtime_data { bool tracked; bool local_lports_changed; struct hmap tracked_dp_bindings; + +struct shash local_active_ports_ipv6_pd; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1116,6 +1118,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, sset_init(>egress_ifaces); smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); +shash_init(>local_active_ports_ipv6_pd); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1141,6 +1144,7 @@ en_runtime_data_cleanup(void *data) free(cur_node); } hmap_destroy(_data->local_datapaths); +shash_destroy(_data->local_active_ports_ipv6_pd); local_binding_data_destroy(_data->lbinding_data); } @@ -1219,6 +1223,8 @@ init_binding_ctx(struct engine_node *node, b_ctx_in->ovs_table = ovs_table; b_ctx_out->local_datapaths = _data->local_datapaths; +b_ctx_out->local_active_ports_ipv6_pd = +_data->local_active_ports_ipv6_pd; b_ctx_out->local_lports = _data->local_lports; b_ctx_out->local_lports_changed = false; b_ctx_out->related_lports = _data->related_lports; @@ -1236,6 +1242,7 @@ en_runtime_data_run(struct engine_node *node, void *data) { struct ed_type_runtime_data *rt_data = data; struct hmap *local_datapaths = _data->local_datapaths; +struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd; struct sset *local_lports = _data->local_lports; struct sset *active_tunnels = _data->active_tunnels; @@ -1251,6 +1258,7 @@ en_runtime_data_run(struct engine_node *node, void *data) free(cur_node); }
[ovs-dev] [PATCH v3 ovn 2/2] controller: incrementally create ras port_binding list
Incrementally manage local_active_ports_ras map for interfaces where periodic router advertisement has been enabled. This patch allows to avoid looping over all local interfaces to check if periodic RA is running on the current port binding. Acked-by: Mark Michelson Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 7 +++ controller/binding.h| 1 + controller/ovn-controller.c | 10 +++- controller/pinctrl.c| 93 - controller/pinctrl.h| 3 +- 5 files changed, 69 insertions(+), 45 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 9711ac850..09793a6f6 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); enum en_lport_type lport_type = get_lport_type(pb); @@ -2514,6 +2517,10 @@ delete_done: b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index 60ad49da0..77197e742 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; struct shash *local_active_ports_ipv6_pd; +struct shash *local_active_ports_ras; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index c4eb54755..34134e87b 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1031,6 +1031,7 @@ struct ed_type_runtime_data { struct hmap tracked_dp_bindings; struct shash local_active_ports_ipv6_pd; +struct shash local_active_ports_ras; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1119,6 +1120,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); shash_init(>local_active_ports_ipv6_pd); +shash_init(>local_active_ports_ras); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1145,6 +1147,7 @@ en_runtime_data_cleanup(void *data) } hmap_destroy(_data->local_datapaths); shash_destroy(_data->local_active_ports_ipv6_pd); +shash_destroy(_data->local_active_ports_ras); local_binding_data_destroy(_data->lbinding_data); } @@ -1225,6 +1228,8 @@ init_binding_ctx(struct engine_node *node, b_ctx_out->local_datapaths = _data->local_datapaths; b_ctx_out->local_active_ports_ipv6_pd = _data->local_active_ports_ipv6_pd; +b_ctx_out->local_active_ports_ras = +_data->local_active_ports_ras; b_ctx_out->local_lports = _data->local_lports; b_ctx_out->local_lports_changed = false; b_ctx_out->related_lports = _data->related_lports; @@ -1243,6 +1248,7 @@ en_runtime_data_run(struct engine_node *node, void *data) struct ed_type_runtime_data *rt_data = data; struct hmap *local_datapaths = _data->local_datapaths; struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd; +struct shash *local_active_ras = _data->local_active_ports_ras; struct sset *local_lports = _data->local_lports; struct sset *active_tunnels = _data->active_tunnels; @@ -1259,6 +1265,7 @@ en_runtime_data_run(struct engine_node *node, void *data) } hmap_clear(local_datapaths); shash_clear(local_active_ipv6_pd); +shash_clear(local_active_ras); local_binding_data_destroy(_data->lbinding_data); sset_destroy(local_lports); related_lports_destroy(_data->related_lports); @@ -3274,7 +3281,8 @@ main(int argc, char *argv[]) br_int, chassis, _data->local_datapaths, _data->active_tunnels, -_data->local_active_ports_ipv6_pd); +_data->local_active_ports_ipv6_pd, +_data->local_active_ports_ras); /* Updating monitor conditions
[ovs-dev] [PATCH v3 ovn 0/2] incrementally process ras-ipv6 pd router ports
https://bugzilla.redhat.com/show_bug.cgi?id=1944220 Changes since v2: - use smap_get_bool instead of smap_get in update_active_pb_ras_pd routine Changes since v1: - use shash instead of hamp - always remove the entry from shash if the user removed the ipv6_pd/ras info in port_binding option column Lorenzo Bianconi (2): controller: incrementally create ipv6 prefix delegation port_binding list controller: incrementally create ras port_binding list controller/binding.c| 39 +++ controller/binding.h| 2 + controller/ovn-controller.c | 19 +++- controller/ovn-controller.h | 6 ++ controller/pinctrl.c| 198 ++-- controller/pinctrl.h| 5 +- 6 files changed, 169 insertions(+), 100 deletions(-) -- 2.31.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH net-next] openvswitch: Introduce per-cpu upcall dispatch
Hi Joe, Maybe you can take a look... Thanks, fbl On Thu, Jul 08, 2021 at 11:40:12AM -0300, Flavio Leitner wrote: > > Hi Pravin, > > Any thoughts on this patch? We are closing OVS 2.16, so it would > be nice to know if it looks okay or needs changes, specially > changes related to the userspace interface. > > Thanks, > fbl > > On Wed, Jun 30, 2021 at 05:53:49AM -0400, Mark Gray wrote: > > The Open vSwitch kernel module uses the upcall mechanism to send > > packets from kernel space to user space when it misses in the kernel > > space flow table. The upcall sends packets via a Netlink socket. > > Currently, a Netlink socket is created for every vport. In this way, > > there is a 1:1 mapping between a vport and a Netlink socket. > > When a packet is received by a vport, if it needs to be sent to > > user space, it is sent via the corresponding Netlink socket. > > > > This mechanism, with various iterations of the corresponding user > > space code, has seen some limitations and issues: > > > > * On systems with a large number of vports, there is a correspondingly > > large number of Netlink sockets which can limit scaling. > > (https://bugzilla.redhat.com/show_bug.cgi?id=1526306) > > * Packet reordering on upcalls. > > (https://bugzilla.redhat.com/show_bug.cgi?id=1844576) > > * A thundering herd issue. > > (https://bugzilla.redhat.com/show_bug.cgi?id=183) > > > > This patch introduces an alternative, feature-negotiated, upcall > > mode using a per-cpu dispatch rather than a per-vport dispatch. > > > > In this mode, the Netlink socket to be used for the upcall is > > selected based on the CPU of the thread that is executing the upcall. > > In this way, it resolves the issues above as: > > > > a) The number of Netlink sockets scales with the number of CPUs > > rather than the number of vports. > > b) Ordering per-flow is maintained as packets are distributed to > > CPUs based on mechanisms such as RSS and flows are distributed > > to a single user space thread. > > c) Packets from a flow can only wake up one user space thread. > > > > The corresponding user space code can be found at: > > https://mail.openvswitch.org/pipermail/ovs-dev/2021-April/382618.html > > > > Bugzilla: https://bugzilla.redhat.com/1844576 > > Signed-off-by: Mark Gray > > --- > > > > Notes: > > v1 - Reworked based on Flavio's comments: > > * Fixed handling of userspace action case > > * Renamed 'struct dp_portids' > > * Fixed handling of return from kmalloc() > > * Removed check for dispatch type from ovs_dp_get_upcall_portid() > >- Reworked based on Dan's comments: > > * Fixed handling of return from kmalloc() > >- Reworked based on Pravin's comments: > > * Fixed handling of userspace action case > >- Added kfree() in destroy_dp_rcu() to cleanup netlink port ids > > > > include/uapi/linux/openvswitch.h | 8 > > net/openvswitch/actions.c| 6 ++- > > net/openvswitch/datapath.c | 70 +++- > > net/openvswitch/datapath.h | 20 + > > 4 files changed, 101 insertions(+), 3 deletions(-) > > > > diff --git a/include/uapi/linux/openvswitch.h > > b/include/uapi/linux/openvswitch.h > > index 8d16744edc31..6571b57b2268 100644 > > --- a/include/uapi/linux/openvswitch.h > > +++ b/include/uapi/linux/openvswitch.h > > @@ -70,6 +70,8 @@ enum ovs_datapath_cmd { > > * set on the datapath port (for OVS_ACTION_ATTR_MISS). Only valid on > > * %OVS_DP_CMD_NEW requests. A value of zero indicates that upcalls should > > * not be sent. > > + * OVS_DP_ATTR_PER_CPU_PIDS: Per-cpu array of PIDs for upcalls when > > + * OVS_DP_F_DISPATCH_UPCALL_PER_CPU feature is set. > > * @OVS_DP_ATTR_STATS: Statistics about packets that have passed through > > the > > * datapath. Always present in notifications. > > * @OVS_DP_ATTR_MEGAFLOW_STATS: Statistics about mega flow masks usage for > > the > > @@ -87,6 +89,9 @@ enum ovs_datapath_attr { > > OVS_DP_ATTR_USER_FEATURES, /* OVS_DP_F_* */ > > OVS_DP_ATTR_PAD, > > OVS_DP_ATTR_MASKS_CACHE_SIZE, > > + OVS_DP_ATTR_PER_CPU_PIDS, /* Netlink PIDS to receive upcalls in > > per-cpu > > +* dispatch mode > > +*/ > > __OVS_DP_ATTR_MAX > > }; > > > > @@ -127,6 +132,9 @@ struct ovs_vport_stats { > > /* Allow tc offload recirc sharing */ > > #define OVS_DP_F_TC_RECIRC_SHARING (1 << 2) > > > > +/* Allow per-cpu dispatch of upcalls */ > > +#define OVS_DP_F_DISPATCH_UPCALL_PER_CPU (1 << 3) > > + > > /* Fixed logical ports. */ > > #define OVSP_LOCAL ((__u32)0) > > > > diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > > index ef15d9eb4774..f79679746c62 100644 > > --- a/net/openvswitch/actions.c > > +++ b/net/openvswitch/actions.c > > @@ -924,7 +924,11 @@ static int output_userspace(struct datapath *dp, > > struct sk_buff *skb, > >
Re: [ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()
Bleep bloop. Greetings Anton Ivanov, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. checkpatch: WARNING: Line has trailing whitespace #171 FILE: lib/timeval.c:327: * shortcut. Otherwise there is at least one fd in it for Lines checked: 192, Warnings: 1, Errors: 0 Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 2/2] ipsec.at: Fix ipsec test flake
Change order of command execution and add `ovn-nbctl --wait=hv sync`. This ensures the vswitchd ovsdb instance is updated by the time it is checked. Fixes: ff2b6ff69740 ("ovn-controller: Add 'local_ip' option to tunnel ports for IPsec case") Signed-off-by: Mark Gray --- tests/ovn-ipsec.at | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tests/ovn-ipsec.at b/tests/ovn-ipsec.at index 389ccff5836a..4c600a9f2779 100644 --- a/tests/ovn-ipsec.at +++ b/tests/ovn-ipsec.at @@ -14,9 +14,6 @@ ovn-nbctl lsp-set-addresses lp2 "f0:00:00:00:00:02 10.1.1.2" net_add n1 # Network to connect hv1 and hv2 -# Enable IPsec -ovn-nbctl set nb_global . ipsec=true - # Create hypervisor hv1 connected to n1 sim_add hv1 as hv1 @@ -45,6 +42,11 @@ ovs-vsctl \ -- set Open_vSwitch . other_config:private_key=dummy-privkey.pem \ -- set Open_vSwitch . other_config:ca_cert=dummy-cacert.pem +# Enable IPsec +ovn-nbctl set nb_global . ipsec=true + +check ovn-nbctl --wait=hv sync + AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:remote_ip | tr -d '"\n'], [0], [192.168.0.1]) AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:local_ip | tr -d '"\n'], [0], [192.168.0.2]) AT_CHECK([as hv2 ovs-vsctl get Interface ovn-hv1-0 options:remote_name | tr -d '\n'], [0], [hv1]) -- 2.27.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 0/2] tests: Fix test flakes
Fix 2 test flakes that have been observed in the OVS CI Mark Gray (2): system-test: Fix flake in ECMP IPv6 symmetric reply test ipsec.at: Fix ipsec test flake tests/ovn-ipsec.at | 8 --- tests/system-ovn.at | 51 + 2 files changed, 33 insertions(+), 26 deletions(-) -- 2.27.0 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn 1/2] system-test: Fix flake in ECMP IPv6 symmetric reply test
Statically add IPv6 neighbor MAC addresses to avoid NS messages evicting datapath flows causing occasional test failures. We also configure all interfaces to send only one IPv6 router solicitation message. These messages can cause datapath flows to be unexpectedly evicted causing test failures. Fixes: 7c927c0c0be1 ("ovn-northd: Fix IPv6 ECMP symmetric reply flows") Signed-off-by: Mark Gray --- tests/system-ovn.at | 51 + 1 file changed, 28 insertions(+), 23 deletions(-) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index 79879c6e003b..fc377bbd1a47 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -5833,17 +5833,35 @@ ovn-nbctl lr-route-add R3 fd01::/64 fd02::1 # Logical port 'alice1' in switch 'alice'. ADD_NAMESPACES(alice1) +# Only send 1 router solicitation as any additional ones can cause datapath +# flows to get evicted, causing unexpected failures below. +NS_CHECK_EXEC([alice1], [sysctl -w net.ipv6.conf.default.router_solicitations=1], [0], [dnl +net.ipv6.conf.default.router_solicitations = 1 +]) ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \ "fd01::1") OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep tentative)" = ""]) ovn-nbctl lsp-add alice alice1 \ -- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2" +# Add neighbour MAC address to avoid sending IPv6 NS messages which could +# cause datapath flows to be evicted +NS_CHECK_EXEC([alice1], [ip -6 neigh add fd01::1 lladdr 00:00:01:01:02:03 dev alice1], [0]) # Logical port 'bob1' in switch 'bob'. ADD_NAMESPACES(bob1) +# Only send 1 router solicitation as any additional ones can cause datapath +# flows to get evicted, causing unexpected failures below. +NS_CHECK_EXEC([bob1], [sysctl -w net.ipv6.conf.default.router_solicitations=1], [0], [dnl +net.ipv6.conf.default.router_solicitations = 1 +]) ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \ "fd07::2") OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep tentative)" = ""]) +# Add neighbour MAC addresses to avoid sending IPv6 NS messages which could +# cause datapath flows to be evicted +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::2 lladdr 00:00:02:01:02:03 dev bob1], [0]) +NS_CHECK_EXEC([bob1], [ip -6 neigh add fd07::3 lladdr 00:00:01:01:02:04 dev bob1], [0]) + ovn-nbctl lsp-add bob bob1 \ -- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1" @@ -5852,45 +5870,32 @@ ovn-nbctl --wait=hv sync on_exit 'ovs-ofctl dump-flows br-int' -# Later in this test we will check for a datapath flow that matches: -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)". Due -# to the way OVS generates datapath flows with wildcards, ICMPv6 NS flows will -# evict this datapath flow. In order to ensure that the flow does not -# get evicted, we send one ping packet in order to carry out neighbor -# discovery. We then flush the datpath to remove the NS flows so that the flow -# "ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)" will -# be present when we check for it. -NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \ -[0], [dnl -2 packets transmitted, 2 received, 0% packet loss, time 0ms -]) -ovs-appctl dpctl/del-flows - # 'bob1' should be able to ping 'alice1' directly. NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \ [0], [dnl 20 packets transmitted, 20 received, 0% packet loss, time 0ms ]) -# Ensure conntrack entry is present. We should not try to predict -# the tunnel key for the output port, so we strip it from the labels -# and just ensure that the known ethernet address is present. -AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \ -sed -e 's/zone=[[0-9]]*/zone=/' | -sed -e 's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'], [0], [dnl -icmpv6,orig=(src=fd07::1,dst=fd01::2,id=,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=,type=129,code=0),zone=,labels=0x04010204 -]) - # Ensure datapaths show conntrack states as expected # Like with conntrack entries, we shouldn't try to predict # port binding tunnel keys. So omit them from expected labels. AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x204010204/.*)' -c], [0], [dnl 1 ]) + AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x.*04010204/.*)' -c], [0], [dnl 1 ]) +# Ensure conntrack entry is present. We should not try to predict +# the tunnel key for the output port, so we strip it from the labels +# and just ensure that the known ethernet address is present. +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \ +sed -e 's/zone=[[0-9]]*/zone=/' | +sed -e 's/labels=0x[[0-9a-f]]*04010204/labels=0x04010204/'], [0], [dnl
[ovs-dev] [PATCH v2 1/2] Optimize the poll loop for poll_immediate_wake()
From: Anton Ivanov If we are not obtaining any useful information out of the poll(), such as is a fd busy or not, we do not need to do a poll() if an immediate_wake() has been requested. This cuts out all the pollfd hash additions, forming the poll arguments and the actual poll() after a call to poll_immediate_wake() Signed-off-by: Anton Ivanov --- lib/poll-loop.c | 69 - lib/timeval.c | 11 +++- 2 files changed, 56 insertions(+), 24 deletions(-) diff --git a/lib/poll-loop.c b/lib/poll-loop.c index 4e751ff2c..09bc4f5c4 100644 --- a/lib/poll-loop.c +++ b/lib/poll-loop.c @@ -53,6 +53,7 @@ struct poll_loop { * wake up immediately, or LLONG_MAX to wait forever. */ long long int timeout_when; /* In msecs as returned by time_msec(). */ const char *timeout_where; /* Where 'timeout_when' was set. */ +bool immediate_wake; }; static struct poll_loop *poll_loop(void); @@ -107,6 +108,13 @@ poll_create_node(int fd, HANDLE wevent, short int events, const char *where) COVERAGE_INC(poll_create_node); +if (loop->immediate_wake) { +/* We have been asked to bail out of this poll loop. + * There is no point to engage in yack shaving a poll hmap. + */ +return; +} + /* Both 'fd' and 'wevent' cannot be set. */ ovs_assert(!fd != !wevent); @@ -181,8 +189,15 @@ poll_wevent_wait_at(HANDLE wevent, const char *where) void poll_timer_wait_at(long long int msec, const char *where) { -long long int now = time_msec(); +long long int now; long long int when; +struct poll_loop *loop = poll_loop(); + +if (loop->immediate_wake) { +return; +} + +now = time_msec(); if (msec <= 0) { /* Wake up immediately. */ @@ -229,7 +244,9 @@ poll_timer_wait_until_at(long long int when, const char *where) void poll_immediate_wake_at(const char *where) { +struct poll_loop *loop = poll_loop(); poll_timer_wait_at(0, where); +loop->immediate_wake = true; } /* Logs, if appropriate, that the poll loop was awakened by an event @@ -320,10 +337,10 @@ poll_block(void) { struct poll_loop *loop = poll_loop(); struct poll_node *node; -struct pollfd *pollfds; +struct pollfd *pollfds = NULL; HANDLE *wevents = NULL; int elapsed; -int retval; +int retval = 0; int i; /* Register fatal signal events before actually doing any real work for @@ -335,34 +352,38 @@ poll_block(void) } timewarp_run(); -pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds); +if (!loop->immediate_wake) { +pollfds = xmalloc(hmap_count(>poll_nodes) * sizeof *pollfds); #ifdef _WIN32 -wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents); +wevents = xmalloc(hmap_count(>poll_nodes) * sizeof *wevents); #endif -/* Populate with all the fds and events. */ -i = 0; -HMAP_FOR_EACH (node, hmap_node, >poll_nodes) { -pollfds[i] = node->pollfd; +/* Populate with all the fds and events. */ +i = 0; +HMAP_FOR_EACH (node, hmap_node, >poll_nodes) { +pollfds[i] = node->pollfd; #ifdef _WIN32 -wevents[i] = node->wevent; -if (node->pollfd.fd && node->wevent) { -short int wsa_events = 0; -if (node->pollfd.events & POLLIN) { -wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE; -} -if (node->pollfd.events & POLLOUT) { -wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE; +wevents[i] = node->wevent; +if (node->pollfd.fd && node->wevent) { +short int wsa_events = 0; +if (node->pollfd.events & POLLIN) { +wsa_events |= FD_READ | FD_ACCEPT | FD_CLOSE; +} +if (node->pollfd.events & POLLOUT) { +wsa_events |= FD_WRITE | FD_CONNECT | FD_CLOSE; +} +WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events); } -WSAEventSelect(node->pollfd.fd, node->wevent, wsa_events); -} #endif -i++; -} +i++; +} -retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents, - loop->timeout_when, ); +retval = time_poll(pollfds, hmap_count(>poll_nodes), wevents, + loop->timeout_when, ); +} else { +retval = time_poll(NULL, 0, NULL, loop->timeout_when, ); +} if (retval < 0) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); VLOG_ERR_RL(, "poll: %s", ovs_strerror(-retval)); @@ -381,6 +402,7 @@ poll_block(void) free_poll_nodes(loop); loop->timeout_when = LLONG_MAX; loop->timeout_where = NULL; +loop->immediate_wake = false; free(pollfds); free(wevents); @@ -417,6 +439,7 @@ poll_loop(void) loop = xzalloc(sizeof *loop);
[ovs-dev] [PATCH v2 2/2] Minimize the number of time calls in time_poll()
From: Anton Ivanov time_poll() makes an excessive number of time_msec() calls which incur a performance penalty. 1. Avoid time_msec() call for timeout calculation when time_poll() is asked to skip poll() 2. Reuse the time_msec() result from deadline calculation for last_wakeup and timeout calculation. Signed-off-by: Anton Ivanov --- lib/timeval.c | 36 +--- 1 file changed, 21 insertions(+), 15 deletions(-) diff --git a/lib/timeval.c b/lib/timeval.c index c6ac87376..64ab22e05 100644 --- a/lib/timeval.c +++ b/lib/timeval.c @@ -287,7 +287,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, long long int timeout_when, int *elapsed) { long long int *last_wakeup = last_wakeup_get(); -long long int start; +long long int start, now; bool quiescent; int retval = 0; @@ -297,28 +297,31 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, if (*last_wakeup && !thread_is_pmd()) { log_poll_interval(*last_wakeup); } -start = time_msec(); +now = start = time_msec(); timeout_when = MIN(timeout_when, deadline); quiescent = ovsrcu_is_quiescent(); for (;;) { -long long int now = time_msec(); int time_left; -if (now >= timeout_when) { +if (n_pollfds == 0) { time_left = 0; -} else if ((unsigned long long int) timeout_when - now > INT_MAX) { -time_left = INT_MAX; } else { -time_left = timeout_when - now; -} - -if (!quiescent) { -if (!time_left) { -ovsrcu_quiesce(); +if (now >= timeout_when) { +time_left = 0; +} else if ((unsigned long long int) timeout_when - now > INT_MAX) { +time_left = INT_MAX; } else { -ovsrcu_quiesce_start(); +time_left = timeout_when - now; +} + +if (!quiescent) { +if (!time_left) { +ovsrcu_quiesce(); +} else { +ovsrcu_quiesce_start(); +} } } @@ -329,6 +332,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, */ if (n_pollfds != 0) { retval = poll(pollfds, n_pollfds, time_left); +} else { +retval = 0; } if (retval < 0) { retval = -errno; @@ -355,7 +360,8 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, ovsrcu_quiesce_end(); } -if (deadline <= time_msec()) { +now = time_msec(); +if (deadline <= now) { #ifndef _WIN32 fatal_signal_handler(SIGALRM); #else @@ -372,7 +378,7 @@ time_poll(struct pollfd *pollfds, int n_pollfds, HANDLE *handles OVS_UNUSED, break; } } -*last_wakeup = time_msec(); +*last_wakeup = now; refresh_rusage(); *elapsed = *last_wakeup - start; return retval; -- 2.20.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
Aaron Conole writes: > Ilya Maximets writes: > >> On 7/12/21 5:10 PM, David Marchand wrote: >>> On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets wrote: >> ovsrobot has issues with reporting the status right now, but this >> patch fails the build in GHA: >> https://github.com/ovsrobot/ovs/actions/runs/1021787643 > > Thanks for linking on results. > > I've spot-checked a bunch of the failing builds, and found 2 fixable code > issues. > A few of the CI run's I can't find/explain the error, but I don't know of > a good > way to "jump to the error" line, am I missing a trick, or is scrolling > the whole > compiler output and checking errors the best method? typing 'error:' in the 'Search logs' field, usually gets you to the actual error faster, but, unfortunately, scrolling is the most reliable option. >>> >>> GHA ui jumps at the last line of a failing step, but the problem is >>> that, in OVS, we dump all logs which adds a lot of noise. >>> >>> We could stop dumping them, since those logs are attached to the job >>> as an archive. >>> Like what is done in DPDK. >>> http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3 >>> >>> WDYT? >> >> Yes, that is good thing to do. We didn't do that because of >> Travis CI, where we have no artifacts collected. > > +1 - we should bend over backwards to make things easier on Travis CI to > the detriment of other platforms. And by this, I mean the opposite - we should *NOT* bend over backwards to make things easier on Travis CI. >> But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution. >> >> Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
Ilya Maximets writes: > On 7/12/21 5:10 PM, David Marchand wrote: >> On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets wrote: > ovsrobot has issues with reporting the status right now, but this > patch fails the build in GHA: > https://github.com/ovsrobot/ovs/actions/runs/1021787643 Thanks for linking on results. I've spot-checked a bunch of the failing builds, and found 2 fixable code issues. A few of the CI run's I can't find/explain the error, but I don't know of a good way to "jump to the error" line, am I missing a trick, or is scrolling the whole compiler output and checking errors the best method? >>> >>> typing 'error:' in the 'Search logs' field, usually gets you >>> to the actual error faster, but, unfortunately, scrolling is >>> the most reliable option. >> >> GHA ui jumps at the last line of a failing step, but the problem is >> that, in OVS, we dump all logs which adds a lot of noise. >> >> We could stop dumping them, since those logs are attached to the job >> as an archive. >> Like what is done in DPDK. >> http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3 >> >> WDYT? > > Yes, that is good thing to do. We didn't do that because of > Travis CI, where we have no artifacts collected. +1 - we should bend over backwards to make things easier on Travis CI to the detriment of other platforms. > But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution. > > Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Add stopwatch to measure OF update duration.
I pushed this to master. On 7/8/21 4:27 PM, Mark Michelson wrote: Acked-by: Mark Michelson On 7/6/21 10:41 AM, Dumitru Ceara wrote: Also, shorten the CONTROLLER_LOOP_STOPWATCH_NAME name as there is a bug in lib/stopwatch.c which fails to report an error when the stopwatch name is longer than 32 characters. CONTROLLER_LOOP_STOPWATCH_NAME was getting very close to that and future commits might mimic the long name and happen to go over the limit. Signed-off-by: Dumitru Ceara --- Note: The OVS lib/stopwatch.c implementation should also be fixed to report an error (or even assert) if the name supplied to stopwatch_create() is longer than 32 characters. But that's out of the scope of this patch. --- controller/ovn-controller.c | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 9050380f3..6a9c25f28 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -93,7 +93,8 @@ static unixctl_cb_func debug_delay_nb_cfg_report; #define DEFAULT_PROBE_INTERVAL_MSEC 5000 #define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 -#define CONTROLLER_LOOP_STOPWATCH_NAME "ovn-controller-flow-generation" +#define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" +#define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" #define OVS_NB_CFG_NAME "ovn-nb-cfg" @@ -2845,6 +2846,7 @@ main(int argc, char *argv[]) update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false); stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS); + stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS); /* Define inc-proc-engine nodes. */ ENGINE_NODE_CUSTOM_DATA(ct_zones, "ct_zones"); @@ -3292,6 +3294,8 @@ main(int argc, char *argv[]) pflow_output_data = engine_get_data(_pflow_output); if (lflow_output_data && pflow_output_data && ct_zones_data) { + stopwatch_start(OFCTRL_PUT_STOPWATCH_NAME, + time_msec()); ofctrl_put(_output_data->flow_table, _output_data->flow_table, _zones_data->pending, @@ -3299,6 +3303,7 @@ main(int argc, char *argv[]) ofctrl_seqno_get_req_cfg(), engine_node_changed(_lflow_output), engine_node_changed(_pflow_output)); + stopwatch_stop(OFCTRL_PUT_STOPWATCH_NAME, time_msec()); } ofctrl_seqno_run(ofctrl_get_cur_cfg()); if_status_mgr_run(if_mgr, binding_data, !ovnsb_idl_txn, ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] northd-ddlog: Fix IP family match for DNAT flows.
I pushed this to master and branch-21.06. On 7/8/21 4:19 PM, Mark Michelson wrote: Acked-by: Mark Michelson On 7/7/21 10:09 AM, Dumitru Ceara wrote: This was causing some IPv6 system tests to fail when run with ovn-northd-ddlog. Also fix cleanup of the northd process in system-ovn.at. A few tests were trying to stop ovn-northd (C version) even when run with ovn-northd-ddlog. Signed-off-by: Dumitru Ceara --- Note: There are some system-ovn.at tests that still fail with ovn-northd-ddlog and need more investigation to see if it's a test issue or a real bug. --- northd/ovn_northd.dl | 2 +- tests/system-ovn.at | 24 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/northd/ovn_northd.dl b/northd/ovn_northd.dl index e27c944a0..dea13a91f 100644 --- a/northd/ovn_northd.dl +++ b/northd/ovn_northd.dl @@ -5687,7 +5687,7 @@ for (r in (._uuid = lr_uuid, } in if (nat.nat.__type == "dnat" or nat.nat.__type == "dnat_and_snat") { None = l3dgw_port in - var __match = "ip && ip4.dst == ${nat.nat.external_ip}" in + var __match = "ip && ${ipX}.dst == ${nat.nat.external_ip}" in (var ext_ip_match, var ext_flow) = lrouter_nat_add_ext_ip_match( r, nat, __match, ipX, true, mask) in { diff --git a/tests/system-ovn.at b/tests/system-ovn.at index f42cfc0db..c01fde131 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -1348,7 +1348,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -3121,7 +3121,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -4577,7 +4577,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -4663,7 +4663,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -4903,7 +4903,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -5287,7 +5287,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -5717,7 +5717,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -5879,7 +5879,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -5928,7 +5928,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d @@ -6021,7 +6021,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d @@ -6083,7 +6083,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d @@ -6234,7 +6234,7 @@ as ovn-nb OVS_APP_EXIT_AND_WAIT([ovsdb-server]) as northd -OVS_APP_EXIT_AND_WAIT([ovn-northd]) +OVS_APP_EXIT_AND_WAIT([NORTHD_TYPE]) as OVS_TRAFFIC_VSWITCHD_STOP(["/.*error receiving.*/d ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 06/12] dpif-netdev: Add packet count and core id paramters for study
On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > This commit introduces additional command line paramter > for mfex study function. If user provides additional packet out > it is used in study to compare minimum packets which must be processed > else a default value is choosen. > Also introduces a third paramter for choosing a particular pmd core. > > $ ovs-appctl dpif-netdev/miniflow-parser-set study 500 3 > > Signed-off-by: Kumar Amber > > --- > v9: > - fix review comments Flavio > v7: > - change the command paramters for core_id and study_pkt_cnt > v5: > - fix review comments(Ian, Flavio, Eelco) > - introucde pmd core id parameter > --- > --- > Documentation/topics/dpdk/bridge.rst | 39 - > lib/dpif-netdev-extract-study.c | 26 +- > lib/dpif-netdev-private-extract.h| 9 ++ > lib/dpif-netdev.c| 121 +-- > 4 files changed, 181 insertions(+), 14 deletions(-) > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index 4db416ddd..c31067c51 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -284,12 +284,45 @@ command also shows whether the CPU supports each > implementation :: > > An implementation can be selected manually by the following command :: > > -$ ovs-appctl dpif-netdev/miniflow-parser-set study > +$ ovs-appctl dpif-netdev/miniflow-parser-set [-pmd core_id] [name] > + [study_cnt] > > -Also user can select the study implementation which studies the traffic for > +The above command has two optional parameters: study_cnt and core_id. > +The core_id set a particular miniflow extract function to a specific The core_id sets > +pmd thread on the core. Third parameter study_cnt, which is specific The third parameter > +to study and ignored by other implementations, means how many packets > +are needed to choose the best implementation. > + > +The user can select the study implementation which studies the traffic for > a specific number of packets by applying all available implementaions of implementations > miniflow extract and than chooses the one with most optimal result for that and then chooses ... with the most optimal > -traffic pattern. > +traffic pattern. The user can optionally provide an packet count [study_cnt] > +parameter which is the minimum number of packets that OVS must study before > +choosing an optimal implementation. If no packet count is provided, then the > +default value, 128 is chosen. Also, as there is no synchronization point > +between threads, one PMD thread might still be running a previous round, > +and can now decide on earlier data. > + > +The per packet count is a global value, and parallel `study()` executions > with Should study() just be study? > +differing packet counts will use the most recent count value provided by > usser. > + > +Study can be selected with packet count by the following command :: > + > +$ ovs-appctl dpif-netdev/miniflow-parser-set study 1024 > + > +Study can be selected with packet count and explicit PMD selection > +by the following command :: > + > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 study 1024 > + > +In the above command the last parameter is the CORE ID of the PMD > +thread and this can also be used to explicitly set the miniflow > +extraction function pointer on different PMD threads. > + > +Scalar can be selected on core 3 by the following command where > +study count can be put as any arbitary number or left blank:: arbitrary > + > +$ ovs-appctl dpif-netdev/miniflow-parser-set -pmd 3 scalar > > Miniflow Extract Validation > ~~~ > diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c > index a19759bd9..2dc3faf83 100644 > --- a/lib/dpif-netdev-extract-study.c > +++ b/lib/dpif-netdev-extract-study.c > @@ -25,7 +25,7 @@ > > VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); > > -static uint32_t mfex_study_pkts_count = 0; > +static uint32_t mfex_study_pkts_count = MFEX_MAX_PKT_COUNT; > > /* Struct to hold miniflow study stats. */ > struct study_stats { > @@ -48,6 +48,28 @@ mfex_study_get_study_stats_ptr(void) > return stats; > } > > +uint32_t mfex_set_study_pkt_cnt(uint32_t pkt_cmp_count, > +const char *name) This needs to be int, not uint32_t as you return a negative value on error. > +{ > +struct dpif_miniflow_extract_impl *miniflow_funcs; > +dpif_mfex_impl_info_get(_funcs); > + > +/* If the packet count is set and implementation called is study then > + * set packet counter to requested number else set the packet counter > + * to default number. > + */ > +if ((strcmp(miniflow_funcs[MFEX_IMPL_STUDY].name, name) == 0) && > +(pkt_cmp_count != 0)) { > + > +atomic_uintptr_t *study_pck_cnt = (void *)_study_pkts_count; > +
Re: [ovs-dev] [PATCH v2 ovn 2/2] controller: incrementally create ras port_binding list
Acked-by: Mark Michelson On 7/10/21 6:13 AM, Lorenzo Bianconi wrote: Incrementally manage local_active_ports_ras map for interfaces where periodic router advertisement has been enabled. This patch allows to avoid looping over all local interfaces to check if periodic RA is running on the current port binding. Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 7 +++ controller/binding.h| 1 + controller/ovn-controller.c | 10 +++- controller/pinctrl.c| 93 - controller/pinctrl.h| 3 +- 5 files changed, 69 insertions(+), 45 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index f87eaec0c..b1b1e3b84 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -1672,6 +1672,9 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); enum en_lport_type lport_type = get_lport_type(pb); @@ -2514,6 +2517,10 @@ delete_done: b_ctx_out->local_active_ports_ipv6_pd, "ipv6_prefix_delegation"); +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ras, +"ipv6_ra_send_periodic"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index 60ad49da0..77197e742 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -73,6 +73,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; struct shash *local_active_ports_ipv6_pd; +struct shash *local_active_ports_ras; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 2bd402ab2..db2d82035 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1030,6 +1030,7 @@ struct ed_type_runtime_data { struct hmap tracked_dp_bindings; struct shash local_active_ports_ipv6_pd; +struct shash local_active_ports_ras; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1118,6 +1119,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); shash_init(>local_active_ports_ipv6_pd); +shash_init(>local_active_ports_ras); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1144,6 +1146,7 @@ en_runtime_data_cleanup(void *data) } hmap_destroy(_data->local_datapaths); shash_destroy(_data->local_active_ports_ipv6_pd); +shash_destroy(_data->local_active_ports_ras); local_binding_data_destroy(_data->lbinding_data); } @@ -1224,6 +1227,8 @@ init_binding_ctx(struct engine_node *node, b_ctx_out->local_datapaths = _data->local_datapaths; b_ctx_out->local_active_ports_ipv6_pd = _data->local_active_ports_ipv6_pd; +b_ctx_out->local_active_ports_ras = +_data->local_active_ports_ras; b_ctx_out->local_lports = _data->local_lports; b_ctx_out->local_lports_changed = false; b_ctx_out->related_lports = _data->related_lports; @@ -1242,6 +1247,7 @@ en_runtime_data_run(struct engine_node *node, void *data) struct ed_type_runtime_data *rt_data = data; struct hmap *local_datapaths = _data->local_datapaths; struct shash *local_active_ipv6_pd = _data->local_active_ports_ipv6_pd; +struct shash *local_active_ras = _data->local_active_ports_ras; struct sset *local_lports = _data->local_lports; struct sset *active_tunnels = _data->active_tunnels; @@ -1258,6 +1264,7 @@ en_runtime_data_run(struct engine_node *node, void *data) } hmap_clear(local_datapaths); shash_clear(local_active_ipv6_pd); +shash_clear(local_active_ras); local_binding_data_destroy(_data->lbinding_data); sset_destroy(local_lports); related_lports_destroy(_data->related_lports); @@ -3272,7 +3279,8 @@ main(int argc, char *argv[]) br_int, chassis, _data->local_datapaths, _data->active_tunnels, -_data->local_active_ports_ipv6_pd); +_data->local_active_ports_ipv6_pd, +
Re: [ovs-dev] [PATCH v2 ovn 1/2] controller: incrementally create ipv6 prefix delegation port_binding list
For the approach, Acked-by: Mark Michelson I have one final suggestion down below. On 7/10/21 6:13 AM, Lorenzo Bianconi wrote: Incrementally manage local_active_ports_ipv6_pd map for interfaces where IPv6 prefix-delegation has been enabled. This patch allows to avoid looping over all local interfaces to check if prefix-delegation is running on the current port binding. Signed-off-by: Lorenzo Bianconi --- controller/binding.c| 32 +++ controller/binding.h| 1 + controller/ovn-controller.c | 11 +++- controller/ovn-controller.h | 6 ++ controller/pinctrl.c| 107 +--- controller/pinctrl.h| 4 +- 6 files changed, 103 insertions(+), 58 deletions(-) diff --git a/controller/binding.c b/controller/binding.c index 594babc98..f87eaec0c 100644 --- a/controller/binding.c +++ b/controller/binding.c @@ -574,6 +574,30 @@ remove_related_lport(const struct sbrec_port_binding *pb, } } +static void +update_active_pb_ras_pd(const struct sbrec_port_binding *pb, +struct hmap *local_datapaths, +struct shash *map, const char *conf) +{ +const char *ras_pd_conf = smap_get(>options, conf); Since ras_pd_conf being "false" is the same as if it did not exist in the configuration, you could change this to: bool ras_pd_conf = smap_get_bool(>options, conf, false); Then you can just do boolean comparisons on ras_pd_conf instead of string comparisons. +struct shash_node *iter = shash_find(map, pb->logical_port); + +if (iter && (!ras_pd_conf || !strcmp(ras_pd_conf, "false"))) { +shash_delete(map, iter); +return; +} +struct pb_ld_binding *ras_pd = NULL; +if (!iter && ras_pd_conf && !strcmp(ras_pd_conf, "true")) { +ras_pd = xzalloc(sizeof *ras_pd); +ras_pd->pb = pb; +shash_add(map, pb->logical_port, ras_pd); +} +if (ras_pd) { +ras_pd->ld = get_local_datapath(local_datapaths, +pb->datapath->tunnel_key); +} +} + /* Corresponds to each Port_Binding.type. */ enum en_lport_type { LP_UNKNOWN, @@ -1645,6 +1669,10 @@ binding_run(struct binding_ctx_in *b_ctx_in, struct binding_ctx_out *b_ctx_out) const struct sbrec_port_binding *pb; SBREC_PORT_BINDING_TABLE_FOR_EACH (pb, b_ctx_in->port_binding_table) { +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); switch (lport_type) { @@ -2482,6 +2510,10 @@ delete_done: continue; } +update_active_pb_ras_pd(pb, b_ctx_out->local_datapaths, +b_ctx_out->local_active_ports_ipv6_pd, +"ipv6_prefix_delegation"); + enum en_lport_type lport_type = get_lport_type(pb); struct binding_lport *b_lport = diff --git a/controller/binding.h b/controller/binding.h index a08011ae2..60ad49da0 100644 --- a/controller/binding.h +++ b/controller/binding.h @@ -72,6 +72,7 @@ void related_lports_destroy(struct related_lports *); struct binding_ctx_out { struct hmap *local_datapaths; +struct shash *local_active_ports_ipv6_pd; struct local_binding_data *lbinding_data; /* sset of (potential) local lports. */ diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 9050380f3..2bd402ab2 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -1028,6 +1028,8 @@ struct ed_type_runtime_data { bool tracked; bool local_lports_changed; struct hmap tracked_dp_bindings; + +struct shash local_active_ports_ipv6_pd; }; /* struct ed_type_runtime_data has the below members for tracking the @@ -1115,6 +1117,7 @@ en_runtime_data_init(struct engine_node *node OVS_UNUSED, sset_init(>egress_ifaces); smap_init(>local_iface_ids); local_binding_data_init(>lbinding_data); +shash_init(>local_active_ports_ipv6_pd); /* Init the tracked data. */ hmap_init(>tracked_dp_bindings); @@ -1140,6 +1143,7 @@ en_runtime_data_cleanup(void *data) free(cur_node); } hmap_destroy(_data->local_datapaths); +shash_destroy(_data->local_active_ports_ipv6_pd); local_binding_data_destroy(_data->lbinding_data); } @@ -1218,6 +1222,8 @@ init_binding_ctx(struct engine_node *node, b_ctx_in->ovs_table = ovs_table; b_ctx_out->local_datapaths = _data->local_datapaths; +b_ctx_out->local_active_ports_ipv6_pd = +_data->local_active_ports_ipv6_pd; b_ctx_out->local_lports = _data->local_lports; b_ctx_out->local_lports_changed = false; b_ctx_out->related_lports = _data->related_lports; @@
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On 7/12/21 4:57 PM, Van Haaren, Harry wrote: >> -Original Message- >> From: Ilya Maximets >> Sent: Monday, July 12, 2021 3:43 PM >> To: Van Haaren, Harry ; Ilya Maximets >> ; Amber, Kumar ; ovs- >> d...@openvswitch.org >> Cc: f...@sysclose.org; echau...@redhat.com; Ferriter, Cian >> ; Stokes, Ian >> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer >> for >> miniflow extract >> >> On 7/12/21 4:02 PM, Van Haaren, Harry wrote: -Original Message- From: Ilya Maximets Sent: Monday, July 12, 2021 2:25 PM To: Amber, Kumar ; ovs-dev@openvswitch.org Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren, Harry ; Ferriter, Cian ; Stokes, Ian Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract On 7/12/21 7:51 AM, kumar Amber wrote: > From: Kumar Amber > > This patch introduces the MFEX function pointers which allows > the user to switch between different miniflow extract implementations > which are provided by the OVS based on optimized ISA CPU. > > The user can query for the available minflow extract variants available > for that CPU by following commands: > > $ovs-appctl dpif-netdev/miniflow-parser-get > > Similarly an user can set the miniflow implementation by the following > command : > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > This allows for more performance and flexibility to the user to choose > the miniflow implementation according to the needs. > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix review comments from Flavio > v7: > - fix review comments(Eelco, Flavio) > v5: > - fix review comments(Ian, Flavio, Eelco) > - add enum to hold mfex indexes > - add new get and set implemenatations > - add Atomic set and get > --- ovsrobot has issues with reporting the status right now, but this patch fails the build in GHA: https://github.com/ovsrobot/ovs/actions/runs/1021787643 >>> >>> Thanks for linking on results. >>> >>> I've spot-checked a bunch of the failing builds, and found 2 fixable code >>> issues. >>> A few of the CI run's I can't find/explain the error, but I don't know of a >>> good >>> way to "jump to the error" line, am I missing a trick, or is scrolling the >>> whole >>> compiler output and checking errors the best method? >> >> typing 'error:' in the 'Search logs' field, usually gets you >> to the actual error faster, but, unfortunately, scrolling is >> the most reliable option. > > Okay, thanks. > > >>> ISSUES: >>> #1 : OVS Requires Mutex issue (Linux clang test dpdk build) >>> 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared >> identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? >>> 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex); >>> >>> #2 : Unused Argument (As from mailing list review comment too, linux gcc >>> dpdk -- >> enable-shared) >>> 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ >>> [-Werror=unused- >> parameter] >>> 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc, >>> >>> #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot >>> explain this?) >>> make: *** [distcheck] Error 1 >>> 4490Makefile:5298: recipe for target 'distcheck' failed >>> 4491+ cat '*/_build/sub/tests/testsuite.log' >>> 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory >>> 4493Error: Process completed with exit code 1.> >>> SOLUTIONS: >>> #1, likely to forward-decl the "dp_netdev_mutex" to make it available >>> in the extract header file, and remove the "static" keyword so its no >>> longer limited >>> to the dpif-netdev.c compilation unit. >>> >>> #2 is a simple OVS_UNUSED as Eelco suggested during review. >>> >>> #3, I'm not sure where the DistCheck issue arise from, it seems to be >>> missing >> directories >>> during the test run? Input appreciated, as pushing & hoping tends to be a >>> tiresome >>> and long process. >> >> This is just a result of the previous build failure. Build >> never reached the testsuite phase, so there are no testsuite >> logs there. You should not see this problem once build is >> fixed. > > Aha, good to know. Then a respin with the fixes for the above issues is > our next step, will arrive on the mailing list soon. If you have a github account it might be good to push patches one-by-one there to be sure that everything is fine before sending to the mail list to avoid re-spins due to build issues. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On 7/12/21 5:10 PM, David Marchand wrote: > On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets wrote: ovsrobot has issues with reporting the status right now, but this patch fails the build in GHA: https://github.com/ovsrobot/ovs/actions/runs/1021787643 >>> >>> Thanks for linking on results. >>> >>> I've spot-checked a bunch of the failing builds, and found 2 fixable code >>> issues. >>> A few of the CI run's I can't find/explain the error, but I don't know of a >>> good >>> way to "jump to the error" line, am I missing a trick, or is scrolling the >>> whole >>> compiler output and checking errors the best method? >> >> typing 'error:' in the 'Search logs' field, usually gets you >> to the actual error faster, but, unfortunately, scrolling is >> the most reliable option. > > GHA ui jumps at the last line of a failing step, but the problem is > that, in OVS, we dump all logs which adds a lot of noise. > > We could stop dumping them, since those logs are attached to the job > as an archive. > Like what is done in DPDK. > http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3 > > WDYT? Yes, that is good thing to do. We didn't do that because of Travis CI, where we have no artifacts collected. But yes, checking for [ -n "$GITHUB_WORKFLOW" ] is a solution. Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn v3] controller: Avoid unnecessary load balancer flow processing.
Since this addressed Han's findings in v1 and I had already ACKed it, I pushed this change to the main branch. Thanks, Dumitru and Han! On 7/12/21 10:14 AM, Dumitru Ceara wrote: Whenever a Load_Balancer is updated, e.g., a VIP is added, the following sequence of events happens: 1. The Southbound Load_Balancer record is updated. 2. The Southbound Datapath_Binding records on which the Load_Balancer is applied are updated. 3. Southbound ovsdb-server sends updates about the Load_Balancer and Datapath_Binding records to ovn-controller. 4. The IDL layer in ovn-controller processes the updates at #3, but because of the SB schema references between tables [0] all logical flows referencing the updated Datapath_Binding are marked as "updated". The same is true for Logical_DP_Group records referencing the Datapath_Binding, and also for all logical flows pointing to the new "updated" datapath groups. 5. ovn-controller ends up recomputing (removing/readding) all flows for all these tracked updates. From the SB Schema: "Datapath_Binding": { "columns": { [...] "load_balancers": {"type": {"key": {"type": "uuid", "refTable": "Load_Balancer", "refType": "weak"}, "min": 0, "max": "unlimited"}}, [...] "Load_Balancer": { "columns": { "datapaths": { [...] "type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": "unlimited"}}, [...] "Logical_DP_Group": { "columns": { "datapaths": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding", "refType": "weak"}, "min": 0, "max": "unlimited"}}}, [...] "Logical_Flow": { "columns": { "logical_datapath": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": 1}}, "logical_dp_group": {"type": {"key": {"type": "uuid", "refTable": "Logical_DP_Group"}, In order to avoid this unnecessary Logical_Flow notification storm we now remove the explicit reference from Datapath_Binding to Load_Balancer and instead store raw UUIDs. This means that on the ovn-controller side we need to perform a Load_Balancer table lookup by UUID whenever a new datapath is added, but that doesn't happen too often and the cost of the lookup is negligible compared to the huge cost of processing the unnecessary logical flow updates. This change is backwards compatible because the contents stored in the database are not changed, just that the schema constraints are relaxed a bit. Some performance measurements, on a scale test deployment simulating an ovn-kubernetes deployment with 120 nodes and a large load balancer with 16K VIPs associated to each node's logical switch, the event processing loop time in ovn-controller, when adding a new VIP, is reduced from ~39 seconds to ~8 seconds. There's no need to change the northd DDlog implementation. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 Acked-by: Mark Michelson Signed-off-by: Dumitru Ceara --- v3: Update SB schema version. v2: Address Han's comments and add Mark's ack. --- controller/lflow.c | 6 -- northd/ovn-northd.c | 14 ++ ovn-sb.ovsschema| 8 +++- 3 files changed, 13 insertions(+), 15 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 60aa011ff..c58c4f25c 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1744,8 +1744,10 @@ lflow_processing_end: /* Add load balancer hairpin flows if the datapath has any load balancers * associated. */ for (size_t i = 0; i < dp->n_load_balancers; i++) { -consider_lb_hairpin_flows(dp->load_balancers[i], - l_ctx_in->local_datapaths, +const struct sbrec_load_balancer *lb = +sbrec_load_balancer_table_get_for_uuid(l_ctx_in->lb_table, + >load_balancers[i]); +consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, l_ctx_out->flow_table); } diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 562dc62b2..999c3f482 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -3635,19 +3635,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On Mon, Jul 12, 2021 at 4:43 PM Ilya Maximets wrote: > >> ovsrobot has issues with reporting the status right now, but this > >> patch fails the build in GHA: > >> https://github.com/ovsrobot/ovs/actions/runs/1021787643 > > > > Thanks for linking on results. > > > > I've spot-checked a bunch of the failing builds, and found 2 fixable code > > issues. > > A few of the CI run's I can't find/explain the error, but I don't know of a > > good > > way to "jump to the error" line, am I missing a trick, or is scrolling the > > whole > > compiler output and checking errors the best method? > > typing 'error:' in the 'Search logs' field, usually gets you > to the actual error faster, but, unfortunately, scrolling is > the most reliable option. GHA ui jumps at the last line of a failing step, but the problem is that, in OVS, we dump all logs which adds a lot of noise. We could stop dumping them, since those logs are attached to the job as an archive. Like what is done in DPDK. http://git.dpdk.org/dpdk/tree/.ci/linux-build.sh#n3 WDYT? -- David Marchand ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 2/3] dpif-netdev: Fix offloads of modified flows
Association of a mark to a flow is done as part of its offload handling, in the offloading thread. However, the PMD thread specifies whether an offload request is an "add" or "modify" by the association of a mark to the flow. This is exposed to a race condition. A flow might be created with actions that cannot be fully offloaded, for example flooding (before MAC learning), and later modified to have actions that can be fully offloaded. If the two requests are queued before the offload thread handling, they are both marked as "add". When the offload thread handles them, the first request is partially offloaded, and the second one is ignored as the flow is already considered as offloaded. Fix it by specifying add/modify of an offload request by the actual flow state change, without relying on the mark. Fixes: 3c7330ebf036 ("netdev-offload-dpdk: Support offload of output action.") Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet --- lib/dpif-netdev.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 21b0e025d..9b2b8d6d9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2451,7 +2451,8 @@ static void queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow, struct match *match, const struct nlattr *actions, size_t actions_len, - odp_port_t orig_in_port) + odp_port_t orig_in_port, + const struct dp_netdev_actions *old_actions) { struct dp_flow_offload_item *offload; int op; @@ -2467,11 +2468,9 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, ovsthread_once_done(_thread_once); } -if (flow->mark != INVALID_FLOW_MARK) { -op = DP_NETDEV_FLOW_OFFLOAD_OP_MOD; -} else { -op = DP_NETDEV_FLOW_OFFLOAD_OP_ADD; -} +op = old_actions +? DP_NETDEV_FLOW_OFFLOAD_OP_MOD +: DP_NETDEV_FLOW_OFFLOAD_OP_ADD; offload = dp_netdev_alloc_flow_offload(pmd, flow, op); offload->match = *match; offload->actions = xmalloc(actions_len); @@ -3323,7 +3322,7 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, dp_netdev_flow_hash(>ufid)); queue_netdev_flow_put(pmd, flow, match, actions, actions_len, - orig_in_port); + orig_in_port, NULL); if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl { struct ds ds = DS_EMPTY_INITIALIZER; @@ -3410,7 +3409,8 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd, ovsrcu_set(_flow->actions, new_actions); queue_netdev_flow_put(pmd, netdev_flow, match, - put->actions, put->actions_len, ODPP_NONE); + put->actions, put->actions_len, ODPP_NONE, + old_actions); if (stats) { get_dpif_flow_status(pmd->dp, netdev_flow, stats, NULL); -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 0/3] dpif-netdev offload transitions
This patch-set improves offloads transitions behavior. Patch #1 avoids flushing PMD offloads unnecessarily. Patch #2 fixes a race condition with flow modifications. Patch #3 improves debuggability of flow modifications. v2-v1: Rebase. Travis: v1: https://travis-ci.org/github/elibritstein/OVS/builds/767839987 GitHub Actions: v1: https://github.com/elibritstein/OVS/actions/runs/769805954 - This run has encountered some internal GitHub problems. - A previous good run, with the same code, only changed commit messages since: https://github.com/elibritstein/OVS/actions/runs/70787 v2: https://github.com/elibritstein/OVS/actions/runs/1023045302 Eli Britstein (3): dpif-netdev: Do not flush PMD offloads on reload dpif-netdev: Fix offloads of modified flows dpif-netdev: Log flow modification in debug level lib/dpif-netdev.c | 130 ++ 1 file changed, 63 insertions(+), 67 deletions(-) -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 3/3] dpif-netdev: Log flow modification in debug level
Log flow modifications to help debugging. Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet --- lib/dpif-netdev.c | 101 +- 1 file changed, 55 insertions(+), 46 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 9b2b8d6d9..caed3e7f2 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2457,6 +2457,61 @@ queue_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, struct dp_flow_offload_item *offload; int op; +if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key_buf, mask_buf; +struct odp_flow_key_parms odp_parms = { +.flow = >flow, +.mask = >wc.masks, +.support = dp_netdev_support, +}; + +ofpbuf_init(_buf, 0); +ofpbuf_init(_buf, 0); + +odp_flow_key_from_flow(_parms, _buf); +odp_parms.key_buf = _buf; +odp_flow_key_from_mask(_parms, _buf); + +if (old_actions) { +ds_put_cstr(, "flow_mod: "); +} else { +ds_put_cstr(, "flow_add: "); +} +odp_format_ufid(>ufid, ); +ds_put_cstr(, " mega_"); +odp_format_ufid(>mega_ufid, ); +ds_put_cstr(, " "); +odp_flow_format(key_buf.data, key_buf.size, +mask_buf.data, mask_buf.size, +NULL, , false); +if (old_actions) { +ds_put_cstr(, ", old_actions:"); +format_odp_actions(, old_actions->actions, old_actions->size, + NULL); +} +ds_put_cstr(, ", actions:"); +format_odp_actions(, actions, actions_len, NULL); + +VLOG_DBG("%s", ds_cstr()); + +ofpbuf_uninit(_buf); +ofpbuf_uninit(_buf); + +/* Add a printout of the actual match installed. */ +struct match m; +ds_clear(); +ds_put_cstr(, "flow match: "); +miniflow_expand(>cr.flow.mf, ); +miniflow_expand(>cr.mask->mf, ); +memset(_md, 0, sizeof m.tun_md); +match_format(, NULL, , OFP_DEFAULT_PRIORITY); + +VLOG_DBG("%s", ds_cstr()); + +ds_destroy(); +} + if (!netdev_is_flow_api_enabled()) { return; } @@ -3324,52 +3379,6 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd, queue_netdev_flow_put(pmd, flow, match, actions, actions_len, orig_in_port, NULL); -if (OVS_UNLIKELY(!VLOG_DROP_DBG((_rl { -struct ds ds = DS_EMPTY_INITIALIZER; -struct ofpbuf key_buf, mask_buf; -struct odp_flow_key_parms odp_parms = { -.flow = >flow, -.mask = >wc.masks, -.support = dp_netdev_support, -}; - -ofpbuf_init(_buf, 0); -ofpbuf_init(_buf, 0); - -odp_flow_key_from_flow(_parms, _buf); -odp_parms.key_buf = _buf; -odp_flow_key_from_mask(_parms, _buf); - -ds_put_cstr(, "flow_add: "); -odp_format_ufid(ufid, ); -ds_put_cstr(, " mega_"); -odp_format_ufid(>mega_ufid, ); -ds_put_cstr(, " "); -odp_flow_format(key_buf.data, key_buf.size, -mask_buf.data, mask_buf.size, -NULL, , false); -ds_put_cstr(, ", actions:"); -format_odp_actions(, actions, actions_len, NULL); - -VLOG_DBG("%s", ds_cstr()); - -ofpbuf_uninit(_buf); -ofpbuf_uninit(_buf); - -/* Add a printout of the actual match installed. */ -struct match m; -ds_clear(); -ds_put_cstr(, "flow match: "); -miniflow_expand(>cr.flow.mf, ); -miniflow_expand(>cr.mask->mf, ); -memset(_md, 0, sizeof m.tun_md); -match_format(, NULL, , OFP_DEFAULT_PRIORITY); - -VLOG_DBG("%s", ds_cstr()); - -ds_destroy(); -} - return flow; } -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH V2 1/3] dpif-netdev: Do not flush PMD offloads on reload
Before flushing offloads of a removed port was supported by [1], it was necessary to flush the 'marks'. In doing so, all offloads of the PMD are removed, include the ones that are not related to the removed port and that are not modified following this removal. As a result such flows are evicted from being offloaded, and won't resume offloading. As PMD offload flush is not necessary, avoid it. [1] 62d1c28e9ce0 ("dpif-netdev: Flush offload rules upon port deletion.") Signed-off-by: Eli Britstein Reviewed-by: Gaetan Rivet --- lib/dpif-netdev.c | 13 - 1 file changed, 13 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 610949f36..21b0e025d 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2232,18 +2232,6 @@ mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd, return ret; } -static void -flow_mark_flush(struct dp_netdev_pmd_thread *pmd) -{ -struct dp_netdev_flow *flow; - -CMAP_FOR_EACH (flow, mark_node, _mark.mark_to_flow) { -if (flow->pmd_id == pmd->core_id) { -queue_netdev_flow_del(pmd, flow); -} -} -} - static struct dp_netdev_flow * mark_to_flow_find(const struct dp_netdev_pmd_thread *pmd, const uint32_t mark) @@ -4811,7 +4799,6 @@ reload_affected_pmds(struct dp_netdev *dp) CMAP_FOR_EACH (pmd, node, >poll_threads) { if (pmd->need_reload) { -flow_mark_flush(pmd); dp_netdev_reload_pmd__(pmd); } } -- 2.28.0.2311.g225365fb51 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
> -Original Message- > From: Ilya Maximets > Sent: Monday, July 12, 2021 3:43 PM > To: Van Haaren, Harry ; Ilya Maximets > ; Amber, Kumar ; ovs- > d...@openvswitch.org > Cc: f...@sysclose.org; echau...@redhat.com; Ferriter, Cian > ; Stokes, Ian > Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer for > miniflow extract > > On 7/12/21 4:02 PM, Van Haaren, Harry wrote: > >> -Original Message- > >> From: Ilya Maximets > >> Sent: Monday, July 12, 2021 2:25 PM > >> To: Amber, Kumar ; ovs-dev@openvswitch.org > >> Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren, > >> Harry ; Ferriter, Cian > >> ; > >> Stokes, Ian > >> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer > >> for > >> miniflow extract > >> > >> On 7/12/21 7:51 AM, kumar Amber wrote: > >>> From: Kumar Amber > >>> > >>> This patch introduces the MFEX function pointers which allows > >>> the user to switch between different miniflow extract implementations > >>> which are provided by the OVS based on optimized ISA CPU. > >>> > >>> The user can query for the available minflow extract variants available > >>> for that CPU by following commands: > >>> > >>> $ovs-appctl dpif-netdev/miniflow-parser-get > >>> > >>> Similarly an user can set the miniflow implementation by the following > >>> command : > >>> > >>> $ ovs-appctl dpif-netdev/miniflow-parser-set name > >>> > >>> This allows for more performance and flexibility to the user to choose > >>> the miniflow implementation according to the needs. > >>> > >>> Signed-off-by: Kumar Amber > >>> Co-authored-by: Harry van Haaren > >>> Signed-off-by: Harry van Haaren > >>> > >>> --- > >>> v9: > >>> - fix review comments from Flavio > >>> v7: > >>> - fix review comments(Eelco, Flavio) > >>> v5: > >>> - fix review comments(Ian, Flavio, Eelco) > >>> - add enum to hold mfex indexes > >>> - add new get and set implemenatations > >>> - add Atomic set and get > >>> --- > >> > >> ovsrobot has issues with reporting the status right now, but this > >> patch fails the build in GHA: > >> https://github.com/ovsrobot/ovs/actions/runs/1021787643 > > > > Thanks for linking on results. > > > > I've spot-checked a bunch of the failing builds, and found 2 fixable code > > issues. > > A few of the CI run's I can't find/explain the error, but I don't know of a > > good > > way to "jump to the error" line, am I missing a trick, or is scrolling the > > whole > > compiler output and checking errors the best method? > > typing 'error:' in the 'Search logs' field, usually gets you > to the actual error faster, but, unfortunately, scrolling is > the most reliable option. Okay, thanks. > > ISSUES: > > #1 : OVS Requires Mutex issue (Linux clang test dpdk build) > > 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared > identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? > > 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex); > > > > #2 : Unused Argument (As from mailing list review comment too, linux gcc > > dpdk -- > enable-shared) > > 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ > > [-Werror=unused- > parameter] > > 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc, > > > > #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot > > explain this?) > > make: *** [distcheck] Error 1 > > 4490Makefile:5298: recipe for target 'distcheck' failed > > 4491+ cat '*/_build/sub/tests/testsuite.log' > > 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory > > 4493Error: Process completed with exit code 1.> > > SOLUTIONS: > > #1, likely to forward-decl the "dp_netdev_mutex" to make it available > > in the extract header file, and remove the "static" keyword so its no > > longer limited > > to the dpif-netdev.c compilation unit. > > > > #2 is a simple OVS_UNUSED as Eelco suggested during review. > > > > #3, I'm not sure where the DistCheck issue arise from, it seems to be > > missing > directories > > during the test run? Input appreciated, as pushing & hoping tends to be a > > tiresome > > and long process. > > This is just a result of the previous build failure. Build > never reached the testsuite phase, so there are no testsuite > logs there. You should not see this problem once build is > fixed. Aha, good to know. Then a respin with the fixes for the above issues is our next step, will arrive on the mailing list soon. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On 7/12/21 4:02 PM, Van Haaren, Harry wrote: >> -Original Message- >> From: Ilya Maximets >> Sent: Monday, July 12, 2021 2:25 PM >> To: Amber, Kumar ; ovs-dev@openvswitch.org >> Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren, >> Harry ; Ferriter, Cian ; >> Stokes, Ian >> Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer >> for >> miniflow extract >> >> On 7/12/21 7:51 AM, kumar Amber wrote: >>> From: Kumar Amber >>> >>> This patch introduces the MFEX function pointers which allows >>> the user to switch between different miniflow extract implementations >>> which are provided by the OVS based on optimized ISA CPU. >>> >>> The user can query for the available minflow extract variants available >>> for that CPU by following commands: >>> >>> $ovs-appctl dpif-netdev/miniflow-parser-get >>> >>> Similarly an user can set the miniflow implementation by the following >>> command : >>> >>> $ ovs-appctl dpif-netdev/miniflow-parser-set name >>> >>> This allows for more performance and flexibility to the user to choose >>> the miniflow implementation according to the needs. >>> >>> Signed-off-by: Kumar Amber >>> Co-authored-by: Harry van Haaren >>> Signed-off-by: Harry van Haaren >>> >>> --- >>> v9: >>> - fix review comments from Flavio >>> v7: >>> - fix review comments(Eelco, Flavio) >>> v5: >>> - fix review comments(Ian, Flavio, Eelco) >>> - add enum to hold mfex indexes >>> - add new get and set implemenatations >>> - add Atomic set and get >>> --- >> >> ovsrobot has issues with reporting the status right now, but this >> patch fails the build in GHA: >> https://github.com/ovsrobot/ovs/actions/runs/1021787643 > > Thanks for linking on results. > > I've spot-checked a bunch of the failing builds, and found 2 fixable code > issues. > A few of the CI run's I can't find/explain the error, but I don't know of a > good > way to "jump to the error" line, am I missing a trick, or is scrolling the > whole > compiler output and checking errors the best method? typing 'error:' in the 'Search logs' field, usually gets you to the actual error faster, but, unfortunately, scrolling is the most reliable option. > > ISSUES: > #1 : OVS Requires Mutex issue (Linux clang test dpdk build) > 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared > identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? > 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex); > > #2 : Unused Argument (As from mailing list review comment too, linux gcc dpdk > --enable-shared) > 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ > [-Werror=unused-parameter] > 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc, > > #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot > explain this?) > make: *** [distcheck] Error 1 > 4490Makefile:5298: recipe for target 'distcheck' failed > 4491+ cat '*/_build/sub/tests/testsuite.log' > 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory > 4493Error: Process completed with exit code 1.> > SOLUTIONS: > #1, likely to forward-decl the "dp_netdev_mutex" to make it available > in the extract header file, and remove the "static" keyword so its no longer > limited > to the dpif-netdev.c compilation unit. > > #2 is a simple OVS_UNUSED as Eelco suggested during review. > > #3, I'm not sure where the DistCheck issue arise from, it seems to be missing > directories > during the test run? Input appreciated, as pushing & hoping tends to be a > tiresome > and long process. This is just a result of the previous build failure. Build never reached the testsuite phase, so there are no testsuite logs there. You should not see this problem once build is fixed. > > >> Best regards, Ilya Maximets. > > Regards, -Harry > ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH ovn] controller: Avoid unnecessary load balancer flow processing.
On 7/12/21 10:11 AM, Dumitru Ceara wrote: > On 7/9/21 6:11 PM, Han Zhou wrote: >>> To avoid this potentially expensive table walk, we use the load_balancer >>> uuids stored in the datapath record itself (it's probably best to see >>> those as hints I guess). >>> >> Thanks for the explain. What you described is indeed a dependency between >> lflow and sb_load_balancer because in lflow's compute/change handlers >> sb_load_balancer data is required. (otherwise we would not need to call >> sbrec_load_balancer_get_for_uuid(). >> >> However, since this dependency is already captured in the I-P, it is just >> easy for this use case. We should simply use >> sbrec_load_balancer_table_get_for_uuid() instead, which takes struct >> sbrec_load_balancer_table* as argument and we already have it in the >> lflow_ctx_in.lb_table as the input to lflow engine node. >> > > You're right, it's simpler like this, thanks for pointing out the > sbrec_*table_get_for_uuid() variant. > > I sent a v2: > > http://patchwork.ozlabs.org/project/ovn/list/?series=253029 > Sorry for the noise; Ilya mentioned offline that I forgot to update the schema version number, I sent a v3 taking care of that: http://patchwork.ozlabs.org/project/ovn/list/?series=253094 Regards, Dumitru ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn v3] controller: Avoid unnecessary load balancer flow processing.
Whenever a Load_Balancer is updated, e.g., a VIP is added, the following sequence of events happens: 1. The Southbound Load_Balancer record is updated. 2. The Southbound Datapath_Binding records on which the Load_Balancer is applied are updated. 3. Southbound ovsdb-server sends updates about the Load_Balancer and Datapath_Binding records to ovn-controller. 4. The IDL layer in ovn-controller processes the updates at #3, but because of the SB schema references between tables [0] all logical flows referencing the updated Datapath_Binding are marked as "updated". The same is true for Logical_DP_Group records referencing the Datapath_Binding, and also for all logical flows pointing to the new "updated" datapath groups. 5. ovn-controller ends up recomputing (removing/readding) all flows for all these tracked updates. >From the SB Schema: "Datapath_Binding": { "columns": { [...] "load_balancers": {"type": {"key": {"type": "uuid", "refTable": "Load_Balancer", "refType": "weak"}, "min": 0, "max": "unlimited"}}, [...] "Load_Balancer": { "columns": { "datapaths": { [...] "type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": "unlimited"}}, [...] "Logical_DP_Group": { "columns": { "datapaths": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding", "refType": "weak"}, "min": 0, "max": "unlimited"}}}, [...] "Logical_Flow": { "columns": { "logical_datapath": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": 1}}, "logical_dp_group": {"type": {"key": {"type": "uuid", "refTable": "Logical_DP_Group"}, In order to avoid this unnecessary Logical_Flow notification storm we now remove the explicit reference from Datapath_Binding to Load_Balancer and instead store raw UUIDs. This means that on the ovn-controller side we need to perform a Load_Balancer table lookup by UUID whenever a new datapath is added, but that doesn't happen too often and the cost of the lookup is negligible compared to the huge cost of processing the unnecessary logical flow updates. This change is backwards compatible because the contents stored in the database are not changed, just that the schema constraints are relaxed a bit. Some performance measurements, on a scale test deployment simulating an ovn-kubernetes deployment with 120 nodes and a large load balancer with 16K VIPs associated to each node's logical switch, the event processing loop time in ovn-controller, when adding a new VIP, is reduced from ~39 seconds to ~8 seconds. There's no need to change the northd DDlog implementation. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 Acked-by: Mark Michelson Signed-off-by: Dumitru Ceara --- v3: Update SB schema version. v2: Address Han's comments and add Mark's ack. --- controller/lflow.c | 6 -- northd/ovn-northd.c | 14 ++ ovn-sb.ovsschema| 8 +++- 3 files changed, 13 insertions(+), 15 deletions(-) diff --git a/controller/lflow.c b/controller/lflow.c index 60aa011ff..c58c4f25c 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1744,8 +1744,10 @@ lflow_processing_end: /* Add load balancer hairpin flows if the datapath has any load balancers * associated. */ for (size_t i = 0; i < dp->n_load_balancers; i++) { -consider_lb_hairpin_flows(dp->load_balancers[i], - l_ctx_in->local_datapaths, +const struct sbrec_load_balancer *lb = +sbrec_load_balancer_table_get_for_uuid(l_ctx_in->lb_table, + >load_balancers[i]); +consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, l_ctx_out->flow_table); } diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 562dc62b2..999c3f482 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -3635,19 +3635,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, continue; } -const struct sbrec_load_balancer **sbrec_lbs = -xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); +struct uuid *lb_uuids = +xmalloc(od->nbs->n_load_balancer * sizeof
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
> -Original Message- > From: Ilya Maximets > Sent: Monday, July 12, 2021 2:25 PM > To: Amber, Kumar ; ovs-dev@openvswitch.org > Cc: f...@sysclose.org; echau...@redhat.com; i.maxim...@ovn.org; Van Haaren, > Harry ; Ferriter, Cian ; > Stokes, Ian > Subject: Re: [v9 01/12] dpif-netdev: Add command line and function pointer for > miniflow extract > > On 7/12/21 7:51 AM, kumar Amber wrote: > > From: Kumar Amber > > > > This patch introduces the MFEX function pointers which allows > > the user to switch between different miniflow extract implementations > > which are provided by the OVS based on optimized ISA CPU. > > > > The user can query for the available minflow extract variants available > > for that CPU by following commands: > > > > $ovs-appctl dpif-netdev/miniflow-parser-get > > > > Similarly an user can set the miniflow implementation by the following > > command : > > > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > > > This allows for more performance and flexibility to the user to choose > > the miniflow implementation according to the needs. > > > > Signed-off-by: Kumar Amber > > Co-authored-by: Harry van Haaren > > Signed-off-by: Harry van Haaren > > > > --- > > v9: > > - fix review comments from Flavio > > v7: > > - fix review comments(Eelco, Flavio) > > v5: > > - fix review comments(Ian, Flavio, Eelco) > > - add enum to hold mfex indexes > > - add new get and set implemenatations > > - add Atomic set and get > > --- > > ovsrobot has issues with reporting the status right now, but this > patch fails the build in GHA: > https://github.com/ovsrobot/ovs/actions/runs/1021787643 Thanks for linking on results. I've spot-checked a bunch of the failing builds, and found 2 fixable code issues. A few of the CI run's I can't find/explain the error, but I don't know of a good way to "jump to the error" line, am I missing a trick, or is scrolling the whole compiler output and checking errors the best method? ISSUES: #1 : OVS Requires Mutex issue (Linux clang test dpdk build) 1291../../lib/dpif-netdev-private-extract.h:87:53: error: use of undeclared identifier 'dp_netdev_mutex'; did you mean 'dp_netdev_input'? 1292 size_t pmd_list_size) OVS_REQUIRES(dp_netdev_mutex); #2 : Unused Argument (As from mailing list review comment too, linux gcc dpdk --enable-shared) 2353lib/dpif-netdev.c:1079:63: error: unused parameter ‘argc’ [-Werror=unused-parameter] 2354 dpif_miniflow_extract_impl_set(struct unixctl_conn *conn, int argc, #3 : Distcheck directory not valid? (linux gcc test 3.16 build. I cannot explain this?) make: *** [distcheck] Error 1 4490Makefile:5298: recipe for target 'distcheck' failed 4491+ cat '*/_build/sub/tests/testsuite.log' 4492cat: '*/_build/sub/tests/testsuite.log': No such file or directory 4493Error: Process completed with exit code 1. SOLUTIONS: #1, likely to forward-decl the "dp_netdev_mutex" to make it available in the extract header file, and remove the "static" keyword so its no longer limited to the dpif-netdev.c compilation unit. #2 is a simple OVS_UNUSED as Eelco suggested during review. #3, I'm not sure where the DistCheck issue arise from, it seems to be missing directories during the test run? Input appreciated, as pushing & hoping tends to be a tiresome and long process. > Best regards, Ilya Maximets. Regards, -Harry ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On 7/12/21 7:51 AM, kumar Amber wrote: > From: Kumar Amber > > This patch introduces the MFEX function pointers which allows > the user to switch between different miniflow extract implementations > which are provided by the OVS based on optimized ISA CPU. > > The user can query for the available minflow extract variants available > for that CPU by following commands: > > $ovs-appctl dpif-netdev/miniflow-parser-get > > Similarly an user can set the miniflow implementation by the following > command : > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > This allows for more performance and flexibility to the user to choose > the miniflow implementation according to the needs. > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix review comments from Flavio > v7: > - fix review comments(Eelco, Flavio) > v5: > - fix review comments(Ian, Flavio, Eelco) > - add enum to hold mfex indexes > - add new get and set implemenatations > - add Atomic set and get > --- ovsrobot has issues with reporting the status right now, but this patch fails the build in GHA: https://github.com/ovsrobot/ovs/actions/runs/1021787643 Best regards, Ilya Maximets. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 05/12] dpif-netdev: Add configure to enable autovalidator at build time.
On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > This commit adds a new command to allow the user to enable > autovalidatior by default at build time thus allowing for > runnig unit test by default. > > $ ./configure --enable-mfex-default-autovalidator > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix review comments Flavio > v7: > - fix review commens(Eelco, Flavio) > v5: > - fix review comments(Ian, Flavio, Eelco) > --- > --- > Documentation/topics/dpdk/bridge.rst | 5 + > NEWS | 3 ++- > acinclude.m4 | 16 > configure.ac | 1 + > lib/dpif-netdev-private-extract.c| 8 ++-- > 5 files changed, 30 insertions(+), 3 deletions(-) > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index 7c618cf1f..4db416ddd 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -307,3 +307,8 @@ implementations provide the same results. > To set the Miniflow autovalidator, use this command :: > > $ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator > + > +A compile time option is available in order to test it with the OVS unit > +test suite. Use the following configure option :: > + > +$ ./configure --enable-mfex-default-autovalidator > diff --git a/NEWS b/NEWS > index 4a7b89409..581bff225 100644 > --- a/NEWS > +++ b/NEWS > @@ -38,6 +38,8 @@ Post-v2.15.0 > * Add study function to miniflow function table which studies packet > and automatically chooses the best miniflow implementation for that > traffic. > + * Add build time configure command to enable auto-validatior as default > + miniflow implementation at build time. > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > @@ -57,7 +59,6 @@ Post-v2.15.0 > whether the SNAT with all-zero IP address is supported. > See ovs-vswitchd.conf.db(5) for details. > > - You are removing a white space here unrelated to your changes. Please leave it in. > v2.15.0 - 15 Feb 2021 > - > - OVSDB: > diff --git a/acinclude.m4 b/acinclude.m4 > index 343303447..5a48f0335 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -14,6 +14,22 @@ > # See the License for the specific language governing permissions and > # limitations under the License. > > +dnl Set OVS MFEX Autovalidator as default miniflow extract at compile time? > +dnl This enables automatically running all unit tests with all MFEX > +dnl implementations. > +AC_DEFUN([OVS_CHECK_MFEX_AUTOVALIDATOR], [ > + AC_ARG_ENABLE([mfex-default-autovalidator], > +[AC_HELP_STRING([--enable-mfex-default-autovalidator], > [Enable MFEX autovalidator as default miniflow_extract implementation.])], > +[autovalidator=yes],[autovalidator=no]) > + AC_MSG_CHECKING([whether MFEX Autovalidator is default implementation]) > + if test "$autovalidator" != yes; then > +AC_MSG_RESULT([no]) > + else > +OVS_CFLAGS="$OVS_CFLAGS -DMFEX_AUTOVALIDATOR_DEFAULT" > +AC_MSG_RESULT([yes]) > + fi > +]) > + > dnl Set OVS DPCLS Autovalidator as default subtable search at compile time? > dnl This enables automatically running all unit tests with all DPCLS > dnl implementations. > diff --git a/configure.ac b/configure.ac > index e45685a6c..46c402892 100644 > --- a/configure.ac > +++ b/configure.ac > @@ -186,6 +186,7 @@ OVS_ENABLE_SPARSE > OVS_CTAGS_IDENTIFIERS > OVS_CHECK_DPCLS_AUTOVALIDATOR > OVS_CHECK_DPIF_AVX512_DEFAULT > +OVS_CHECK_MFEX_AUTOVALIDATOR > OVS_CHECK_BINUTILS_AVX512 > > AC_ARG_VAR(KARCH, [Kernel Architecture String]) > diff --git a/lib/dpif-netdev-private-extract.c > b/lib/dpif-netdev-private-extract.c > index 4ea111f94..ad71f238e 100644 > --- a/lib/dpif-netdev-private-extract.c > +++ b/lib/dpif-netdev-private-extract.c > @@ -77,20 +77,24 @@ dp_mfex_impl_get_default(void) > { > atomic_uintptr_t *mfex_func = (void *)_mfex_func; > static bool default_mfex_func_set = false; > +#ifdef MFEX_AUTOVALIDATOR_DEFAULT > +int mfex_idx = MFEX_IMPL_AUTOVALIDATOR; > +#else > int mfex_idx = MFEX_IMPL_SCALAR; > +#endif > > /* For the first call, this will be choosen based on the > * compile time flag and if nor flag is set it is set to > * default scalar. > */ > if (OVS_UNLIKELY(!default_mfex_func_set)) { > -VLOG_INFO("Default MFEX implementation is %s.\n", > + > +VLOG_INFO("Default miniflow extract implementation%s.\n", Guess the text should have been updated in the patch introducing it. >mfex_impls[mfex_idx].name); > atomic_store_relaxed(mfex_func, (uintptr_t) mfex_impls > [mfex_idx].extract_func); > default_mfex_func_set = true;
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
On Mon, Jul 12, 2021 at 02:22:46PM +0200, Eelco Chaudron wrote: > See some comments below… > > For this patch series, I’m only looking at the diff from v6..v9, not a full > review. > I will do basic compilation and some tests at the end. > > Cheers, > > Eelco > > > On 12 Jul 2021, at 7:51, kumar Amber wrote: > > > From: Kumar Amber > > > > This patch introduces the MFEX function pointers which allows > > the user to switch between different miniflow extract implementations > > which are provided by the OVS based on optimized ISA CPU. > > > > The user can query for the available minflow extract variants available > > for that CPU by following commands: > > > > $ovs-appctl dpif-netdev/miniflow-parser-get > > > > Similarly an user can set the miniflow implementation by the following > > command : > > > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > > > This allows for more performance and flexibility to the user to choose > > the miniflow implementation according to the needs. > > > > Signed-off-by: Kumar Amber > > Co-authored-by: Harry van Haaren > > Signed-off-by: Harry van Haaren > > > > --- > > v9: > > - fix review comments from Flavio > > v7: > > - fix review comments(Eelco, Flavio) > > v5: > > - fix review comments(Ian, Flavio, Eelco) > > - add enum to hold mfex indexes > > - add new get and set implemenatations > > - add Atomic set and get > > --- > > --- > > NEWS | 1 + > > lib/automake.mk | 2 + > > lib/dpif-netdev-avx512.c | 31 +- > > lib/dpif-netdev-private-extract.c | 162 ++ > > lib/dpif-netdev-private-extract.h | 111 > > lib/dpif-netdev-private-thread.h | 8 ++ > > lib/dpif-netdev.c | 105 +++ > > 7 files changed, 416 insertions(+), 4 deletions(-) > > create mode 100644 lib/dpif-netdev-private-extract.c > > create mode 100644 lib/dpif-netdev-private-extract.h > > > > diff --git a/NEWS b/NEWS > > index 6cdccc715..b0f08e96d 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -32,6 +32,7 @@ Post-v2.15.0 > > * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction > > if the > > CPU supports it. This enhances performance by using the native > > vpopcount > > instructions, instead of the emulated version of vpopcount. > > + * Add command line option to switch between MFEX function pointers. > > - ovs-ctl: > > * New option '--no-record-hostname' to disable hostname configuration > > in ovsdb on startup. > > diff --git a/lib/automake.mk b/lib/automake.mk > > index 3c9523c1a..53b8abc0f 100644 > > --- a/lib/automake.mk > > +++ b/lib/automake.mk > > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \ > > lib/dpif-netdev-private-dpcls.h \ > > lib/dpif-netdev-private-dpif.c \ > > lib/dpif-netdev-private-dpif.h \ > > + lib/dpif-netdev-private-extract.c \ > > + lib/dpif-netdev-private-extract.h \ > > lib/dpif-netdev-private-flow.h \ > > lib/dpif-netdev-private-thread.h \ > > lib/dpif-netdev-private.h \ > > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c > > index 6f9aa8284..7772b7abf 100644 > > --- a/lib/dpif-netdev-avx512.c > > +++ b/lib/dpif-netdev-avx512.c > > @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct > > dp_netdev_pmd_thread *pmd, > > * // do all processing (HWOL->MFEX->EMC->SMC) > > * } > > */ > > + > > +/* Do a batch minfilow extract into keys. */ > > +uint32_t mf_mask = 0; > > +miniflow_extract_func mfex_func; > > +atomic_read_relaxed(>miniflow_extract_opt, _func); > > +if (mfex_func) { > > +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd); > > +} > > + > > uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; > > uint32_t iter = lookup_pkts_bitmask; > > while (iter) { > > @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct > > dp_netdev_pmd_thread *pmd, > > pkt_metadata_init(>md, in_port); > > > > struct dp_netdev_flow *f = NULL; > > +struct netdev_flow_key *key = [i]; > > + > > +/* Check the minfiflow mask to see if the packet was correctly > > + * classifed by vector mfex else do a scalar miniflow extract > > + * for that packet. > > + */ > > +bool mfex_hit = !!(mf_mask & (1 << i)); > > > > /* Check for a partial hardware offload match. */ > > if (hwol_enabled) { > > @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct > > dp_netdev_pmd_thread *pmd, > > } > > if (f) { > > rules[i] = >cr; > > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet); > > +/* If AVX512 MFEX already classified the packet, use it. */ > > +if (mfex_hit) { > > +pkt_meta[i].tcp_flags = > > miniflow_get_tcp_flags(>mf); > > +} else { > > +
Re: [ovs-dev] [v9 04/12] docs/dpdk/bridge: add miniflow extract section.
On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > This commit adds a section to the dpdk/bridge.rst netdev documentation, > detailing the added miniflow functionality. The newly added commands are > documented, and sample output is provided. > > The use of auto-validator and special study function is also described > in detail as well as running fuzzy tests. > > Signed-off-by: Kumar Amber > Co-authored-by: Cian Ferriter > Signed-off-by: Cian Ferriter > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > Acked-by: Flavio Leitner > > --- > v7: > - fix review comments(Eelco) > v5: > - fix review comments(Ian, Flavio, Eelco) > --- > --- > Documentation/topics/dpdk/bridge.rst | 51 > 1 file changed, 51 insertions(+) > > diff --git a/Documentation/topics/dpdk/bridge.rst > b/Documentation/topics/dpdk/bridge.rst > index 2d0850836..7c618cf1f 100644 > --- a/Documentation/topics/dpdk/bridge.rst > +++ b/Documentation/topics/dpdk/bridge.rst > @@ -256,3 +256,54 @@ The following line should be seen in the configure > output when the above option > is used :: > > checking whether DPIF AVX512 is default implementation... yes > + > +Miniflow Extract > + > + > +Miniflow extract (MFEX) performs parsing of the raw packets and extracts the > +important header information into a compressed miniflow. This miniflow is > +composed of bits and blocks where the bits signify which blocks are set or > +have values where as the blocks hold the metadata, ip, udp, vlan, etc. These > +values are used by the datapath for switching decisions later.The Optimized > +miniflow extract is traffic specific to speed up the lookup, whereas the > +scalar works for ALL traffic patterns > + > +Most modern CPUs have SIMD capabilities. These SIMD instructions are able > +to process a vector rather than act on one single data. This sounds odd “rather than act on one single data.”? > OVS provides multiple > +implementations of miniflow extract. This allows the user to take advantage > +of SIMD instructions like AVX512 to gain additional performance. > + > +A list of implementations can be obtained by the following command. The > +command also shows whether the CPU supports each implementation :: > + > +$ ovs-appctl dpif-netdev/miniflow-parser-get > +Available Optimized Miniflow Extracts: > +autovalidator (available: True, pmds: none) > +scalar (available: True, pmds: 1,15) > +study (available: True, pmds: none) > + > +An implementation can be selected manually by the following command :: > + > +$ ovs-appctl dpif-netdev/miniflow-parser-set study > + > +Also user can select the study implementation which studies the traffic for > +a specific number of packets by applying all available implementaions of implementations > +miniflow extract and than chooses the one with most optimal result for that than -> then most optimal -> the most optimal > +traffic pattern. > + > +Miniflow Extract Validation > +~~~ > + > +As multiple versions of miniflow extract can co-exist, each with different > +CPU ISA optimizations, it is important to validate that they all give the > +exact same results. To easily test all miniflow implementations, an > +``autovalidator`` implementation of the miniflow exists. This implementation > +runs all other available miniflow extract implementations, and verifies that > +the results are identical. > + > +Running the OVS unit tests with the autovalidator enabled ensures all > +implementations provide the same results. > + > +To set the Miniflow autovalidator, use this command :: > + > +$ ovs-appctl dpif-netdev/miniflow-parser-set autovalidator > -- > 2.25.1 ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function
On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > The study function runs all the available implementations > of miniflow_extract and makes a choice whose hitmask has > maximum hits and sets the mfex to that function. > > Study can be run at runtime using the following command: > > $ ovs-appctl dpif-netdev/miniflow-parser-set study > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix comments Flavio > v8: > - fix review comments Flavio > v7: > - fix review comments(Eelco) > v5: > - fix review comments(Ian, Flavio, Eelco) > - add Atomic set in study > --- > --- > NEWS | 3 + > lib/automake.mk | 1 + > lib/dpif-netdev-extract-study.c | 136 ++ > lib/dpif-netdev-private-extract.c | 11 +++ > lib/dpif-netdev-private-extract.h | 23 + > 5 files changed, 174 insertions(+) > create mode 100644 lib/dpif-netdev-extract-study.c > > diff --git a/NEWS b/NEWS > index cf254bcfe..4a7b89409 100644 > --- a/NEWS > +++ b/NEWS > @@ -35,6 +35,9 @@ Post-v2.15.0 > * Add command line option to switch between MFEX function pointers. > * Add miniflow extract auto-validator function to compare different > miniflow extract implementations against default implementation. > + * Add study function to miniflow function table which studies packet > + and automatically chooses the best miniflow implementation for that > + traffic. > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > diff --git a/lib/automake.mk b/lib/automake.mk > index 53b8abc0f..f4f36325e 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -107,6 +107,7 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dp-packet.h \ > lib/dp-packet.c \ > lib/dpdk.h \ > + lib/dpif-netdev-extract-study.c \ > lib/dpif-netdev-lookup.h \ > lib/dpif-netdev-lookup.c \ > lib/dpif-netdev-lookup-autovalidator.c \ > diff --git a/lib/dpif-netdev-extract-study.c b/lib/dpif-netdev-extract-study.c > new file mode 100644 > index 0..a19759bd9 > --- /dev/null > +++ b/lib/dpif-netdev-extract-study.c > @@ -0,0 +1,136 @@ > +/* > + * Copyright (c) 2021 Intel. > + * > + * Licensed under the Apache License, Version 2.0 (the "License"); > + * you may not use this file except in compliance with the License. > + * You may obtain a copy of the License at: > + * > + * http://www.apache.org/licenses/LICENSE-2.0 > + * > + * Unless required by applicable law or agreed to in writing, software > + * distributed under the License is distributed on an "AS IS" BASIS, > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. > + * See the License for the specific language governing permissions and > + * limitations under the License. > + */ > + > +#include > +#include > +#include > +#include > + > +#include "dpif-netdev-private-thread.h" > +#include "openvswitch/vlog.h" > +#include "ovs-thread.h" > + > +VLOG_DEFINE_THIS_MODULE(dpif_mfex_extract_study); > + > +static uint32_t mfex_study_pkts_count = 0; > + > +/* Struct to hold miniflow study stats. */ > +struct study_stats { > +uint32_t pkt_count; > +uint32_t impl_hitcount[MFEX_IMPL_MAX]; > +}; > + > +/* Define per thread data to hold the study stats. */ > +DEFINE_PER_THREAD_MALLOCED_DATA(struct study_stats *, study_stats); > + > +/* Allocate per thread PMD pointer space for study_stats. */ > +static inline struct study_stats * > +mfex_study_get_study_stats_ptr(void) > +{ > +struct study_stats *stats = study_stats_get(); > +if (OVS_UNLIKELY(!stats)) { > + stats = xzalloc(sizeof *stats); > + study_stats_set_unsafe(stats); > +} > +return stats; > +} > + > +uint32_t > +mfex_study_traffic(struct dp_packet_batch *packets, > + struct netdev_flow_key *keys, > + uint32_t keys_size, odp_port_t in_port, > + struct dp_netdev_pmd_thread *pmd_handle) > +{ > +uint32_t hitmask = 0; > +uint32_t mask = 0; > +struct dp_netdev_pmd_thread *pmd = pmd_handle; > +struct dpif_miniflow_extract_impl *miniflow_funcs; > +struct study_stats *stats = mfex_study_get_study_stats_ptr(); > +dpif_mfex_impl_info_get(_funcs); > + > +/* Run traffic optimized miniflow_extract to collect the hitmask > + * to be compared after certain packets have been hit to choose > + * the best miniflow_extract version for that traffic. > + */ > +for (int i = MFEX_IMPL_START_IDX; i < MFEX_IMPL_MAX; i++) { > +if (!miniflow_funcs[i].available) { > +continue; > +} > + > +hitmask = miniflow_funcs[i].extract_func(packets, keys, keys_size, > + in_port, pmd_handle); > +stats->impl_hitcount[i] += count_1bits(hitmask); > + > +/* If traffic is
Re: [ovs-dev] [PATCH ovn] controller: instrument ovn-controller loop with stopwatch
Bleep bloop. Greetings Lorenzo Bianconi, I am a robot and I have tried out your patch. Thanks for your contribution. I encountered some error that I wasn't expecting. See the details below. git-am: error: Failed to merge in the changes. hint: Use 'git am --show-current-patch' to see the failed patch Patch failed at 0001 controller: instrument ovn-controller loop with stopwatch When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". Please check this out. If you feel there has been an error, please email acon...@redhat.com Thanks, 0-day Robot ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-offload-tc: verify the flower rule installed
On Mon, Jul 12, 2021 at 10:28:15AM +0200, Eelco Chaudron wrote: > > > On 9 Jul 2021, at 20:23, Ilya Maximets wrote: > > > On 7/9/21 10:35 AM, Eelco Chaudron wrote: > >> > >> > >> On 8 Jul 2021, at 22:18, Ilya Maximets wrote: > >> > >>> On 5/17/21 3:20 PM, Eelco Chaudron wrote: > When OVs installs the flower rule, it only checks for the OK from the > kernel. It does not check if the rule requested matches the one > actually programmed. This change will add this check and warns the > user if this is not the case. > > Signed-off-by: Eelco Chaudron > --- > lib/tc.c | 59 > +++ > 1 file changed, 59 insertions(+) > > diff --git a/lib/tc.c b/lib/tc.c > index a27cca2cc..e134f6a06 100644 > --- a/lib/tc.c > +++ b/lib/tc.c > @@ -2979,6 +2979,50 @@ nl_msg_put_flower_options(struct ofpbuf *request, > struct tc_flower *flower) > return 0; > } > > +static bool > +cmp_tc_flower_match_action(const struct tc_flower *a, > + const struct tc_flower *b) > +{ > +if (memcmp(>mask, >mask, sizeof a->mask)) { > +VLOG_DBG_RL(_rl, "tc flower compare failed mask compare"); > +return false; > +} > + > +/* We can not memcmp() the key as some keys might be set while the > mask > + * is not.*/ > + > +for (int i = 0; i < sizeof a->key; i++) { > +uint8_t mask = ((uint8_t *)>mask)[i]; > +uint8_t key_a = ((uint8_t *)>key)[i] & mask; > +uint8_t key_b = ((uint8_t *)>key)[i] & mask; > + > +if (key_a != key_b) { > +VLOG_DBG_RL(_rl, "tc flower compare failed key > compare at " > +"%d", i); > +return false; > +} > +} > + > +/* Compare the actions. */ > +const struct tc_action *action_a = a->actions; > +const struct tc_action *action_b = b->actions; > + > +if (a->action_count != b->action_count) { > +VLOG_DBG_RL(_rl, "tc flower compare failed action length > check"); > +return false; > +} > + > +for (int i = 0; i < a->action_count; i++, action_a++, action_b++) { > +if (memcmp(action_a, action_b, sizeof *action_a)) { > +VLOG_DBG_RL(_rl, "tc flower compare failed action > compare " > +"for %d", i); > +return false; > +} > +} > + > +return true; > +} > + > int > tc_replace_flower(struct tcf_id *id, struct tc_flower *flower) > { > @@ -3010,6 +3054,21 @@ tc_replace_flower(struct tcf_id *id, struct > tc_flower *flower) > > id->prio = tc_get_major(tc->tcm_info); > id->handle = tc->tcm_handle; > + > +if (id->prio != TC_RESERVED_PRIORITY_POLICE) { > +struct tc_flower flower_out; > +struct tcf_id id_out; > +int ret; > + > +ret = parse_netlink_to_tc_flower(reply, _out, > _out, > + false); > + > +if (ret || !cmp_tc_flower_match_action(flower, > _out)) { > +VLOG_WARN_RL(_rl, "Kernel flower acknowledgment > does " > + "not match request!\n Set dpif_netlink to > dbg to " > + "see which rule caused this error."); > >>> > >>> So we're only printing the warning and not reverting the change > >>> and not returning an error, right? So, OVS will continue to > >>> work with the incorrect rule installed? > >>> I think, we should revert the incorrect change and return the > >>> error, so the flow could be installed to the OVS kernel datapath, > >>> but maybe this is a task for a separate change. > >>> > >>> What do you think? > >> > >> The goal was to make sure we do not break anything, in case there is an > >> existing kernel bug. As unfortunately, we are missing a good set of TC > >> unit tests. > >> > >> With the "warning only" option, we can backport this. And if in the field > >> we do not see any (false) reports, a follow-up patch can do as you > >> suggested. > > > > Make sense. I removed '\n' from a warning (these doesn't look good in the > > log) > > and applied to master. > > Thanks! > > > You and Marcelo are talking about backporting, do you think it make sense to > > backport to stable branches? > > If it applies cleanly, I would suggest backporting it all the way to 2.13. > Marcelo? I don't know how different is the support for 2.13 and 2.15. I mean, if 2.13 is only for critical patches or so. Anyhow, I'd say 2.15 yes and 2.13 on best effort. :) > > //Eelco
Re: [ovs-dev] [v9 03/12] dpif-netdev: Add study function to select the best mfex function
On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > The study function runs all the available implementations > of miniflow_extract and makes a choice whose hitmask has > maximum hits and sets the mfex to that function. > > Study can be run at runtime using the following command: > > $ ovs-appctl dpif-netdev/miniflow-parser-set study > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix comments Flavio > v8: > - fix review comments Flavio > v7: > - fix review comments(Eelco) > v5: > - fix review comments(Ian, Flavio, Eelco) > - add Atomic set in study > --- Acked-by: Eelco Chaudron ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
[ovs-dev] [PATCH ovn] controller: instrument ovn-controller loop with stopwatch
Introduce stopwatch instrumentation to the following ovn-controller routines: - commit_ct_zones - bfd_run - patch_run - pinctrl_run - if_status_mgr_update - if_status_mgr_run - ofctrl_seqno_run Signed-off-by: Lorenzo Bianconi --- This patch is based on controller: "Add stopwatch to measure OF update duration" http://patchwork.ozlabs.org/project/ovn/patch/20210706144149.3553-1-dce...@redhat.com/ --- controller/ovn-controller.c | 38 + 1 file changed, 38 insertions(+) diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 6a9c25f28..57a0bf393 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -95,6 +95,13 @@ static unixctl_cb_func debug_delay_nb_cfg_report; #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" +#define PINCTRL_RUN_STOPWATCH_NAME "pinctrl-run" +#define PATCH_RUN_STOPWATCH_NAME "patch-run" +#define CT_ZONE_COMMIT_STOPWATCH_NAME "ct-zone-commit" +#define IF_STATUS_MGR_RUN_STOPWATCH_NAME "if-status-mgr-run" +#define IF_STATUS_MGR_UPDATE_STOPWATCH_NAME "if-status-mgr-update" +#define OFCTRL_SEQNO_RUN_STOPWATCH_NAME "ofctrl-seqno-run" +#define BFD_RUN_STOPWATCH_NAME "bfd-run" #define OVS_NB_CFG_NAME "ovn-nb-cfg" @@ -2847,6 +2854,13 @@ main(int argc, char *argv[]) stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS); stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS); +stopwatch_create(PINCTRL_RUN_STOPWATCH_NAME, SW_MS); +stopwatch_create(PATCH_RUN_STOPWATCH_NAME, SW_MS); +stopwatch_create(CT_ZONE_COMMIT_STOPWATCH_NAME, SW_MS); +stopwatch_create(IF_STATUS_MGR_RUN_STOPWATCH_NAME, SW_MS); +stopwatch_create(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, SW_MS); +stopwatch_create(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, SW_MS); +stopwatch_create(BFD_RUN_STOPWATCH_NAME, SW_MS); /* Define inc-proc-engine nodes. */ ENGINE_NODE_CUSTOM_DATA(ct_zones, "ct_zones"); @@ -3231,23 +3245,33 @@ main(int argc, char *argv[]) ct_zones_data = engine_get_data(_ct_zones); if (ovs_idl_txn) { if (ct_zones_data) { +stopwatch_start(CT_ZONE_COMMIT_STOPWATCH_NAME, +time_msec()); commit_ct_zones(br_int, _zones_data->pending); +stopwatch_stop(CT_ZONE_COMMIT_STOPWATCH_NAME, + time_msec()); } +stopwatch_start(BFD_RUN_STOPWATCH_NAME, time_msec()); bfd_run(ovsrec_interface_table_get(ovs_idl_loop.idl), br_int, chassis, sbrec_ha_chassis_group_table_get( ovnsb_idl_loop.idl), sbrec_sb_global_table_get(ovnsb_idl_loop.idl)); +stopwatch_stop(BFD_RUN_STOPWATCH_NAME, time_msec()); } runtime_data = engine_get_data(_runtime_data); if (runtime_data) { +stopwatch_start(PATCH_RUN_STOPWATCH_NAME, time_msec()); patch_run(ovs_idl_txn, sbrec_port_binding_by_type, ovsrec_bridge_table_get(ovs_idl_loop.idl), ovsrec_open_vswitch_table_get(ovs_idl_loop.idl), ovsrec_port_table_get(ovs_idl_loop.idl), br_int, chassis, _data->local_datapaths); +stopwatch_stop(PATCH_RUN_STOPWATCH_NAME, time_msec()); +stopwatch_start(PINCTRL_RUN_STOPWATCH_NAME, +time_msec()); pinctrl_run(ovnsb_idl_txn, sbrec_datapath_binding_by_key, sbrec_port_binding_by_datapath, @@ -3266,6 +3290,8 @@ main(int argc, char *argv[]) br_int, chassis, _data->local_datapaths, _data->active_tunnels); +stopwatch_stop(PINCTRL_RUN_STOPWATCH_NAME, + time_msec()); /* Updating monitor conditions if runtime data or * logical datapath goups changed. */ if (engine_node_changed(_runtime_data) @@ -3288,7 +3314,11 @@ main(int argc, char *argv[]) struct local_binding_data *binding_data = runtime_data ? _data->lbinding_data : NULL; +stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME, +time_msec()); if_status_mgr_update(if_mgr, binding_data); +
Re: [ovs-dev] [PATCH ovs v1] tunnel: Remove the padding from packet when encapsulating.
On Mon, Jun 28, 2021 at 10:07 AM Tonghao Zhang wrote: > On Thu, Apr 1, 2021 at 9:34 PM Tonghao Zhang > wrote: > > > > On Mon, Dec 14, 2020 at 11:11 AM wrote: > > > > > > From: Tonghao Zhang > > > > > > The root cause is that the old version of openvswitch doesn't > > > remove the padding from packet before L3+ conntrack processing > > > and then packets is dropped in linux kernel stack. The patch [1] > > > fixes the issue. We fix this issue on gateway which running ovs-dpdk > > > as a quick workaround. Padding should be removed because tunnel size > > > + inner size > 64B. More detailes, see [1] > > > > > > [1] - > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9382fe71c0058465e942a633869629929102843d > > > Signed-off-by: Tonghao Zhang > > ping :) > friendly ping. Hi Ilya Can you help me to review this patch? > > > --- > > > lib/netdev-native-tnl.c | 4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/lib/netdev-native-tnl.c b/lib/netdev-native-tnl.c > > > index b89dfdd52..acfbb13c4 100644 > > > --- a/lib/netdev-native-tnl.c > > > +++ b/lib/netdev-native-tnl.c > > > @@ -149,11 +149,15 @@ void * > > > netdev_tnl_push_ip_header(struct dp_packet *packet, > > > const void *header, int size, int *ip_tot_size) > > > { > > > +int padding = dp_packet_l2_pad_size(packet); > > > struct eth_header *eth; > > > struct ip_header *ip; > > > struct ovs_16aligned_ip6_hdr *ip6; > > > > > > eth = dp_packet_push_uninit(packet, size); > > > +if (padding) { > > > +dp_packet_set_size(packet, dp_packet_size(packet) - padding); > > > +} > > > *ip_tot_size = dp_packet_size(packet) - sizeof (struct > eth_header); > > > > > > memcpy(eth, header, size); > > > -- > > > 2.14.1 > > > > > > > > > -- > > Best regards, Tonghao > > > > -- > Best regards, Tonghao > -- Best regards, Tonghao ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [v9 01/12] dpif-netdev: Add command line and function pointer for miniflow extract
See some comments below… For this patch series, I’m only looking at the diff from v6..v9, not a full review. I will do basic compilation and some tests at the end. Cheers, Eelco On 12 Jul 2021, at 7:51, kumar Amber wrote: > From: Kumar Amber > > This patch introduces the MFEX function pointers which allows > the user to switch between different miniflow extract implementations > which are provided by the OVS based on optimized ISA CPU. > > The user can query for the available minflow extract variants available > for that CPU by following commands: > > $ovs-appctl dpif-netdev/miniflow-parser-get > > Similarly an user can set the miniflow implementation by the following > command : > > $ ovs-appctl dpif-netdev/miniflow-parser-set name > > This allows for more performance and flexibility to the user to choose > the miniflow implementation according to the needs. > > Signed-off-by: Kumar Amber > Co-authored-by: Harry van Haaren > Signed-off-by: Harry van Haaren > > --- > v9: > - fix review comments from Flavio > v7: > - fix review comments(Eelco, Flavio) > v5: > - fix review comments(Ian, Flavio, Eelco) > - add enum to hold mfex indexes > - add new get and set implemenatations > - add Atomic set and get > --- > --- > NEWS | 1 + > lib/automake.mk | 2 + > lib/dpif-netdev-avx512.c | 31 +- > lib/dpif-netdev-private-extract.c | 162 ++ > lib/dpif-netdev-private-extract.h | 111 > lib/dpif-netdev-private-thread.h | 8 ++ > lib/dpif-netdev.c | 105 +++ > 7 files changed, 416 insertions(+), 4 deletions(-) > create mode 100644 lib/dpif-netdev-private-extract.c > create mode 100644 lib/dpif-netdev-private-extract.h > > diff --git a/NEWS b/NEWS > index 6cdccc715..b0f08e96d 100644 > --- a/NEWS > +++ b/NEWS > @@ -32,6 +32,7 @@ Post-v2.15.0 > * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if > the > CPU supports it. This enhances performance by using the native > vpopcount > instructions, instead of the emulated version of vpopcount. > + * Add command line option to switch between MFEX function pointers. > - ovs-ctl: > * New option '--no-record-hostname' to disable hostname configuration > in ovsdb on startup. > diff --git a/lib/automake.mk b/lib/automake.mk > index 3c9523c1a..53b8abc0f 100644 > --- a/lib/automake.mk > +++ b/lib/automake.mk > @@ -118,6 +118,8 @@ lib_libopenvswitch_la_SOURCES = \ > lib/dpif-netdev-private-dpcls.h \ > lib/dpif-netdev-private-dpif.c \ > lib/dpif-netdev-private-dpif.h \ > + lib/dpif-netdev-private-extract.c \ > + lib/dpif-netdev-private-extract.h \ > lib/dpif-netdev-private-flow.h \ > lib/dpif-netdev-private-thread.h \ > lib/dpif-netdev-private.h \ > diff --git a/lib/dpif-netdev-avx512.c b/lib/dpif-netdev-avx512.c > index 6f9aa8284..7772b7abf 100644 > --- a/lib/dpif-netdev-avx512.c > +++ b/lib/dpif-netdev-avx512.c > @@ -149,6 +149,15 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread > *pmd, > * // do all processing (HWOL->MFEX->EMC->SMC) > * } > */ > + > +/* Do a batch minfilow extract into keys. */ > +uint32_t mf_mask = 0; > +miniflow_extract_func mfex_func; > +atomic_read_relaxed(>miniflow_extract_opt, _func); > +if (mfex_func) { > +mf_mask = mfex_func(packets, keys, batch_size, in_port, pmd); > +} > + > uint32_t lookup_pkts_bitmask = (1ULL << batch_size) - 1; > uint32_t iter = lookup_pkts_bitmask; > while (iter) { > @@ -167,6 +176,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread > *pmd, > pkt_metadata_init(>md, in_port); > > struct dp_netdev_flow *f = NULL; > +struct netdev_flow_key *key = [i]; > + > +/* Check the minfiflow mask to see if the packet was correctly > + * classifed by vector mfex else do a scalar miniflow extract > + * for that packet. > + */ > +bool mfex_hit = !!(mf_mask & (1 << i)); > > /* Check for a partial hardware offload match. */ > if (hwol_enabled) { > @@ -177,7 +193,13 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread > *pmd, > } > if (f) { > rules[i] = >cr; > -pkt_meta[i].tcp_flags = parse_tcp_flags(packet); > +/* If AVX512 MFEX already classified the packet, use it. */ > +if (mfex_hit) { > +pkt_meta[i].tcp_flags = miniflow_get_tcp_flags(>mf); > +} else { > +pkt_meta[i].tcp_flags = parse_tcp_flags(packet); > +} > + > pkt_meta[i].bytes = dp_packet_size(packet); > phwol_hits++; > hwol_emc_smc_hitmask |= (1 << i); > @@ -185,9 +207,10 @@ dp_netdev_input_outer_avx512(struct dp_netdev_pmd_thread > *pmd,
Re: [ovs-dev] [PATCH 2/2] netdev-offload-dpdk: Fix vxlan vni cast-align warnings
> -Original Message- > From: Eli Britstein > Sent: Sunday, July 11, 2021 6:15 AM > To: d...@openvswitch.org; Ilya Maximets ; Van Haaren, > Harry > Cc: Gaetan Rivet ; Majd Dibbiny ; Eli > Britstein > Subject: [PATCH 2/2] netdev-offload-dpdk: Fix vxlan vni cast-align warnings > > Compiling with -Werror and -Wcast-align has errors like: > > lib/netdev-offload-dpdk.c: In function 'dump_flow_pattern': > lib/netdev-offload-dpdk.c:385:38: error: cast increases required alignment > of target type [-Werror=cast-align] > 385 |ntohl(*(ovs_be32 *) vxlan_spec->vni) >> 8, > | ^ > > Fix them. > > Reported-by: Harry Van Haaren > Fixes: 4e432d6f8128 ("netdev-offload-dpdk: Support tnl/push using vxlan encap > attribute.") > Fixes: e098c2f966cb ("netdev-dpdk-offload: Add vxlan pattern matching > function.") > Signed-off-by: Eli Britstein Thanks Eli, tested compilation, and the cast-align issue is resolved for master branch with these the patches in this series. I cannot test functionality here, so just compile tested. Series-Tested-by: Harry van Haaren ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [ovs-dev] [PATCH] netdev-offload-tc: verify the flower rule installed
On 9 Jul 2021, at 20:23, Ilya Maximets wrote: > On 7/9/21 10:35 AM, Eelco Chaudron wrote: >> >> >> On 8 Jul 2021, at 22:18, Ilya Maximets wrote: >> >>> On 5/17/21 3:20 PM, Eelco Chaudron wrote: When OVs installs the flower rule, it only checks for the OK from the kernel. It does not check if the rule requested matches the one actually programmed. This change will add this check and warns the user if this is not the case. Signed-off-by: Eelco Chaudron --- lib/tc.c | 59 +++ 1 file changed, 59 insertions(+) diff --git a/lib/tc.c b/lib/tc.c index a27cca2cc..e134f6a06 100644 --- a/lib/tc.c +++ b/lib/tc.c @@ -2979,6 +2979,50 @@ nl_msg_put_flower_options(struct ofpbuf *request, struct tc_flower *flower) return 0; } +static bool +cmp_tc_flower_match_action(const struct tc_flower *a, + const struct tc_flower *b) +{ +if (memcmp(>mask, >mask, sizeof a->mask)) { +VLOG_DBG_RL(_rl, "tc flower compare failed mask compare"); +return false; +} + +/* We can not memcmp() the key as some keys might be set while the mask + * is not.*/ + +for (int i = 0; i < sizeof a->key; i++) { +uint8_t mask = ((uint8_t *)>mask)[i]; +uint8_t key_a = ((uint8_t *)>key)[i] & mask; +uint8_t key_b = ((uint8_t *)>key)[i] & mask; + +if (key_a != key_b) { +VLOG_DBG_RL(_rl, "tc flower compare failed key compare at " +"%d", i); +return false; +} +} + +/* Compare the actions. */ +const struct tc_action *action_a = a->actions; +const struct tc_action *action_b = b->actions; + +if (a->action_count != b->action_count) { +VLOG_DBG_RL(_rl, "tc flower compare failed action length check"); +return false; +} + +for (int i = 0; i < a->action_count; i++, action_a++, action_b++) { +if (memcmp(action_a, action_b, sizeof *action_a)) { +VLOG_DBG_RL(_rl, "tc flower compare failed action compare " +"for %d", i); +return false; +} +} + +return true; +} + int tc_replace_flower(struct tcf_id *id, struct tc_flower *flower) { @@ -3010,6 +3054,21 @@ tc_replace_flower(struct tcf_id *id, struct tc_flower *flower) id->prio = tc_get_major(tc->tcm_info); id->handle = tc->tcm_handle; + +if (id->prio != TC_RESERVED_PRIORITY_POLICE) { +struct tc_flower flower_out; +struct tcf_id id_out; +int ret; + +ret = parse_netlink_to_tc_flower(reply, _out, _out, + false); + +if (ret || !cmp_tc_flower_match_action(flower, _out)) { +VLOG_WARN_RL(_rl, "Kernel flower acknowledgment does " + "not match request!\n Set dpif_netlink to dbg to " + "see which rule caused this error."); >>> >>> So we're only printing the warning and not reverting the change >>> and not returning an error, right? So, OVS will continue to >>> work with the incorrect rule installed? >>> I think, we should revert the incorrect change and return the >>> error, so the flow could be installed to the OVS kernel datapath, >>> but maybe this is a task for a separate change. >>> >>> What do you think? >> >> The goal was to make sure we do not break anything, in case there is an >> existing kernel bug. As unfortunately, we are missing a good set of TC unit >> tests. >> >> With the "warning only" option, we can backport this. And if in the field we >> do not see any (false) reports, a follow-up patch can do as you suggested. > > Make sense. I removed '\n' from a warning (these doesn't look good in the > log) > and applied to master. Thanks! > You and Marcelo are talking about backporting, do you think it make sense to > backport to stable branches? If it applies cleanly, I would suggest backporting it all the way to 2.13. Marcelo? //Eelco ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev