Re: [ovs-dev] [PATCH v3 4/7] dpcls: enable cpu feature detection

2020-06-16 Thread William Tu
btw, remember to add "." at the end of the commit title.
so
"dpcls: enable cpu feature detection."

On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
 wrote:
>
> This commit implements a method to retrieve the CPU ISA capabilities.
> These ISA capabilities can be used in OVS to select a function
> implementation that uses the best ISA available on the CPU being used.
>
> Signed-off-by: Harry van Haaren 
> ---
>  lib/dpdk-stub.c | 13 +
>  lib/dpdk.c  | 27 +++
>  lib/dpdk.h  |  2 ++
>  3 files changed, 42 insertions(+)
>
> diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
> index c332c217c..9935f3d2b 100644
> --- a/lib/dpdk-stub.c
> +++ b/lib/dpdk-stub.c
> @@ -79,6 +79,19 @@ print_dpdk_version(void)
>  {
>  }
>
> +int
> +dpdk_get_cpu_has_isa(const char *arch OVS_UNUSED,
> + const char *feature OVS_UNUSED)
> +{
> +static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
> +if (ovsthread_once_start(&once)) {
> +VLOG_ERR("DPDK not supported in this version of Open vSwitch, "
> + "cannot use CPU flag based optimizations");
> +ovsthread_once_done(&once);
> +}
> +return 0;
> +}
> +
>  void
>  dpdk_status(const struct ovsrec_open_vswitch *cfg)
>  {
> diff --git a/lib/dpdk.c b/lib/dpdk.c
> index 31450d470..3bea65859 100644
> --- a/lib/dpdk.c
> +++ b/lib/dpdk.c
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -513,6 +514,32 @@ print_dpdk_version(void)
>  puts(rte_version());
>  }
>
> +#define CHECK_CPU_FEATURE(feature, name_str, RTE_CPUFLAG)   \
> +do {\
> +if (strncmp(feature, name_str, strlen(name_str)) == 0) {\
> +int has_isa = rte_cpu_get_flag_enabled(RTE_CPUFLAG);\
> +VLOG_DBG("CPU flag %s, available %s\n", name_str,   \
> +  has_isa ? "yes" : "no");  \
> +return has_isa; \
> +}   \
> +} while (0)
> +
> +int
> +dpdk_get_cpu_has_isa(const char *arch, const char *feature)
> +{
> +/* Ensure Arch is x86_64 */
> +if (strncmp(arch, "x86_64", 6) != 0) {
> +return 0;
> +}
> +
> +CHECK_CPU_FEATURE(feature, "avx512f", RTE_CPUFLAG_AVX512F);
> +CHECK_CPU_FEATURE(feature, "bmi2", RTE_CPUFLAG_BMI2);

why are "avx512f" and "bmi2" hard-coded here?
I thought this function "dpdk_get_cpu_has_isa" allows you to check any
cpu feature.

Regards,
William


> +
> +VLOG_WARN("Unknown CPU arch,feature: %s,%s. Returning not supported.\n",
> +  arch, feature);
> +return 0;
> +}
> +
>  void
>  dpdk_status(const struct ovsrec_open_vswitch *cfg)
>  {
> diff --git a/lib/dpdk.h b/lib/dpdk.h
> index 736a64279..818dfcbba 100644
> --- a/lib/dpdk.h
> +++ b/lib/dpdk.h
> @@ -44,4 +44,6 @@ bool dpdk_per_port_memory(void);
>  bool dpdk_available(void);
>  void print_dpdk_version(void);
>  void dpdk_status(const struct ovsrec_open_vswitch *);
> +int dpdk_get_cpu_has_isa(const char * arch, const char *feature);
> +
>  #endif /* dpdk.h */
> --
> 2.17.1
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 6/7] dpif-lookup: add avx512 gather implementation

2020-06-16 Thread William Tu
On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
 wrote:
>
> This commit adds an AVX-512 dpcls lookup implementation.
> It uses the AVX-512 SIMD ISA to perform multiple miniflow
> operations in parallel.
>
> To run this implementation, the "avx512f" and "bmi2" ISAs are
> required. These ISA checks are performed at runtime while
> probing the subtable implementation. If a CPU does not provide
> both "avx512f" and "bmi2", then this code does not execute.
>
> The avx512 code is built as a seperate static library, with added
> CFLAGS to enable the required ISA features. By building only this
> static library with avx512 enabled, it is ensured that the main OVS
> core library is *not* using avx512, and that OVS continues to run
> as before on CPUs that do not support avx512.
>
> The approach taken in this implementation is to use the
> gather instruction to access the packet miniflow, allowing
> any miniflow blocks to be loaded into an AVX-512 register.
> This maximises the usefulness of the register, and hence this
> implementation handles any subtable with up to miniflow 8 bits.
>
> Note that specialization of these avx512 lookup routines
> still provides performance value, as the hashing of the
> resulting data is performed in scalar code, and compile-time
> loop unrolling occurs when specialized to miniflow bits.
>
> Signed-off-by: Harry van Haaren 
>
> ---
>
> v3:
> - Improve function name for _any subtable lookup
> - Use "" include not <> for immintrin.h
> - Add checks for SSE42 instructions in core OVS for CRC32 based hashing
>   If not available, disable AVX512 lookup implementation as it requires
>   uses CRC32 for hashing, and the hashing algorithm must match core OVS.
>   Issue a #warning when building x86_64 without SSE42 for core OVS.

Where did you add this warning?

> - Rework ovs_asserts() into function selection time check
> - Add #define for magic number 8, number of u64 blocks in AVX512 register
> - Add #if CHECKER around AVX code, sparse doesn't like checking it
> - Remove #warning if SSE42 isn't available. There is now no message if
>   the AVX512 routines are not being compiled into the binary, however
>   the "subtable-lookup-get" command will not return it in the list.
>
> hvh: comment #warning for crc32 sse42 isa
>
> Signed-off-by: Harry van Haaren 
>
> hvh: avx512 add #if __CHECKER__
>
> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk|  16 ++
>  lib/dpif-netdev-lookup-avx512-gather.c | 265 +
>  lib/dpif-netdev-lookup.c   |  15 ++
>  lib/dpif-netdev-lookup.h   |   7 +
>  lib/dpif-netdev.c  |   4 +
>  5 files changed, 307 insertions(+)
>  create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c
>
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 19e454c4b..d8a05b384 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -8,13 +8,16 @@
>  # libopenvswitch.la is the library to link against for binaries like 
> vswitchd.
>  # The code itself is built as two seperate static libraries;
>  # - core: Core files, always compiled with distro provided CFLAGS
> +# - lookupavx512: ISA optimized routines that require CPUID checks at runtime
>  lib_LTLIBRARIES += lib/libopenvswitch.la
>  lib_LTLIBRARIES += lib/libopenvswitchcore.la
> +lib_LTLIBRARIES += lib/libopenvswitchlookupavx512.la
>
>  # Dummy library to link against doesn't have any sources, but does
>  # depend on libopenvswitchcore static library
>  lib_libopenvswitch_la_SOURCES =
>  lib_libopenvswitch_la_LIBADD = lib/libopenvswitchcore.la
> +lib_libopenvswitch_la_LIBADD += lib/libopenvswitchlookupavx512.la
>
>  # Dummy library continues to depend on external libraries as before
>  lib_libopenvswitch_la_LIBADD += $(SSL_LIBS)
> @@ -31,6 +34,19 @@ lib_libopenvswitch_la_LDFLAGS = \
>  $(lib_libopenvswitchcore_la_LIBS) \
>  $(AM_LDFLAGS)
>
> +
> +# Build lookupavx512 library with extra CFLAGS enabled. This allows the
> +# compiler to use the ISA features required for the ISA optimized code-paths.
> +lib_libopenvswitchlookupavx512_la_CFLAGS = \
> +   -mavx512f \
> +   -mavx512bw \
> +   -mavx512dq \
> +   -mbmi2 \
> +   $(AM_CFLAGS)
> +lib_libopenvswitchlookupavx512_la_SOURCES = \
> +   lib/dpif-netdev-lookup-avx512-gather.c
> +
the robot is showing error
gcc: error: unrecognized command line option '-mavx512f'
looks like the older version of gcc doesn't have the option.
I don't know a better way to check gcc flags support, maybe add
a check at acinclude.m4, the _OVS_CHECK_CC_OPTION?


William
> +
>  # Build core vswitch libraries as before
>  lib_libopenvswitchcore_la_SOURCES = \
> lib/aes128.c \
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> new file mode 100644
> index 0..754cd0e3c
> --- /dev/null
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -0,0 +1,265 @@
> +/*
> + * Copyright (c) 2020, Intel Corperation.
> + *
> + *

Re: [ovs-dev] [PATCH v3 5/7] lib/automake: split build multiple static library

2020-06-16 Thread William Tu
On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
 wrote:
>
> This commit changes the way the core lib/* code is built.
> Before this commit, the lib/libopenvswitch_la target contains
> all the code, and is directly linked against by executable targets
> like ovs-vswitchd, ovsdb, tests etc.
>
> This commit splits the building of the code and the linking to
> that code into two seperate static libraries, providing more
> flexibility in building of each individual static library.
>
> A new library lib/libopenvswitchcore_la represents the lib/*
> code. The previous library lib/libopenvswitch_la remains intact,
> and is used by executable targets to link against. The core
> library is listed as a dependency for the linked against library.
>
> This approach requires no changes for executable targets, and
> provides the required flexibility for future ISA optimized static
> libraries to be built individually, and later combined into a
> single static library.
>
> Signed-off-by: Harry van Haaren 
> ---
I tested it and it looks good to me.

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v3 7/7] docs/dpdk/bridge: add datapath performance section

2020-06-16 Thread William Tu
On Wed, Jun 10, 2020 at 3:47 AM Harry van Haaren
 wrote:
>
> This commit adds a section to the dpdk/bridge.rst netdev documentation,
> detailing the added DPCLS functionality. The newely added commands are
typo: newly

> documented, and sample output is provided.
>
> Signed-off-by: Harry van Haaren 
> ---
>  Documentation/topics/dpdk/bridge.rst | 63 
>  1 file changed, 63 insertions(+)
>
> diff --git a/Documentation/topics/dpdk/bridge.rst 
> b/Documentation/topics/dpdk/bridge.rst
> index f0ef42ecc..2ada76571 100644
> --- a/Documentation/topics/dpdk/bridge.rst
> +++ b/Documentation/topics/dpdk/bridge.rst
> @@ -137,3 +137,66 @@ currently turned off by default.
>  To turn on SMC::
>
>  $ ovs-vsctl --no-wait set Open_vSwitch . other_config:smc-enable=true
> +
> +Datapath Classifier Performance
> +---
> +
> +The datapath classifier (dpcls) performs wildcard rule matching, a compute
> +intensive process of matching a packet ``miniflow`` to a rule ``miniflow``. 
> The
> +code that does this compute work impacts datapath performance, and optimizing
> +it can provide higher switching performance.
> +
> +Modern CPUs provide extensive SIMD instructions which can be used to get 
> higher
> +performance. The CPU OVS is being deployed on must be capable of running 
> these
> +SIMD instructions in order to take advantage of the performance benefits.
> +In OVS v2.14 runtime CPU detection was introduced to enable identifing if 
> these
typo: identifying

Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpif-netdev: Add miniflow bits to dump-flows.

2020-06-08 Thread William Tu
On Mon, Jun 8, 2020 at 7:01 AM Ilya Maximets  wrote:
>
> On 5/14/20 4:11 PM, William Tu wrote:
> > The 'dpctl/dump-flows -m' only shows the number of 1-bit in the
> > miniflow map, the patch outputs additional miniflow bits after it.
> > The format will be
> >   dp-extra-info:miniflow_bits(count_1bit(unit0):unit0,
> >   count_1bit(unit1):unit1)
> > Example:
> >   dp-extra-info:miniflow_bits(4:0x30c0,1:0x400)
> >
> > By searching the unique miniflow bits, we know the number of subtables,
> > and for earch subtables, the fields it matches on.
>
> Hi.
>
> Beside the curiosity what is the purpose of printing this information?
> How can it be used?
>
So from the bitmap we can know which field in the 'struct flow' this
subtable is matching on. And collecting all the bitmaps from dpctl/dump-flow,
we can know which fields are used to match more frequently than others.

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovs] doc: Fix a typo in ovs-actions man page.

2020-06-01 Thread William Tu
On Fri, May 29, 2020 at 2:39 AM  wrote:
>
> From: Numan Siddique 
>
> Signed-off-by: Numan Siddique 
> ---
Acked-by: William Tu 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-29 Thread William Tu
On Fri, May 29, 2020 at 4:47 AM Van Haaren, Harry
 wrote:
>
> > -Original Message-
> > From: William Tu 
> > Sent: Friday, May 29, 2020 2:19 AM
> > To: Van Haaren, Harry 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> > implementation
> >
> > On Wed, May 27, 2020 at 12:21:43PM +, Van Haaren, Harry wrote:
> 
> > > As a result, hashing identical data in different .c files produces a 
> > > different hash
> > values.
> > >
> > > From OVS docs 
> > > (http://docs.openvswitch.org/en/latest/intro/install/general/)
> > the following
> > > enables native ISA for your build, or else just enable SSE4.2 and 
> > > popcount:
> > > ./configure CFLAGS="-g -O2 -march=native"
> > > ./configure CFLAGS="-g -O2 -march=nehalem"
> >
> > Hi Harry,
> > Thanks for the info!
> > I can make it work now, with
> > ./configure CFLAGS="-g -O2 -msse4.2 -march=native"
>
> OK - that's good - the root cause of the bug/hash-mismatch is confirmed!
>
>
> > using similar setup
> > ovs-ofctl add-flow br0 'actions=drop'
> > ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
> > ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk \
> >   options:dpdk-
> > devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1
> >
> > The performance seems a little worse (9.7Mpps -> 8.7Mpps).
> > I wonder whether it's due to running it in VM (however I don't
> > have physical machine).
>
> Performance degradations are not expected, let me try understand
> the below performance data posted, and work through it.
>
> Agree that isolating the hardware and being able to verify
> environment would help in removing potential noise.. but
> let us work with the setup you have. Do you know what CPU
> it is you're running on?

Thanks! I think it's skylake
root@instance-3:~/ovs# lscpu
Architecture:x86_64
CPU op-mode(s):  32-bit, 64-bit
Byte Order:  Little Endian
CPU(s):  4
On-line CPU(s) list: 0-3
Thread(s) per core:  2
Core(s) per socket:  2
Socket(s):   1
NUMA node(s):1
Vendor ID:   GenuineIntel
CPU family:  6
Model:   85
Model name:  Intel(R) Xeon(R) CPU @ 2.00GHz
Stepping:3
CPU MHz: 2000.176
BogoMIPS:4000.35
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:   32K
L1i cache:   32K
L2 cache:1024K
L3 cache:39424K
NUMA node0 CPU(s):   0-3
Flags:   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx
pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc
cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2
x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm
3dnowprefetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase
tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f
avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl
xsaveopt xsavec xgetbv1 xsaves arat md_clear arch_capabilities

lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 03)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)

>
> It seems you have EMC enabled (as per OVS defaults). The stats posted show
> an approx 10:1 ratio on hits in EMC and DPCLS. This likely adds noise to the
> measurements - as only 10% of the packets hit the changes in DPCLS.
>
> Also in the perf top profile dp_netdev_input__ takes more cycles than
> miniflow_extract, and the memcmp() is present, indicating EMC is consuming
> CPU cycles to perform its duties.
>
> I guess our simple test case is failing to show what we're trying to measure,
> as you know a EMC likes low flow counts, all explaining why DPCLS is
> only ~2% of CPU time.
>
> 
> Removed details of CPU profiles & PMD stats for AVX512 and Generic DPCLS
> removed to trim conversation. Very helpful to see into your system, and I'm
> a big fan of perf top and friends - so this was useful to see, thanks!
> (Future readers, check the mailing list "thread" view for previous post's 
> details).
>
>
> > Is there any thing I should double check?
>
> Would you mind re-testing with EMC disabled? Likely DPCLS will show up as a
> much larger % in the CPU profile, and this might provide some new insights.
>
OK, with EMC disabled, the performance gap is a little better.
Now we don't see memcmp.

===

Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-28 Thread William Tu
On Wed, May 27, 2020 at 12:21:43PM +, Van Haaren, Harry wrote:
> > -Original Message-
> > From: dev  On Behalf Of Van Haaren, Harry
> > Sent: Tuesday, May 26, 2020 3:52 PM
> > To: William Tu 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> > implementation
> 
> 
> 
> > > Why ukey is related here? Does you avx512 patch make any change to ukey?
> > 
> > No AVX512 doesn't make any ukey changes - but issues in the hashing of the
> > miniflow data blocks cause ukeys to be installed in different locations than
> > where they are looked up - hence "ukey install fail" == "issue in miniflow 
> > iteration" in
> > this context.
> 
> The ukey install fails are due to a mismatch in compile flags (with/without 
> SSE 4.2),
> combined with the fact that the hashing in OVS changes its implementation 
> depending
> on the availability of the SSE 4.2  ISA (and other defines for other 
> architectures).
> 
> The mismatch comes from upcall code being compiled without SSE4.2 (so using 
> mhash hash code)
> while the AVX512 lookup hash routines have SSE4.2 enabled (so using CRC32 
> hash code).
> As a result, hashing identical data in different .c files produces a 
> different hash values.
> 
> From OVS docs (http://docs.openvswitch.org/en/latest/intro/install/general/) 
> the following
> enables native ISA for your build, or else just enable SSE4.2 and popcount:
> ./configure CFLAGS="-g -O2 -march=native"
> ./configure CFLAGS="-g -O2 -march=nehalem"

Hi Harry,
Thanks for the info!
I can make it work now, with 
./configure CFLAGS="-g -O2 -msse4.2 -march=native"

using similar setup
ovs-ofctl add-flow br0 'actions=drop'
ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk \
  options:dpdk-devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1

The performance seems a little worse (9.7Mpps -> 8.7Mpps).
I wonder whether it's due to running it in VM (however I don't
have physical machine).

=== Enable AVX512 ===
Drop rate: 8.7Mpps
2020-05-29T01:03:15.740Z|00049|dpif_netdev_lookup|INFO|Subtable function 
'avx512_gather' set priority to 5
  21.93%  pmd-c00/id:10  ovs-vswitchd[.] dp_netdev_input__
  19.38%  pmd-c00/id:10  ovs-vswitchd[.] miniflow_extract
  19.08%  pmd-c00/id:10  ovs-vswitchd[.] eth_pcap_rx_infinite
  10.24%  pmd-c00/id:10  ovs-vswitchd[.] miniflow_hash_5tuple
   9.63%  pmd-c00/id:10  libc-2.27.so[.] __memcmp_avx2_movbe
   8.46%  pmd-c00/id:10  ovs-vswitchd[.] free_dpdk_buf
   1.83%  pmd-c00/id:10  ovs-vswitchd[.] dpcls_avx512_gather_skx_mf_4_1
   1.65%  pmd-c00/id:10  ovs-vswitchd[.] odp_execute_actions
   1.17%  pmd-c00/id:10  ovs-vswitchd[.] fast_path_processing
   1.12%  pmd-c00/id:10  ovs-vswitchd[.] netdev_dpdk_rxq_recv
   0.99%  pmd-c00/id:10  ovs-vswitchd[.] pmd_perf_end_iteration
   0.87%  pmd-c00/id:10  ovs-vswitchd[.] dp_netdev_process_rxq_port
   0.51%  pmd-c00/id:10  ovs-vswitchd[.] cmap_find_batch

root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 0:
  packets received: 167704800
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 152452853
  smc hits: 0
  megaflow hits: 15251600
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 346 
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 38399744430 (100.00%)
  avg cycles per packet: 228.97 (38399744430/167704800)
  avg processing cycles per packet: 228.97 (38399744430/167704800)

=== Generic lookup ===
Drop rate: 9.7Mpps
2020-05-29T01:07:05.781Z|00049|dpif_netdev_lookup|INFO|Subtable function 
'generic' set priority to 5

pmd thread numa_id 0 core_id 1:
  packets received: 332413344
  packet recirculations: 0
  avg. datapath passes per packet: 1.00
  emc hits: 302178098
  smc hits: 0
  megaflow hits: 30234893
  avg. subtable lookups per megaflow hit: 1.00
  miss with success upcall: 1
  miss with failed upcall: 320 
  avg. packets per output batch: 0.00
  idle cycles: 0 (0.00%)
  processing cycles: 68605925782 (100.00%)
  avg cycles per packet: 206.39 (68605925782/332413344)
  avg processing cycles per packet: 206.39 (68605925782/332413344)

  22.04%  pmd-c01/id:10  ovs-vswitchd[.] dp_netdev_input__
  19.87%  pmd-c01/id:10  ovs-vswitchd[.] miniflow_extract
  18.24%  pmd-c01/id:10  ovs-vswitchd[.] eth_pcap_rx_infinite
   9.84%  pmd-c01/id:10  libc-2.27.so[.] __memcmp_avx2_movbe
   9.76%  pmd-c01/id:10  ovs-vswitchd[.] miniflow_hash_5tuple

Re: [ovs-dev] [PATCH] meta-flow: Document that constituents of conjunctive flows may overlap.

2020-05-28 Thread William Tu
On Wed, May 27, 2020 at 12:24:31PM -0700, Ben Pfaff wrote:
> Suggested-by: Antonin Bas 
> Signed-off-by: Ben Pfaff 
> ---
>  lib/meta-flow.xml | 2 ++
>  manpages.mk   | 3 ---
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/meta-flow.xml b/lib/meta-flow.xml
> index d4495552b3c2..1546758744b4 100644
> --- a/lib/meta-flow.xml
> +++ b/lib/meta-flow.xml
> @@ -1240,6 +1240,8 @@ tcp,tp_src=0x07c0/0xfff0
>  priority, that is, any given packet must be able to match at most one
>  conjunctive flow at a given priority.  Overlapping conjunctive flows
>  yield unpredictable results.
> +(The flows that constitute a conjunctive flow may overlap with those
> +that constitute the same or another conjunctive flow.)
>
>
>  Following a conjunctive flow match, the search for the flow with
LGTM above.
Acked-by: William Tu 

But the blow is unrelated.

> diff --git a/manpages.mk b/manpages.mk
> index dc201484c637..54a3a82ad963 100644
> --- a/manpages.mk
> +++ b/manpages.mk
> @@ -104,7 +104,6 @@ utilities/bugtool/ovs-bugtool.8: \
>  utilities/bugtool/ovs-bugtool.8.in:
>  lib/ovs.tmac:
>  
> -
>  utilities/ovs-dpctl-top.8: \
>   utilities/ovs-dpctl-top.8.in \
>   lib/ovs.tmac
> @@ -155,8 +154,6 @@ lib/common-syn.man:
>  lib/common.man:
>  lib/ovs.tmac:
>  
> -lib/ovs.tmac:
> -
>  utilities/ovs-testcontroller.8: \
>   utilities/ovs-testcontroller.8.in \
>   lib/common.man \
> -- 
> 2.25.3
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] compat: Backport ipv6_stub change

2020-05-26 Thread William Tu
On Mon, May 25, 2020 at 10:50 AM Ilya Maximets  wrote:
>
> On 5/25/20 7:01 PM, Ilya Maximets wrote:
> > On 5/24/20 7:59 PM, William Tu wrote:
> >> On Thu, May 21, 2020 at 02:54:03PM -0700, Greg Rose wrote:
> >>> A patch backported to the Linux stable 4.14 tree and present in the
> >>> latest stable 4.14.181 kernel breaks ipv6_stub usage.
> >>>
> >>> The commit is
> >>> 8ab8786f78c3 ("net ipv6_stub: use ip6_dst_lookup_flow instead of 
> >>> ip6_dst_lookup").
> >>>
> >>> Create the compat layer define to check for it and fixup usage in vxlan
> >>> and geneve modules.
> >>>
> >>> Passes Travis here:
> >>> https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/689798733
> >>>
> >>> Signed-off-by: Greg Rose 
> >> Thanks for fixing the travis failure.
> >> Applied to master.
> >> William
> >
> > We need to backport this to 2.13 to avoid TravisCI failure on this branch.
>
> And 2.12.
>
Done! Thank you.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] compat: Backport ipv6_stub change

2020-05-24 Thread William Tu
On Thu, May 21, 2020 at 02:54:03PM -0700, Greg Rose wrote:
> A patch backported to the Linux stable 4.14 tree and present in the
> latest stable 4.14.181 kernel breaks ipv6_stub usage.
> 
> The commit is
> 8ab8786f78c3 ("net ipv6_stub: use ip6_dst_lookup_flow instead of 
> ip6_dst_lookup").
> 
> Create the compat layer define to check for it and fixup usage in vxlan
> and geneve modules.
> 
> Passes Travis here:
> https://travis-ci.org/github/gvrose8192/ovs-experimental/builds/689798733
> 
> Signed-off-by: Greg Rose 
Thanks for fixing the travis failure.
Applied to master.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-21 Thread William Tu
On Thu, May 21, 2020 at 6:04 AM Van Haaren, Harry
 wrote:
>
> > -Original Message-
> > From: William Tu 
> > Sent: Wednesday, May 20, 2020 4:15 PM
> > To: Van Haaren, Harry 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> > implementation
>
> 
>
> > > 2020-05-20T14:15:20.184Z|00378|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> > > sub func, 4 1
> > > 2020-05-20T14:15:21.219Z|00379|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> > > sub func, 4 1
> > >
> > btw, looking at
> > ovs-appctl coverage/show, this counter is very high when enabling the avx512
> >   handler_duplicate_upcall 459645.4/sec 434475.500/sec
> > 17300.5372/sec   total: 64120526
>
> This counter seems to post some garbage to me if I run it before any traffic?
> Tested using OVS Master @ 48b1c7642 (not including any AVX512 patches):
>
> ./utilities/ovs-appctl coverage/show | grep duplicate_upcall
> 21:handler_duplicate_upcall   0.0/sec 0.000/sec0./sec   
> total: 10272710751479363764
>
> # re-runs show different numbers - indicates a garbage-initialized counter 
> perhaps?
> 21:handler_duplicate_upcall   0.0/sec 0.000/sec0./sec   
> total: 1049338714623956653
> 21:handler_duplicate_upcall   0.0/sec 0.000/sec0./sec   
> total: 18343161283719775679
>

using the same pcap traffic (p0.pcap) on current master, I did not see
the above issue:
datapath_drop_upcall_error  57.4/sec 4.783/sec0.0797/sec
total: 287
drop_action_of_pipeline  5909696.2/sec 492474.683/sec
8207.9114/sec   total: 52399553

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-21 Thread William Tu
> > ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3
> > ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > ovs-ofctl add-flow br0 'actions=drop'
> > ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
> > ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk \
> >   options:dpdk-
> > devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1
>
> I use Ether/VLAN/IPv4 to achieve a subtable with (4,1), is that the same as 
> you?
> Just trying to remove variables between our setups.
>
btw I have only one OpenFlow rule, 'actions=drop'
The pcap file as input is a random one I just capture in my machine's interface
root@instance-3:~/ovs# tcpdump -n -r p0.pcap  | wc -l
reading from file p0.pcap, link-type EN10MB (Ethernet)
22
root@instance-3:~/ovs# tcpdump -n -r p0.pcap
reading from file p0.pcap, link-type EN10MB (Ethernet)
22:30:10.471943 IP 10.182.0.2.22 > 76.21.95.192.62190: Flags [P.], seq
3532581039:3532581163, ack 2971134033, win 501, options [nop,nop,TS
val 521819346 ecr 304440082], length 124
22:30:10.499759 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [.], ack
124, win 4092, options [nop,nop,TS val 304440141 ecr 521819346],
length 0
22:30:13.242821 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [P.], seq
1:37, ack 124, win 4096, options [nop,nop,TS val 304442869 ecr
521819346], length 36
22:30:13.243113 IP 10.182.0.2.22 > 76.21.95.192.62190: Flags [P.], seq
124:160, ack 37, win 501, options [nop,nop,TS val 521822117 ecr
304442869], length 36
22:30:13.271718 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [.], ack
160, win 4094, options [nop,nop,TS val 304442900 ecr 521822117],
length 0
22:30:13.415212 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [P.], seq
37:73, ack 160, win 4096, options [nop,nop,TS val 304443043 ecr
521822117], length 36
22:30:13.415479 IP 10.182.0.2.22 > 76.21.95.192.62190: Flags [P.], seq
160:196, ack 73, win 501, options [nop,nop,TS val 521822289 ecr
304443043], length 36
22:30:13.442371 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [.], ack
196, win 4094, options [nop,nop,TS val 304443069 ecr 521822289],
length 0
22:30:13.577866 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [P.], seq
73:109, ack 196, win 4096, options [nop,nop,TS val 304443208 ecr
521822289], length 36
22:30:13.578123 IP 10.182.0.2.22 > 76.21.95.192.62190: Flags [P.], seq
196:232, ack 109, win 501, options [nop,nop,TS val 521822452 ecr
304443208], length 36
22:30:13.608249 IP 76.21.95.192.62190 > 10.182.0.2.22: Flags [.], ack
232, win 4094, options [nop,nop,TS val 304443230 ecr 521822452],
length 0
22:30:16.932478 IP 169.254.169.254.80 > 10.182.0.2.51642: Flags [P.],
seq 1150154089:1150154672, ack 1477571123, win 65535, length 583:
HTTP: HTTP/1.1 200 OK
22:30:16.932540 IP 10.182.0.2.51642 > 169.254.169.254.80: Flags [.],
ack 583, win 64737, length 0
22:30:16.932547 IP 169.254.169.254.80 > 10.182.0.2.51642: Flags [F.],
seq 583, ack 1, win 65535, length 0
22:30:16.933193 IP 10.182.0.2.51642 > 169.254.169.254.80: Flags [F.],
seq 1, ack 584, win 64736, length 0
22:30:16.933280 IP 169.254.169.254.80 > 10.182.0.2.51642: Flags [.],
ack 2, win 65535, length 0
22:30:16.936976 IP 10.182.0.2.51650 > 169.254.169.254.80: Flags [S],
seq 1944213115, win 65320, options [mss 1420,sackOK,TS val 2204263930
ecr 0,nop,wscale 7], length 0
22:30:16.937201 IP 169.254.169.254.80 > 10.182.0.2.51650: Flags [S.],
seq 4118061879, ack 1944213116, win 65535, options [mss 1420,eol],
length 0
22:30:16.937234 IP 10.182.0.2.51650 > 169.254.169.254.80: Flags [.],
ack 1, win 65320, length 0
22:30:16.937297 IP 10.182.0.2.51650 > 169.254.169.254.80: Flags [P.],
seq 1:287, ack 1, win 65320, length 286: HTTP: GET
/computeMetadata/v1/instance/network-interfaces/?alt=json&last_etag=7c556bc02e6331f4&recursive=True&timeout_sec=72&wait_for_change=True
HTTP/1.1
22:30:16.937374 IP 169.254.169.254.80 > 10.182.0.2.51650: Flags [.],
ack 287, win 65249, length 0
22:30:16.937428 IP 169.254.169.254.80 > 10.182.0.2.51650: Flags [.],
ack 287, win 65535, length 0

I also attached the pcap file.
https://drive.google.com/file/d/1CR5pMebrtjzShF9bpXJcr9GAQY_6Og44/view?usp=sharing

> > LOG:
> > 2020-05-20T13:49:26.542Z|00047|dpdk|INFO|DPDK Enabled - initialized
> > 2020-05-20T13:49:26.544Z|00048|connmgr|INFO|br0<->unix#2: 1 flow_mods
> > in the last 0 s (1 adds)
> > 2020-05-20T13:49:26.547Z|00049|dpif_netdev_lookup|INFO|Subtable
> > function 'avx512_gather' set priority to 5
> > 2020-05-20T13:49:26.553Z|00050|netdev_dpdk|INFO|Device
> > 'vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1' attached to
> > DPDK
> > 2020-05-20T13:49:26.553Z|00051|dpif_netdev|INFO|PMD thread on numa_id:
> > 0, core id:  0 created.
> > 2020-05-20T13:49:26.554Z|00052|dpif_netdev|INFO|PMD thread on numa_id:
> > 0, core id:  1 created.
> > 2020-05-20T13:49:26.554Z|00053|dpif_netdev|INFO|There are 2 pmd
> > threads on numa node 0
> > 2020-05-20T13:49:26.554Z|00054|dpdk|INFO|Device with port_id=0 al

Re: [ovs-dev] [RFC v2 PATCH 0/4] XDP offload using flow API provider

2020-05-20 Thread William Tu
On Sat, May 16, 2020 at 7:43 AM Toshiaki Makita
 wrote:
>
> Hi William,
>
> On 2020/05/08 0:08, Toshiaki Makita wrote:
> > On 2020/05/05 23:29, William Tu wrote:
> >> On Tue, Apr 21, 2020 at 11:47:00PM +0900, Toshiaki Makita wrote:
> >>> This patch adds an XDP-based flow cache using the OVS netdev-offload
> >>> flow API provider.  When an OVS device with XDP offload enabled,
> >>> packets first are processed in the XDP flow cache (with parse, and
> >>> table lookup implemented in eBPF) and if hits, the action processing
> >>> are also done in the context of XDP, which has the minimum overhead.
> >>>
> >>> This provider is based on top of William's recently posted patch for
> >>> custom XDP load.  When a custom XDP is loaded, the provider detects if
> >>> the program supports classifier, and if supported it starts offloading
> >>> flows to the XDP program.
> >>>
> >>> The patches are derived from xdp_flow[1], which is a mechanism similar to
> >>> this but implemented in kernel.
> >>>
> >>>
> >>> * Motivation
> >>>
> >>> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> >>> performance, there are use cases where packets better to be processed in
> >>> kernel, for example, TCP/IP connections, or container to container
> >>> connections.  Current solution is to use tap device or af_packet with
> >>> extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> >>> is to steer packets earlier in the XDP program, and decides to send to
> >>> userspace datapath or stay in kernel.
> >>>
> >>> One problem with current netdev-afxdp is that it forwards all packets to
> >>> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> >>> program.) only provides the interface to load XDP program, howerver users
> >>> usually don't know how to write their own XDP program.
> >>>
> >>> XDP also supports HW-offload so it may be possible to offload flows to
> >>> HW through this provider in the future, although not currently.
> >>> The reason is that map-in-map is required for our program to support
> >>> classifier with subtables in XDP, but map-in-map is not offloadable.
> >>> If map-in-map becomes offloadable, HW-offload of our program will also
> >>> be doable.
> >>>
> >>>
> >>> * How to use
> >>>
> >>> 1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> >>>
> >>> 2. make with --enable-afxdp --enable-bpf
> >>> --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
> >>> the BPF object will not be installed anywhere by "make install" at this 
> >>> point.
> >>>
> >>> 3. Load custom XDP program
> >>> E.g.
> >>> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 
> >>> options:xdp-mode=native \
> >>>options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> >>> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 
> >>> options:xdp-mode=native \
> >>>options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> >>>
> >>> 4. Enable XDP_REDIRECT
> >>> If you use veth devices, make sure to load some (possibly dummy) programs
> >>> on the peers of veth devices.
> >>
> >> Hi Toshiaki,
> >>
> >> What kind of dummy program to put at the other side of veth?
> >
> > A program which just returns XDP_PASS should be sufficient.
> > e.g.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/bpf/progs/xdp_dummy.c
> >
> >
> >> I'm trying to create a end-to-end test using veth, similar to
> >> the ping test in tests/system-traffic.at
> >>
> >> At the other side of veth, I use
> >> $/bpf-next/samples/bpf# ./xdp_rxq_info -d p0 -S -a XDP_PASS
> >>
> >> but somehow around 90% of the icmp packets are dropped, I'm still
> >> debugging the reason.
> >
> > I'm going to ping test based off of current master in a couple of days.
>
> Sorry for the delay.
> I can successfully ping between veth devices with the current master.
>
> veth0---veth1---ovs---veth2---veth3(in netns)
>
> ping between veth0 and veth3 succeeded without packet loss and with 
> debug_stats[2]
> counted.
>
> Do you still have the problem?
>
Thanks, I will test it again tomorrow and get back to you!
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovs-bugtool: Add -m option to dump-flows.

2020-05-20 Thread William Tu
On Tue, May 19, 2020 at 09:33:05AM -0700, Gregory Rose wrote:
> 
> On 5/14/2020 7:02 AM, William Tu wrote:
> >This patch adds 'ovs-appctl dpctl/dump-flows -m' to bugtool,
> >the output will include wildcarded fields and the miniflow bits,
> >such as 'dp-extra-info:miniflow_bits(4,1)'.
> >
> >Cc: Emma Finn 
> >Signed-off-by: William Tu 
> >---
> >  utilities/bugtool/plugins/network-status/openvswitch.xml | 1 +
> >  1 file changed, 1 insertion(+)
> >
> >diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml 
> >b/utilities/bugtool/plugins/network-status/openvswitch.xml
> >index e6fa4fd15fff..56e091feb45f 100644
> >--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
> >+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
> >@@ -32,6 +32,7 @@
> >   > filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-appctl-dpif
> >  ovs-appctl -t 
> > ovsdb-server ovsdb-server/list-dbs
> >   > repeat="2">ovs-appctl dpctl/dump-flows netdev@ovs-netdev
> >+ >repeat="2">ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev
> >   > repeat="2">ovs-appctl dpctl/dump-flows system@ovs-system
> >   > repeat="2">ovs-appctl dpctl/show -s
> >   > filters="ovs">/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
> >  "show"
> >
> 
> Seems fine to me.
> 
> Acked-by: Greg Rose 
Applied to master, thanks.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv2] ovsdb-idl: Fix NULL deref reported by Coverity.

2020-05-20 Thread William Tu
On Mon, May 18, 2020 at 02:14:36PM -0700, Yifeng Sun wrote:
> Thanks William.
> 
> Reviewed-by: Yifeng Sun 
> 
Thanks, applied to master.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Documentation: Fix kernel support matrix

2020-05-20 Thread William Tu
On Tue, May 19, 2020 at 04:00:07PM -0700, Han Zhou wrote:
> On Tue, May 19, 2020 at 3:01 PM Greg Rose  wrote:
> >
> > The documentation matrix for OVS branches and which kernels they support
> > is out of date.  Update it to show that since 2.10 the lowest kernel
> > that we test and support is Linux 3.16.
> >
> > RHEL and CentOS kernels based upon the original 3.10 kernel are still
> > supported.
> >
> > Reported-by: Han Zhou 
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2020-May/370742.html
> > Signed-off-by: Greg Rose 
> > ---
Applied to master, thanks
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-20 Thread William Tu


> >0x561040e9 <+313>:   kmovb  %eax,%k4
> >0x561040ed <+317>:   vpsrlq $0x4,%zmm0,%zmm2
> >0x561040f4 <+324>:   vpandd %zmm3,%zmm0,%zmm0
> >0x561040fa <+330>:   vpandd %zmm2,%zmm3,%zmm2
> >0x56104100 <+336>:   vpshufb %zmm0,%zmm4,%zmm0
> >0x56104106 <+342>:   vpshufb %zmm2,%zmm4,%zmm2
> >0x5610410c <+348>:   vpaddb %zmm2,%zmm0,%zmm0
> >0x56104112 <+354>:   vpsadbw %zmm7,%zmm0,%zmm0
> >0x56104118 <+360>:   vpaddq %zmm1,%zmm0,%zmm0
> >0x5610411e <+366>:   vmovdqa64 %zmm8,%zmm1
> >0x56104124 <+372>:   vpgatherqq 0x18(%r9,%zmm0,8),%zmm1{%k3}
> >0x5610412c <+380>:   vpandq %zmm6,%zmm1,%zmm0{%k4}{z}
> >
> > Would you try some of the above and see can it be reproduced?
>
> btw, I saw every second ovs is doing reprobing
> 2020-05-20T14:15:15.113Z|00373|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:16.129Z|00374|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:17.138Z|00375|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:18.150Z|00376|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:19.170Z|00377|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:20.184Z|00378|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
> 2020-05-20T14:15:21.219Z|00379|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
> sub func, 4 1
>
btw, looking at
ovs-appctl coverage/show, this counter is very high when enabling the avx512
  handler_duplicate_upcall 459645.4/sec 434475.500/sec
17300.5372/sec   total: 64120526

other counters look OK.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-20 Thread William Tu
Hi Harry,

Thanks for your feedback!

> > > > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
> > > > > This commit adds an AVX-512 dpcls lookup implementation.
> > > > > It uses the AVX-512 SIMD ISA to perform multiple miniflow
> > > > > operations in parallel.
>
> 
>
> > Hi Harry,
> >
> > I managed to find a machine with avx512 in google cloud and did some
> > performance testing. I saw lower performance when enabling avx512,
> > I believe I did something wrong. Do you mind having a look:
> >

> >
> > 3) start ovs and set avx and traffic gen
> >  ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
> >  ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk
> > options:dpdk-devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1
>
> The output of the first command (enabling the AVX512 lookup) posts some 
> output to Log INFO, please ensure its there?
>
> 2020-05-20T09:39:09Z|00262|dpif_netdev_lookup|INFO|Subtable function 
> 'avx512_gather' set priority to 4
> 2020-05-20T09:39:09Z|6|dpif_netdev(pmd-c15/id:99)|INFO|reprobing sub 
> func, 5 1
>
Yes, got these info log.
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x3
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-ofctl add-flow br0 'actions=drop'
ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk \
  options:dpdk-devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1

LOG:
2020-05-20T13:49:26.542Z|00047|dpdk|INFO|DPDK Enabled - initialized
2020-05-20T13:49:26.544Z|00048|connmgr|INFO|br0<->unix#2: 1 flow_mods
in the last 0 s (1 adds)
2020-05-20T13:49:26.547Z|00049|dpif_netdev_lookup|INFO|Subtable
function 'avx512_gather' set priority to 5
2020-05-20T13:49:26.553Z|00050|netdev_dpdk|INFO|Device
'vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1' attached to
DPDK
2020-05-20T13:49:26.553Z|00051|dpif_netdev|INFO|PMD thread on numa_id:
0, core id:  0 created.
2020-05-20T13:49:26.554Z|00052|dpif_netdev|INFO|PMD thread on numa_id:
0, core id:  1 created.
2020-05-20T13:49:26.554Z|00053|dpif_netdev|INFO|There are 2 pmd
threads on numa node 0
2020-05-20T13:49:26.554Z|00054|dpdk|INFO|Device with port_id=0 already stopped
2020-05-20T13:49:26.648Z|00055|netdev_dpdk|WARN|Rx checksum offload is
not supported on port 0
2020-05-20T13:49:26.648Z|00056|netdev_dpdk|WARN|Interface tg0 does not
support MTU configuration, max packet size supported is 1500.
2020-05-20T13:49:26.648Z|00057|netdev_dpdk|INFO|Port 0: 02:70:63:61:70:00
2020-05-20T13:49:26.648Z|00058|dpif_netdev|INFO|Core 0 on numa node 0
assigned port 'tg0' rx queue 0 (measured processing cycles 0).
2020-05-20T13:49:26.648Z|00059|bridge|INFO|bridge br0: added interface
tg0 on port 1
2020-05-20T13:49:26.648Z|1|ofproto_dpif_upcall(pmd-c00/id:9)|WARN|upcall_cb
failure: ukey installation fails
2020-05-20T13:49:27.562Z|2|dpif_netdev(pmd-c00/id:9)|INFO|reprobing
sub func, 4 1

>
> > 4) dp flows with miniflow info

> It seems the "packets:0, bytes:0,used:never" tags indicate that there is no 
> traffic hitting these rules at all?
> Output here (with traffic running for a while) shows:
> packets:621588543, bytes:37295312580, used:0.000s, dp:ovs, actions:dpdk1, 
> dp-extra-info:miniflow_bits(4,1)
>
Thanks, this is the hit rules:
root@instance-3:~/ovs# ovs-appctl dpctl/dump-flows -m | grep -v never
flow-dump from pmd on cpu core: 0
ufid:f06998a0-9ff8-4ee5-b12f-5d7e2fcc7f0f,
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(tg0),packet_type(ns=0,id=0),eth(src=42:01:0a:b6:00:01/00:00:00:00:00:00,dst=42:01:0a:b6:00:02/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=169.254.169.254/0.0.0.0,dst=10.182.0.2/0.0.0.0,proto=6/0,tos=0/0,ttl=64/0,frag=no),tcp(src=80/0,dst=51642/0),tcp_flags(0/0),
packets:3942096, bytes:255152, used:0.001s, flags:P., dp:ovs,
actions:drop, dp-extra-info:miniflow_bits(4,1)
ufid:cb3a6eac-3a7d-4e0d-a145-414dd482b4b9,
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(tg0),packet_type(ns=0,id=0),eth(src=42:01:0a:b6:00:01/00:00:00:00:00:00,dst=42:01:0a:b6:00:02/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=169.254.169.254/0.0.0.0,dst=10.182.0.2/0.0.0.0,proto=6/0,tos=0/0,ttl=64/0,frag=no),tcp(src=80/0,dst=51650/0),tcp_flags(0/0),
packets:2779552, bytes:172332224, used:0.000s, flags:S., dp:ovs,
actions:drop, dp-extra-info:miniflow_bits(4,1)
ufid:781f3f48-ffd7-424f-ae99-62158ba05cbd,
skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(0/0),recirc_id(0),dp_hash(0/0),in_port(tg0),packet_type(ns=0,id=0),eth(src=42:01:0a:b6:00:02/00:00:00:00:00:00,dst=42:01:0a:b6:00:01/00:00:00:00:00:00),eth_type(0x0800),ipv4(src=10.182.0.2/0.0.0.0,dst=169.254.169.254/0.0.0.0,proto=6/0,tos=0/0,ttl=64/0,frag=no),tcp(src=51650/0,dst=80/0),tcp_flags(0/0),
packets:637373, 

Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-20 Thread William Tu
On Wed, May 20, 2020 at 3:35 AM Federico Iezzi  wrote:
>
>
>
>
>
> On Wed, 20 May 2020 at 12:20, Van Haaren, Harry  
> wrote:
>>
>> > -----Original Message-
>> > From: William Tu 
>> > Sent: Wednesday, May 20, 2020 1:12 AM
>> > To: Van Haaren, Harry 
>> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
>> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
>> > implementation
>> >
>> > On Mon, May 18, 2020 at 9:12 AM Van Haaren, Harry
>> >  wrote:
>> > >
>> > > > -Original Message-
>> > > > From: William Tu 
>> > > > Sent: Monday, May 18, 2020 3:58 PM
>> > > > To: Van Haaren, Harry 
>> > > > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
>> > > > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
>> > > > implementation
>> > > >
>> > > > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
>> > > > > This commit adds an AVX-512 dpcls lookup implementation.
>> > > > > It uses the AVX-512 SIMD ISA to perform multiple miniflow
>> > > > > operations in parallel.
>>
>> 
>>
>> > Hi Harry,
>> >
>> > I managed to find a machine with avx512 in google cloud and did some
>> > performance testing. I saw lower performance when enabling avx512,
>
>
> AVX512 instruction path lowers the clock speed well below the base frequency 
> [1].
> Aren't you killing the PMD performance while improving the lookup ones?
>
> [1] 
> https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/2nd-gen-xeon-scalable-spec-update.pdf
>  (see page 20)
>

Hi Federico,

Thanks for sharing the link.
Does that mean if OVS PMD uses avx512 on one core, then all the other cores's
frequency will be lower?

There are some discussion here:
https://lemire.me/blog/2018/09/07/avx-512-when-and-how-to-use-these-new-instructions/
My take is that overall down clocking will happen, but application
will get better performance.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-19 Thread William Tu
On Mon, May 18, 2020 at 9:12 AM Van Haaren, Harry
 wrote:
>
> > -Original Message-
> > From: William Tu 
> > Sent: Monday, May 18, 2020 3:58 PM
> > To: Van Haaren, Harry 
> > Cc: ovs-dev@openvswitch.org; i.maxim...@ovn.org
> > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> > implementation
> >
> > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
> > > This commit adds an AVX-512 dpcls lookup implementation.
> > > It uses the AVX-512 SIMD ISA to perform multiple miniflow
> > > operations in parallel.
> > >
> > > To run this implementation, the "avx512f" and "bmi2" ISAs are
> > > required. These ISA checks are performed at runtime while
> > > probing the subtable implementation. If a CPU does not provide
> > > both "avx512f" and "bmi2", then this code does not execute.
> > >
> > > The avx512 code is built as a seperate static library, with added
> > > CFLAGS to enable the required ISA features. By building only this
> > > static library with avx512 enabled, it is ensured that the main OVS
> > > core library is *not* using avx512, and that OVS continues to run
> > > as before on CPUs that do not support avx512.
> > >
> > > The approach taken in this implementation is to use the
> > > gather instruction to access the packet miniflow, allowing
> > > any miniflow blocks to be loaded into an AVX-512 register.
> > > This maximises the usefulness of the register, and hence this
> > > implementation handles any subtable with up to miniflow 8 bits.
> > >
> > > Note that specialization of these avx512 lookup routines
> > > still provides performance value, as the hashing of the
> > > resulting data is performed in scalar code, and compile-time
> > > loop unrolling occurs when specialized to miniflow bits.
> > >
> >
> > Hi Harry,
> >
> > I haven't tried running the code due to my machine only
> > support avx2. There are some minor issues such as indentation.
> > But I read through it with example below and I think it's correct.
>
> Thanks for the review! I'll post replies inline for context.
>
> Note, the Software Development Emulator (SDE) tool enables emulation of 
> AVX512 ISA.
> Full details provided at the link below, using this would enable running 
> AVX512 DPCLS
> implementation itself, should you want to test it locally:
> https://software.intel.com/content/www/us/en/develop/articles/intel-software-development-emulator.html
>
>
> > Given that you have to do a lot of preparation (ex: popcount, creating
> > bit_masks, broadcast, ... etc) before using avx instructions, do you
> > have some performance number? I didn't see any from ovsconf 18 or 19.
> > Is using avx512 much better than avx2?
>
> Correct there is some "pre-work" to do before the miniflow manipulation 
> itself.
> Note that much of the more complex work (e.g. miniflow bitmask generation for 
> the subtable)
> is done at subtable instantiation time, instead of on the critical path. Also 
> the popcount
> lookup table is "static const", which will turn into a single AVX512 load at 
> runtime.
>
> AVX512 provides some very useful features, which are used throughout the code
> below. In particular, the AVX512 "k-mask" feature allows the developer to 
> switch-off
> a lane in the SIMD register (this is sometimes referred to as a predication 
> mask).
> Using these "k-masks" solves requiring more instructions later to "merge" 
> results
> back together (as SSE or AVX2 code would have to do).
> Example : "mask_set1_epi64" allows setting a specific value into the "lanes" 
> as
> given by the k-mask, and results in an AVX512 register with those contents.
>
> There are also new instructions in AVX512 which provide even more powerful 
> ISA, for example
> the "AVX512VPOPCNTDQ" CPUID provides a vectorized popcount which can be used 
> instead of
> the "_mm512_popcnt_epi64_manual()" helper function. Enabling of the AVX512 
> VPOPCNT instruction
> is planned in future patches to OVS. Details of the instruction are available 
> on the intrinsics guide:
> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm512_popcnt_epi64&expand=4368
>
> Finally, although the code can seem a bit verbose, most _mm512_xxx_yyy() 
> intrinsics result in a single
> instruction. This means that although the code looks "big", however the 
> resulting instruction stream often

Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather implementation

2020-05-18 Thread William Tu
On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
> This commit adds an AVX-512 dpcls lookup implementation.
> It uses the AVX-512 SIMD ISA to perform multiple miniflow
> operations in parallel.
> 
> To run this implementation, the "avx512f" and "bmi2" ISAs are
> required. These ISA checks are performed at runtime while
> probing the subtable implementation. If a CPU does not provide
> both "avx512f" and "bmi2", then this code does not execute.
> 
> The avx512 code is built as a seperate static library, with added
> CFLAGS to enable the required ISA features. By building only this
> static library with avx512 enabled, it is ensured that the main OVS
> core library is *not* using avx512, and that OVS continues to run
> as before on CPUs that do not support avx512.
> 
> The approach taken in this implementation is to use the
> gather instruction to access the packet miniflow, allowing
> any miniflow blocks to be loaded into an AVX-512 register.
> This maximises the usefulness of the register, and hence this
> implementation handles any subtable with up to miniflow 8 bits.
> 
> Note that specialization of these avx512 lookup routines
> still provides performance value, as the hashing of the
> resulting data is performed in scalar code, and compile-time
> loop unrolling occurs when specialized to miniflow bits.
> 

Hi Harry,

I haven't tried running the code due to my machine only
support avx2. There are some minor issues such as indentation.
But I read through it with example below and I think it's correct.

Given that you have to do a lot of preparation (ex: popcount, creating
bit_masks, broadcast, ... etc) before using avx instructions, do you
have some performance number? I didn't see any from ovsconf 18 or 19.
Is using avx512 much better than avx2?

> Signed-off-by: Harry van Haaren 
> ---
>  lib/automake.mk|  16 ++
>  lib/dpif-netdev-lookup-avx512-gather.c | 255 +
>  lib/dpif-netdev-lookup.c   |   7 +
>  lib/dpif-netdev-lookup.h   |   7 +
>  lib/dpif-netdev.c  |   4 +
>  5 files changed, 289 insertions(+)
>  create mode 100644 lib/dpif-netdev-lookup-avx512-gather.c
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 19e454c4b..d8a05b384 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -8,13 +8,16 @@
>  # libopenvswitch.la is the library to link against for binaries like 
> vswitchd.
>  # The code itself is built as two seperate static libraries;
>  # - core: Core files, always compiled with distro provided CFLAGS
> +# - lookupavx512: ISA optimized routines that require CPUID checks at runtime
>  lib_LTLIBRARIES += lib/libopenvswitch.la
>  lib_LTLIBRARIES += lib/libopenvswitchcore.la
> +lib_LTLIBRARIES += lib/libopenvswitchlookupavx512.la
>  
>  # Dummy library to link against doesn't have any sources, but does
>  # depend on libopenvswitchcore static library
>  lib_libopenvswitch_la_SOURCES =
>  lib_libopenvswitch_la_LIBADD = lib/libopenvswitchcore.la
> +lib_libopenvswitch_la_LIBADD += lib/libopenvswitchlookupavx512.la
>  
>  # Dummy library continues to depend on external libraries as before
>  lib_libopenvswitch_la_LIBADD += $(SSL_LIBS)
> @@ -31,6 +34,19 @@ lib_libopenvswitch_la_LDFLAGS = \
>  $(lib_libopenvswitchcore_la_LIBS) \
>  $(AM_LDFLAGS)
>  
> +
> +# Build lookupavx512 library with extra CFLAGS enabled. This allows the
> +# compiler to use the ISA features required for the ISA optimized code-paths.
> +lib_libopenvswitchlookupavx512_la_CFLAGS = \
> + -mavx512f \
> + -mavx512bw \
> + -mavx512dq \
> + -mbmi2 \
> + $(AM_CFLAGS)
> +lib_libopenvswitchlookupavx512_la_SOURCES = \
> + lib/dpif-netdev-lookup-avx512-gather.c
> +
> +
>  # Build core vswitch libraries as before
>  lib_libopenvswitchcore_la_SOURCES = \
>   lib/aes128.c \
> diff --git a/lib/dpif-netdev-lookup-avx512-gather.c 
> b/lib/dpif-netdev-lookup-avx512-gather.c
> new file mode 100644
> index 0..52348041b
> --- /dev/null
> +++ b/lib/dpif-netdev-lookup-avx512-gather.c
> @@ -0,0 +1,255 @@
> +/*
> + * Copyright (c) 2020, Intel Corperation.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifdef __x86_64__
> +
> +#include 
> +
> +#include "dpif-netdev.h"
> +#include "dpif-netdev-lookup.h"
> +#include "dpif-netdev-private.h"
> +#include "cmap.h"
> +#include "flow.h"
> +#include "pvector.h"
> +#include "openvswitch/vl

Re: [ovs-dev] [PATCH v2 0/5] DPCLS Subtable ISA Optimization

2020-05-18 Thread William Tu
On Mon, May 18, 2020 at 4:34 AM Van Haaren, Harry
 wrote:
>
> > -Original Message-
> > From: William Tu 
> > Sent: Saturday, May 16, 2020 5:01 AM
> > To: Van Haaren, Harry 
> > Cc: ovs-dev ; Ilya Maximets 
> > Subject: Re: [ovs-dev] [PATCH v2 0/5] DPCLS Subtable ISA Optimization
> >
> > Hi Harry,
>
> Hey William,
>
> > Thanks for the patch, I learn a lot from them.
>
> Cool, yeah it's been fun for me learning about the OVS datapath at this level.
>
> > On Wed, May 6, 2020 at 6:05 AM Harry van Haaren
> >  wrote:
> > >
> > > This patchset implements the changes as proposed during the
> > > OVS Conf '19, in the talk "Next steps for SW Datapath".
> > > Youtube link: https://youtu.be/x0bOpojnpmU
> 
> > > Patch 5/5:
> > > Actual AVX-512 implementation for DPCLS subtable search. This is the
> > > actual SIMD vector code, which performs DPCLS miniflow iteration in
> > > parallel.
> > >
> > From your previous slides and patch5, I roughly understand the avx code 
> > logic.
>
> Any questions feel free to ask! The SIMD design & implementation can be 
> difficult
> to understand, I'd be happy to help if you're curious about specific aspects.
>
> > I'm also thinking about a very rough idea.
> > I wonder if it is possible to use avx scatter function to implement 
> > miniflow_expand.
>
> Is miniflow expand a significant amount of cycles in your use-case? I know 
> it's used to decompress
> a miniflow as required for OF updates etc, but on the datapath it shouldn't 
> matter? If there's a
> benchmark to run that shows mf expand to be a hotspot that would be very 
> interesting!
>
> You're right that AVX scatter could be used to perform the writes from a 
> single AVX register.
>
> > And for lookup a subtable, we can expand to the origin "struct flow" memory
> > layouts for both packets and subtable->mf.
> > So each field for each packet is at a fixed offset from the mf values.
> > This wastes some memory due to expand but makes rule match keys easier?
>
> My concern here is that "miniflow" has this very nice attribute that it is 
> compressed, and
> hence requires fewer cache lines than the full "struct flow". Particularly, 
> the miniflow
> is contiguous, meaning utilization of the cache lines is 100%. Typical 
> miniflow sizes for
> outer packets are ~6 or so miniflow blocks, so ~6*8bytes (uint64_t) + 2 bytes 
> for "bits".
> That means simple packets are resident in a single cache-line, and many 
> tunneled packets
> can be represented by 2 cache-lines.
>
> Matching on "struct flow" would imply a sparsely populated region of 672 
> bytes, and depending
> on the exact contents being matched on, could be anywhere from 2-X cache 
> lines? Generally
> compute is more performant than memory-accesses that aren't cache local, I'm 
> not sure is really
> going to give performance benefits in the bigger picture.
>
Hi Harry,
Thanks for your explanation! And yes, the cache line miss overhead is definitely
more important. Now I understood the design.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Support for GTP-U

2020-05-18 Thread William Tu
On Mon, May 18, 2020 at 4:48 AM Rohit kamble  wrote:
>
> Hi All,
>
> Do we have  GTP-U(GPRS Tunneling Protocol) support in OVS ?
> Current I am using lastest OVS version(v-2.12.0), I am not able to see
> GTP-U support.
> Is there any plan to support GTP in a future releases ?
>
Hi Rohit,

Yes, however it's not on 2.12.
You have to use the latest code on master.
http://docs.openvswitch.org/en/latest/faq/releases/
and only for userspace datapath.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2 0/5] DPCLS Subtable ISA Optimization

2020-05-15 Thread William Tu
Hi Harry,

Thanks for the patch, I learn a lot from them.

On Wed, May 6, 2020 at 6:05 AM Harry van Haaren
 wrote:
>
> This patchset implements the changes as proposed during the
> OVS Conf '19, in the talk "Next steps for SW Datapath".
> Youtube link: https://youtu.be/x0bOpojnpmU
>
> The talk raises 3 main requirements for CPU ISA Optimizations,
> each of which is addressed in some of the patches below.
> - Test & Validation (video @ 2:20)
> - Usabiliity & Debug (video @ 6:00)
> - Package & Deploy (video @ 8:45)
>
> Patch 1/5:
> The test and validation requirements proposed above are implemented,
> with the refactor of the subtable function pointer registration,
> and the autovalidator implementation is added.
>
> Patch 2/5:
> Adds the commands for usability & debug.
>
> Patch 3/5:
> Enable CPU ISA detection at runtime, providing information for future
> ISA optimized functions. v1 for CPU ISA:
> https://patchwork.ozlabs.org/series/160427/mbox/
>
> Patch 4/5:
> Build system changes to enable the Package & Deploy requirements,
> allowing a single OVS binary to run on all CPUs, but also gain best
> performance from CPU specific ISA optimizations.
>
> Patch 5/5:
> Actual AVX-512 implementation for DPCLS subtable search. This is the
> actual SIMD vector code, which performs DPCLS miniflow iteration in
> parallel.
>
>From your previous slides and patch5, I roughly understand the avx code logic.

I'm also thinking about a very rough idea.
I wonder if it is possible to use avx scatter function to implement
miniflow_expand.
And for lookup a subtable, we can expand to the origin "struct flow" memory
layouts for both packets and subtable->mf.
So each field for each packet is at a fixed offset from the mf values.
This wastes some memory due to expand but makes rule match keys easier?

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv2] ovsdb-idl: Fix NULL deref reported by Coverity.

2020-05-15 Thread William Tu
When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.

An example:
ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
struct ovsdb_datum datum;
union ovsdb_atom key;

datum.n = 1;
datum.keys = &key;

key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
// which dereferences null datum.values.
ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
   datum->values
ovsdb_datum_destroy

Signed-off-by: William Tu 
---
v2:
   - I applied previous patch e398275024e815b52with yifeng's comments,
 but accidently remove this chunk.  With this fix, the Coverity
 shows around 133 defects. (now it's around 300)
---
 lib/ovsdb-idl.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 1535ad7b5197..6614ea1617ef 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -4449,7 +4449,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
  * transaction only does writes of existing values, without making any real
  * changes, we will drop the whole transaction later in
  * ovsdb_idl_txn_commit().) */
-if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
+if (datum->keys && datum->values &&
+write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
  datum, &column->type)) {
 goto discard_datum;
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] metaflow: Fix maskable conntrack orig tuple fields

2020-05-14 Thread William Tu
On Thu, May 14, 2020 at 10:21:23AM -0700, Yi-Hung Wei wrote:
> On Thu, May 14, 2020 at 8:37 AM William Tu  wrote:
> >
> > On Wed, May 13, 2020 at 01:11:17PM -0700, Yi-Hung Wei wrote:
> > > From man ovs-fields(7), the conntrack origin tuple fields
> > > ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
> > > to be bitwise maskable, but they are not.  This patch enables
> > > those fields to be maskable, and adds a regression test.
> > >
> > > Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
> > > Reported-by: Wenying Dong 
> > > Signed-off-by: Yi-Hung Wei 
> > > ---
> > > Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/686707703
> > > ---
> >
> > Thanks for fixing it and adding tests! Applied to master.
> > William
> 
> 
> Thanks William for review.  Can we backport it to older branches (as
> far as we can cleanly apply) ?
> 

I applied to 2.10, 2.11, 2.12, and 2.13.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 2/2] oss-fuzz: Fix miniflow_target.c.

2020-05-14 Thread William Tu
On Tue, May 12, 2020 at 04:31:58PM -0700, Yifeng Sun wrote:
> LGTM, thanks.
> 
> Reviewed-by: Yifeng Sun 
> 
> 
Applied, thanks.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] metaflow: Fix maskable conntrack orig tuple fields

2020-05-14 Thread William Tu
On Wed, May 13, 2020 at 01:11:17PM -0700, Yi-Hung Wei wrote:
> From man ovs-fields(7), the conntrack origin tuple fields
> ct_nw_src/dst, ct_ipv6_src/dst, and ct_tp_src/dst are supposed
> to be bitwise maskable, but they are not.  This patch enables
> those fields to be maskable, and adds a regression test.
> 
> Fixes: daf4d3c18da4 ("odp: Support conntrack orig tuple key.")
> Reported-by: Wenying Dong 
> Signed-off-by: Yi-Hung Wei 
> ---
> Travis CI: https://travis-ci.org/github/YiHungWei/ovs/builds/686707703
> ---

Thanks for fixing it and adding tests! Applied to master.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH ovs v2 0/4] expand the meter table and fix bug

2020-05-14 Thread William Tu
On Wed, May 13, 2020 at 09:31:31PM +0800, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> The patch set expand or shrink the meter table when necessary.
> and other patches fix bug or improve codes.
> 
> Tonghao Zhang (4):
>   dpif-netdev: Expand the meters supported number
>   dpif-netdev: Add burst size to buckets
>   dpif-netdev: Use the u64 instead of u32 for buckets
>   Revert "dpif-netdev: includes microsecond delta in meter bucket
> calculation"
> 
>  include/openvswitch/ofp-meter.h |   2 +-
>  lib/dpif-netdev.c   | 336 
>  lib/ofp-meter.c |   4 +-
>  3 files changed, 257 insertions(+), 85 deletions(-)
> 
Thanks!
The series looks good to me. See if others have more comments.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ofproto: report coverage on hitting datapath flow limit

2020-05-14 Thread William Tu
On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote:
> Whenever the number of flows in the datapath crosses above
> the flow limit set/autoconfigured, it is helpful to report
> this event through coverage counter for an operator/devops
> engineer to know and take proactive corrections in the
> switch configuration.
> 
> Today, these events are reported in ovs vswitch log when
> a new flow can not be inserted in upcall processing in which
> case ovs writes a warning, otherwise an auto correction
> made by ovs to flush old flows without any intimation at all.
> 
> Signed-off-by: Gowrishankar Muthukrishnan 
> ---

Thanks, the patch looks good to me.

I thought logging to ovs-vswitchd.log is good enough, because
that's usually the first file we look, then if necessary we check
the coverage log. Just curious, do you have some case where you
keep seeing the flow_limit_hit frequently?

Acked-by: William Tu 

>  ofproto/ofproto-dpif-upcall.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 5e08ef10d..a76532ec7 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(upcall_ukey_contention);
>  COVERAGE_DEFINE(upcall_ukey_replace);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(upcall_flow_limit_hit);
>  
>  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
>   * and possibly sets up a kernel flow as a cache. */
> @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct upcall 
> *upcall)
>  
>  atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
>  if (udpif_get_n_flows(udpif) >= flow_limit) {
> +COVERAGE_INC(upcall_flow_limit_hit);
>  VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
>  return false;
>  }
> @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator)
>   *   datapath flows, so we will recover before all the flows are
>   *   gone.) */
>  n_dp_flows = udpif_get_n_flows(udpif);
> +if (n_dp_flows >= flow_limit) {
> +COVERAGE_INC(upcall_flow_limit_hit);
> +}
> +
>  kill_them_all = n_dp_flows > flow_limit * 2;
>  max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
>  
> -- 
> 2.21.1
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv9 1/2] userspace: Enable TSO support for non-DPDK.

2020-05-14 Thread William Tu
On Thu, May 14, 2020 at 01:42:21PM +0200, Ilya Maximets wrote:
> On 3/24/20 11:10 PM, William Tu wrote:
> > This patch enables TSO support for non-DPDK use cases, and
> > also add check-system-tso testsuite. Before TSO, we have to
> > disable checksum offload, allowing the kernel to calculate the
> > TCP/UDP packet checsum. With TSO, we can skip the checksum
> > validation by enabling checksum offload, and with large packet
> > size, we see better performance.
> > 
> > Consider container to container use cases:
> >   iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
> > And I got around 6Gbps, similar to TSO with DPDK-enabled.
> > 
> > Signed-off-by: William Tu 
> > Acked-by: Flavio Leitner 
> > 
> 
> This version looks good except the '(DPDK doesn't support)' comment.
> It's misleading.  DPDK supports partial offloading.
> 
> And a couple of minor style/grammar suggestions:
> ---
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 4c127e759..0430cca8e 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -47,6 +47,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
>  };
>  
>  #define DP_PACKET_CONTEXT_SIZE 64
> +
>  #ifdef DPDK_NETDEV
>  #define DEF_OL_FLAG(NAME, DPDK_DEF, GENERIC_DEF) NAME = DPDK_DEF
>  #else
> @@ -58,7 +59,7 @@ enum dp_packet_offload_mask {
>  /* Value 0 is not used. */
>  /* Is the 'rss_hash' valid? */
>  DEF_OL_FLAG(DP_PACKET_OL_RSS_HASH, PKT_RX_RSS_HASH, 0x1),
> -/* Is the 'flow_mark' valid? (DPDK does not support) */
> +/* Is the 'flow_mark' valid? */
>  DEF_OL_FLAG(DP_PACKET_OL_FLOW_MARK, PKT_RX_FDIR_ID, 0x2),
>  /* Bad L4 checksum in the packet. */
>  DEF_OL_FLAG(DP_PACKET_OL_RX_L4_CKSUM_BAD, PKT_RX_L4_CKSUM_BAD, 0x4),
> @@ -80,20 +81,20 @@ enum dp_packet_offload_mask {
>  DEF_OL_FLAG(DP_PACKET_OL_TX_UDP_CKSUM, PKT_TX_UDP_CKSUM, 0x400),
>  /* Offload SCTP checksum. */
>  DEF_OL_FLAG(DP_PACKET_OL_TX_SCTP_CKSUM, PKT_TX_SCTP_CKSUM, 0x800),
> -/* Adding new field requres adding to DP_PACKET_OL_SUPPORTED_MASK */
> +/* Adding new field requires adding to DP_PACKET_OL_SUPPORTED_MASK. */
>  };
>  
> -#define DP_PACKET_OL_SUPPORTED_MASK (DP_PACKET_OL_RSS_HASH  |   \
> - DP_PACKET_OL_FLOW_MARK |   \
> - DP_PACKET_OL_RX_L4_CKSUM_BAD | \
> - DP_PACKET_OL_RX_IP_CKSUM_BAD | \
> - DP_PACKET_OL_RX_L4_CKSUM_GOOD| \
> - DP_PACKET_OL_RX_IP_CKSUM_GOOD| \
> - DP_PACKET_OL_TX_TCP_SEG |  \
> - DP_PACKET_OL_TX_IPV4 | \
> - DP_PACKET_OL_TX_IPV6 | \
> - DP_PACKET_OL_TX_TCP_CKSUM |\
> - DP_PACKET_OL_TX_UDP_CKSUM |\
> +#define DP_PACKET_OL_SUPPORTED_MASK (DP_PACKET_OL_RSS_HASH | \
> + DP_PACKET_OL_FLOW_MARK| \
> + DP_PACKET_OL_RX_L4_CKSUM_BAD  | \
> + DP_PACKET_OL_RX_IP_CKSUM_BAD  | \
> + DP_PACKET_OL_RX_L4_CKSUM_GOOD | \
> + DP_PACKET_OL_RX_IP_CKSUM_GOOD | \
> + DP_PACKET_OL_TX_TCP_SEG   | \
> + DP_PACKET_OL_TX_IPV4  | \
> + DP_PACKET_OL_TX_IPV6  | \
> + DP_PACKET_OL_TX_TCP_CKSUM | \
> + DP_PACKET_OL_TX_UDP_CKSUM | \
>   DP_PACKET_OL_TX_SCTP_CKSUM)
>  
>  #define DP_PACKET_OL_TX_L4_MASK (DP_PACKET_OL_TX_TCP_CKSUM | \
> @@ -888,7 +889,7 @@ dp_packet_batch_reset_cutlen(struct dp_packet_batch 
> *batch)
>  }
>  
>  /* Returns the RSS hash of the packet 'p'.  Note that the returned value is
> - * correct only if 'dp_packet_rss_valid(p)' returns true */
> + * correct only if 'dp_packet_rss_valid(p)' returns 'true'. */
>  static inline uint32_t
>  dp_packet_get_rss_hash(const struct dp_packet *p)
>  {
> ---
> 
> 
> With above diff applied for the series:
> Acked-by: Ilya Maximets 

Applied to master, thanks!
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpif-netdev: Add miniflow bits to dump-flows.

2020-05-14 Thread William Tu
The 'dpctl/dump-flows -m' only shows the number of 1-bit in the
miniflow map, the patch outputs additional miniflow bits after it.
The format will be
  dp-extra-info:miniflow_bits(count_1bit(unit0):unit0,
  count_1bit(unit1):unit1)
Example:
  dp-extra-info:miniflow_bits(4:0x30c0,1:0x400)

By searching the unique miniflow bits, we know the number of subtables,
and for earch subtables, the fields it matches on.

Cc: Emma Finn 
Cc: Ian Stokes 
Signed-off-by: William Tu 
---
 lib/dpif-netdev.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 51c888501bdf..b618b07be0c8 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3352,8 +3352,9 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
 if (unit) {
 ds_put_char(&extra_info, ',');
 }
-ds_put_format(&extra_info, "%d",
-  count_1bits(flow->cr.mask->mf.map.bits[unit]));
+ds_put_format(&extra_info, "%d:0x%llx",
+  count_1bits(flow->cr.mask->mf.map.bits[unit]),
+  flow->cr.mask->mf.map.bits[unit]);
 }
 ds_put_char(&extra_info, ')');
 flow->dp_extra_info = ds_steal_cstr(&extra_info);
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] ovs-bugtool: Add -m option to dump-flows.

2020-05-14 Thread William Tu
This patch adds 'ovs-appctl dpctl/dump-flows -m' to bugtool,
the output will include wildcarded fields and the miniflow bits,
such as 'dp-extra-info:miniflow_bits(4,1)'.

Cc: Emma Finn 
Signed-off-by: William Tu 
---
 utilities/bugtool/plugins/network-status/openvswitch.xml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml 
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index e6fa4fd15fff..56e091feb45f 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -32,6 +32,7 @@
 /usr/share/openvswitch/scripts/ovs-bugtool-ovs-appctl-dpif
 ovs-appctl -t 
ovsdb-server ovsdb-server/list-dbs
 ovs-appctl dpctl/dump-flows netdev@ovs-netdev
+ovs-appctl dpctl/dump-flows -m netdev@ovs-netdev
 ovs-appctl dpctl/dump-flows system@ovs-system
 ovs-appctl dpctl/show -s
 /usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
 "show"
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

2020-05-13 Thread William Tu
On Mon, May 11, 2020 at 7:53 AM Toshiaki Makita
 wrote:
>
> On 2020/05/09 0:05, William Tu wrote:
> > On Thu, May 7, 2020 at 7:56 AM Toshiaki Makita
> >  wrote:
> >>
> >> On 2020/05/06 0:18, William Tu wrote:
> >>> On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> >>>> On 2020/05/04 1:22, William Tu wrote:
> >>>>> On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>>>>> This adds a reference program, flowtable_afxdp.o, which can be used to
> >>>>>> offload flows to XDP through netdev-offload-xdp.
> >>>>>> The program will be compiled when using --enable-bpf switch.
> >>>>>>
> >>>>>> Signed-off-by: Toshiaki Makita 
> >>>>>
> >>>>> Hi Toshiaki,
> >>>>>
> >>>>> Thanks for your patch, I haven't tried to run it but I have
> >>>>> questions about miniflow.
> >>>>>
> >>>>>> ---
> >>>> ...
> >>>>>> +SEC("xdp") int
> >>>>>> +flowtable_afxdp(struct xdp_md *ctx)
> >>>>>> +{
> >>>>>> +struct xdp_miniflow *pkt_mf;
> >>>>>> +struct xdp_subtable_mask *subtable_mask;
> >>>>>> +int *head;
> >>>>>> +struct xdp_flow_actions *xdp_actions = NULL;
> >>>>>> +struct nlattr *a;
> >>>>>> +unsigned int left;
> >>>>>> +int cnt, idx, zero = 0;
> >>>>>> +
> >>>>>> +account_debug(0);
> >>>>>> +
> >>>>>> +head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>>>>> +if (!head) {
> >>>>>> +return XDP_ABORTED;
> >>>>>> +}
> >>>>>> +if (*head == XDP_SUBTABLES_TAIL) {
> >>>>>> +/* Offload not enabled */
> >>>>>> +goto upcall;
> >>>>>> +}
> >>>>>> +
> >>>>>> +/* Get temporary storage for storing packet miniflow */
> >>>>>> +pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>>>>> +if (!pkt_mf) {
> >>>>>> +return XDP_ABORTED;
> >>>>>> +}
> >>>>>> +
> >>>>>
> >>>>> I start to wonder what's the benefit of using miniflow in XDP?
> >>>>> miniflow tries to compress the large flow key into smaller memory,
> >>>>> and with a flowmap indicating which offset in bits are valid.
> >>>>> And when adding a new subtable at flow_put, the subtable's key
> >>>>> size can be smaller with only the needed fields.
> >>>>>
> >>>>> But in the case of XDP/eBPF, we have to statically allocated fixed
> >>>>> key size (the subtbl_template has key as struct xdp_flow_key), so
> >>>>> each subtable is always having key with full key fields. (not saving
> >>>>> memory space) Moreover with miniflow, it makes the 'mask_key' function
> >>>>> below pretty complicated.
> >>>>
> >>>> Fixed sized subtable is restriction of map-in-map.
> >>>> The benefit to use miniflow I envisioned initially was
> >>>> - compress the key
> >>>> - use existing function to convert flows for XDP
> >>>>
> >>>> The first one is actually not doable due to map-in-map. I hope someday 
> >>>> the
> >>>> restriction gets loosen...
> >>>
> >>> On my second thought this might be a good idea.
> >>>
> >>> One problem I hit in my previous prototype is that the flow key is too 
> >>> big.
> >>> ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to 
> >>> bpf
> >>> some BPF limitation. If we continue adding more fields to struct xdp_flow,
> >>> ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory 
> >>> usage
> >>> in key is smaller. The key size in subtbl_template can be a value smaller 
> >>> the
> >>> the size of struct xdp_flow.
> >>
> >> Nice idea!
> >> So you mean we can have less sized subtbl_template key because when adding 
> >> more and
> >&g

Re: [ovs-dev] [PATCH 1/2] oss-fuzz: Fix fuzzer flags in CFLAGS.

2020-05-13 Thread William Tu
On Tue, May 12, 2020 at 8:44 AM William Tu  wrote:
>
> Thanks for taking a look.
>
> On Tue, May 12, 2020 at 8:41 AM Ilya Maximets  wrote:
> >
> > On 5/12/20 5:22 PM, William Tu wrote:
> > > When running fuzzer locally by doing
> > >  $ ./configure CC=clang CFLAGS="-g -O2 -fsanitize=fuzzer-no-link -Werror"
> >
> > Hmm... Why passing 'fuzzer-no-link' to replace it inside the make script?
>
> the configure fails i`f passing as "-fsanitize=fuzzer"
> configure: error: C compiler cannot create executables
> See `config.log' for more details
>
> configure:4079: checking whether the C compiler works
> configure:4101: clang -g -O2 -fsanitize=fuzzer -Werror   conftest.c  >&5
> /tmp/conftest-67bd49.o: In function `main':
> /root/ovs/conftest.c:14: multiple definition of `main'
> /usr/lib/llvm-8/lib/clang/8.0.0/lib/linux/libclang_rt.fuzzer-x86_64.a(FuzzerMain.cpp.o):FuzzerMain.cpp:(.text.main+0x0):
> first defined here
> /usr/lib/llvm-8/lib/clang/8.0.0/lib/linux/libclang_rt.fuzzer-x86_64.a(FuzzerMain.cpp.o):
> In function `main':
> FuzzerMain.cpp:(.text.main+0x12): undefined reference to
> `LLVMFuzzerTestOneInput'
> clang: error: linker command failed with exit code 1 (use -v to see 
> invocation)
> configure:4105: $? = 1
> configure:4143: result: no
> configure: failed program was:
>
> >
> > >  $ make oss-fuzz-targets
> >
> > AFAIK, oss-fuzz-targets are for Google's oss-fuzz project and not for
> > local usage.
>
> We're thinking about adding more tests under tests/oss-fuzz/
> so need to make it work locally first.
>
Some context:
People told me that using fuzzer can detect integer wraparound/overflow
bug, something like this meter bucket issue.
https://mail.openvswitch.org/pipermail/ovs-dev/2020-May/370291.html

So I start to look at how clang fuzzer works in OVS. It requires us to pick
the fuzzer target function, and in this meter bucket case, it's pretty hard to
do it.

Regards,
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv9 1/2] userspace: Enable TSO support for non-DPDK.

2020-05-12 Thread William Tu
On Mon, Apr 27, 2020 at 8:54 AM William Tu  wrote:
>
> On Tue, Mar 24, 2020 at 3:11 PM William Tu  wrote:
> >
> > This patch enables TSO support for non-DPDK use cases, and
> > also add check-system-tso testsuite. Before TSO, we have to
> > disable checksum offload, allowing the kernel to calculate the
> > TCP/UDP packet checsum. With TSO, we can skip the checksum
> > validation by enabling checksum offload, and with large packet
> > size, we see better performance.
> >
> > Consider container to container use cases:
> >   iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
> > And I got around 6Gbps, similar to TSO with DPDK-enabled.
> >
> > Signed-off-by: William Tu 
> > Acked-by: Flavio Leitner 
> >
> > ---
> > v9:
> >   - make naming of flags more clear
> >   - I couldn't think of any smart MACRO
> >   - travis: 
> > https://travis-ci.org/github/williamtu/ovs-travis/builds/666513254

Hi Ilya,
I'm thinking about applying this patches since we have a couple of
non-dpdk use cases. Do you have more comments I should work on?
Thanks

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 1/2] oss-fuzz: Fix fuzzer flags in CFLAGS.

2020-05-12 Thread William Tu
Thanks for taking a look.

On Tue, May 12, 2020 at 8:41 AM Ilya Maximets  wrote:
>
> On 5/12/20 5:22 PM, William Tu wrote:
> > When running fuzzer locally by doing
> >  $ ./configure CC=clang CFLAGS="-g -O2 -fsanitize=fuzzer-no-link -Werror"
>
> Hmm... Why passing 'fuzzer-no-link' to replace it inside the make script?

the configure fails i`f passing as "-fsanitize=fuzzer"
configure: error: C compiler cannot create executables
See `config.log' for more details

configure:4079: checking whether the C compiler works
configure:4101: clang -g -O2 -fsanitize=fuzzer -Werror   conftest.c  >&5
/tmp/conftest-67bd49.o: In function `main':
/root/ovs/conftest.c:14: multiple definition of `main'
/usr/lib/llvm-8/lib/clang/8.0.0/lib/linux/libclang_rt.fuzzer-x86_64.a(FuzzerMain.cpp.o):FuzzerMain.cpp:(.text.main+0x0):
first defined here
/usr/lib/llvm-8/lib/clang/8.0.0/lib/linux/libclang_rt.fuzzer-x86_64.a(FuzzerMain.cpp.o):
In function `main':
FuzzerMain.cpp:(.text.main+0x12): undefined reference to
`LLVMFuzzerTestOneInput'
clang: error: linker command failed with exit code 1 (use -v to see invocation)
configure:4105: $? = 1
configure:4143: result: no
configure: failed program was:

>
> >  $ make oss-fuzz-targets
>
> AFAIK, oss-fuzz-targets are for Google's oss-fuzz project and not for
> local usage.

We're thinking about adding more tests under tests/oss-fuzz/
so need to make it work locally first.

William
>
> > fails due to "(.text+0x20): undefined reference to `main'"
> >
> > The patch fixes it by replacing "fuzzer-no-link" to "fuzzer" so the
> > binary under tests/oss-fuzz/ can be generated.
> >
> > Cc: Bhargava Shastry 
> > Cc: Yifeng Sun 
> > Signed-off-by: William Tu 
> > ---
> >  tests/oss-fuzz/automake.mk | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/tests/oss-fuzz/automake.mk b/tests/oss-fuzz/automake.mk
> > index 2b116e7a51a1..66a5c3037f5d 100644
> > --- a/tests/oss-fuzz/automake.mk
> > +++ b/tests/oss-fuzz/automake.mk
> > @@ -8,6 +8,9 @@ OSS_FUZZ_TARGETS = \
> >  EXTRA_PROGRAMS += $(OSS_FUZZ_TARGETS)
> >  oss-fuzz-targets: $(OSS_FUZZ_TARGETS)
> >
> > +CFLAGS_FUZZER:=$(CFLAGS:fuzzer-no-link=fuzzer)
> > +override CFLAGS=$(CFLAGS_FUZZER)
> > +
> >  tests_oss_fuzz_flow_extract_target_SOURCES = \
> >   tests/oss-fuzz/flow_extract_target.c \
> >   tests/oss-fuzz/fuzzer.h
> >
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH 3/3] ovsdb-idl: Fix NULL deref reported by Coverity.

2020-05-12 Thread William Tu
On Sat, May 9, 2020 at 11:02 AM Yifeng Sun  wrote:
>
> Thanks William, this patch looks good to me.
> Maybe code will be a little neater with the fixes below:
>
> @@ -1017,6 +1017,9 @@ static void
>  free_data(enum ovsdb_atomic_type type,
>union ovsdb_atom *atoms, size_t n_atoms)
>  {
> +if (!atoms) {
> +return;
> +}
>  if (ovsdb_atom_needs_destruction(type)) {
>
>
>
> Reviewed-by: Yifeng Sun 
>
Thank you.
I applied the series to master.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 2/2] oss-fuzz: Fix miniflow_target.c.

2020-05-12 Thread William Tu
Clang reports:
tests/oss-fuzz/miniflow_target.c:209:26: error: suggest braces around \
initialization of subobject
  [-Werror,-Wmissing-braces]
  struct flow flow2 = {0};

Fix it by using memset.

Cc: Bhargava Shastry 
Cc: Yifeng Sun 
Signed-off-by: William Tu 
---
 tests/oss-fuzz/miniflow_target.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/oss-fuzz/miniflow_target.c b/tests/oss-fuzz/miniflow_target.c
index 393443061e85..50b8b0e64237 100644
--- a/tests/oss-fuzz/miniflow_target.c
+++ b/tests/oss-fuzz/miniflow_target.c
@@ -206,8 +206,9 @@ test_minimask_combine(struct flow *flow)
 struct minimask minicombined;
 uint64_t storage[FLOW_U64S];
 } m;
-struct flow flow2 = {0};
+struct flow flow2;
 
+memset(&flow2, 0, sizeof flow2);
 mask.masks = *flow;
 minimask = minimask_create(&mask);
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/2] oss-fuzz: Fix fuzzer flags in CFLAGS.

2020-05-12 Thread William Tu
When running fuzzer locally by doing
 $ ./configure CC=clang CFLAGS="-g -O2 -fsanitize=fuzzer-no-link -Werror"
 $ make oss-fuzz-targets
fails due to "(.text+0x20): undefined reference to `main'"

The patch fixes it by replacing "fuzzer-no-link" to "fuzzer" so the
binary under tests/oss-fuzz/ can be generated.

Cc: Bhargava Shastry 
Cc: Yifeng Sun 
Signed-off-by: William Tu 
---
 tests/oss-fuzz/automake.mk | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/oss-fuzz/automake.mk b/tests/oss-fuzz/automake.mk
index 2b116e7a51a1..66a5c3037f5d 100644
--- a/tests/oss-fuzz/automake.mk
+++ b/tests/oss-fuzz/automake.mk
@@ -8,6 +8,9 @@ OSS_FUZZ_TARGETS = \
 EXTRA_PROGRAMS += $(OSS_FUZZ_TARGETS)
 oss-fuzz-targets: $(OSS_FUZZ_TARGETS)
 
+CFLAGS_FUZZER:=$(CFLAGS:fuzzer-no-link=fuzzer)
+override CFLAGS=$(CFLAGS_FUZZER)
+
 tests_oss_fuzz_flow_extract_target_SOURCES = \
tests/oss-fuzz/flow_extract_target.c \
tests/oss-fuzz/fuzzer.h
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 2/4] dpif-netdev: Add burst size to buckets

2020-05-12 Thread William Tu
On Sun, May 10, 2020 at 7:12 PM Tonghao Zhang  wrote:
>
> On Sat, May 9, 2020 at 11:26 PM William Tu  wrote:
> >
> > On Sat, May 09, 2020 at 09:54:10AM +0800, Tonghao Zhang wrote:
> > > On Sat, May 9, 2020 at 7:23 AM William Tu  wrote:
> > > >
> > > > On Thu, Apr 30, 2020 at 07:00:37PM +0800, xiangxia.m@gmail.com 
> > > > wrote:
> > > > > From: Tonghao Zhang 
> > > > >
> > > > > For now, the meter of the userspace datapath, don't include
> > > > > the bucket burst size to buckets. This patch includes it now.
> > > > >
> > > > > Cc: Ilya Maximets 
> > > > > Cc: William Tu 
> > > > > Cc: Jarno Rajahalme 
> > > > > Cc: Ben Pfaff 
> > > > > Cc: Andy Zhou 
> > > > > Signed-off-by: Tonghao Zhang 
> > > > > ---
> > > > >  lib/dpif-netdev.c | 7 +--
> > > > >  1 file changed, 1 insertion(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > > > index 17c0241aa2e2..59546db6a2a2 100644
> > > > > --- a/lib/dpif-netdev.c
> > > > > +++ b/lib/dpif-netdev.c
> > > > > @@ -6092,15 +6092,10 @@ dpif_netdev_meter_set(struct dpif *dpif, 
> > > > > ofproto_meter_id meter_id,
> > > > >  for (i = 0; i < config->n_bands; ++i) {
> > > > >  uint32_t band_max_delta_t;
> > > > >
> > > > > -/* Set burst size to a workable value if none specified. */
> > > > > -if (config->bands[i].burst_size == 0) {
> > > > > -config->bands[i].burst_size = config->bands[i].rate;
> > > > > -}
> > > > > -
> > > > >  meter->bands[i].up = config->bands[i];
> > > > >  /* Convert burst size to the bucket units: */
> > > > >  /* pkts => 1/1000 packets, kilobits => bits. */
> > > > > -meter->bands[i].up.burst_size *= 1000;
> > > > > +meter->bands[i].up.burst_size += config->bands[i].rate * 
> > > > > 1000ULL;
> > > >
> > > > I don't quite understand.
> > > > Isn't this remove the setting of burst_size and always use
> > > > 'config->bands[i].rate * 1000ULL;'?
> > > Hi William, thanks for you reviews,
> > > meter->bands[i].up.burst_size += config->bands[i].rate * 1000ULL;
> > > burst_size  will plus the config->bands[i].rate * 1000ULL  and then
> > > assigned to burst_size again.
> > > so if user don't set the burst_size, burst_size is 0, and only plus
> > > the config->bands[i].rate * 1000ULL.
> > > Before the patch, if user don't set the burst_sze, burst_size = 0, and
> > > will the rate *1000.
> > > Here, burst_size is different from kernel datapath. burst_size in
> > > netdev will be used as bucket. so buckets shoud be "burst_size" + rate
> > >
> > > > Ex: When user set
> > > > ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps burst stats 
> > > > bands=type=drop rate=1 burst_size=123
> > > > does 123 get set?
> > > burst_size(used for bucket size )should be (burst_size + rate) *1000
> > > my patch should be: because burst_size uint kilobits
> > > -/* Set burst size to a workable value if none specified. */
> > > -if (config->bands[i].burst_size == 0) {
> > > -config->bands[i].burst_size = config->bands[i].rate;
> > > -}
> > > -
> > >  meter->bands[i].up = config->bands[i];
> > >  /* Convert burst size to the bucket units: */
> > >  /* pkts => 1/1000 packets, kilobits => bits. */
> > > -meter->bands[i].up.burst_size *= 1000;
> > > +meter->bands[i].up.burst_size += config->bands[i].rate;
> > > +meter->bands[i].up.burst_size *= 1000ULL;
> >
> >
> > OK, thanks.
> > btw, why should we include bucket to burst_size?
> In netdev datapath, up.burst_size will be used as buckets, the
> "burst_size" in the command:
> ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps burst stats
> bands=type=drop rate=1 burst_size=1024'
>
> should be included to buckets, ("up.burst_size"). Think about tbf in kernel:
> $ tc qdisc add dev enp130s0f0 handle 10: root tbf rate 10mbit burst
> 2mb latency 70ms
> the command above, allow 2mb burst, and ovs kernel datapath also do that:
> in dp_meter_create function:
> band->bucket = (band->burst_size + band->rate) * 1000ULL;

Thanks, now I understand.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 1/4] dpif-netdev: Expand the meters supported number

2020-05-09 Thread William Tu
On Thu, Apr 30, 2020 at 07:00:36PM +0800, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> For now, ovs-vswitchd use the array of the dp_meter struct
> to store meter's data, and at most, there are only 65536
> (defined by MAX_METERS) meters that can be used. But in some
> case, for example, in the edge gateway, we should use 200,000,
> at least, meters for IP address bandwidth limitation.
> Every one IP address will use two meters for its rx and tx
> path[1]. In other way, ovs-vswitchd should support meter-offload
> (rte_mtr_xxx api introduced by dpdk.), but there are more than
> 65536 meters in the hardware, such as Mellanox ConnectX-6.
> 
> This patch use array to manage the meter, but it can ben expanded.
> 
> [1].
> $ in_port=p0,ip,ip_dst=1.1.1.x action=meter:n,output:p1
> $ in_port=p1,ip,ip_src=1.1.1.x action=meter:m,output:p0

I'm just curious why you need so many 'unique' meters?
can you share the meter id if their settings are the same?

ex:
$ in_port=p0,ip,ip_dst=1.1.1.x action=meter:X ,output:p1
$ in_port=p1,ip,ip_src=1.1.1.x action=meter:X, output:p0

if both flows have the same meter setup in X.

> 
> Cc: Ilya Maximets 
> Cc: William Tu 
> Cc: Jarno Rajahalme 
> Cc: Ben Pfaff 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---

Thanks for working on both kernel and userspace datapath!
Also add Pravin in the loop.

>  lib/dpif-netdev.c | 320 
> ++
>  1 file changed, 251 insertions(+), 69 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index ef14e83b5f06..17c0241aa2e2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -98,9 +98,12 @@ DEFINE_STATIC_PER_THREAD_DATA(uint32_t, recirc_depth, 0)
>  
>  /* Configuration parameters. */
>  enum { MAX_FLOWS = 65536 }; /* Maximum number of flows in flow table. */
> -enum { MAX_METERS = 65536 };/* Maximum number of meters. */
> -enum { MAX_BANDS = 8 }; /* Maximum number of bands / meter. */
> -enum { N_METER_LOCKS = 64 };/* Maximum number of meters. */
> +
> +/* Maximum number of meters in the table. */
> +#define METER_ENTRY_MAX (20ULL)
> +/* Maximum number of bands / meter. */
> +#define METER_BAND_MAX  (8)
> +#define DP_METER_ARRAY_SIZE_MIN (1ULL << 10)
>  
>  COVERAGE_DEFINE(datapath_drop_meter);
>  COVERAGE_DEFINE(datapath_drop_upcall_error);
> @@ -283,12 +286,25 @@ struct dp_meter {
>  uint16_t flags;
>  uint16_t n_bands;
>  uint32_t max_delta_t;
> +uint32_t id;
> +struct ovs_mutex lock;
>  uint64_t used;
>  uint64_t packet_count;
>  uint64_t byte_count;
>  struct dp_meter_band bands[];
>  };
>  
> +struct dp_meter_instance {
> +uint32_t n_meters;
> +OVSRCU_TYPE(struct dp_meter *) dp_meters[];
usually we add a comments here, saying
/* Followed by
 *struct dp_meter[n];
 * where n is the n_meters.
 */

> +};
> +
> +struct dp_meter_table {
> +OVSRCU_TYPE(struct dp_meter_instance *) ti;
> +uint32_t count;
> +struct ovs_mutex lock;
> +};
> +
>  struct pmd_auto_lb {
>  bool auto_lb_requested; /* Auto load balancing requested by user. */
>  bool is_enabled;/* Current status of Auto load balancing. */
> @@ -329,8 +345,7 @@ struct dp_netdev {
>  atomic_uint32_t tx_flush_interval;
>  
>  /* Meters. */
> -struct ovs_mutex meter_locks[N_METER_LOCKS];

Why removing the multiple locks and using only one lock?
Do you see any performance overhead when switching to sinlge lock?

> -struct dp_meter *meters[MAX_METERS]; /* Meter bands. */
> +struct dp_meter_table meter_tbl;
>  
>  /* Probability of EMC insertions is a factor of 'emc_insert_min'.*/
>  OVS_ALIGNED_VAR(CACHE_LINE_SIZE) atomic_uint32_t emc_insert_min;
> @@ -378,19 +393,6 @@ struct dp_netdev {
>  struct pmd_auto_lb pmd_alb;
>  };
>  
> -static void meter_lock(const struct dp_netdev *dp, uint32_t meter_id)
> -OVS_ACQUIRES(dp->meter_locks[meter_id % N_METER_LOCKS])
> -{
> -ovs_mutex_lock(&dp->meter_locks[meter_id % N_METER_LOCKS]);
> -}
> -
> -static void meter_unlock(const struct dp_netdev *dp, uint32_t meter_id)
> -OVS_RELEASES(dp->meter_locks[meter_id % N_METER_LOCKS])
> -{
> -ovs_mutex_unlock(&dp->meter_locks[meter_id % N_METER_LOCKS]);
> -}
> -
> -
>  static struct dp_netdev_port *dp_netdev_lookup_port(const struct dp_netdev 
> *dp,
>  odp_port_t)
>  OVS_REQUIRES(dp->port_mutex);
> @@ -1523,6 +1525,9 @@ choose_port(struct dp_netdev *dp, const char *name)
>  return ODPP_NONE;
>  }
>  

Re: [ovs-dev] [PATCH OVS 2/4] dpif-netdev: Add burst size to buckets

2020-05-09 Thread William Tu
On Sat, May 09, 2020 at 09:54:10AM +0800, Tonghao Zhang wrote:
> On Sat, May 9, 2020 at 7:23 AM William Tu  wrote:
> >
> > On Thu, Apr 30, 2020 at 07:00:37PM +0800, xiangxia.m@gmail.com wrote:
> > > From: Tonghao Zhang 
> > >
> > > For now, the meter of the userspace datapath, don't include
> > > the bucket burst size to buckets. This patch includes it now.
> > >
> > > Cc: Ilya Maximets 
> > > Cc: William Tu 
> > > Cc: Jarno Rajahalme 
> > > Cc: Ben Pfaff 
> > > Cc: Andy Zhou 
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  lib/dpif-netdev.c | 7 +--
> > >  1 file changed, 1 insertion(+), 6 deletions(-)
> > >
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index 17c0241aa2e2..59546db6a2a2 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -6092,15 +6092,10 @@ dpif_netdev_meter_set(struct dpif *dpif, 
> > > ofproto_meter_id meter_id,
> > >  for (i = 0; i < config->n_bands; ++i) {
> > >  uint32_t band_max_delta_t;
> > >
> > > -/* Set burst size to a workable value if none specified. */
> > > -if (config->bands[i].burst_size == 0) {
> > > -config->bands[i].burst_size = config->bands[i].rate;
> > > -}
> > > -
> > >  meter->bands[i].up = config->bands[i];
> > >  /* Convert burst size to the bucket units: */
> > >  /* pkts => 1/1000 packets, kilobits => bits. */
> > > -meter->bands[i].up.burst_size *= 1000;
> > > +meter->bands[i].up.burst_size += config->bands[i].rate * 1000ULL;
> >
> > I don't quite understand.
> > Isn't this remove the setting of burst_size and always use
> > 'config->bands[i].rate * 1000ULL;'?
> Hi William, thanks for you reviews,
> meter->bands[i].up.burst_size += config->bands[i].rate * 1000ULL;
> burst_size  will plus the config->bands[i].rate * 1000ULL  and then
> assigned to burst_size again.
> so if user don't set the burst_size, burst_size is 0, and only plus
> the config->bands[i].rate * 1000ULL.
> Before the patch, if user don't set the burst_sze, burst_size = 0, and
> will the rate *1000.
> Here, burst_size is different from kernel datapath. burst_size in
> netdev will be used as bucket. so buckets shoud be "burst_size" + rate
> 
> > Ex: When user set
> > ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps burst stats 
> > bands=type=drop rate=1 burst_size=123
> > does 123 get set?
> burst_size(used for bucket size )should be (burst_size + rate) *1000
> my patch should be: because burst_size uint kilobits
> -/* Set burst size to a workable value if none specified. */
> -if (config->bands[i].burst_size == 0) {
> -config->bands[i].burst_size = config->bands[i].rate;
> -}
> -
>  meter->bands[i].up = config->bands[i];
>  /* Convert burst size to the bucket units: */
>  /* pkts => 1/1000 packets, kilobits => bits. */
> -meter->bands[i].up.burst_size *= 1000;
> +meter->bands[i].up.burst_size += config->bands[i].rate;
> +meter->bands[i].up.burst_size *= 1000ULL;


OK, thanks.
btw, why should we include bucket to burst_size?

William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 3/4] dpif-netdev: Use the u32 instead of u64 for buckets

2020-05-09 Thread William Tu
On Sat, May 09, 2020 at 08:46:56AM +0800, Tonghao Zhang wrote:
> On Sat, May 9, 2020 at 7:12 AM William Tu  wrote:
> >
> > On Thu, Apr 30, 2020 at 07:00:38PM +0800, xiangxia.m@gmail.com wrote:
> > > From: Tonghao Zhang 
> > >
> > > When setting the meter rate to 4+Gbps, there is an overflow, the
> > > meters don't work as expected.
> > >
> > > Cc: Ilya Maximets 
> > > Cc: William Tu 
> > > Cc: Jarno Rajahalme 
> > > Cc: Ben Pfaff 
> > > Cc: Andy Zhou 
> > > Signed-off-by: Tonghao Zhang 
> > > ---
> > >  include/openvswitch/ofp-meter.h | 2 +-
> > >  lib/dpif-netdev.c   | 4 ++--
> > >  lib/ofp-meter.c | 4 ++--
> > >  3 files changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/openvswitch/ofp-meter.h 
> > > b/include/openvswitch/ofp-meter.h
> > > index 6776eae87e26..f55f89ac1a71 100644
> > > --- a/include/openvswitch/ofp-meter.h
> > > +++ b/include/openvswitch/ofp-meter.h
> > > @@ -37,7 +37,7 @@ struct ofputil_meter_band {
> > >  uint16_t type;
> > >  uint8_t prec_level; /* Non-zero if type == 
> > > OFPMBT_DSCP_REMARK. */
> > >  uint32_t rate;
> > > -uint32_t burst_size;
> > > +uint64_t burst_size;
> > >  };
> > >
> > >  void ofputil_format_meter_band(struct ds *, enum ofp13_meter_flags,
> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > > index 59546db6a2a2..104347d8b251 100644
> > > --- a/lib/dpif-netdev.c
> > > +++ b/lib/dpif-netdev.c
> > > @@ -277,7 +277,7 @@ static bool dpcls_lookup(struct dpcls *cls,
> > >
> > >  struct dp_meter_band {
> > >  struct ofputil_meter_band up; /* type, prec_level, pad, rate, 
> > > burst_size */
> > > -uint32_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for 
> > > KBPS) */
> > > +uint64_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for 
> > > KBPS) */
> >
> > why setting to 4Gbps will overflow?
> > Each unit in bucket is 1kbpf, so 4Gbps is around 4M, around 2^20.
> Hi William, thanks for your review,
> If we set the rate to 430kbps == 4.3Gbps
> ovs-ofctl -O OpenFlow13 add-meter br-int "meter=104 kbps stats
> bands=type=drop rate=430"
> 
> In the dpif_netdev_meter_set function:
> meter->bands[i].up.burst_size *= 1000;
> burst_size should be 43 but the max u32 is 4294967296
> 
Now I understand, thanks.

Acked-by: William Tu 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 2/4] dpif-netdev: Add burst size to buckets

2020-05-08 Thread William Tu
On Thu, Apr 30, 2020 at 07:00:37PM +0800, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> For now, the meter of the userspace datapath, don't include
> the bucket burst size to buckets. This patch includes it now.
> 
> Cc: Ilya Maximets 
> Cc: William Tu 
> Cc: Jarno Rajahalme 
> Cc: Ben Pfaff 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  lib/dpif-netdev.c | 7 +--
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 17c0241aa2e2..59546db6a2a2 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -6092,15 +6092,10 @@ dpif_netdev_meter_set(struct dpif *dpif, 
> ofproto_meter_id meter_id,
>  for (i = 0; i < config->n_bands; ++i) {
>  uint32_t band_max_delta_t;
>  
> -/* Set burst size to a workable value if none specified. */
> -if (config->bands[i].burst_size == 0) {
> -config->bands[i].burst_size = config->bands[i].rate;
> -}
> -
>  meter->bands[i].up = config->bands[i];
>  /* Convert burst size to the bucket units: */
>  /* pkts => 1/1000 packets, kilobits => bits. */
> -meter->bands[i].up.burst_size *= 1000;
> +meter->bands[i].up.burst_size += config->bands[i].rate * 1000ULL;

I don't quite understand.
Isn't this remove the setting of burst_size and always use
'config->bands[i].rate * 1000ULL;'?

Ex: When user set 
ovs-ofctl -O OpenFlow13 add-meter br0 'meter=1 pktps burst stats 
bands=type=drop rate=1 burst_size=123
does 123 get set?
William
>  /* Initialize bucket to empty. */
>  meter->bands[i].bucket = 0;
>  
> -- 
> 1.8.3.1
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 3/4] dpif-netdev: Use the u32 instead of u64 for buckets

2020-05-08 Thread William Tu
On Thu, Apr 30, 2020 at 07:00:38PM +0800, xiangxia.m@gmail.com wrote:
> From: Tonghao Zhang 
> 
> When setting the meter rate to 4+Gbps, there is an overflow, the
> meters don't work as expected.
> 
> Cc: Ilya Maximets 
> Cc: William Tu 
> Cc: Jarno Rajahalme 
> Cc: Ben Pfaff 
> Cc: Andy Zhou 
> Signed-off-by: Tonghao Zhang 
> ---
>  include/openvswitch/ofp-meter.h | 2 +-
>  lib/dpif-netdev.c   | 4 ++--
>  lib/ofp-meter.c | 4 ++--
>  3 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/openvswitch/ofp-meter.h b/include/openvswitch/ofp-meter.h
> index 6776eae87e26..f55f89ac1a71 100644
> --- a/include/openvswitch/ofp-meter.h
> +++ b/include/openvswitch/ofp-meter.h
> @@ -37,7 +37,7 @@ struct ofputil_meter_band {
>  uint16_t type;
>  uint8_t prec_level; /* Non-zero if type == OFPMBT_DSCP_REMARK. */
>  uint32_t rate;
> -uint32_t burst_size;
> +uint64_t burst_size;
>  };
>  
>  void ofputil_format_meter_band(struct ds *, enum ofp13_meter_flags,
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 59546db6a2a2..104347d8b251 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -277,7 +277,7 @@ static bool dpcls_lookup(struct dpcls *cls,
>  
>  struct dp_meter_band {
>  struct ofputil_meter_band up; /* type, prec_level, pad, rate, burst_size 
> */
> -uint32_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for KBPS) 
> */
> +uint64_t bucket; /* In 1/1000 packets (for PKTPS), or in bits (for KBPS) 
> */

why setting to 4Gbps will overflow?
Each unit in bucket is 1kbpf, so 4Gbps is around 4M, around 2^20.

William

>  uint64_t packet_count;
>  uint64_t byte_count;
>  };
> @@ -5970,7 +5970,7 @@ dp_netdev_run_meter(struct dp_netdev *dp, struct 
> dp_packet_batch *packets_,
>  band = &meter->bands[m];
>  
>  /* Update band's bucket. */
> -band->bucket += delta_t * band->up.rate;
> +band->bucket += (uint64_t)delta_t * band->up.rate;
>  band->bucket += delta_in_us * band->up.rate / 1000;
>  if (band->bucket > band->up.burst_size) {
>  band->bucket = band->up.burst_size;
> diff --git a/lib/ofp-meter.c b/lib/ofp-meter.c
> index 9ea40a0bfb63..1ac993bb028b 100644
> --- a/lib/ofp-meter.c
> +++ b/lib/ofp-meter.c
> @@ -72,7 +72,7 @@ ofputil_format_meter_band(struct ds *s, enum 
> ofp13_meter_flags flags,
>  ds_put_format(s, " rate=%"PRIu32, mb->rate);
>  
>  if (flags & OFPMF13_BURST) {
> -ds_put_format(s, " burst_size=%"PRIu32, mb->burst_size);
> +ds_put_format(s, " burst_size=%"PRIu64, mb->burst_size);
>  }
>  if (mb->type == OFPMBT13_DSCP_REMARK) {
>  ds_put_format(s, " prec_level=%"PRIu8, mb->prec_level);
> @@ -703,7 +703,7 @@ parse_ofp_meter_mod_str__(struct ofputil_meter_mod *mm, 
> char *string,
>  return error;
>  }
>  } else if (!strcmp(name, "burst_size")) {
> -char *error = str_to_u32(value, &band->burst_size);
> +char *error = str_to_u64(value, &band->burst_size);
>  if (error) {
>  return error;
>  }
> -- 
> 1.8.3.1
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH OVS 4/4] revert: dpif-netdev: includes microsecond delta in meter bucket calculation

2020-05-08 Thread William Tu
On Fri, May 8, 2020 at 1:11 AM 姜立东  wrote:
>
> Oh, this is due to patch porting from 2.10.
> Between latest and 2.10, commit 42697ca77 is introduced to fix millisecond 
> token insertion rate as below.
>  /* All packets will hit the meter at the same time. */
> -long_delta_t = (now - meter->used) / 1000; /* msec */
> +long_delta_t = now / 1000 - meter->used / 1000; /* msec */
>
> While my patch is ported from 2.10 base line, so additional token is counted 
> incorrectly.
> Commit 42697ca77 is good enough to fix the loss of token in delta computation.
> So we are looking into if higher token insertion rate in micro second is 
> needed in some cases.
>

Thanks, I forgot this is already fixed.
I will revert the patch if no other comments.

William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS Userspace: Usecase with multiple vxlan tunnels

2020-05-08 Thread William Tu
On Wed, May 6, 2020 at 1:41 PM Vasu Dasari  wrote:
>
> Thanks William. Does your statement mean, by network design, vxlan tunnel 
> ports and underlay bridge ports should not be part of a single bridge domain?
>
I saw your second email is using two bridges and usually that's how
people use it.
http://docs.openvswitch.org/en/latest/howto/userspace-tunneling/
Using single bridge also doesn't work for me, I'm still thinking about
the reason...
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

2020-05-08 Thread William Tu
On Thu, May 7, 2020 at 7:56 AM Toshiaki Makita
 wrote:
>
> On 2020/05/06 0:18, William Tu wrote:
> > On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> >> On 2020/05/04 1:22, William Tu wrote:
> >>> On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>>> This adds a reference program, flowtable_afxdp.o, which can be used to
> >>>> offload flows to XDP through netdev-offload-xdp.
> >>>> The program will be compiled when using --enable-bpf switch.
> >>>>
> >>>> Signed-off-by: Toshiaki Makita 
> >>>
> >>> Hi Toshiaki,
> >>>
> >>> Thanks for your patch, I haven't tried to run it but I have
> >>> questions about miniflow.
> >>>
> >>>> ---
> >> ...
> >>>> +SEC("xdp") int
> >>>> +flowtable_afxdp(struct xdp_md *ctx)
> >>>> +{
> >>>> +struct xdp_miniflow *pkt_mf;
> >>>> +struct xdp_subtable_mask *subtable_mask;
> >>>> +int *head;
> >>>> +struct xdp_flow_actions *xdp_actions = NULL;
> >>>> +struct nlattr *a;
> >>>> +unsigned int left;
> >>>> +int cnt, idx, zero = 0;
> >>>> +
> >>>> +account_debug(0);
> >>>> +
> >>>> +head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>>> +if (!head) {
> >>>> +return XDP_ABORTED;
> >>>> +}
> >>>> +if (*head == XDP_SUBTABLES_TAIL) {
> >>>> +/* Offload not enabled */
> >>>> +goto upcall;
> >>>> +}
> >>>> +
> >>>> +/* Get temporary storage for storing packet miniflow */
> >>>> +pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>>> +if (!pkt_mf) {
> >>>> +return XDP_ABORTED;
> >>>> +}
> >>>> +
> >>>
> >>> I start to wonder what's the benefit of using miniflow in XDP?
> >>> miniflow tries to compress the large flow key into smaller memory,
> >>> and with a flowmap indicating which offset in bits are valid.
> >>> And when adding a new subtable at flow_put, the subtable's key
> >>> size can be smaller with only the needed fields.
> >>>
> >>> But in the case of XDP/eBPF, we have to statically allocated fixed
> >>> key size (the subtbl_template has key as struct xdp_flow_key), so
> >>> each subtable is always having key with full key fields. (not saving
> >>> memory space) Moreover with miniflow, it makes the 'mask_key' function
> >>> below pretty complicated.
> >>
> >> Fixed sized subtable is restriction of map-in-map.
> >> The benefit to use miniflow I envisioned initially was
> >> - compress the key
> >> - use existing function to convert flows for XDP
> >>
> >> The first one is actually not doable due to map-in-map. I hope someday the
> >> restriction gets loosen...
> >
> > On my second thought this might be a good idea.
> >
> > One problem I hit in my previous prototype is that the flow key is too big.
> > ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
> > some BPF limitation. If we continue adding more fields to struct xdp_flow,
> > ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory 
> > usage
> > in key is smaller. The key size in subtbl_template can be a value smaller 
> > the
> > the size of struct xdp_flow.
>
> Nice idea!
> So you mean we can have less sized subtbl_template key because when adding 
> more and
> more keys, we probably don't use *all* of keys at the same time?

Now I'm a little confused.
If you see 'struct flow' in ovs, all the fields have its own space
(ex: no union is used for
ipv4 and ipv6). So using miniflow extract saves space by only use
fields on the packet header.
But if you see 'struct sw_flow_key' in ovs kernel, same layer of
protocols are using union,
(ex: ipv4 and ipv6 are in a union). So no extra space is wasted.
So how we define the struct xdp_flow_key makes difference.

Or here is another idea.
We can also do on-demand parsing/key_extract here.
1) keep a list of key_fields = {empty}
2) when xdp_flow_put, we know which fields are needed to extract
ex: key_fields = {src_mac, dst_mac}
3) then at the xdp_miniflow_extract we can skip/return early after
the L2 fields

Re: [ovs-dev] [RFC v2 PATCH 3/4] netdev-offload: Add xdp flow api provider

2020-05-08 Thread William Tu
On Thu, May 7, 2020 at 7:40 AM Toshiaki Makita
 wrote:
>
> On 2020/05/06 0:37, William Tu wrote:
> > On Tue, Apr 21, 2020 at 11:47:03PM +0900, Toshiaki Makita wrote:...
> >> +/* Convert odp_port to devmap_idx in output action */
> >> +static int
> >> +convert_port_to_devmap_idx(struct nlattr *actions, size_t actions_len)
> >> +{
> >> +struct nlattr *a;
> >> +unsigned int left;
> >> +bool output_seen = false;
> >> +
> >> +NL_ATTR_FOR_EACH_UNSAFE(a, left, actions, actions_len) {
> >> +int type = nl_attr_type(a);
> >> +
> >> +if (output_seen) {
> >> +VLOG_DBG("XDP does not support packet copy");
> >> +return EOPNOTSUPP;
> >> +}
> >> +
> >> +if (type == OVS_ACTION_ATTR_OUTPUT) {
> >> +odp_port_t *port;
> >> +struct netdev_info *netdev_info;
> >> +
> >> +port = CONST_CAST(odp_port_t *,
> >> +  nl_attr_get_unspec(a, sizeof(odp_port_t)));
> >> +netdev_info = find_netdev_info(*port);
> >> +if (!netdev_info) {
> >> +VLOG_DBG("Cannot output to port %u without XDP prog 
> >> attached",
> >> + *port);
> >
> > Hi Toshiaki,
> >
> > Does this mean all of my port attached to ovs need to have XDP prog 
> > attached?
>
> Yes.
>
> > Why can't a port with xdp offload receive a packet and forward to a 
> > non-xdp-offload
> > port? Is this a limitation in kernel?
>
> Yes. In many cases XDP_REDIRECT does not work when redirect target device 
> does not
> have XDP prog attached. Attaching an XDP prog is just a workaround and that's 
> not
> sufficient, e.g. veth does not work even when doing that, but we don't have 
> any
> other (right) way to workaround it. Linux netdev community is working on this
> problem. Once we have a sane way to fix it, we no longer need to attach an 
> XDP prog.
>
I see, thank you!
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] netdev-afxdp on rhel8.0 rhel8.1 rhel8.2

2020-05-08 Thread William Tu
Hi Eelco,

Thanks for your suggestion. The xdpsock on rhel8.0 works ok, so I
guess there are some difference
in netdev-afxdp.c. We will debug it more and find a solution.

William

On Thu, May 7, 2020 at 12:26 AM Eelco Chaudron  wrote:
>
> Hi William,
>
> I have not tried running XDP on RHEL for a while. But for AF_XDP and
> OVS, we kept on using the latest XDP features, which might not all be
> present in the RHEL8 kernels.
> Also, note that AF_XDP is marked Technology Preview in the release
> notes.
>
> Take a look at the specific xdpsock version included in the kernel
> source, and see what OVS does differently that would cause the failure.
>
> //Eelco
>
>
> On 7 May 2020, at 2:27, William Tu wrote:
>
> > Hi,
> >
> > We're testing afxdp on rhel8.x. We simply try to run 'make
> > check-afxdp' and see if it works.
> > We are doing:
> > $ yum install libbpf-devel numactl-devel
> > $ cd ovs; ./boot.sh ; ./configure --enable-afxdp; make check-afxdp
> >
> > On rhel8.2, the basic ping works.
> > However on rhel8.0 and 8.1, all test cases failed due to creating xsk
> > socket failed.
> > ovs-vswitchd.log shows:
> > 2020-05-07T00:08:54.891Z|00052|netdev_afxdp|INFO|ovs-p0: Setting XDP
> > mode to best-effort.
> > 2020-05-07T00:08:55.099Z|00053|netdev_afxdp|ERR|xsk_socket__create
> > failed (Operation not supported) mode: generic, use-need-wakeup:
> > false, qid: 0
> >
> > Has anyone tried it before? Or any kernel difference?
> > rhel8.1: Linux rhel-8-1 4.18.0-147.5.1.el8_1.x86_64
> > rhel8.2: Linux instance-2 4.18.0-193.el8.x86_64
> >
> > Thanks
> > William
>
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] netdev-afxdp on rhel8.0 rhel8.1 rhel8.2

2020-05-06 Thread William Tu
Hi,

We're testing afxdp on rhel8.x. We simply try to run 'make
check-afxdp' and see if it works.
We are doing:
$ yum install libbpf-devel numactl-devel
$ cd ovs; ./boot.sh ; ./configure --enable-afxdp; make check-afxdp

On rhel8.2, the basic ping works.
However on rhel8.0 and 8.1, all test cases failed due to creating xsk
socket failed.
ovs-vswitchd.log shows:
2020-05-07T00:08:54.891Z|00052|netdev_afxdp|INFO|ovs-p0: Setting XDP
mode to best-effort.
2020-05-07T00:08:55.099Z|00053|netdev_afxdp|ERR|xsk_socket__create
failed (Operation not supported) mode: generic, use-need-wakeup:
false, qid: 0

Has anyone tried it before? Or any kernel difference?
rhel8.1: Linux rhel-8-1 4.18.0-147.5.1.el8_1.x86_64
rhel8.2: Linux instance-2 4.18.0-193.el8.x86_64

Thanks
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] OVS Userspace: Usecase with multiple vxlan tunnels

2020-05-06 Thread William Tu
On Sun, May 03, 2020 at 11:12:04PM -0400, Vasu Dasari wrote:
> Hi,
> 
> I am trying a userspace vxlan test case scenario as follows. ap0, fp0, ap1
> and fp1 are namespaces where in packets from ap0 are sent to fp0 over vxlan
> tunnel and packets from ap1 to fp1 are sent over second vxlan tunnel. And
> br0 for the bridge is handling ARP, route, etc. Note that br0 has two
> addresses on assigned to it.
> 
> On executing the script with following command:
> sudo make -s -C _build-gcc/ check-system-userspace TESTSUITEFLAGS='-k
> "modified ping over vxlan tunnel"'
> 
> I see that OVS times out and leaving behind br0 and ovs-netdev interfaces.
> But, if I comment out the one of the addresses on br0 at least the script
> will be successful but the ping on second tunnel will not be successful.
> 
> My question is:
> 1. Is this use case supported in userspace OVS mode?
> 2. Am I missing something in configuration?
> 
> Thanks
> -Vasu
> 
> 
>   +--+
>+---+  |  |  +---+
>|ap0+--+  +--+fp0|
>+---+  |  |  +---+
>   | ovs  |
>+---+  | br0  |  +---+
>|ap1+--+  +--+fp1|
>+---+  |  |  +---+
>   +--+
> 
> 
> 

We usually have an underlay bridage and overlay bridge.
And the vxlan device is attached to the overlay, and the
underlay bridge has the outer ip assigned.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] rhel: Support RHEL8.0 build and packaging

2020-05-06 Thread William Tu
On Mon, Apr 27, 2020 at 01:48:42PM -0700, Yifeng Sun wrote:
> This patch provides essential fixes for OVS to build and package on RHEL8.0.
> 
> The required package python3-sphinx can be installed by:
> $ ARCH=$( /bin/arch )
> $ subscription-manager repos --enable 
> "codeready-builder-for-rhel-8-${ARCH}-rpms"
> $ yum install python3-sphinx
> 
> Signed-off-by: Yifeng Sun 
> ---
>  rhel/openvswitch-fedora.spec.in   | 10 --
>  rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh |  5 +
>  2 files changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
> index 7bc8c34b80af..02504f05f9b7 100644
> --- a/rhel/openvswitch-fedora.spec.in
> +++ b/rhel/openvswitch-fedora.spec.in
> @@ -60,9 +60,15 @@ BuildRequires: autoconf automake libtool
>  BuildRequires: systemd-units openssl openssl-devel
>  BuildRequires: python3-devel
>  BuildRequires: desktop-file-utils
> -BuildRequires: groff graphviz
> -BuildRequires: checkpolicy, selinux-policy-devel
> +%if 0%{?rhel} >= 8
> +BuildRequires: groff-base
> +BuildRequires: python3-sphinx
> +%else
> +BuildRequires: groff
>  BuildRequires: /usr/bin/sphinx-build-3
> +%endif
> +BuildRequires: graphviz
> +BuildRequires: checkpolicy, selinux-policy-devel
>  # make check dependencies
>  BuildRequires: procps-ng
>  %if %{with libcapng}
> diff --git a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh 
> b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
> index a9b5cdd817da..43dcc73fd3c5 100644
> --- a/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
> +++ b/rhel/usr_share_openvswitch_scripts_ovs-kmod-manage.sh
> @@ -75,6 +75,11 @@ IFS='.\|-' read mainline_major mainline_minor 
> mainline_patch major_rev \
>  # echo mainline_major=$mainline_major mainline_minor=$mainline_minor \
>  # mainline_patch=$mainline_patch major_rev=$major_rev minor_rev=$minor_rev
>  
> +if [ "$mainline_major" = "4" ]; then
> +# Skip this script on rhel8
> +exit 0
> +fi
> +

This makes the check for 4.4 (SLES12 SP3) and 4.12 (SLES12 SP4)
below unnecessary. Do you want to consider these two cases?


William
>  expected_rhel_base_minor="el7"
>  if [ "$mainline_major" = "3" ] && [ "$mainline_minor" = "10" ]; then
>  if [ "$major_rev" = "327" ]; then
> -- 
> 2.7.4
> 
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 3/4] netdev-offload: Add xdp flow api provider

2020-05-05 Thread William Tu
On Tue, Apr 21, 2020 at 11:47:03PM +0900, Toshiaki Makita wrote:
> This provider offloads classifier to software XDP.
> 
> It works only when a custom XDP object is loaded by afxdp netdev.
> The BPF program needs to support classifier with array-of-maps for
> subtable hashmaps and arraymap for subtable masks. The flow api
> provider detects classifier support in the custom XDP program when
> loading it.
> (More explanation is TBD)
> 
> In the future it may be possible to offload classifier to SmartNIC
> through XDP, but it will require map-in-map support in NIC HW or
> TCAM-like map support in BPF and HW which currently does not exist.
> 
> Signed-off-by: Toshiaki Makita 
> ---
>  lib/automake.mk   |6 +-
>  lib/bpf-util.c|   38 ++
>  lib/bpf-util.h|   22 +
>  lib/netdev-afxdp.c|  191 +-
>  lib/netdev-afxdp.h|3 +
>  lib/netdev-linux-private.h|2 +
>  lib/netdev-offload-provider.h |6 +
>  lib/netdev-offload-xdp.c  | 1143 +
>  lib/netdev-offload-xdp.h  |   49 ++
>  lib/netdev-offload.c  |3 +
>  10 files changed, 1461 insertions(+), 2 deletions(-)
>  create mode 100644 lib/bpf-util.c
>  create mode 100644 lib/bpf-util.h
>  create mode 100644 lib/netdev-offload-xdp.c
>  create mode 100644 lib/netdev-offload-xdp.h
> 
> diff --git a/lib/automake.mk b/lib/automake.mk
> index 95925b57c..c088ed142 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -419,10 +419,14 @@ endif
>  
>  if HAVE_AF_XDP
>  lib_libopenvswitch_la_SOURCES += \
> + lib/bpf-util.c \
> + lib/bpf-util.h \
>   lib/netdev-afxdp-pool.c \
>   lib/netdev-afxdp-pool.h \
>   lib/netdev-afxdp.c \
> - lib/netdev-afxdp.h
> + lib/netdev-afxdp.h \
> + lib/netdev-offload-xdp.c \
> + lib/netdev-offload-xdp.h
>  endif
>  
>  if DPDK_NETDEV
> diff --git a/lib/bpf-util.c b/lib/bpf-util.c
> new file mode 100644
> index 0..324cfbe1d
> --- /dev/null
> +++ b/lib/bpf-util.c
> @@ -0,0 +1,38 @@
> +/*
> + * Copyright (c) 2020 NTT Corp.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include 
> +
> +#include "bpf-util.h"
> +
> +#include 
> +
> +#include "ovs-thread.h"
> +
> +DEFINE_STATIC_PER_THREAD_DATA(struct { char s[128]; },
> +  libbpf_strerror_buffer,
> +  { "" });
> +
> +const char *
> +ovs_libbpf_strerror(int err)
> +{
> +enum { BUFSIZE = sizeof libbpf_strerror_buffer_get()->s };
> +char *buf = libbpf_strerror_buffer_get()->s;
> +
> +libbpf_strerror(err, buf, BUFSIZE);
> +
> +return buf;
> +}
> diff --git a/lib/bpf-util.h b/lib/bpf-util.h
> new file mode 100644
> index 0..6346935b3
> --- /dev/null
> +++ b/lib/bpf-util.h
> @@ -0,0 +1,22 @@
> +/*
> + * Copyright (c) 2020 NTT Corp.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + * http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef BPF_UTIL_H
> +#define BPF_UTIL_H 1
> +
> +const char *ovs_libbpf_strerror(int err);
> +
> +#endif /* bpf-util.h */
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index a526c8339..aa2b0c32c 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -37,10 +37,13 @@
>  #include 
>  #include 
>  
> +#include "bpf-util.h"
>  #include "coverage.h"
>  #include "dp-packet.h"
>  #include "dpif-netdev.h"
>  #include "fatal-signal.h"
> +#include "netdev-offload-provider.h"
> +#include "netdev-offload-xdp.h"
>  #include "openvswitch/compiler.h"
>  #include "openvswitch/dynamic-string.h"
>  #include "openvswitch/list.h"
> @@ -261,10 +264,190 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>  ovs_mutex_unlock(&unused_pools_mutex);
>  }
>  
> +bool
> +has_xdp_flowtable(struct netdev *netdev)
> +{
> +struct netdev_linux *dev = netdev_linux_cast(netdev);
> +
> +return dev->has_xdp_flowtable;
> +}
> +
> +struct bpf_object *
> +get_xdp_object(str

Re: [ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

2020-05-05 Thread William Tu
On Tue, May 05, 2020 at 11:43:58AM +0900, Toshiaki Makita wrote:
> On 2020/05/04 1:22, William Tu wrote:
> >On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> >>This adds a reference program, flowtable_afxdp.o, which can be used to
> >>offload flows to XDP through netdev-offload-xdp.
> >>The program will be compiled when using --enable-bpf switch.
> >>
> >>Signed-off-by: Toshiaki Makita 
> >
> >Hi Toshiaki,
> >
> >Thanks for your patch, I haven't tried to run it but I have
> >questions about miniflow.
> >
> >>---
> ...
> >>+SEC("xdp") int
> >>+flowtable_afxdp(struct xdp_md *ctx)
> >>+{
> >>+struct xdp_miniflow *pkt_mf;
> >>+struct xdp_subtable_mask *subtable_mask;
> >>+int *head;
> >>+struct xdp_flow_actions *xdp_actions = NULL;
> >>+struct nlattr *a;
> >>+unsigned int left;
> >>+int cnt, idx, zero = 0;
> >>+
> >>+account_debug(0);
> >>+
> >>+head = bpf_map_lookup_elem(&subtbl_masks_hd, &zero);
> >>+if (!head) {
> >>+return XDP_ABORTED;
> >>+}
> >>+if (*head == XDP_SUBTABLES_TAIL) {
> >>+/* Offload not enabled */
> >>+goto upcall;
> >>+}
> >>+
> >>+/* Get temporary storage for storing packet miniflow */
> >>+pkt_mf = bpf_map_lookup_elem(&pkt_mf_tbl, &zero);
> >>+if (!pkt_mf) {
> >>+return XDP_ABORTED;
> >>+}
> >>+
> >
> >I start to wonder what's the benefit of using miniflow in XDP?
> >miniflow tries to compress the large flow key into smaller memory,
> >and with a flowmap indicating which offset in bits are valid.
> >And when adding a new subtable at flow_put, the subtable's key
> >size can be smaller with only the needed fields.
> >
> >But in the case of XDP/eBPF, we have to statically allocated fixed
> >key size (the subtbl_template has key as struct xdp_flow_key), so
> >each subtable is always having key with full key fields. (not saving
> >memory space) Moreover with miniflow, it makes the 'mask_key' function
> >below pretty complicated.
> 
> Fixed sized subtable is restriction of map-in-map.
> The benefit to use miniflow I envisioned initially was
> - compress the key
> - use existing function to convert flows for XDP
> 
> The first one is actually not doable due to map-in-map. I hope someday the
> restriction gets loosen...

On my second thought this might be a good idea.

One problem I hit in my previous prototype is that the flow key is too big.
ex: struct sw_flow_key in ovs kernel is more than 500B, and moving it to bpf
some BPF limitation. If we continue adding more fields to struct xdp_flow,
ex: ipv6, vxlan tunnel, then with miniflow extract, the actually memory usage
in key is smaller. The key size in subtbl_template can be a value smaller the
the size of struct xdp_flow.

But when adding more fields, I'm not sure whether we will hit some limitation
of BPF at xdp_miniflow_extract.
So miniflow can save key size but complicate the key extraction.
Without miniflow, key size is large but key extract and match are simpler.
What do you think?

> 
> The second one is doable and it's good for userspace offload driver.
> But I ended up with rewriting most helper functions of miniflow for BPF due
> to some restriction.
> 
> >What if we don't compress the flow key?
> >unlike to OVS userspace, more like the kernel's implementation of OVS
> >megaflow (ex: masked_flow_lookup in net/openvswitch/flow_table.c)
> 
> Without miniflow the BPF program will be simple like the original xdp_flow
> patch for Linux kernel.
> I can try it.
> 
> >Where it has a list of masks (like your 'subtbl_masks' map) but
> >only use 1 hash table (so we don't need map-in-map).
> >We lookup the hash table multiple times, each
> >time applied different masks in the mask list until finding a match.
> 
> For HW offload one big hash table may help. Fow software it will be slower
> as key size and entries are larger.
> Maybe we can support both types, one big hash table and multiple subtables,
> in the future?
> 

For one big hash table, do you mean adding another cache before megaflow
like EMC (exact match cache)?

William
> As I don't have BPF offloadable NICs I prefer to keep current multiple
> subtables for now.
> 
> Toshiaki Makita
> 
> >
> >However, I'm not sure whether this will hit some limitations of verifier.
> >
> >Please correct me if I understand wrong here.
> >Thanks
> >William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] raft: Fix leak of the incomplete command.

2020-05-05 Thread William Tu
On Tue, May 5, 2020 at 1:48 AM Ilya Maximets  wrote:
>
> On 5/5/20 3:08 AM, William Tu wrote:
> > On Mon, May 04, 2020 at 04:55:24PM -0700, Han Zhou wrote:
> >> On Mon, May 4, 2020 at 4:10 PM William Tu  wrote:
> >>>
> >>> On Mon, May 04, 2020 at 09:55:41PM +0200, Ilya Maximets wrote:
> >>>> Function raft_command_initiate() returns correctly referenced command
> >>>> instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
> >>>> commands because one more reference is in raft->commands list.
> >>>> raft_handle_execute_command_request__() leaks the reference by not
> >>>> returning pointer anywhere and not unreferencing incomplete commands.
> >>>>
> >>>>  792 bytes in 11 blocks are definitely lost in loss record 258 of 262
> >>>> at 0x483BB1A: calloc (vg_replace_malloc.c:762)
> >>>> by 0x44BA32: xcalloc (util.c:121)
> >>>> by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
> >>>> by 0x422E5F: raft_command_initiate (raft.c:2061)
> >>>> by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
> >>>> by 0x428651: raft_handle_execute_command_request (raft.c:4177)
> >>>> by 0x428651: raft_handle_rpc (raft.c:4230)
> >>>> by 0x428651: raft_conn_run (raft.c:1445)
> >>>> by 0x428DEA: raft_run (raft.c:1803)
> >>>> by 0x407392: main_loop (ovsdb-server.c:226)
> >>>> by 0x407392: main (ovsdb-server.c:469)
> >>>>
> >>>> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for
> >> clustered databases.")
> >>>> Signed-off-by: Ilya Maximets 
> >>>
> >>> Looks good to me, Coverity also detects this leak.
> >>> Acked-by: William Tu 
> >>>
> >>
> >> Thanks for the fix.
> >> Acked-by: Han Zhou 
> >
> > Thanks.
> > Applied to master and 2.13.
>
> Thanks.  Could you, please, backport down to 2.9?

done.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 0/4] XDP offload using flow API provider

2020-05-05 Thread William Tu
On Tue, May 05, 2020 at 11:50:38AM +0900, Toshiaki Makita wrote:
> On 2020/05/05 1:24, William Tu wrote:
> >On Tue, Apr 21, 2020 at 11:47:00PM +0900, Toshiaki Makita wrote:
> >>This patch adds an XDP-based flow cache using the OVS netdev-offload
> >>flow API provider.  When an OVS device with XDP offload enabled,
> >>packets first are processed in the XDP flow cache (with parse, and
> >>table lookup implemented in eBPF) and if hits, the action processing
> >>are also done in the context of XDP, which has the minimum overhead.
> >>
> >>This provider is based on top of William's recently posted patch for
> >>custom XDP load.  When a custom XDP is loaded, the provider detects if
> >>the program supports classifier, and if supported it starts offloading
> >>flows to the XDP program.
> >>
> >>The patches are derived from xdp_flow[1], which is a mechanism similar to
> >>this but implemented in kernel.
> >>
> >>
> >>* Motivation
> >>
> >>While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> >>performance, there are use cases where packets better to be processed in
> >>kernel, for example, TCP/IP connections, or container to container
> >>connections.  Current solution is to use tap device or af_packet with
> >>extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> >>is to steer packets earlier in the XDP program, and decides to send to
> >>userspace datapath or stay in kernel.
> >>
> >>One problem with current netdev-afxdp is that it forwards all packets to
> >>userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> >>program.) only provides the interface to load XDP program, howerver users
> >>usually don't know how to write their own XDP program.
> >>
> >>XDP also supports HW-offload so it may be possible to offload flows to
> >>HW through this provider in the future, although not currently.
> >>The reason is that map-in-map is required for our program to support
> >>classifier with subtables in XDP, but map-in-map is not offloadable.
> >>If map-in-map becomes offloadable, HW-offload of our program will also
> >>be doable.
> >>
> >>
> >>* How to use
> >>
> >>1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> >>
> >>2. make with --enable-afxdp --enable-bpf
> >>--enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
> >>the BPF object will not be installed anywhere by "make install" at this 
> >>point.
> >
> >When configure, there is a missing include, causing error due to __u64
> >checking bpf/bpf_helpers.h usability... no
> >checking bpf/bpf_helpers.h presence... yes
> >configure: WARNING: bpf/bpf_helpers.h: present but cannot be compiled
> >configure: WARNING: bpf/bpf_helpers.h: check for missing prerequisite 
> >headers?
> >
> >configure:18876: gcc -c -g -O2  conftest.c >&5
> >In file included from /usr/local/include/bpf/bpf_helpers.h:5:0,
> >  from conftest.c:73:
> >/usr/local/include/bpf/bpf_helper_defs.h:55:82: error: unknown type name 
> >'__u64'
> >  static int (*bpf_map_update_elem)(void *map, const void *key, const void 
> > *value, __u64 flags) = (void *) 2;
> > 
> >   ^
> >/usr/local/include/bpf/bpf_helper_defs.h:79:41: error: unknown type name 
> >'__u32'
> >  static int (*bpf_probe_read)(void *dst, __u32 size, const void 
> > *unsafe_ptr) = (void *) 4;
> >
> >I applied this to fix it:
> >diff --git a/acinclude.m4 b/acinclude.m4
> >index 5eeab6feb9cc..39dfce565182 100644
> >--- a/acinclude.m4
> >+++ b/acinclude.m4
> >@@ -326,7 +326,8 @@ AC_DEFUN([OVS_CHECK_LINUX_BPF], [
> >[AC_MSG_ERROR([unable to find llc to compile BPF program])])
> >  AC_CHECK_HEADER([bpf/bpf_helpers.h], [],
> >-  [AC_MSG_ERROR([unable to find bpf/bpf_helpers.h to compile BPF 
> >program])])
> >+  [AC_MSG_ERROR([unable to find bpf/bpf_helpers.h to compile BPF 
> >program])],
> >+[#include ])
> >  AC_CHECK_HEADER([linux/bpf.h], [],
> >[AC_MSG_ERROR([unable to find linux/bpf.h to compile BPF program])])
> 
> Oh thanks, not sure why I didn't hit this problem.
> 

I think if you start with ./boot.sh, then this should show up.
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 0/4] XDP offload using flow API provider

2020-05-05 Thread William Tu
On Tue, Apr 21, 2020 at 11:47:00PM +0900, Toshiaki Makita wrote:
> This patch adds an XDP-based flow cache using the OVS netdev-offload
> flow API provider.  When an OVS device with XDP offload enabled,
> packets first are processed in the XDP flow cache (with parse, and
> table lookup implemented in eBPF) and if hits, the action processing
> are also done in the context of XDP, which has the minimum overhead.
> 
> This provider is based on top of William's recently posted patch for
> custom XDP load.  When a custom XDP is loaded, the provider detects if
> the program supports classifier, and if supported it starts offloading
> flows to the XDP program.
> 
> The patches are derived from xdp_flow[1], which is a mechanism similar to
> this but implemented in kernel.
> 
> 
> * Motivation
> 
> While userspace datapath using netdev-afxdp or netdev-dpdk shows good
> performance, there are use cases where packets better to be processed in
> kernel, for example, TCP/IP connections, or container to container
> connections.  Current solution is to use tap device or af_packet with
> extra kernel-to/from-userspace overhead.  But with XDP, a better solution
> is to steer packets earlier in the XDP program, and decides to send to
> userspace datapath or stay in kernel.
> 
> One problem with current netdev-afxdp is that it forwards all packets to
> userspace, The first patch from William (netdev-afxdp: Enable loading XDP
> program.) only provides the interface to load XDP program, howerver users
> usually don't know how to write their own XDP program.
> 
> XDP also supports HW-offload so it may be possible to offload flows to
> HW through this provider in the future, although not currently.
> The reason is that map-in-map is required for our program to support
> classifier with subtables in XDP, but map-in-map is not offloadable.
> If map-in-map becomes offloadable, HW-offload of our program will also
> be doable.
> 
> 
> * How to use
> 
> 1. Install clang/llvm >= 9, libbpf >= 0.0.4, and kernel >= 5.3.
> 
> 2. make with --enable-afxdp --enable-bpf
> --enable-bpf will generate XDP program "bpf/flowtable_afxdp.o".  Note that
> the BPF object will not be installed anywhere by "make install" at this 
> point. 
> 
> 3. Load custom XDP program
> E.g.
> $ ovs-vsctl add-port ovsbr0 veth0 -- set int veth0 options:xdp-mode=native \
>   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> $ ovs-vsctl add-port ovsbr0 veth1 -- set int veth1 options:xdp-mode=native \
>   options:xdp-obj="path/to/ovs/bpf/flowtable_afxdp.o"
> 
> 4. Enable XDP_REDIRECT
> If you use veth devices, make sure to load some (possibly dummy) programs
> on the peers of veth devices.

Hi Toshiaki,

What kind of dummy program to put at the other side of veth?
I'm trying to create a end-to-end test using veth, similar to
the ping test in tests/system-traffic.at

At the other side of veth, I use 
$/bpf-next/samples/bpf# ./xdp_rxq_info -d p0 -S -a XDP_PASS

but somehow around 90% of the icmp packets are dropped, I'm still
debugging the reason.

William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] raft: Fix leak of the incomplete command.

2020-05-04 Thread William Tu
On Mon, May 04, 2020 at 04:55:24PM -0700, Han Zhou wrote:
> On Mon, May 4, 2020 at 4:10 PM William Tu  wrote:
> >
> > On Mon, May 04, 2020 at 09:55:41PM +0200, Ilya Maximets wrote:
> > > Function raft_command_initiate() returns correctly referenced command
> > > instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
> > > commands because one more reference is in raft->commands list.
> > > raft_handle_execute_command_request__() leaks the reference by not
> > > returning pointer anywhere and not unreferencing incomplete commands.
> > >
> > >  792 bytes in 11 blocks are definitely lost in loss record 258 of 262
> > > at 0x483BB1A: calloc (vg_replace_malloc.c:762)
> > > by 0x44BA32: xcalloc (util.c:121)
> > > by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
> > > by 0x422E5F: raft_command_initiate (raft.c:2061)
> > > by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
> > > by 0x428651: raft_handle_execute_command_request (raft.c:4177)
> > > by 0x428651: raft_handle_rpc (raft.c:4230)
> > > by 0x428651: raft_conn_run (raft.c:1445)
> > > by 0x428DEA: raft_run (raft.c:1803)
> > > by 0x407392: main_loop (ovsdb-server.c:226)
> > > by 0x407392: main (ovsdb-server.c:469)
> > >
> > > Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for
> clustered databases.")
> > > Signed-off-by: Ilya Maximets 
> >
> > Looks good to me, Coverity also detects this leak.
> > Acked-by: William Tu 
> >
> 
> Thanks for the fix.
> Acked-by: Han Zhou 

Thanks.
Applied to master and 2.13.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-afxdp: Fix missing init.

2020-05-04 Thread William Tu
On Mon, May 04, 2020 at 10:58:06AM -0700, Gregory Rose wrote:
> 
> On 5/4/2020 9:28 AM, William Tu wrote:
> >When introducing the interrupt mode for netdev-afxdp, the netdev
> >init function is accidentally removed.  Fix it by adding it back.
> >
> >Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.")
> >Signed-off-by: William Tu 
> >---
> >  lib/netdev-linux.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> >diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> >index 40d0cc1105ea..b52071e92ec7 100644
> >--- a/lib/netdev-linux.c
> >+++ b/lib/netdev-linux.c
> >@@ -3588,6 +3588,7 @@ const struct netdev_class netdev_internal_class = {
> >  #ifdef HAVE_AF_XDP
> >  #define NETDEV_AFXDP_CLASS_COMMON   \
> >+.init = netdev_afxdp_init,  \
> >  .construct = netdev_afxdp_construct,\
> >  .destruct = netdev_afxdp_destruct,  \
> >  .get_stats = netdev_afxdp_get_stats,\
> >
> 
> Acked-by: Greg Rose 
Applied to master, thanks
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] raft: Fix leak of the incomplete command.

2020-05-04 Thread William Tu
On Mon, May 04, 2020 at 09:55:41PM +0200, Ilya Maximets wrote:
> Function raft_command_initiate() returns correctly referenced command
> instance.  'n_ref' equals 1 for complete commands and 2 for incomplete
> commands because one more reference is in raft->commands list.
> raft_handle_execute_command_request__() leaks the reference by not
> returning pointer anywhere and not unreferencing incomplete commands.
> 
>  792 bytes in 11 blocks are definitely lost in loss record 258 of 262
> at 0x483BB1A: calloc (vg_replace_malloc.c:762)
> by 0x44BA32: xcalloc (util.c:121)
> by 0x422E5F: raft_command_create_incomplete (raft.c:2038)
> by 0x422E5F: raft_command_initiate (raft.c:2061)
> by 0x428651: raft_handle_execute_command_request__ (raft.c:4161)
> by 0x428651: raft_handle_execute_command_request (raft.c:4177)
> by 0x428651: raft_handle_rpc (raft.c:4230)
> by 0x428651: raft_conn_run (raft.c:1445)
> by 0x428DEA: raft_run (raft.c:1803)
> by 0x407392: main_loop (ovsdb-server.c:226)
> by 0x407392: main (ovsdb-server.c:469)
> 
> Fixes: 1b1d2e6daa56 ("ovsdb: Introduce experimental support for clustered 
> databases.")
> Signed-off-by: Ilya Maximets 

Looks good to me, Coverity also detects this leak.
Acked-by: William Tu 

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] tunneling: RFC: Handle fully specified VxLAN tunnel port

2020-05-04 Thread William Tu
On Fri, May 01, 2020 at 02:40:41PM -0400, Vasu Dasari wrote:
> Thanks William for your comments.
> 
> 
> *Vasu Dasari*
> 
> 
> On Fri, May 1, 2020 at 11:52 AM William Tu  wrote:
> 
> > On Thu, Apr 30, 2020 at 08:42:04AM -0400, Vasu Dasari wrote:
> > > Hi,
> > >
> > > I am trying to implement a functionality, where in if user specifies port
> > > through which a VxLAN encapsulated packet can be sent out, then use that
> > > port rather than going through routing procedure.
> > >
> > > ovs-vsctl add-port br0 at_vxlan_fp1 -- \
> > > set int at_vxlan_fp1 type=vxlan \
> > > options:remote_ip=172.32.2.1 options:local_ip=172.32.2.100 \
> > > options:dst_mac=00:00:00:00:01:02
> > options:src_mac=00:00:00:00:01:01
> > > \
> > > options:out_port=1
> > >
> > Why do you need to add dst_mac and src_mac?
> > Usually in the OVS kernel datapath case, OVS will consult the Linux
> > kernel's
> > arp table and get the src/dst mac address.
> > What if the manually set dst_mac here is different than the entry in
> > kernel?
> >
> >
> The goal is not to use Linux kernel's capabilities to determine which path
> the tunnel to take.
> 
> Imagine a set of switches in a fabric controlled by a controller and it is
> used primarily for L2 services and some L3 services. If I were to use Linux
> kernel to dictate which path to take, it would lead to using one bond or a
> physical interface all the time for a particular destination switch. By
> doing so, there is no fabric diversity. Hence, as controller has view of
> the network, it can dictate which path/topology to take for a particular L2
> or L3 service. By using virtualized the fabric, I do not have to worry
> about about Mac pollution and loops (I can detail this separately if
> needed). And hence wanted to use VxLAN.
> 
I see your use case, thanks!
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] tunneling: RFC: Handle fully specified VxLAN tunnel port

2020-05-04 Thread William Tu
On Fri, May 01, 2020 at 02:58:11PM -0400, Vasu Dasari wrote:
> William, My comments are inline. Thanks.
> 
> On Fri, May 1, 2020 at 12:01 PM William Tu  wrote:
> 
> > On Thu, Apr 30, 2020 at 08:52:38AM -0400, Vasu Dasari wrote:
> > > This email is with the technical difficulty I am having with supporting
> > > above feature.
> > >
> > > I have implemented all infrastructure necessary to support the
> > > CLI, netdev-vport, netdev-native-tnl, etc, and currently debugging my way
> > > through this.
> > >
> > > I am stuck in ofproto-dpif-xlate::native_tunnel_output(). What I see is
> > > that, although this function has all the parameters needed to create
> > > encapsulation header and know which odp_port to send it out of, it would
> > > still rely on "NORMAL" flow to send out the packet. And "NORMAL" flow
> > > relies on Mac learning table to figure out whether to flood or send it
> > out
> > > of a learned port.
> > >
> > > In this new case I am trying out, encap-dst-mac is not programmed in
> > > Mac-learning table(as the encap-dst-mac and out_port are explicitly
> > > specified and can be retrieved from netdev directly. And hence,
> > > xlate_normal() would flood the packet out of all ports and would never
> > > resolve dst-mac address as IP infrastructure on local machine is not
> > > configured for the source-ip address.
> > >
> > > My question is:
> > > 1. How can I accomplish sending out encapsulated frame without going
> > > through "NORMAL" processing?
> >
> > I don't think you need NORMAL flow.
> > You can always add OpenFlow rules to redirect packets to your tunnel port.
> >
> 
> I am using a flow of this sort to direct access side traffic to a vxlan
> tunnel port.
> 
> ovs-ofctl add-flow br0 priority=1,in_port=ovs-ap0,actions=at_vxlan_fp1
> 
> But, at_vxlan_fp1 is a virtual-port riding on a physical port "out_port".
> There is a lot of logic in native_tunnel_output() which I do not claim to
> understand completely but understanding that part with experimentation. For
> the packets to move from at_vxlan_fp to "out_port" leave the switch from
> "out_port" I think we need "NORMAL" flow.
> 
> To prove out the idea, I have added a new field called "out_ofp_port" to
> xlate_ctx data structure in ofproto-dpif-xlate.c, and this field would be
> used to carry the out_ofp_port as passed by the caller, regardless of what
> Mac-learning table says. And this solved the transmit problem.

I see. For xmit, the reason might be around
3652 err = tnl_neigh_lookup(out_dev->xbridge->name, &d_ip6, &dmac);
3653 if (err) {
3654 xlate_report(ctx, OFT_DETAIL,
3655  "neighbor cache miss for %s on bridge %s, "
3656  "sending %s request",
3657  buf_dip6, out_dev->xbridge->name, d_ip ? "ARP" : 
"ND");

Where neighbor lookup fails to find the remote_ip's MAC address.

> 
> Now I am have problem with receiving, that is, being able to classify the
> packet arriving on a physical port and map it to vxlan virtual port. Looks
> like this determination is done in classifier_rules. Currently trying to
> understand that logic. Any pointers to help me get there, it would be great.
> 

Can you aslo manually insert a flow for rx?
ex: in_port=, action=vxlan vport
William

> Thanks,
> -Vasu
> 
> > > 2. Any suggestions on how can I go about getting this done?
> > >
> > > Thanks
> > > -Vasu
> > >
> > > *Vasu Dasari*
> > >
> > >
> > > On Thu, Apr 30, 2020 at 8:42 AM Vasu Dasari  wrote:
> > >
> > > > Hi,
> > > >
> > > > I am trying to implement a functionality, where in if user specifies
> > port
> > > > through which a VxLAN encapsulated packet can be sent out, then use
> > that
> > > > port rather than going through routing procedure.
> > > >
> > > > ovs-vsctl add-port br0 at_vxlan_fp1 -- \
> > > > set int at_vxlan_fp1 type=vxlan \
> > > > options:remote_ip=172.32.2.1 options:local_ip=172.32.2.100 \
> > > > options:dst_mac=00:00:00:00:01:02
> > > > options:src_mac=00:00:00:00:01:01 \
> > > > options:out_port=1
> > > >
> > > > This would create a fully specified tunnel port, it includes all L2
> > and L3
> > > > parameters needed to create encapsulated frame. This 

[ovs-dev] [PATCH] netdev-afxdp: Fix missing init.

2020-05-04 Thread William Tu
When introducing the interrupt mode for netdev-afxdp, the netdev
init function is accidentally removed.  Fix it by adding it back.

Fixes: 5bfc519fee499 ("netdev-afxdp: Add interrupt mode netdev class.")
Signed-off-by: William Tu 
---
 lib/netdev-linux.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 40d0cc1105ea..b52071e92ec7 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -3588,6 +3588,7 @@ const struct netdev_class netdev_internal_class = {
 
 #ifdef HAVE_AF_XDP
 #define NETDEV_AFXDP_CLASS_COMMON   \
+.init = netdev_afxdp_init,  \
 .construct = netdev_afxdp_construct,\
 .destruct = netdev_afxdp_destruct,  \
 .get_stats = netdev_afxdp_get_stats,\
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 0/4] XDP offload using flow API provider

2020-05-04 Thread William Tu
ux.c
@@ -3588,6 +3588,7 @@ const struct netdev_class netdev_internal_class = {
 
 #ifdef HAVE_AF_XDP
 #define NETDEV_AFXDP_CLASS_COMMON   \
+.init = netdev_afxdp_init,  \
 .construct = netdev_afxdp_construct,\
 .destruct = netdev_afxdp_destruct,  \
 .get_stats = netdev_afxdp_get_stats,\


Other part works fine.
I'm planning to play with more rules and performance.

William

> You should be able to see some maps installed, including "debug_stats".
> $ bpftool map
> 
> If packets are successfully redirected by the XDP program,
> debug_stats[2] will be counted.
> $ bpftool map dump id 
> 
> Currently only very limited keys and output actions are supported.
> For example NORMAL action entry and IP based matching work with current
> key support.
> 
> 
> * Performance
> 
> Tested 2 cases. 1) i40e to veth, 2) i40e to i40e.
> Test 1 Measured drop rate at veth interface with redirect action from
> physical interface (i40e 25G NIC, XXV 710) to veth. The CPU is Xeon
> Silver 4114 (2.20 GHz).
>XDP_DROP
> +--+  +---++---+
>  pktgen -- wire --> | eth0 | -- NORMAL ACTION --> | veth0 || veth2 |
> +--+  +---++---+
> 
> Test 2 uses i40e instead of veth, and measured tx packet rate at output
> device.
> 
> Single-flow performance test results:
> 
> 1) i40e-veth
> 
>   a) no-zerocopy in i40e
> 
> - xdp   3.7 Mpps
> - afxdp 820 kpps
> 
>   b) zerocopy in i40e (veth does not have zc)
> 
> - xdp   1.8 Mpps
> - afxdp 800 Kpps
> 
> 2) i40e-i40e
> 
>   a) no-zerocopy
> 
> - xdp   3.0 Mpps
> - afxdp 1.1 Mpps
> 
>   b) zerocopy
> 
> - xdp   1.7 Mpps
> - afxdp 4.0 Mpps
> 
> ** xdp is better when zc is disabled. The reason of poor performance on zc
>is that xdp_frame requires packet memory allocation and memcpy on
>XDP_REDIRECT to other devices iff zc is enabled.
> 
> ** afxdp with zc is better than xdp without zc, but afxdp is using 2 cores
>in this case, one is pmd and the other is softirq. When pmd and softirq
>were running on the same core, the performance was extremely poor as
>pmd consumes cpus.
>When offloading to xdp, xdp only uses softirq while pmd is still
>consuming 100% cpu.  This means we need probably only one pmd for xdp
>even when we want to use more cores for multi-flow.
>I'll also test afxdp-nonpmd when it is applied.
> 
> 
> This patch set is based on top of commit 82b7e6d19 ("compat: Fix broken
> partial backport of extack op parameter").
> 
> [1] https://lwn.net/Articles/802653/
> 
> v2:
> - Add uninit callback of netdev-offload-xdp.
> - Introduce "offload-driver" other_config to specify offload driver.
> - Add --enable-bpf (HAVE_BPF) config option to build bpf programs.
> - Workaround incorrect UINTPTR_MAX in x64 clang bpf build.
> - Fix boot.sh autoconf warning.
> 
> TODO:
> - CI fails due to missing function "bpf_program__get_type" which is not
>   provided by libbpf from linux 5.3. Although we can use linux >= 5.5 to
>   fix it, maybe it's time to switch to using libbpf standalone repository?
> - Fix a crash bug in patch 1 which has been reported by Eelco Chaudron.
> - Add test for XDP offload driver.
> - Add documentation.
> - Implement more actions like vlan push/pop.
> 
> Toshiaki Makita (3):
>   netdev-offload: Add "offload-driver" other_config to specify offload
> driver
>   netdev-offload: Add xdp flow api provider
>   bpf: Add reference XDP program implementation for netdev-offload-xdp
> 
> William Tu (1):
>   netdev-afxdp: Enable loading XDP program.
> 
>  Documentation/intro/install/afxdp.rst |   59 ++
>  Makefile.am   |9 +-
>  NEWS  |2 +
>  acinclude.m4  |   56 ++
>  bpf/.gitignore|4 +
>  bpf/Makefile.am   |   59 ++
>  bpf/bpf_miniflow.h|  179 
>  bpf/bpf_netlink.h |   34 +
>  bpf/bpf_workaround.h  |   28 +
>  bpf/flowtable_afxdp.c |  515 +++
>  configure.ac  |2 +
>  lib/automake.mk   |6 +-
>  lib/bpf-util.c|   38 +
>  lib/bpf-util.h|   22 +
>  lib/netdev-afxdp.c|  342 +++-
>  lib/netdev-

Re: [ovs-dev] [RFC v2 PATCH 4/4] bpf: Add reference XDP program implementation for netdev-offload-xdp

2020-05-03 Thread William Tu
On Tue, Apr 21, 2020 at 11:47:04PM +0900, Toshiaki Makita wrote:
> This adds a reference program, flowtable_afxdp.o, which can be used to
> offload flows to XDP through netdev-offload-xdp.
> The program will be compiled when using --enable-bpf switch.
> 
> Signed-off-by: Toshiaki Makita 

Hi Toshiaki,

Thanks for your patch, I haven't tried to run it but I have
questions about miniflow.

> ---
>  Makefile.am   |   9 +-
>  acinclude.m4  |  56 +
>  bpf/.gitignore|   4 +
>  bpf/Makefile.am   |  59 +
>  bpf/bpf_miniflow.h| 179 +++
>  bpf/bpf_netlink.h |  34 +++
>  bpf/bpf_workaround.h  |  28 +++
>  bpf/flowtable_afxdp.c | 515 ++
>  configure.ac  |   2 +
>  9 files changed, 884 insertions(+), 2 deletions(-)
>  create mode 100644 bpf/.gitignore
>  create mode 100644 bpf/Makefile.am
>  create mode 100644 bpf/bpf_miniflow.h
>  create mode 100644 bpf/bpf_netlink.h
>  create mode 100644 bpf/bpf_workaround.h
>  create mode 100644 bpf/flowtable_afxdp.c
> 
> diff --git a/Makefile.am b/Makefile.am
> index b279303d1..f18bfefde 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -8,6 +8,9 @@
>  AUTOMAKE_OPTIONS = foreign subdir-objects
>  ACLOCAL_AMFLAGS = -I m4
>  SUBDIRS = datapath
> +if HAVE_BPF
> +SUBDIRS += bpf
> +endif
>  
>  AM_CPPFLAGS = $(SSL_CFLAGS)
>  AM_LDFLAGS = $(SSL_LDFLAGS)
> @@ -198,7 +201,9 @@ ALL_LOCAL += dist-hook-git
>  dist-hook-git: distfiles
>   @if test -e $(srcdir)/.git && (git --version) >/dev/null 2>&1; then \
> (cd datapath && $(MAKE) distfiles); \
> -   (cat distfiles; sed 's|^|datapath/|' datapath/distfiles) | \
> +   (cd bpf && $(MAKE) distfiles); \
> +   (cat distfiles; sed 's|^|datapath/|' datapath/distfiles; \
> +sed 's|^|bpf/|' bpf/distfiles) | \
>   LC_ALL=C sort -u > all-distfiles; \
> (cd $(srcdir) && git ls-files) | grep -v '\.gitignore$$' | \
>   grep -v '\.gitattributes$$' | \
> @@ -234,7 +239,7 @@ config-h-check:
>   @cd $(srcdir); \
>   if test -e .git && (git --version) >/dev/null 2>&1 && \
> git --no-pager grep -L '#include ' `git ls-files | grep 
> '\.c$$' | \
> - grep -vE 
> '^datapath|^lib/sflow|^third-party|^datapath-windows|^python'`; \
> + grep -vE 
> '^datapath|^lib/sflow|^third-party|^datapath-windows|^python|^bpf'`; \
>   then \
> echo "See above for list of violations of the rule that"; \
> echo "every C source file must #include ."; \
> diff --git a/acinclude.m4 b/acinclude.m4
> index 0901f2870..2fb2f385f 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -301,6 +301,62 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
>AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
>  ])
>  
> +dnl OVS_CHECK_LINUX_BPF
> +dnl
> +dnl Check both llvm and libbpf support
> +AC_DEFUN([OVS_CHECK_LINUX_BPF], [
> +  AC_ARG_ENABLE([bpf],
> +[AC_HELP_STRING([--enable-bpf],
> +[Compile reference eBPF programs for XDP])],
> +[], [enable_bpf=no])
> +  AC_MSG_CHECKING([whether BPF is enabled])
> +  if test "$enable_bpf" != yes; then
> +AC_MSG_RESULT([no])
> +BPF_ENABLE=false
> +  else
> +AC_MSG_RESULT([yes])
> +BPF_ENABLE=true
> +
> +AC_CHECK_PROG(CLANG_CHECK, clang, yes)
> +AS_IF([test X"$CLANG_CHECK" != X"yes"],
> +  [AC_MSG_ERROR([unable to find clang to compile BPF program])])
> +
> +AC_CHECK_PROG(LLC_CHECK, llc, yes)
> +AS_IF([test X"$LLC_CHECK" != X"yes"],
> +  [AC_MSG_ERROR([unable to find llc to compile BPF program])])
> +
> +AC_CHECK_HEADER([bpf/bpf_helpers.h], [],
> +  [AC_MSG_ERROR([unable to find bpf/bpf_helpers.h to compile BPF 
> program])])
> +
> +AC_CHECK_HEADER([linux/bpf.h], [],
> +  [AC_MSG_ERROR([unable to find linux/bpf.h to compile BPF program])])
> +
> +AC_MSG_CHECKING([for LLVM bpf target support])
> +if llc -march=bpf -mattr=help >/dev/null 2>&1; then
> +  AC_MSG_RESULT([yes])
> +else
> +  AC_MSG_RESULT([no])
> +  AC_MSG_ERROR([LLVM does not support bpf target])
> +fi
> +
> +AC_MSG_CHECKING([for BTF DATASEC support])
> +AC_LANG_CONFTEST(
> +  [AC_LANG_SOURCE([__attribute__((section("_x"), used)) int x;])])
> +if clang -g -O2 -S -target bpf -emit-llvm -c conftest.c -o conftest.ll 
> && \
> +   llc -march=bpf -filetype=obj -o conftest.o conftest.ll && \
> +   readelf -p.BTF conftest.o 2>/dev/null | grep -q -w _x; then
> +  AC_MSG_RESULT([yes])
> +else
> +  AC_MSG_RESULT([no])
> +  AC_MSG_ERROR([LLVM does not support BTF DATASEC])
> +fi
> +
> +AC_DEFINE([HAVE_BPF], [1],
> +  [Define to 1 if BPF compilation is available and enabled.])
> +  fi
> +  AM_CONDITIONAL([HAVE_BPF], test "$BPF_ENABLE" = true)
> +])
> +
>  dnl OVS_CHECK_DPDK
>  dnl
>  dnl Configure DPDK source tree
> diff --git a/bpf/.gitignore b/bpf/.gitignore
> new file mode 100644
> index 0

[ovs-dev] [PATCH 2/3] ovsdb-idlc: Fix memory leak reported by Coverity.

2020-05-02 Thread William Tu
An exmplae pattern shown below:
void
ovsrec_ct_zone_index_set_external_ids(const struct ovsrec_ct_zone...
{
//  1. alloc_fn: Storage is returned from allocation function xmalloc.
//  2. var_assign: Assigning: datum = storage returned from xmalloc(24UL).
struct ovsdb_datum *datum = xmalloc(sizeof(struct ovsdb_datum));

//  3. Condition external_ids, taking false branch.
if (external_ids) {
...
} else {
//  4. noescape: Resource datum is not freed or pointed-to in 
ovsdb_datum_init_empty.
ovsdb_datum_init_empty(datum);
}
//  5. noescape: Resource datum is not freed or pointed-to in 
ovsdb_idl_index_write.
ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *, &row->header_),
  
&ovsrec_ct_zone_columns[OVSREC_CT_ZONE_COL_EXTERNAL_IDS],
  datum,
  &ovsrec_table_classes[OVSREC_TABLE_CT_ZONE]);

// CID 1420856 (#1 of 1): Resource leak (RESOURCE_LEAK)
// 6. leaked_storage: Variable datum going out of scope leaks the storage it
  points to.
Fix it by freeing the datum.

Signed-off-by: William Tu 
---
 ovsdb/ovsdb-idlc.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in
index 1d385e15c1e5..698fe25f3095 100755
--- a/ovsdb/ovsdb-idlc.in
+++ b/ovsdb/ovsdb-idlc.in
@@ -1306,6 +1306,7 @@ struct %(s)s *
   &%(s)s_columns[%(S)s_COL_%(C)s],
   datum,
   &%(p)stable_classes[%(P)sTABLE_%(T)s]);
+free(datum);
 }
 """ % {'t': tableName,
'p': prefix,
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 1/3] ovsdb-idlc: Fix memory leak reported by Coverity.

2020-05-02 Thread William Tu
Coverity shows the following memory leak in this code pattern:

void
ovsrec_ipfix_index_set_obs_domain_id(
{
struct ovsdb_datum datum;
//  1. alloc_fn: Storage is returned from allocation function xmalloc.
//  2. var_assign: Assigning: key = storage returned from xmalloc(16UL).
union ovsdb_atom *key = xmalloc(sizeof(union ovsdb_atom));

//  3. Condition n_obs_domain_id, taking false branch.
if (n_obs_domain_id) {
datum.n = 1;
datum.keys = key;
key->integer = *obs_domain_id;
} else {
datum.n = 0;
datum.keys = NULL;
}
datum.values = NULL;
ovsdb_idl_index_write(CONST_CAST(struct ovsdb_idl_row *,
//  CID 1420891 (#1 of 1): Resource leak (RESOURCE_LEAK)

Fix it by moving the xmalloc to the true branch.

Signed-off-by: William Tu 
---
 ovsdb/ovsdb-idlc.in | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in
index c285ee4b3c10..1d385e15c1e5 100755
--- a/ovsdb/ovsdb-idlc.in
+++ b/ovsdb/ovsdb-idlc.in
@@ -1351,9 +1351,10 @@ struct %(s)s *
 print("datum.values = NULL;")
 txn_write_func = "ovsdb_idl_index_write"
 elif type.is_optional_pointer():
-print("union ovsdb_atom *key = xmalloc(sizeof (union 
ovsdb_atom));")
+print("union ovsdb_atom *key;")
 print()
 print("if (%s) {" % keyVar)
+print("key = xmalloc(sizeof (union ovsdb_atom));")
 print("datum.n = 1;")
 print("datum.keys = key;")
 print("" + 
type.key.assign_c_value_casting_away_const("key->%s" % 
type.key.type.to_string(), keyVar))
@@ -1364,9 +1365,10 @@ struct %(s)s *
 print("datum.values = NULL;")
 txn_write_func = "ovsdb_idl_index_write"
 elif type.n_max == 1:
-print("union ovsdb_atom *key = xmalloc(sizeof(union 
ovsdb_atom));")
+print("union ovsdb_atom *key;")
 print()
 print("if (%s) {" % nVar)
+print("key = xmalloc(sizeof(union ovsdb_atom));")
 print("datum.n = 1;")
 print("datum.keys = key;")
 print("" + 
type.key.assign_c_value_casting_away_const("key->%s" % 
type.key.type.to_string(), "*" + keyVar))
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 3/3] ovsdb-idl: Fix NULL deref reported by Coverity.

2020-05-02 Thread William Tu
When 'datum.values' or 'datum.keys' is NULL, some code path calling
into ovsdb_idl_txn_write__ triggers NULL deref.  An example is below:

ovsrec_open_vswitch_set_cur_cfg(const struct ovsrec_open_vswitch
{
struct ovsdb_datum datum;
union ovsdb_atom key;

datum.n = 1;
datum.keys = &key;

key.integer = cur_cfg;
//  1. assign_zero: Assigning: datum.values = NULL.
datum.values = NULL;
//  CID 1421356 (#1 of 1): Explicit null dereferenced (FORWARD_NULL)
//  2. var_deref_model: Passing &datum to ovsdb_idl_txn_write_clone,\
// which dereferences null datum.values.
ovsdb_idl_txn_write_clone(&row->header_, &ovsrec_open_vswitch_col
}

And with the following calls:
ovsdb_idl_txn_write_clone
  ovsdb_idl_txn_write__
6. deref_parm_in_call: Function ovsdb_datum_destroy dereferences
   datum->values
ovsdb_datum_destroy

And another possible NULL deref is at ovsdb_datum_equals(). Fix the
two by adding additional checks.

Signed-off-by: William Tu 
---
 lib/ovsdb-data.c | 8 ++--
 lib/ovsdb-idl.c  | 3 ++-
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index 4828624f658d..9ce3cdeca28a 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -1033,8 +1033,12 @@ free_data(enum ovsdb_atomic_type type,
 void
 ovsdb_datum_destroy(struct ovsdb_datum *datum, const struct ovsdb_type *type)
 {
-free_data(type->key.type, datum->keys, datum->n);
-free_data(type->value.type, datum->values, datum->n);
+if (datum->keys) {
+free_data(type->key.type, datum->keys, datum->n);
+}
+if (datum->values) {
+free_data(type->value.type, datum->values, datum->n);
+}
 }
 
 /* Swaps the contents of 'a' and 'b', which need not have the same type. */
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 1535ad7b5197..6614ea1617ef 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -4449,7 +4449,8 @@ ovsdb_idl_txn_write__(const struct ovsdb_idl_row *row_,
  * transaction only does writes of existing values, without making any real
  * changes, we will drop the whole transaction later in
  * ovsdb_idl_txn_commit().) */
-if (write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
+if (datum->keys && datum->values &&
+write_only && ovsdb_datum_equals(ovsdb_idl_read(row, column),
  datum, &column->type)) {
 goto discard_datum;
 }
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH 0/3] ovsdb-idl: Fix memory leak and NULL deref

2020-05-02 Thread William Tu
The three patches fix around 350 memory-related defects reported
by Coverity.  With this series applied, Coverity still shows
around 130 defetcs.

Travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/682347632

William Tu (3):
  ovsdb-idlc: Fix memory leak reported by Coverity.
  ovsdb-idlc: Fix memory leak reported by Coverity.
  ovsdb-idl: Fix NULL deref reported by Coverity.

 lib/ovsdb-data.c| 8 ++--
 lib/ovsdb-idl.c | 3 ++-
 ovsdb/ovsdb-idlc.in | 7 +--
 3 files changed, 13 insertions(+), 5 deletions(-)

-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] tunneling: RFC: Handle fully specified VxLAN tunnel port

2020-05-01 Thread William Tu
On Thu, Apr 30, 2020 at 08:52:38AM -0400, Vasu Dasari wrote:
> This email is with the technical difficulty I am having with supporting
> above feature.
> 
> I have implemented all infrastructure necessary to support the
> CLI, netdev-vport, netdev-native-tnl, etc, and currently debugging my way
> through this.
> 
> I am stuck in ofproto-dpif-xlate::native_tunnel_output(). What I see is
> that, although this function has all the parameters needed to create
> encapsulation header and know which odp_port to send it out of, it would
> still rely on "NORMAL" flow to send out the packet. And "NORMAL" flow
> relies on Mac learning table to figure out whether to flood or send it out
> of a learned port.
> 
> In this new case I am trying out, encap-dst-mac is not programmed in
> Mac-learning table(as the encap-dst-mac and out_port are explicitly
> specified and can be retrieved from netdev directly. And hence,
> xlate_normal() would flood the packet out of all ports and would never
> resolve dst-mac address as IP infrastructure on local machine is not
> configured for the source-ip address.
>
> My question is:
> 1. How can I accomplish sending out encapsulated frame without going
> through "NORMAL" processing?

I don't think you need NORMAL flow.
You can always add OpenFlow rules to redirect packets to your tunnel port.

> 2. Any suggestions on how can I go about getting this done?
> 
> Thanks
> -Vasu
> 
> *Vasu Dasari*
> 
> 
> On Thu, Apr 30, 2020 at 8:42 AM Vasu Dasari  wrote:
> 
> > Hi,
> >
> > I am trying to implement a functionality, where in if user specifies port
> > through which a VxLAN encapsulated packet can be sent out, then use that
> > port rather than going through routing procedure.
> >
> > ovs-vsctl add-port br0 at_vxlan_fp1 -- \
> > set int at_vxlan_fp1 type=vxlan \
> > options:remote_ip=172.32.2.1 options:local_ip=172.32.2.100 \
> > options:dst_mac=00:00:00:00:01:02
> > options:src_mac=00:00:00:00:01:01 \
> > options:out_port=1
> >
> > This would create a fully specified tunnel port, it includes all L2 and L3
> > parameters needed to create encapsulated frame. This kind of syntax would
> > mimic what is supported by off the shelf hardware like Broadcom. I also
> > noticed that pica8's Openflow switch supports this kind of syntax as well 
> > (Configuring
> > VXLAN )
> >
> > And the user could create flows of this sort to transport user packets
> > with VxLAN payload:
> >
> > ovs-ofctl add-flow br0 priority=1,in_port=ovs-ap0,actions=at_vxlan_fp1
> > ovs-ofctl add-flow br0 priority=1,in_port=at_vxlan_fp1,actions=ovs-ap0
> >
> >
> > I have initiated a discussion for this kind of request in June, 2019 at, 
> > ovs-discuss
> > thread
> > .
> > And would like to use this thread for design and any other comments.
> >
> >
> > Please let me know what you think.
> >
> >
> > Thanks
> >
> > -Vasu
> >
> >
> > *Vasu Dasari*
> >
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] tunneling: RFC: Handle fully specified VxLAN tunnel port

2020-05-01 Thread William Tu
On Thu, Apr 30, 2020 at 08:42:04AM -0400, Vasu Dasari wrote:
> Hi,
> 
> I am trying to implement a functionality, where in if user specifies port
> through which a VxLAN encapsulated packet can be sent out, then use that
> port rather than going through routing procedure.
> 
> ovs-vsctl add-port br0 at_vxlan_fp1 -- \
> set int at_vxlan_fp1 type=vxlan \
> options:remote_ip=172.32.2.1 options:local_ip=172.32.2.100 \
> options:dst_mac=00:00:00:00:01:02 options:src_mac=00:00:00:00:01:01
> \
> options:out_port=1
> 
Why do you need to add dst_mac and src_mac?
Usually in the OVS kernel datapath case, OVS will consult the Linux kernel's
arp table and get the src/dst mac address.
What if the manually set dst_mac here is different than the entry in kernel?

> This would create a fully specified tunnel port, it includes all L2 and L3
> parameters needed to create encapsulated frame. This kind of syntax would
> mimic what is supported by off the shelf hardware like Broadcom. I also
> noticed that pica8's Openflow switch supports this kind of syntax as
> well (Configuring
> VXLAN )
> 
> And the user could create flows of this sort to transport user packets with
> VxLAN payload:
> 
> ovs-ofctl add-flow br0 priority=1,in_port=ovs-ap0,actions=at_vxlan_fp1
> ovs-ofctl add-flow br0 priority=1,in_port=at_vxlan_fp1,actions=ovs-ap0
> 
> 
> I have initiated a discussion for this kind of request in June, 2019
> at, ovs-discuss
> thread
> .
> And would like to use this thread for design and any other comments.
> 
> 
> Please let me know what you think.
> 
> 
> Thanks
> 
> -Vasu
> 
> 
> *Vasu Dasari*
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] docs: Document check_pkt_len action.

2020-05-01 Thread William Tu
Cc: Numan Siddique 
Signed-off-by: William Tu 
---
 Documentation/faq/releases.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index dbc1706dec57..3903e5922489 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -140,6 +140,7 @@ Q: Are all features available with all datapaths?
 NIC Bonding YES1.0  1.0  YES
 Multiple VTEPs  YES1.10 1.10 YES
 Meter action4.15   2.10 2.7  NO
+check_pkt_len action5.22.12 2.12 NO
 == == == = ===
 
 Do note, however:
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv5] userspace: Add conntrack timeout policy support.

2020-05-01 Thread William Tu
On Thu, Apr 30, 2020 at 03:29:45PM -0700, Yi-Hung Wei wrote:
> On Wed, Apr 29, 2020 at 12:25 PM William Tu  wrote:
> >
> > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
> > policy support") adds conntrack timeout policy for kernel datapath.
> > This patch enables support for the userspace datapath.  I tested
> > using the 'make check-system-userspace' which checks the timeout
> > policies for ICMP and UDP cases.
> >
> > Signed-off-by: William Tu 
> > ---
> > v5: address feedback from Yi-Hung
> >   - couple improvement for error handling
> >   - travis: 
> > https://travis-ci.org/github/williamtu/ovs-travis/builds/681160951
> >   - currently failed due to kernel issue
> >
> > v4: address feedback from Yi-Hung
> >   - move default policy value to lib/conntrack-tp.c
> >   - separate icmp bug patch
> >   - refactor and fix include issues
> >   - fix the clang lock analysis annotation
> >   - keep clean interval to 5 seconds
> >   - improve tests in system-traffic.at
> >   - travis: 
> > https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645
> >
> > v3:
> >   - address feedback from Yi-Hung
> >   - use ID 0 as default policy
> >   - move conn_{init,update}_expiration to lib/conntrack-tp.c
> >   - s/tpid/tp_id/
> >   - add default timeout value to CT_DPIF_TP_*_ATTRs
> >   - reduce the CT_CLEAN_INTERVAL from 5 to 3s, to make the tests
> > run faster.
> >   - add more tests to system-traffic.at
> >   - code refactoring and renaming
> > ---
> 
> Thanks for this new version. It looks good to me.
> 
> Acked-by: Yi-Hung Wei 

Applied to master, thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] compat: Fix ipv6_dst_lookup build error

2020-04-30 Thread William Tu
On Thu, Apr 30, 2020 at 10:36:20AM -0700, Gregory Rose wrote:
> 
> 
> On 4/29/2020 5:53 PM, William Tu wrote:
> >On Wed, Apr 29, 2020 at 2:41 PM Yi-Hung Wei  wrote:
> >>
> >>The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is
> >>recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree.
> >>
> >>This causes travis build failure:
> >> * https://travis-ci.org/github/openvswitch/ovs/builds/681084038
> >>
> >>This patch updates the backport logic to invoke the right function.
> >>
> >>Related patch in
> >> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
> >>
> >>b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of
> >>ip6_dst_lookup")
> >>
> >>Signed-off-by: Yi-Hung Wei 
> >>---
> >Looks good to me, thanks for fixing the issue.
> >CC Greg to see if he has more comments.
> 
> A backport all the way to 3.16.  That's not often.
> 
> LGTM.  Thanks Yi-Hung!

Applied to master, thanks.
William
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [RFC v2 PATCH 2/4] netdev-offload: Add "offload-driver" other_config to specify offload driver

2020-04-30 Thread William Tu
On Tue, Apr 21, 2020 at 11:47:02PM +0900, Toshiaki Makita wrote:
> The following commit will introduce another offload driver using XDP.
> When using afxdp netdev, either of TC or XDP will be supported, so let's
> add an other_config to specify which offload driver is preferable.
> When not specified, TC will be used if netdev supports it.
> 
> Signed-off-by: Toshiaki Makita 
> ---
>  lib/netdev-offload.c | 37 +
>  1 file changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index 32eab5910..ba9ebc3cd 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(netdev_offload);
>  
>  static bool netdev_flow_api_enabled = false;
>  
> +#define FLOW_API_DRIVER_DEFAULT "linux_tc"
> +static const char *netdev_flow_api_driver = NULL;
> +
>  /* Protects 'netdev_flow_apis'.  */
>  static struct ovs_mutex netdev_flow_api_provider_mutex = 
> OVS_MUTEX_INITIALIZER;
>  
> @@ -171,18 +174,30 @@ netdev_flow_api_equals(const struct netdev *netdev1,
>  static int
>  netdev_assign_flow_api(struct netdev *netdev)
>  {
> -struct netdev_registered_flow_api *rfa;
> +struct netdev_registered_flow_api *rfa, *current_rfa = NULL;
>  
>  CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
> +if (netdev_flow_api_driver &&
> +strcmp(netdev_flow_api_driver, rfa->flow_api->type)) {
> +continue;
> +}
>  if (!rfa->flow_api->init_flow_api(netdev)) {
> -ovs_refcount_ref(&rfa->refcnt);
> -ovsrcu_set(&netdev->flow_api, rfa->flow_api);
> -VLOG_INFO("%s: Assigned flow API '%s'.",
> -  netdev_get_name(netdev), rfa->flow_api->type);
> -return 0;
> +if (!current_rfa ||
> +(!netdev_flow_api_driver &&
> + !strcmp(FLOW_API_DRIVER_DEFAULT, rfa->flow_api->type))) {
> +current_rfa = rfa;
Question:

When using netdev-afxdp, can we still enable tc-offload?
af_xdp hook gets packets before the tc hook, unless we have hw tc-offload,
then it by-passes the linux tc.

William

> +}
> +} else {
> +VLOG_DBG("%s: flow API '%s' is not suitable.",
> + netdev_get_name(netdev), rfa->flow_api->type);
>  }
> -VLOG_DBG("%s: flow API '%s' is not suitable.",
> - netdev_get_name(netdev), rfa->flow_api->type);
> +}
> +if (current_rfa) {
> +ovs_refcount_ref(¤t_rfa->refcnt);
> +ovsrcu_set(&netdev->flow_api, current_rfa->flow_api);
> +VLOG_INFO("%s: Assigned flow API '%s'.",
> +  netdev_get_name(netdev), current_rfa->flow_api->type);
> +return 0;
>  }
>  VLOG_INFO("%s: No suitable flow API found.", netdev_get_name(netdev));
>  
> @@ -647,6 +662,8 @@ netdev_set_flow_api_enabled(const struct smap 
> *ovs_other_config)
>  static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>  
>  if (ovsthread_once_start(&once)) {
> +const char *offload_driver;
> +
>  netdev_flow_api_enabled = true;
>  
>  VLOG_INFO("netdev: Flow API Enabled");
> @@ -660,6 +677,10 @@ netdev_set_flow_api_enabled(const struct smap 
> *ovs_other_config)
>  netdev_offload_rebalance_policy = true;
>  }
>  
> +offload_driver = smap_get_def(ovs_other_config, "offload-driver",
> +  NULL);
> +netdev_flow_api_driver = nullable_xstrdup(offload_driver);
> +
>  netdev_ports_flow_init();
>  
>  ovsthread_once_done(&once);
> -- 
> 2.25.1
> 
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] netdev-linux: remove sum of vport stats and kernel netdev stats

2020-04-30 Thread William Tu
On Thu, Apr 23, 2020 at 05:35:14AM +, 姜立东 via dev wrote:
> From df9ff3b67f11e467928ca0873031d81b87f3d0c5 Mon Sep 17 00:00:00 2001
> From: Jiang Lidong 
> Date: Thu, 23 Apr 2020 11:07:28 +0800
> Subject: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev 
> stats
> When using kernel veth as OVS interface, doubled drop counter
> value is shown when veth drops packets due to traffic overrun.
> 
> In netdev_linux_get_stats, it reads both vport stats and kernel
> netdev stats, in case vport stats retrieve failure. If both of
> them success, error counters are added to include errors from
> different layers. But implementation of ovs_vport_get_stats in
> kernel data path has included kernel netdev stats by calling
> dev_get_stats. When drop or other error counters is not zero,
> its value is doubled by netdev_linux_get_stats.
> 
> In this change, adding kernel netdev stats into vport stats
> is removed, since vport stats includes all information of
> kernel netdev stats.
> 
> Signed-off-by: Jiang Lidong 
> ---
>  lib/netdev-linux.c | 27 +--
>  1 file changed, 1 insertion(+), 26 deletions(-)
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index ff045cb..6d139d0 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -2207,33 +2207,8 @@ netdev_linux_get_stats(const struct netdev *netdev_,
>  } else if (netdev->vport_stats_error) {
>  /* stats not available from OVS then use netdev stats. */
>  *stats = dev_stats;
> -} else {
> -/* Use kernel netdev's packet and byte counts since vport's counters
> - * do not reflect packet counts on the wire when GSO, TSO or GRO are
> - * enabled. */
> -stats->rx_packets = dev_stats.rx_packets;
> -stats->rx_bytes = dev_stats.rx_bytes;
> -stats->tx_packets = dev_stats.tx_packets;
> -stats->tx_bytes = dev_stats.tx_bytes;
> -
> -stats->rx_errors   += dev_stats.rx_errors;
> -stats->tx_errors   += dev_stats.tx_errors;
> -stats->rx_dropped  += dev_stats.rx_dropped;
> -stats->tx_dropped  += dev_stats.tx_dropped;
> -stats->multicast   += dev_stats.multicast;
> -stats->collisions  += dev_stats.collisions;
> -stats->rx_length_errors+= dev_stats.rx_length_errors;
> -stats->rx_over_errors  += dev_stats.rx_over_errors;
> -stats->rx_crc_errors   += dev_stats.rx_crc_errors;
> -stats->rx_frame_errors += dev_stats.rx_frame_errors;
> -stats->rx_fifo_errors  += dev_stats.rx_fifo_errors;
> -stats->rx_missed_errors+= dev_stats.rx_missed_errors;
> -stats->tx_aborted_errors   += dev_stats.tx_aborted_errors;
> -stats->tx_carrier_errors   += dev_stats.tx_carrier_errors;
> -stats->tx_fifo_errors  += dev_stats.tx_fifo_errors;
> -stats->tx_heartbeat_errors += dev_stats.tx_heartbeat_errors;
> -stats->tx_window_errors+= dev_stats.tx_window_errors;
>  }
> +
>  ovs_mutex_unlock(&netdev->mutex);
>  
>  return error;
> -- 
> 1.8.3.1
btw, I always had a hard time applying your patch.
Maybe your git is too old, can you upgrade to a newer version?
Thanks
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 答复: 答复: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev stats

2020-04-30 Thread William Tu
On Wed, Apr 29, 2020 at 10:42:41PM -0700, Pravin Shelar wrote:
> On Wed, Apr 29, 2020 at 11:07 AM William Tu  wrote:
> >
> > On Tue, Apr 28, 2020 at 03:29:10AM +, 姜立东 wrote:
> > > Hi William,
> > >
> > > > -/* Use kernel netdev's packet and byte counts since vport's 
> > > > counters
> > > > - * do not reflect packet counts on the wire when GSO, TSO or 
> > > > GRO are
> > > > - * enabled. */
> > >
> > > Actually I think it should be moved to netdev_stats_from_ovs_vport_stats 
> > > :), that explains what netdev_stats_from_ovs_vport_stats is doing.
> > > In fact, I think better solution is copying all physical abnormal 
> > > counters in netdev_stats_from_ovs_vport_stats ,
> > > and remove this copy from netdev_linux_get_stats.
> > > but netdev_stats_from_ovs_vport_stats is in kernel module that may not be 
> > > upgraded in some application scenarios.
> > > So I moved to remove unnecessary copies, such as RX/TX bytes, packets, 
> > > drops.
> > >
> > > Regards,
> > > Lidong
> >
> > oh, I see your point. Then this looks good to me.
> > Add Pravin to see if he has some comments.
> >
> 
> The patch looks fine to me.
> Thanks,
> Pravin.

Thanks.
I applied to master.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] compat: Fix ipv6_dst_lookup build error

2020-04-29 Thread William Tu
On Wed, Apr 29, 2020 at 2:41 PM Yi-Hung Wei  wrote:
>
> The geneve/vxlan compat code base invokes ipv6_dst_lookup() which is
> recently replaced by ipv6_dst_lookup_flow() in the stable kernel tree.
>
> This causes travis build failure:
> * https://travis-ci.org/github/openvswitch/ovs/builds/681084038
>
> This patch updates the backport logic to invoke the right function.
>
> Related patch in
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
>
> b9f3e457098e ("net: ipv6_stub: use ip6_dst_lookup_flow instead of
>ip6_dst_lookup")
>
> Signed-off-by: Yi-Hung Wei 
> ---
Looks good to me, thanks for fixing the issue.
CC Greg to see if he has more comments.

William

> Travis test: https://travis-ci.org/github/YiHungWei/ovs/builds/681179784
> ---
>  acinclude.m4   |  3 +++
>  datapath/linux/compat/geneve.c | 11 +++
>  datapath/linux/compat/vxlan.c  | 14 --
>  3 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/acinclude.m4 b/acinclude.m4
> index 0e90c333211e..dabbffd01cf7 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -589,7 +589,10 @@ AC_DEFUN([OVS_CHECK_LINUX_COMPAT], [
>
>OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_dst_lookup.*net],
>[OVS_DEFINE([HAVE_IPV6_DST_LOOKUP_NET])])
> +  OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], 
> [ipv6_dst_lookup_flow.*net],
> +  [OVS_DEFINE([HAVE_IPV6_DST_LOOKUP_FLOW_NET])])
>OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_stub])
> +  OVS_GREP_IFELSE([$KSRC/include/net/addrconf.h], [ipv6_dst_lookup_flow])
>
>OVS_GREP_IFELSE([$KSRC/include/linux/err.h], [ERR_CAST])
>OVS_GREP_IFELSE([$KSRC/include/linux/err.h], [IS_ERR_OR_NULL])
> diff --git a/datapath/linux/compat/geneve.c b/datapath/linux/compat/geneve.c
> index 1551a37217ec..7bfc6d8822e5 100644
> --- a/datapath/linux/compat/geneve.c
> +++ b/datapath/linux/compat/geneve.c
> @@ -962,15 +962,18 @@ static struct dst_entry *geneve_get_v6_dst(struct 
> sk_buff *skb,
> return dst;
> }
>
> -#ifdef HAVE_IPV6_DST_LOOKUP_NET
> +#if defined(HAVE_IPV6_DST_LOOKUP_FLOW_NET)
> +   if (ipv6_stub->ipv6_dst_lookup_flow(geneve->net, gs6->sock->sk, &dst,
> +fl6)) {
> +#elif defined(HAVE_IPV6_DST_LOOKUP_FLOW)
> +   if (ipv6_stub->ipv6_dst_lookup_flow(gs6->sock->sk, &dst, fl6)) {
> +#elif defined(HAVE_IPV6_DST_LOOKUP_NET)
> if (ipv6_stub->ipv6_dst_lookup(geneve->net, gs6->sock->sk, &dst, 
> fl6)) {
> -#else
> -#ifdef HAVE_IPV6_STUB
> +#elif defined(HAVE_IPV6_STUB)
> if (ipv6_stub->ipv6_dst_lookup(gs6->sock->sk, &dst, fl6)) {
>  #else
> if (ip6_dst_lookup(gs6->sock->sk, &dst, fl6)) {
>  #endif
> -#endif
> netdev_dbg(dev, "no route to %pI6\n", &fl6->daddr);
> return ERR_PTR(-ENETUNREACH);
> }
> diff --git a/datapath/linux/compat/vxlan.c b/datapath/linux/compat/vxlan.c
> index f8f667e9748b..b334870b768e 100644
> --- a/datapath/linux/compat/vxlan.c
> +++ b/datapath/linux/compat/vxlan.c
> @@ -990,18 +990,20 @@ static struct dst_entry *vxlan6_get_route(struct 
> vxlan_dev *vxlan,
> fl6.fl6_dport = dport;
> fl6.fl6_sport = sport;
>
> -#ifdef HAVE_IPV6_DST_LOOKUP_NET
> -   err = ipv6_stub->ipv6_dst_lookup(vxlan->net,
> -sock6->sock->sk,
> +#if defined(HAVE_IPV6_DST_LOOKUP_FLOW_NET)
> +   err = ipv6_stub->ipv6_dst_lookup_flow(vxlan->net, sock6->sock->sk,
> + &ndst, &fl6);
> +#elif defined(HAVE_IPV6_DST_LOOKUP_FLOW)
> +   err = ipv6_stub->ipv6_dst_lookup_flow(sock6->sock->sk, &ndst, &fl6);
> +#elif defined(HAVE_IPV6_DST_LOOKUP_NET)
> +   err = ipv6_stub->ipv6_dst_lookup(vxlan->net, sock6->sock->sk,
>  &ndst, &fl6);
> -#else
> -#ifdef HAVE_IPV6_STUB
> +#elif defined(HAVE_IPV6_STUB)
> err = ipv6_stub->ipv6_dst_lookup(vxlan->vn6_sock->sock->sk,
>  &ndst, &fl6);
>  #else
> err = ip6_dst_lookup(vxlan->vn6_sock->sock->sk, &ndst, &fl6);
>  #endif
> -#endif
> if (err < 0)
> return ERR_PTR(err);
>
> --
> 2.7.4
>
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv5] userspace: Add conntrack timeout policy support.

2020-04-29 Thread William Tu
Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath.  I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.

Signed-off-by: William Tu 
---
v5: address feedback from Yi-Hung
  - couple improvement for error handling
  - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/681160951
  - currently failed due to kernel issue

v4: address feedback from Yi-Hung
  - move default policy value to lib/conntrack-tp.c
  - separate icmp bug patch
  - refactor and fix include issues
  - fix the clang lock analysis annotation
  - keep clean interval to 5 seconds
  - improve tests in system-traffic.at
  - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645

v3:
  - address feedback from Yi-Hung
  - use ID 0 as default policy
  - move conn_{init,update}_expiration to lib/conntrack-tp.c
  - s/tpid/tp_id/
  - add default timeout value to CT_DPIF_TP_*_ATTRs
  - reduce the CT_CLEAN_INTERVAL from 5 to 3s, to make the tests
run faster.
  - add more tests to system-traffic.at
  - code refactoring and renaming
---
 Documentation/faq/releases.rst   |   2 +-
 NEWS |   2 +
 lib/automake.mk  |   2 +
 lib/conntrack-icmp.c |   6 +-
 lib/conntrack-other.c|   4 +-
 lib/conntrack-private.h  |  70 ++---
 lib/conntrack-tcp.c  |   5 +-
 lib/conntrack-tp.c   | 308 +++
 lib/conntrack-tp.h   |  30 
 lib/conntrack.c  |  37 +++--
 lib/conntrack.h  |   8 +-
 lib/ct-dpif.h|   2 +
 lib/dpif-netdev.c|  75 +-
 ofproto/ofproto-dpif.c   |   3 +-
 tests/system-traffic.at  |  29 +++-
 tests/system-userspace-macros.at |   6 +-
 tests/test-conntrack.c   |   6 +-
 17 files changed, 500 insertions(+), 95 deletions(-)
 create mode 100644 lib/conntrack-tp.c
 create mode 100644 lib/conntrack-tp.h

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index f170ebd3ff22..dbc1706dec57 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
 == == == = ===
 Connection tracking 4.32.5  2.6  YES
 Conntrack Fragment Reass.   4.32.6  2.12 YES
-Conntrack Timeout Policies  5.22.12 NO   NO
+Conntrack Timeout Policies  5.22.12 2.14 NO
 Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
diff --git a/NEWS b/NEWS
index b61a6027234e..3dbd8ec0e244 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,8 @@ Post-v2.13.0
- AF_XDP:
  * New netdev class 'afxdp-nonpmd' for netdev-afxdp to save CPU cycles
by enabling interrupt mode.
+   - Userspace datapath:
+ * Add support for conntrack zone-based timeout policy.
 
 
 v2.13.0 - 14 Feb 2020
diff --git a/lib/automake.mk b/lib/automake.mk
index 95925b57c351..86940ccd2f9e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -53,6 +53,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/conntrack-icmp.c \
lib/conntrack-private.h \
lib/conntrack-tcp.c \
+   lib/conntrack-tp.c \
+   lib/conntrack-tp.h \
lib/conntrack-other.c \
lib/conntrack.c \
lib/conntrack.h \
diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
index 6cbf9656dd93..bf49f9a9fa93 100644
--- a/lib/conntrack-icmp.c
+++ b/lib/conntrack-icmp.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "conntrack-private.h"
+#include "conntrack-tp.h"
 #include "dp-packet.h"
 
 enum OVS_PACKED_ENUM icmp_state {
@@ -79,12 +80,13 @@ icmp6_valid_new(struct dp_packet *pkt)
 
 static struct conn *
 icmp_new_conn(struct conntrack *ct, struct dp_packet *pkt OVS_UNUSED,
-  long long now)
+  long long now, uint32_t tp_id)
 {
 struct conn_icmp *conn = xzalloc(sizeof *conn);
 conn->state = ICMPS_FIRST;
-conn_init_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
+conn->up.tp_id = tp_id;
 
+conn_init_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
 return &conn->up;
 }
 
diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
index de22ef87cc19..d3b46018586c 100644
--- a/lib/conntrack-other.c
+++ b/lib/conntrack-other.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "conntrack-private.h"
+#include "conntrack-tp

Re: [ovs-dev] [PATCHv4 2/2] userspace: Add conntrack timeout policy support.

2020-04-29 Thread William Tu
On Tue, Apr 28, 2020 at 11:11:00AM -0700, Yi-Hung Wei wrote:
> On Mon, Apr 27, 2020 at 8:42 AM William Tu  wrote:
> >
> > Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
> > policy support") adds conntrack timeout policy for kernel datapath.
> > This patch enables support for the userspace datapath.  I tested
> > using the 'make check-system-userspace' which checks the timeout
> > policies for ICMP and UDP cases.
> >
> > Signed-off-by: William Tu 
> > ---
> > v4: address feedback from Yi-Hung
> >   - move default policy value to lib/conntrack-tp.c
> >   - separate icmp bug patch
> >   - refactor and fix include issues
> >   - fix the clang lock analysis annotation
> >   - keep clean interval to 5 seconds
> >   - improve tests in system-traffic.at
> >   - travis: 
> > https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645
> > ---
> 
> Thanks for v4. I only have a few minor comments below.
> 
> Thanks,
> 
> -Yi-Hung
> 
> 
> > +++ b/lib/conntrack-tp.c
> > @@ -0,0 +1,301 @@
> > +/*
> > + * Copyright (c) 2020 VMware, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + * http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include 
> > +
> > +#include "conntrack-private.h"
> > +#include "conntrack-tp.h"
> > +#include "ct-dpif.h"
> > +#include "openvswitch/vlog.h"
> > +
> > +VLOG_DEFINE_THIS_MODULE(conntrack_tp);
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
> > +
> > +static const char *ct_timeout_str[] = {
> > +#define CT_TIMEOUT(NAME) #NAME,
> > +CT_TIMEOUTS
> > +#undef CT_TIMEOUT
> > +};
> > +
> > +/* Default timeout policy in seconds. */
> > +static unsigned int ct_dpif_netdev_tp_def[] = {
> > +[CT_DPIF_TP_ATTR_TCP_SYN_SENT] = 30,
> > +[CT_DPIF_TP_ATTR_TCP_SYN_RECV] = 30,
> > +[CT_DPIF_TP_ATTR_TCP_ESTABLISHED] = 24 * 60 * 60,
> > +[CT_DPIF_TP_ATTR_TCP_FIN_WAIT] = 15 * 60,
> > +[CT_DPIF_TP_ATTR_TCP_TIME_WAIT] = 45,
> > +[CT_DPIF_TP_ATTR_TCP_CLOSE] = 30,
> > +[CT_DPIF_TP_ATTR_UDP_FIRST] = 60,
> > +[CT_DPIF_TP_ATTR_UDP_SINGLE] = 60,
> > +[CT_DPIF_TP_ATTR_UDP_MULTIPLE] = 30,
> > +[CT_DPIF_TP_ATTR_ICMP_FIRST] = 60,
> > +[CT_DPIF_TP_ATTR_ICMP_REPLY] = 30,
> > +};
> > +
> > +static struct timeout_policy *
> > +timeout_policy_lookup(struct conntrack *ct, int32_t tp_id)
> > +OVS_REQUIRES(ct->ct_lock)
> > +{
> > +struct timeout_policy *tp;
> > +uint32_t hash;
> > +
> > +hash = hash_int(tp_id, ct->hash_basis);
> > +HMAP_FOR_EACH_IN_BUCKET (tp, node, hash, &ct->timeout_policies) {
> > +if (tp->policy.id == tp_id) {
> > +return tp;
> > +}
> > +}
> > +return NULL;
> > +}
> > +
> > +struct timeout_policy *
> > +timeout_policy_get(struct conntrack *ct, int32_t tp_id)
> > +{
> > +struct timeout_policy *tp;
> > +
> > +ovs_mutex_lock(&ct->ct_lock);
> > +tp = timeout_policy_lookup(ct, tp_id);
> > +if (!tp) {
> > +ovs_mutex_unlock(&ct->ct_lock);
> > +return NULL;
> > +}
> > +
> > +ovs_mutex_unlock(&ct->ct_lock);
> > +return tp;
> > +}
> > +
> > +static void
> > +update_existing_tp(struct timeout_policy *tp_dst,
> > +   const struct timeout_policy *tp_src)
> > +{
> > +struct ct_dpif_timeout_policy *dst;
> > +const struct ct_dpif_timeout_policy *src;
> > +int i;
> > +
> > +dst = &tp_dst->policy;
> > +src = &tp_src->policy;
> > +
> > +/* Set the value and present bit to dst if present
> > + * bit in src is set.
> > + */
> > +for (i = 0; i < ARRAY_SIZE(dst->attrs); i++) {
> > +if (src->present &am

Re: [ovs-dev] 答复: 答复: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev stats

2020-04-29 Thread William Tu
On Tue, Apr 28, 2020 at 03:29:10AM +, 姜立东 wrote:
> Hi William, 
> 
> > -/* Use kernel netdev's packet and byte counts since vport's 
> > counters
> > - * do not reflect packet counts on the wire when GSO, TSO or GRO 
> > are
> > - * enabled. */
> 
> Actually I think it should be moved to netdev_stats_from_ovs_vport_stats :), 
> that explains what netdev_stats_from_ovs_vport_stats is doing.
> In fact, I think better solution is copying all physical abnormal counters in 
> netdev_stats_from_ovs_vport_stats , 
> and remove this copy from netdev_linux_get_stats. 
> but netdev_stats_from_ovs_vport_stats is in kernel module that may not be 
> upgraded in some application scenarios. 
> So I moved to remove unnecessary copies, such as RX/TX bytes, packets, drops.
> 
> Regards,
> Lidong

oh, I see your point. Then this looks good to me.
Add Pravin to see if he has some comments.

William
> 
> -邮件原件-
> 发件人: William Tu  
> 发送时间: 2020年4月27日 12:20
> 收件人: 姜立东 
> 抄送: d...@openvswitch.org
> 主题: Re: 答复: [ovs-dev] [PATCH] netdev-linux: remove sum of vport stats and 
> kernel netdev stats
> 
> On Mon, Apr 27, 2020 at 02:00:38AM +, 姜立东 wrote:
> > Hi William,
> > 
> > Thanks for your comments. 
> > Agree with you, those counters such as collisions and multicast should 
> > be kept as current implementation, since they are don't provided by vport 
> > stats.
> > 
> > I also suggest to remove rx/tx bytes and packets, rx/tx error and 
> > drops as well, only count physical or hardware stats in, because it is 
> > confusing when checking with netdev_stats_from_ovs_vport_stats, if 
> > netdev_stats_from_ovs_vport_stats has provided them, why they are copied 
> > again.
> > 
> > What do you think about change as below? 
> > 
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
> > ff045cb..b851236 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -2208,18 +2208,6 @@ netdev_linux_get_stats(const struct netdev *netdev_,
> >  /* stats not available from OVS then use netdev stats. */
> >  *stats = dev_stats;
> >  } else {
> > -/* Use kernel netdev's packet and byte counts since vport's 
> > counters
> > - * do not reflect packet counts on the wire when GSO, TSO or GRO 
> > are
> > - * enabled. */
> But the comments above says the reason using the counter below.
> > -stats->rx_packets = dev_stats.rx_packets;
> > -stats->rx_bytes = dev_stats.rx_bytes;
> > -stats->tx_packets = dev_stats.tx_packets;
> > -stats->tx_bytes = dev_stats.tx_bytes;
> > -
> 
> How about we just remove the below 4 stats?
> > -stats->rx_errors   += dev_stats.rx_errors;
> > -stats->tx_errors   += dev_stats.tx_errors;
> > -stats->rx_dropped  += dev_stats.rx_dropped;
> > -stats->tx_dropped  += dev_stats.tx_dropped;
> >  stats->multicast   += dev_stats.multicast;
> >  stats->collisions  += dev_stats.collisions;
> >  stats->rx_length_errors+= dev_stats.rx_length_errors;
> > 
> > BR,
> > Lidong
> > 
> > -邮件原件-
> > 发件人: William Tu 
> > 发送时间: 2020年4月25日 22:31
> > 收件人: 姜立东 
> > 抄送: d...@openvswitch.org
> > 主题: Re: [ovs-dev] [PATCH] netdev-linux: remove sum of vport stats and 
> > kernel netdev stats
> > 
> > On Thu, Apr 23, 2020 at 05:35:14AM +, 姜立东 via dev wrote:
> > > From df9ff3b67f11e467928ca0873031d81b87f3d0c5 Mon Sep 17 00:00:00 
> > > 2001
> > > From: Jiang Lidong 
> > > Date: Thu, 23 Apr 2020 11:07:28 +0800
> > > Subject: [PATCH] netdev-linux: remove sum of vport stats and kernel 
> > > netdev stats
> > > When using kernel veth as OVS interface, doubled drop counter
> > > value is shown when veth drops packets due to traffic overrun.
> > > 
> > > In netdev_linux_get_stats, it reads both vport stats and kernel
> > > netdev stats, in case vport stats retrieve failure. If both of
> > > them success, error counters are added to include errors from
> > > different layers. But implementation of ovs_vport_get_stats in
> > > kernel data path has included kernel netdev stats by calling
> > > dev_get_stats. When drop or other error counters is not zero,
> > > its value is doubled by netdev_linux_get_stats.
> > > 
> > > In this change, a

[ovs-dev] [PATCH] ovs-bugtool: Add ethtool -l for combined channel.

2020-04-29 Thread William Tu
Users of netdev-afxdp has to setup the combined channel
on physical NIC. This helps debugging related issues.
Example output:
  $ ethtool -l enp3s0f0
  Channel parameters for enp3s0f0:
  Pre-set maximums:
  RX:0
  TX:0
  Other: 1
  Combined:  63
  Current hardware settings:
  RX:0
  TX:0
  Other: 1
  Combined:  1

Some previous discussion:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-January/366631.html

Signed-off-by: William Tu 
---
 utilities/bugtool/ovs-bugtool.in | 1 +
 1 file changed, 1 insertion(+)

diff --git a/utilities/bugtool/ovs-bugtool.in b/utilities/bugtool/ovs-bugtool.in
index 47f3c4629f70..1a5170d8c78b 100755
--- a/utilities/bugtool/ovs-bugtool.in
+++ b/utilities/bugtool/ovs-bugtool.in
@@ -628,6 +628,7 @@ exclude those logs from the archive.
 cmd_output(CAP_NETWORK_INFO, [ETHTOOL, '-k', p])
 cmd_output(CAP_NETWORK_INFO, [ETHTOOL, '-i', p])
 cmd_output(CAP_NETWORK_INFO, [ETHTOOL, '-c', p])
+cmd_output(CAP_NETWORK_INFO, [ETHTOOL, '-l', p])
 if int(t) == 1:
 cmd_output(CAP_NETWORK_INFO,
[TC, '-s', '-d', 'class', 'show', 'dev', p])
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] bugtool: Add dump-tlv-map.

2020-04-29 Thread William Tu
This helps debugging the tlv map issues.

Signed-off-by: William Tu 
---
 utilities/bugtool/plugins/network-status/openvswitch.xml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/utilities/bugtool/plugins/network-status/openvswitch.xml 
b/utilities/bugtool/plugins/network-status/openvswitch.xml
index 72aa449302b8..e6fa4fd15fff 100644
--- a/utilities/bugtool/plugins/network-status/openvswitch.xml
+++ b/utilities/bugtool/plugins/network-status/openvswitch.xml
@@ -39,6 +39,7 @@
 /usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
 "dump-ports"
 /usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
 "dump-groups"
 /usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
 "dump-group-stats"
+/usr/share/openvswitch/scripts/ovs-bugtool-ovs-ofctl-loop-over-bridges
 "dump-tlv-map"
 /usr/share/openvswitch/scripts/ovs-bugtool-get-dpdk-nic-numa
 ip -s -s link 
show
 /usr/share/openvswitch/scripts/ovs-bugtool-get-port-stats
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] docs: Fix GTP-U release version.

2020-04-29 Thread William Tu
On Tue, Apr 28, 2020 at 10:59:31PM +0200, Ilya Maximets wrote:
> On 4/27/20 5:45 PM, William Tu wrote:
> > GTP-U support should be at OVS-2.14.
> > 
> > Signed-off-by: William Tu 
> > ---
> 
> Acked-by: Ilya Maximets 
Applied, thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] ofp-actions: Add delete field action

2020-04-29 Thread William Tu
On Sun, Apr 19, 2020 at 07:17:57AM -0700, William Tu wrote:
> On Tue, Apr 14, 2020 at 01:33:28PM -0700, Yi-Hung Wei wrote:
> > This patch adds a new OpenFlow action, delete field, to delete a
> > field in packets.  Currently, only the tun_metadata fields are
> > supported.
> > 
> > One use case to add this action is to support multiple versions
> > of geneve tunnel metadatas to be exchanged among different versions
> > of networks.  For example, we may introduce tun_metadata2 to
> > replace old tun_metadata1, but still want to provide backward
> > compatibility to the older release.  In this case, in the new
> > OpenFlow pipeline, we would like to support the case to receive a
> > packet with tun_metadata1, do some processing.  And if the packet
> > is going to a switch in the newer release, we would like to delete
> > the value in tun_metadata1 and set a value into tun_metadata2.
> > 
> > Currently, ovs does not provide an action to remove a value in
> > tun_metadata if the value is present.  This patch fulfills the gap
> > by adding the delete_field action.  For example, the OpenFlow
> > syntax to delete tun_metadata1 is:
> > 
> > actions=delete_field:tun_metadata1
> > 
> > Signed-off-by: Yi-Hung Wei 
> > ---
> LGTM.
> Acked-by: William Tu 
> 
> Let's see Ben or others have more comments.
> William
>  
Applied to master, thanks.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] Failed to execute unit test: system-traffic.at: conntrack - floating IP

2020-04-27 Thread William Tu
On Mon, Apr 27, 2020 at 11:54 AM Vasu Dasari  wrote:
>
> Hi,
>
> I am running into an error when I try to run the system-traffic test case -
> "conntrack - floating IP". Actually, any test case which is using this in
> the testsuite is failing:
> ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24", "f0:00:00:01:01:01")
>
> Command used to run the test:
> sudo make -C _build-gcc/ check-system-userspace TESTSUITEFLAGS='-k
> "conntrack - floating IP"'
> Ububtu Kernel: 5.3.0-46-generic
> OVS Version: Latest master
>
> Relevant supporting log from system-userspace-testsuite.log
>
> =
> ../../tests/system-traffic.at:5757: ip netns exec at_ns0 sh <<
> NS_EXEC_HEREDOC
> ip link set dev p0 address "f0:00:00:01:01:01"
> NS_EXEC_HEREDOC
> --- /dev/null   2020-04-27 14:44:02.898140777 -0400
> +++
> /opt/vdasari/Developer/ovs/_build-gcc/tests/system-userspace-testsuite.dir/at-groups/120/stderr
> 2020-04-27 14:44:33.432320411 -0400
> @@ -0,0 +1 @@
> +Invalid address length 6 - must be 42401 bytes

Hi Vasu,

This is due to a bug in iproute2.
Can you update to the latest version?
see
Documentation/topics/testing.rst
"
Many of the kernel tests are dependent on the utilities present in the
  iproute2 package, especially the 'ip' command.  If there are many
  otherwise unexplained errors it may be necessary to update the iproute2
  package utilities on the system.  It is beyond the scope of this
  documentation to explain all that is necessary to build and install
  an updated iproute2 utilities package.  The package is available from
  the Linux kernel organization open source git repositories.
https://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
"
> ../../tests/system-traffic.at:5757: exit code was 1, expected 0
> =
>
> Found a reference to this kind of error at
> https://bugzilla.redhat.com/show_bug.cgi?id=1550097
> This was from last April.
>
> If I execute the command from bash, the same is successful:
> 
> $ sudo ip netns exec at_ns0 bash
> $ ip link set dev p0 address "f0:00:00:01:01:01"
> $ ip link show p0
> 11: p0@if10:  mtu 1500 qdisc noqueue state
> UP mode DEFAULT group default qlen 1000
> link/ether f0:00:00:01:01:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0
> 
>
> Any idea why this command is failing only in OVS environment?
>
> Thanks
> -Vasu
>
> *Vasu Dasari*
> ___
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCHv9 1/2] userspace: Enable TSO support for non-DPDK.

2020-04-27 Thread William Tu
On Tue, Mar 24, 2020 at 3:11 PM William Tu  wrote:
>
> This patch enables TSO support for non-DPDK use cases, and
> also add check-system-tso testsuite. Before TSO, we have to
> disable checksum offload, allowing the kernel to calculate the
> TCP/UDP packet checsum. With TSO, we can skip the checksum
> validation by enabling checksum offload, and with large packet
> size, we see better performance.
>
> Consider container to container use cases:
>   iperf3 -c (ns0) -> veth peer -> OVS -> veth peer -> iperf3 -s (ns1)
> And I got around 6Gbps, similar to TSO with DPDK-enabled.
>
> Signed-off-by: William Tu 
> Acked-by: Flavio Leitner 
>
> ---
> v9:
>   - make naming of flags more clear
>   - I couldn't think of any smart MACRO
>   - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/666513254
> v8:
>   - make some namings more clear
>
> v7: more refactoring on functions
>   - rss and flow mark related functions.
>   - dp_packet_clone_with_headroom
>   - fix definitions of DP_PACKET_OL_FLOW_MARK_MASK
>   - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/663658338
>
> v6: fix indentation
>
> v5: feedback from Flavio
>   - move some code around, break the long line
>   - travis is now working
> https://travis-ci.org/github/williamtu/ovs-travis/builds/661607017
>
> v4:
>   - Avoid duplications of DPDK and non-DPDK code by
> refactoring the definition of DP_PACKET_OL flags
> and relevant functions
>   - I got weird error in travis (I think this is unrelated)
> https://travis-ci.org/github/williamtu/ovs-travis/builds/661313463
> sindex.c:378:26: error: unknown type name ‘sqlite3_str’
> static int query_appendf(sqlite3_str *query, const char *fmt, ...)
>   - test compile ok on dpdk and non-dpdk, make check-system-tso still
> has a couple fails
>
> v3:
>   - fix comments and some coding style
>   - add valgrind check
>   - travis: https://travis-ci.org/williamtu/ovs-travis/builds/660394007
> v2:
>   - add make check-system-tso test
>   - combine logging for dpdk and non-dpdk
>   - I'm surprised that most of the test cases passed.
> This is due to few tests using tcp/udp, so it does not trigger
> TSO.  I saw only geneve/vxlan fails randomly, maybe we can
> check it later.
> ---
>  lib/dp-packet.c   |   6 +-
>  lib/dp-packet.h   | 572 
> +++---
>  lib/userspace-tso.c   |   5 -
>  tests/.gitignore  |   3 +
>  tests/automake.mk |  21 ++
>  tests/system-tso-macros.at|  31 +++
>  tests/system-tso-testsuite.at |  26 ++
>  7 files changed, 339 insertions(+), 325 deletions(-)
>  create mode 100644 tests/system-tso-macros.at
>  create mode 100644 tests/system-tso-testsuite.at
>
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index cd2623500e3d..72f6d09ac7f3 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -192,10 +192,8 @@ dp_packet_clone_with_headroom(const struct dp_packet 
> *buffer, size_t headroom)
>  sizeof(struct dp_packet) -
>  offsetof(struct dp_packet, l2_pad_size));
>
> -#ifdef DPDK_NETDEV
> -new_buffer->mbuf.ol_flags = buffer->mbuf.ol_flags;
> -new_buffer->mbuf.ol_flags &= ~DPDK_MBUF_NON_OFFLOADING_FLAGS;
> -#endif
> +*dp_packet_ol_flags_ptr(new_buffer) = *dp_packet_ol_flags_ptr(buffer);
> +*dp_packet_ol_flags_ptr(new_buffer) &= DP_PACKET_OL_SUPPORTED_MASK;
>
>  if (dp_packet_rss_valid(buffer)) {
>  dp_packet_set_rss_hash(new_buffer, dp_packet_get_rss_hash(buffer));
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index 9f8991faad52..4c127e759e6d 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -47,19 +47,62 @@ enum OVS_PACKED_ENUM dp_packet_source {
>  };
>
>  #define DP_PACKET_CONTEXT_SIZE 64
> +#ifdef DPDK_NETDEV
> +#define DEF_OL_FLAG(NAME, DPDK_DEF, GENERIC_DEF) NAME = DPDK_DEF
> +#else
> +#define DEF_OL_FLAG(NAME, DPDK_DEF, GENERIC_DEF) NAME = GENERIC_DEF
> +#endif
>
> -#ifndef DPDK_NETDEV
>  /* Bit masks for the 'ol_flags' member of the 'dp_packet' structure. */
>  enum dp_packet_offload_mask {
> -DP_PACKET_OL_RSS_HASH_MASK  = 0x1, /* Is the 'rss_hash' valid? */
> -DP_PACKET_OL_FLOW_MARK_MASK = 0x2, /* Is the 'flow_mark' valid? */
> +/* Value 0 is not used. */
> +/* Is the 'rss_hash' valid? */
> +DEF_OL_FLAG(DP_PACKET_OL_RSS_HASH, PKT_RX_RSS_HASH, 0x1),
> +/* Is the 'flow_mark' valid? (DPDK does not support) */
> +DEF_OL_FLAG(DP_PACKET_OL_FLOW_MARK, PKT_RX_FDIR_ID, 0x2),
> +/* Bad L4 c

Re: [ovs-dev] [PATCHv5] netdev-afxdp: Add interrupt mode netdev class.

2020-04-27 Thread William Tu
On Tue, Apr 14, 2020 at 6:23 AM William Tu  wrote:
>
> The patch adds a new netdev class 'afxdp-nonpmd' to enable afxdp
> interrupt mode. This is similar to 'type=afxdp', except that the
> is_pmd field is set to false. As a result, the packet processing
> is handled by main thread, not pmd thread. This avoids burning
> the CPU to always 100% when there is no traffic.
>
> Signed-off-by: William Tu 
> ---
> v5:
>   - add NETDEV_AFXDP_CLASS_COMMON
> v4:
>   - Previously crash and fix it with qid = qid % netdev_n_txq(netdev)
> Now remove it because Ilya's fix:
>   "dpif-netdev: Force port reconfiguration to change dynamic_txqs."
> ---
>  NEWS  |  3 +++
>  lib/netdev-linux.c| 37 +++--
>  lib/netdev-provider.h |  1 +
>  lib/netdev.c  |  1 +
>  tests/system-afxdp.at | 23 +++
>  5 files changed, 51 insertions(+), 14 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 70bd17584594..6db2d993ffdb 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -10,6 +10,9 @@ Post-v2.13.0
>   * Deprecated DPDK ring ports (dpdkr) are no longer supported.
> - Linux datapath:
>   * Support for kernel versions up to 5.5.x.
> +   - AF_XDP:
> + * New netdev class 'afxdp-nonpmd' for netdev-afxdp to save CPU cycles
> +   by enabling interrupt mode.
>
>
>  v2.13.0 - 14 Feb 2020
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index ff045cb1290b..1d7ed0145c48 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -3599,24 +3599,33 @@ const struct netdev_class netdev_internal_class = {
>  };
>
>  #ifdef HAVE_AF_XDP
> +#define NETDEV_AFXDP_CLASS_COMMON   \
> +.construct = netdev_afxdp_construct,\
> +.destruct = netdev_afxdp_destruct,  \
> +.get_stats = netdev_afxdp_get_stats,\
> +.get_custom_stats = netdev_afxdp_get_custom_stats,  \
> +.get_status = netdev_linux_get_status,  \
> +.set_config = netdev_afxdp_set_config,  \
> +.get_config = netdev_afxdp_get_config,  \
> +.reconfigure = netdev_afxdp_reconfigure,\
> +.get_numa_id = netdev_linux_get_numa_id,\
> +.send = netdev_afxdp_batch_send,\
> +.rxq_construct = netdev_afxdp_rxq_construct,\
> +.rxq_destruct = netdev_afxdp_rxq_destruct,  \
> +.rxq_recv = netdev_afxdp_rxq_recv
> +
>  const struct netdev_class netdev_afxdp_class = {
>  NETDEV_LINUX_CLASS_COMMON,
> +NETDEV_AFXDP_CLASS_COMMON,
>  .type = "afxdp",
>  .is_pmd = true,
> -.init = netdev_afxdp_init,
> -.construct = netdev_afxdp_construct,
> -.destruct = netdev_afxdp_destruct,
> -.get_stats = netdev_afxdp_get_stats,
> -.get_custom_stats = netdev_afxdp_get_custom_stats,
> -.get_status = netdev_linux_get_status,
> -.set_config = netdev_afxdp_set_config,
> -.get_config = netdev_afxdp_get_config,
> -.reconfigure = netdev_afxdp_reconfigure,
> -.get_numa_id = netdev_linux_get_numa_id,
> -.send = netdev_afxdp_batch_send,
> -.rxq_construct = netdev_afxdp_rxq_construct,
> -.rxq_destruct = netdev_afxdp_rxq_destruct,
> -.rxq_recv = netdev_afxdp_rxq_recv,
> +};
> +
> +const struct netdev_class netdev_afxdp_nonpmd_class = {
> +NETDEV_LINUX_CLASS_COMMON,
> +NETDEV_AFXDP_CLASS_COMMON,
> +.type = "afxdp-nonpmd",
> +.is_pmd = false,
>  };
>  #endif
>
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index 6f509424bc81..d9503adb0fb6 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -850,6 +850,7 @@ extern const struct netdev_class netdev_tap_class;
>
>  #ifdef HAVE_AF_XDP
>  extern const struct netdev_class netdev_afxdp_class;
> +extern const struct netdev_class netdev_afxdp_nonpmd_class;
>  #endif
>  #ifdef  __cplusplus
>  }
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 8c44eee8e98a..90962eec66cf 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -154,6 +154,7 @@ netdev_initialize(void)
>  netdev_register_flow_api_provider(&netdev_offload_tc);
>  #ifdef HAVE_AF_XDP
>  netdev_register_provider(&netdev_afxdp_class);
> +netdev_register_provider(&netdev_afxdp_nonpmd_class);
>  #endif
>  #endif
>  #if defined(__FreeBSD__) || defined(__NetBSD__)
> diff --git a/tests/system-afxdp.at b/tests/system-afxdp.at
> index e4451624f882..0d09906fb6c8 100644
> --- a/tests/system-

[ovs-dev] [PATCH] AUTHORS: Add Anton Ivanov.

2020-04-27 Thread William Tu
Signed-off-by: William Tu 
---
 AUTHORS.rst | 1 +
 1 file changed, 1 insertion(+)

diff --git a/AUTHORS.rst b/AUTHORS.rst
index 61a3f6117900..5d83d309ccd5 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -60,6 +60,7 @@ Andy Zhou  az...@ovn.org
 Ankur Sharma   ankursha...@vmware.com
 Anoob Somananoob.so...@citrix.com
 Ansis Atteka   aatt...@vmware.com
+Anton Ivanov   anton.iva...@cambridgegreys.com
 Antonio Fischetti  antonio.fische...@intel.com
 Anupam Chanda
 Ariel Tubaltsevatubalt...@vmware.com
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] docs: Fix GTP-U release version.

2020-04-27 Thread William Tu
GTP-U support should be at OVS-2.14.

Signed-off-by: William Tu 
---
 Documentation/faq/releases.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index b3507bd1c7fa..f170ebd3ff22 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -131,7 +131,7 @@ Q: Are all features available with all datapaths?
 Tunnel - Geneve-IPv64.42.6  2.6  NO
 Tunnel - ERSPAN 4.18   2.10 2.10 NO
 Tunnel - ERSPAN-IPv64.18   2.10 2.10 NO
-Tunnel - GTP-U  NO NO   2.13 NO
+Tunnel - GTP-U  NO NO   2.14 NO
 QoS - Policing  YES1.1  2.6  NO
 QoS - Shaping   YES1.1  NO   NO
 sFlow   YES1.0  1.0  NO
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCHv4 2/2] userspace: Add conntrack timeout policy support.

2020-04-27 Thread William Tu
Commit 1f1613183733 ("ct-dpif, dpif-netlink: Add conntrack timeout
policy support") adds conntrack timeout policy for kernel datapath.
This patch enables support for the userspace datapath.  I tested
using the 'make check-system-userspace' which checks the timeout
policies for ICMP and UDP cases.

Signed-off-by: William Tu 
---
v4: address feedback from Yi-Hung
  - move default policy value to lib/conntrack-tp.c
  - separate icmp bug patch
  - refactor and fix include issues
  - fix the clang lock analysis annotation
  - keep clean interval to 5 seconds
  - improve tests in system-traffic.at
  - travis: https://travis-ci.org/github/williamtu/ovs-travis/builds/680158645

v3:
  - address feedback from Yi-Hung
  - use ID 0 as default policy
  - move conn_{init,update}_expiration to lib/conntrack-tp.c
  - s/tpid/tp_id/
  - add default timeout value to CT_DPIF_TP_*_ATTRs
  - reduce the CT_CLEAN_INTERVAL from 5 to 3s, to make the tests
run faster.
  - add more tests to system-traffic.at
  - code refactoring and renaming
---
 Documentation/faq/releases.rst   |   2 +-
 NEWS |   2 +
 lib/automake.mk  |   2 +
 lib/conntrack-icmp.c |   6 +-
 lib/conntrack-other.c|   4 +-
 lib/conntrack-private.h  |  70 +++--
 lib/conntrack-tcp.c  |   5 +-
 lib/conntrack-tp.c   | 301 +++
 lib/conntrack-tp.h   |  30 
 lib/conntrack.c  |  37 ++---
 lib/conntrack.h  |   8 +-
 lib/ct-dpif.h|   2 +
 lib/dpif-netdev.c|  75 +-
 ofproto/ofproto-dpif.c   |   3 +-
 tests/system-traffic.at  |  29 +++-
 tests/system-userspace-macros.at |   6 +-
 tests/test-conntrack.c   |   6 +-
 17 files changed, 493 insertions(+), 95 deletions(-)
 create mode 100644 lib/conntrack-tp.c
 create mode 100644 lib/conntrack-tp.h

diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index b3507bd1c7fa..4884515446d7 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -118,7 +118,7 @@ Q: Are all features available with all datapaths?
 == == == = ===
 Connection tracking 4.32.5  2.6  YES
 Conntrack Fragment Reass.   4.32.6  2.12 YES
-Conntrack Timeout Policies  5.22.12 NO   NO
+Conntrack Timeout Policies  5.22.12 2.14 NO
 Conntrack Zone Limit4.18   2.10 2.13 YES
 Conntrack NAT   4.62.6  2.8  YES
 Tunnel - LISP   NO 2.11 NO   NO
diff --git a/NEWS b/NEWS
index 70bd17584594..1e6af8f57bdd 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,8 @@ Post-v2.13.0
  * Deprecated DPDK ring ports (dpdkr) are no longer supported.
- Linux datapath:
  * Support for kernel versions up to 5.5.x.
+   - Userspace datapath:
+ * Add support for conntrack zone-based timeout policy.
 
 
 v2.13.0 - 14 Feb 2020
diff --git a/lib/automake.mk b/lib/automake.mk
index 95925b57c351..86940ccd2f9e 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -53,6 +53,8 @@ lib_libopenvswitch_la_SOURCES = \
lib/conntrack-icmp.c \
lib/conntrack-private.h \
lib/conntrack-tcp.c \
+   lib/conntrack-tp.c \
+   lib/conntrack-tp.h \
lib/conntrack-other.c \
lib/conntrack.c \
lib/conntrack.h \
diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
index 6cbf9656dd93..bf49f9a9fa93 100644
--- a/lib/conntrack-icmp.c
+++ b/lib/conntrack-icmp.c
@@ -22,6 +22,7 @@
 #include 
 
 #include "conntrack-private.h"
+#include "conntrack-tp.h"
 #include "dp-packet.h"
 
 enum OVS_PACKED_ENUM icmp_state {
@@ -79,12 +80,13 @@ icmp6_valid_new(struct dp_packet *pkt)
 
 static struct conn *
 icmp_new_conn(struct conntrack *ct, struct dp_packet *pkt OVS_UNUSED,
-  long long now)
+  long long now, uint32_t tp_id)
 {
 struct conn_icmp *conn = xzalloc(sizeof *conn);
 conn->state = ICMPS_FIRST;
-conn_init_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
+conn->up.tp_id = tp_id;
 
+conn_init_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
 return &conn->up;
 }
 
diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
index de22ef87cc19..d3b46018586c 100644
--- a/lib/conntrack-other.c
+++ b/lib/conntrack-other.c
@@ -17,6 +17,7 @@
 #include 
 
 #include "conntrack-private.h"
+#include "conntrack-tp.h"
 #include "dp-packet.h"
 
 enum OVS_PACKED_ENUM other_state {
@@ -69,12 +70,13 @@ other_valid_new(struct dp_packet *pkt OVS_UNUSED)
 
 static struct conn *
 other_new_conn(str

[ovs-dev] [PATCHv4 1/2] conntrack: Fix icmp conntrack state.

2020-04-27 Thread William Tu
ICMP conntrack state should be ICMPS_REPLY after seeing both
side of ICMP traffic.

Signed-off-by: William Tu 
---
 lib/conntrack-icmp.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
index 63246f0124d0..6cbf9656dd93 100644
--- a/lib/conntrack-icmp.c
+++ b/lib/conntrack-icmp.c
@@ -50,9 +50,12 @@ icmp_conn_update(struct conntrack *ct, struct conn *conn_,
  struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
 {
 struct conn_icmp *conn = conn_icmp_cast(conn_);
-conn->state = reply ? ICMPS_REPLY : ICMPS_FIRST;
-conn_update_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
 
+if (reply && conn->state == ICMPS_FIRST) {
+   conn->state = ICMPS_REPLY;
+}
+
+conn_update_expiration(ct, &conn->up, icmp_timeouts[conn->state], now);
 return CT_UPDATE_VALID;
 }
 
-- 
2.7.4

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] ovsdb: Remove duplicated function defintions

2020-04-27 Thread William Tu
On Tue, Apr 21, 2020 at 03:09:05PM -0700, Yi-Hung Wei wrote:
> ovsdb_function_from_string() and ovsdb_function_to_string() are defined
> both in ovsdb/condition.c and lib/ovsdb-condidtion.c with the same function
> definition.  Remove the one in ovsdb/condition.c to avoid duplication.
> 
> This also resolves the following bazel building error.
> 
> ./libopenvswitch.lo(ovsdb-condition.pic.o): In function 
> `ovsdb_function_from_string':
> /lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_from_string'
> ./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:34:
>  first defined here
> ./libopenvswitch.lo(ovsdb-condition.pic.o): In function 
> `ovsdb_function_from_string':
> ./lib/ovsdb-condition.c:24: multiple definition of `ovsdb_function_to_string'
> ./libovsdb.a(condition.pic.o):/proc/self/cwd/external/openvswitch_repo/ovsdb/condition.c:335
> 
> Reported-by: Harold Lim 
> Signed-off-by: Yi-Hung Wei 
> ---
> Travis-CI: https://travis-ci.org/github/YiHungWei/ovs/builds/677890120
Applied to master, thanks
William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Fast path in vlog

2020-04-27 Thread William Tu
On Tue, Apr 21, 2020 at 08:57:04AM -0700, William Tu wrote:
> On Tue, Apr 21, 2020 at 09:24:38AM +0100, anton.iva...@cambridgegreys.com 
> wrote:
> > From: Anton Ivanov 
> > 
> > Avoid grabbing any mutexes if the log levels specify that
> > no logging is to take place.
> > 
> > Signed-off-by: Anton Ivanov 
> > ---
> Looks good to me, thanks
> Acked-by: William Tu 
Applied to master, thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Switch ovsdb log fsync to data only

2020-04-27 Thread William Tu
On Tue, Apr 21, 2020 at 05:24:01PM +0100, Anton Ivanov wrote:
> 
> On 21/04/2020 17:04, William Tu wrote:
> >On Tue, Apr 21, 2020 at 09:23:57AM +0100, anton.iva...@cambridgegreys.com 
> >wrote:
> >>From: Anton Ivanov 
> >>
> >>We do not check metadata - mtime, atime, anywhere, so we
> >>do not need to update it every time we sync the log.
> >>if the system supports it, the log update should be
> >>data only
> >>
> >>Signed-off-by: Anton Ivanov 
> >LGTM,
> >But how do you know we do not check mtime or atime of the ovsdb log file?
> 
> By searching through the code :)
> 
> stat and stat64 are fairly easy to grep for and so is their use. They are 
> mostly confined to ssl-stream.c
> 
> >If there isn't a lot of performance overhead updating the metadata,
> >why not keep it as it is now?
> 
> The performance overhead on a spinning rust disk is massive - it is several 
> times. In fact, you can hear it. My source trees are on a SAS 7.2K RPM array 
> and with the current upstream the testsuite sounds like a hectic Call of Duty 
> bout.
> 
> With this, the sound becomes much "smoother" - you can hear that the disks 
> are no longer searching like crazy.
> 
> On a non-rotational - not so much, but still in the 10s of percent. The 
> kernel has to issue at least two sets of requests instead of one and there is 
> quite a bit of locking and barriers on metadata updates.
> 
> A.
Applied to master, thanks.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] Bareudp Tunnel Support

2020-04-26 Thread William Tu
On Mon, Apr 27, 2020 at 09:02:16AM +0530, Martin Varghese wrote:
> On Sat, Apr 25, 2020 at 06:40:18AM -0700, William Tu wrote:
> > On Fri, Apr 24, 2020 at 02:54:00PM +0530, Martin Varghese wrote:
> > > On Fri, Apr 24, 2020 at 01:06:21AM -0700, Pravin Shelar wrote:
> > > > On Sun, Apr 19, 2020 at 8:11 PM Martin Varghese
> > > >  wrote:
> > > > >
> > > > > From: Martin Varghese 
> > > > >
> > > > > UDP tunnel encapsulation module for tunnelling different protocols 
> > > > > like
> > > > > MPLS, IP, NSH etc
> > > > >
> > > > > The Bareudp tunnel module provides a generic UDP L3 encapsulation
> > > > > tunnelling module for tunnelling different protocols like MPLS,IP,NSH 
> > > > > etc.
> > > > > inside a UDP tunnel.
> > > > >
> > > > > Signed-off-by: Martin Varghese 
> > > > > ---
> > > > >  Documentation/automake.mk  |   1 +
> > > > >  Documentation/faq/bareudp.rst  |  62 ++
> > > > >  Documentation/faq/index.rst|   1 +
> > > > >  Documentation/faq/releases.rst |   1 +
> > > > >  NEWS   |   3 +-
> > > > >  datapath/Modules.mk|   4 +-
> > > > >  datapath/linux/Modules.mk  |   2 +
> > > > >  datapath/linux/compat/bareudp.c| 820 
> > > > > +
> > > > >  datapath/linux/compat/include/linux/if_link.h  |  11 +
> > > > >  datapath/linux/compat/include/linux/openvswitch.h  |  11 +
> > > > >  datapath/linux/compat/include/net/bareudp.h|  59 ++
> > > > >  datapath/linux/compat/include/net/ip6_tunnel.h |   9 +
> > > > >  datapath/linux/compat/include/net/ip_tunnels.h |   7 +
> > > > >  datapath/linux/compat/ip6_tunnel.c |  60 ++
> > > > >  datapath/linux/compat/ip_tunnel.c  |  47 ++
> > > > >  datapath/vport-bareudp.c   | 202 +
> > > > >  datapath/vport.c   |  11 +-
> > > > >  lib/dpif-netlink-rtnl.c|  53 ++
> > > > >  lib/dpif-netlink.c |  10 +
> > > > >  lib/netdev-vport.c |  25 +-
> > > > >  lib/netdev.h   |   1 +
> > > > >  ofproto/ofproto-dpif-xlate.c   |   1 +
> > > > >  rhel/openvswitch-kmod-fedora.spec.in   |   2 +-
> > > > >  ...sr_share_openvswitch_scripts_ovs-kmod-manage.sh |   2 +-
> > > > >  tests/automake.mk  |   2 +-
> > > > >  tests/system-layer3-tunnels.at |  47 ++
> > > > >  utilities/ovs-dev.py   |   1 +
> > > > >  27 files changed, 1447 insertions(+), 8 deletions(-)
> > > > >  create mode 100644 Documentation/faq/bareudp.rst
> > > > >  create mode 100644 datapath/linux/compat/bareudp.c
> > > > >  create mode 100644 datapath/linux/compat/include/net/bareudp.h
> > > > >  create mode 100644 datapath/vport-bareudp.c
> > > > >
> > > > I do not see need to have vport-bareudp module. we can directly use
> > > > bareudp dev from upstream kernel or from ovs compat module. Current
> > > > vport modules are there due to legacy reasons. All new tunnel
> > > > implementation should follow new design in which all tunnel devices
> > > > are netdevices.
> > > >
> > > if flag ovs_tunnels_out_of_tree is true, the old genetlink interface is 
> > > used corret? 
> > Then it uses the kernel module under ovs/datapath/* , not the upstream 
> > kernel.
> > In your case, it load the code from datapath/linux/compat/bareudp.c
> >
> 
> The new convention is to move the ovs_vport_ops_register to bareudp.c and 
> hence there
> is no need of a seperate vport-bareudp module.But unlike devices from upstream
> the vport type of bareudp device in compact should be OVS_VPORT_TYPE_BAREDUP 
> instead
> of OVS_VPORT_TYPE_NETDEVICE in upstream device
> 
> Is the above undertanding correct ? 
Yes.

> > > How the rtnetlink interface will be used in that use .I am missing 
> > > something here.
> > > Is there any tunnel device which does the new  way with ovs-kernel tree 
> > > installed ?  
> > 
> > Yes, take a look at ERSPAN tunnel.
> > 
> unlike vxlan, & geneve the ERSPAN device regiters itself with rtnl with same
> name as the upstream device "ip6erspan" . Will it not create a conflict ?

Yes, so only one module can be loaded.
Either the upstream kernel one or the OVS compat.

William

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] 答复: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev stats

2020-04-26 Thread William Tu
On Mon, Apr 27, 2020 at 02:00:38AM +, 姜立东 wrote:
> Hi William,
> 
> Thanks for your comments. 
> Agree with you, those counters such as collisions and multicast should be 
> kept as current implementation, 
> since they are don't provided by vport stats. 
> 
> I also suggest to remove rx/tx bytes and packets, rx/tx error and drops as 
> well, only count physical or hardware stats in,
> because it is confusing when checking with netdev_stats_from_ovs_vport_stats, 
> if netdev_stats_from_ovs_vport_stats 
> has provided them, why they are copied again.
> 
> What do you think about change as below? 
> 
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index ff045cb..b851236 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -2208,18 +2208,6 @@ netdev_linux_get_stats(const struct netdev *netdev_,
>  /* stats not available from OVS then use netdev stats. */
>  *stats = dev_stats;
>  } else {
> -/* Use kernel netdev's packet and byte counts since vport's counters
> - * do not reflect packet counts on the wire when GSO, TSO or GRO are
> - * enabled. */
But the comments above says the reason using the counter below.
> -stats->rx_packets = dev_stats.rx_packets;
> -stats->rx_bytes = dev_stats.rx_bytes;
> -stats->tx_packets = dev_stats.tx_packets;
> -stats->tx_bytes = dev_stats.tx_bytes;
> -

How about we just remove the below 4 stats?
> -stats->rx_errors   += dev_stats.rx_errors;
> -stats->tx_errors   += dev_stats.tx_errors;
> -stats->rx_dropped  += dev_stats.rx_dropped;
> -stats->tx_dropped  += dev_stats.tx_dropped;
>  stats->multicast   += dev_stats.multicast;
>  stats->collisions      += dev_stats.collisions;
>  stats->rx_length_errors+= dev_stats.rx_length_errors;
> 
> BR,
> Lidong
> 
> -邮件原件-
> 发件人: William Tu  
> 发送时间: 2020年4月25日 22:31
> 收件人: 姜立东 
> 抄送: d...@openvswitch.org
> 主题: Re: [ovs-dev] [PATCH] netdev-linux: remove sum of vport stats and kernel 
> netdev stats
> 
> On Thu, Apr 23, 2020 at 05:35:14AM +, 姜立东 via dev wrote:
> > From df9ff3b67f11e467928ca0873031d81b87f3d0c5 Mon Sep 17 00:00:00 2001
> > From: Jiang Lidong 
> > Date: Thu, 23 Apr 2020 11:07:28 +0800
> > Subject: [PATCH] netdev-linux: remove sum of vport stats and kernel netdev 
> > stats
> > When using kernel veth as OVS interface, doubled drop counter
> > value is shown when veth drops packets due to traffic overrun.
> > 
> > In netdev_linux_get_stats, it reads both vport stats and kernel
> > netdev stats, in case vport stats retrieve failure. If both of
> > them success, error counters are added to include errors from
> > different layers. But implementation of ovs_vport_get_stats in
> > kernel data path has included kernel netdev stats by calling
> > dev_get_stats. When drop or other error counters is not zero,
> > its value is doubled by netdev_linux_get_stats.
> > 
> > In this change, adding kernel netdev stats into vport stats
> > is removed, since vport stats includes all information of
> > kernel netdev stats.
> > 
> > Signed-off-by: Jiang Lidong 
> > ---
> >  lib/netdev-linux.c | 27 +--
> >  1 file changed, 1 insertion(+), 26 deletions(-)
> > 
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 
> > ff045cb..6d139d0 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -2207,33 +2207,8 @@ netdev_linux_get_stats(const struct netdev *netdev_,
> >  } else if (netdev->vport_stats_error) {
> >  /* stats not available from OVS then use netdev stats. */
> >  *stats = dev_stats;
> > -} else {
> > -/* Use kernel netdev's packet and byte counts since vport's 
> > counters
> > - * do not reflect packet counts on the wire when GSO, TSO or GRO 
> > are
> > - * enabled. */
> > -stats->rx_packets = dev_stats.rx_packets;
> > -stats->rx_bytes = dev_stats.rx_bytes;
> > -stats->tx_packets = dev_stats.tx_packets;
> > -stats->tx_bytes = dev_stats.tx_bytes;
> > -
> > -stats->rx_errors   += dev_stats.rx_errors;
> > -stats->tx_errors   += dev_stats.tx_errors;
> > -stats->rx_dropped  += dev_stats.rx_dropped;
> > -stats->tx_dropped  += dev_stats.tx_dropped;
> 
> Thanks for re

<    1   2   3   4   5   6   7   8   9   10   >