> -----Original Message-----
> From: William Tu <[email protected]>
> Sent: Wednesday, May 20, 2020 1:12 AM
> To: Van Haaren, Harry <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> implementation
>
> On Mon, May 18, 2020 at 9:12 AM Van Haaren, Harry
> <[email protected]> wrote:
> >
> > > -----Original Message-----
> > > From: William Tu <[email protected]>
> > > Sent: Monday, May 18, 2020 3:58 PM
> > > To: Van Haaren, Harry <[email protected]>
> > > Cc: [email protected]; [email protected]
> > > Subject: Re: [ovs-dev] [PATCH v2 5/5] dpif-lookup: add avx512 gather
> > > implementation
> > >
> > > On Wed, May 06, 2020 at 02:06:09PM +0100, Harry van Haaren wrote:
> > > > This commit adds an AVX-512 dpcls lookup implementation.
> > > > It uses the AVX-512 SIMD ISA to perform multiple miniflow
> > > > operations in parallel.
<snip lots of code/patch contents for readability>
> Hi Harry,
>
> I managed to find a machine with avx512 in google cloud and did some
> performance testing. I saw lower performance when enabling avx512,
> I believe I did something wrong. Do you mind having a look:
>
> 1) first a compile error
> diff --git a/lib/dpif-netdev-lookup.c b/lib/dpif-netdev-lookup.c
> index b22a26b8c8a2..5c71096c10c5 100644
> --- a/lib/dpif-netdev-lookup.c
> +++ b/lib/dpif-netdev-lookup.c
> @@ -1,5 +1,6 @@
>
> #include <config.h>
> +#include <errno.h>
> #include "dpif-netdev-lookup.h"
>
> #include "openvswitch/vlog.h"
Existing code compiles fine here - but I've added this in the v3, thanks for
flagging.
> 2) cpuinfo
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
> pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm
> constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq
> pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
> aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch
> invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle
> avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap
> clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
> xsaves arat md_clear arch_capabilities
The avx512f and dq/cd/bw/vl extensions indicate AVX512 is available on this
machine, all looks good so far.
>
> 3) start ovs and set avx and traffic gen
> ovs-appctl dpif-netdev/subtable-lookup-set avx512_gather 5
> ovs-vsctl add-port br0 tg0 -- set int tg0 type=dpdk
> options:dpdk-devargs=vdev:net_pcap0,rx_pcap=/root/ovs/p0.pcap,infinite_rx=1
The output of the first command (enabling the AVX512 lookup) posts some output
to Log INFO, please ensure its there?
2020-05-20T09:39:09Z|00262|dpif_netdev_lookup|INFO|Subtable function
'avx512_gather' set priority to 4
2020-05-20T09:39:09Z|00006|dpif_netdev(pmd-c15/id:99)|INFO|reprobing sub func,
5 1
> 4) dp flows with miniflow info
> root@instance-3:~/ovs# ovs-appctl dpctl/dump-flows -m
> flow-dump from pmd on cpu core: 0
> ufid:caf11111-2e15-418c-a7d4-b4ec377593ca,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(
> 0/0),recirc_id(0),dp_hash(0/0),in_port(tg0),packet_type(ns=0,id=0),eth(src=42:01:
> 0a:b6:00:02,dst=42:01:0a:b6:00:01),eth_type(0x0800),ipv4(src=10.182.0.2/0.0.0.0,
> dst=76.21.95.192/0.0.0.0,proto=6/0,tos=0x10/0,ttl=64/0,frag=no),tcp(src=22/0,ds
> t=62190/0),tcp_flags(0/0),
> packets:0, bytes:0, used:never, dp:ovs, actions:drop,
> dp-extra-info:miniflow_bits(5,1)
> ufid:78cc1751-3a81-4dba-900c-b3507d965bdc,
> skb_priority(0/0),skb_mark(0/0),ct_state(0/0),ct_zone(0/0),ct_mark(0/0),ct_label(
> 0/0),recirc_id(0),dp_hash(0/0),in_port(tg0),packet_type(ns=0,id=0),eth(src=42:01:
> 0a:b6:00:02,dst=42:01:0a:b6:00:01),eth_type(0x0800),ipv4(src=10.182.0.2/0.0.0.0,
> dst=169.254.169.254/0.0.0.0,proto=6/0,tos=0/0,ttl=64/0,frag=no),tcp(src=51650/
> 0,dst=80/0),tcp_flags(0/0),
> packets:0, bytes:0, used:never, dp:ovs, actions:drop,
> dp-extra-info:miniflow_bits(5,1)
It seems the "packets:0, bytes:0,used:never" tags indicate that there is no
traffic hitting these rules at all?
Output here (with traffic running for a while) shows:
packets:621588543, bytes:37295312580, used:0.000s, dp:ovs, actions:dpdk1,
dp-extra-info:miniflow_bits(4,1)
> 5) pmd-stat-show
> root@instance-3:~/ovs# ovs-appctl dpif-netdev/pmd-stats-show
> pmd thread numa_id 0 core_id 0:
> packets received: 19838528
> packet recirculations: 0
> avg. datapath passes per packet: 1.00
> emc hits: 0
> smc hits: 0
> megaflow hits: 0
> avg. subtable lookups per megaflow hit: 0.00 (---> this doesn't
> look right ....)
> miss with success upcall: 78
> miss with failed upcall: 19838418
> avg. packets per output batch: 2.00
> idle cycles: 0 (0.00%)
> processing cycles: 103431787838 (100.00%)
> avg cycles per packet: 5213.68 (103431787838/19838528)
> avg processing cycles per packet: 5213.68 (103431787838/19838528)
Would you try the pmd-stats-show command before setting the AVX512 lookup?
If the issue is still present it would indicate its not related to the exact
lookup
implementation.
Running the same test on master, patchset using scalar, and patchset using
AVX512
for lookup all provides a valid megaflow hits count, exactly equal to
(packets_rx - 1)
as the first one goes to the upcall to install the rule (and hence doesn't hit
the rule :)
$ ./utilities/ovs-appctl dpif-netdev/pmd-stats-show
### Master
pmd thread numa_id 0 core_id 15:
packets received: 503095
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 0
smc hits: 0
megaflow hits: 503094
avg. subtable lookups per megaflow hit: 1.00
### Scalar Lookup
$ ./utilities/ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 15:
packets received: 508759
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 0
smc hits: 0
megaflow hits: 508758
avg. subtable lookups per megaflow hit: 1.00
### AVX512 Lookup
./utilities/ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 15:
packets received: 540311
packet recirculations: 0
avg. datapath passes per packet: 1.00
emc hits: 0
smc hits: 0
megaflow hits: 540310
avg. subtable lookups per megaflow hit: 1.00
> 6) gdb also looks not right..., I didn't see any avx512 instructions
> (gdb) b avx512_lookup_impl
> Breakpoint 2 at 0x55e92342a8df: avx512_lookup_impl. (4 locations)
> Dump of assembler code for function dpcls_avx512_gather_skx_mf_5_1:
> 96 const uint64_t tbl_u0 = subtable->mask.mf.map.bits[0];
> 0x000055e92342a8df <+31>: mov 0x30(%rdi),%r8
> 97 const uint64_t tbl_u1 = subtable->mask.mf.map.bits[1];
> 0x000055e92342a8e3 <+35>: mov 0x38(%rdi),%r9
> 98 ovs_assert(__builtin_popcountll(tbl_u0) == bit_count_u0);
<snip some ASM>
(gdb) disas dpcls_avx512_gather_skx_mf_5_1
<snip preamble>
0x0000555556103f34 <+724>: vmovdqu64 0x28(%rdi),%zmm2{%k1}{z}
0x0000555556103f3e <+734>: vmovdqu64 0x18(%rcx),%zmm0{%k1}{z}
0x0000555556103f48 <+744>: vpandd %zmm0,%zmm1,%zmm0
0x0000555556103f4e <+750>: vpcmpeqq %zmm2,%zmm0,%k7{%k1}
Disassembly here shows AVX512 register usage here, as expected.
Note the "avx512_lookup_impl" is a static function in a .c file, so it is not
visible
outside the compilation unit. Further, it is also marked "ALWAYS_INLINE", so
even
inside the compilation unit, there isn't a symbol with that name. I'm surprised
GDB
let me set a breakpoint on it. Disassembling it doesn't work:
(gdb) b avx512_lookup_impl
Breakpoint 2 at 0x5555561035af: avx512_lookup_impl. (4 locations)
(gdb) disas avx512_lookup_impl
No symbol "avx512_lookup_impl" in current context.
The functions it is inlined into are available for disassembly, as their symbols
do exist in the binary. (Sidenote: Going to add dpcls_ to the _any function for
consistency in naming with the others);
dpcls_avx512_gather_skx_mf_4_0
dpcls_avx512_gather_skx_mf_4_1
dpcls_avx512_gather_skx_mf_5_1
avx512_gather_any
Disassembling the _any version of the avx512 lookup function here
shows the AVX512 instructions, using ZMM registers and {k} masks.
(gdb) disas avx512_gather_mf_any
Dump of assembler code for function avx512_gather_mf_any:
0x0000555556103fb0 <+0>: lea 0x8(%rsp),%r10
0x0000555556103fb5 <+5>: and $0xffffffffffffffc0,%rsp
0x0000555556103fb9 <+9>: pushq -0x8(%r10)
<skipping preamble/pushes etc, to the fun AVX512 part>
0x00005555561040dd <+301>: vpandd %zmm0,%zmm5,%zmm0
0x00005555561040e3 <+307>: or %rdi,%rax
0x00005555561040e6 <+310>: test %r8,%r8
0x00005555561040e9 <+313>: kmovb %eax,%k4
0x00005555561040ed <+317>: vpsrlq $0x4,%zmm0,%zmm2
0x00005555561040f4 <+324>: vpandd %zmm3,%zmm0,%zmm0
0x00005555561040fa <+330>: vpandd %zmm2,%zmm3,%zmm2
0x0000555556104100 <+336>: vpshufb %zmm0,%zmm4,%zmm0
0x0000555556104106 <+342>: vpshufb %zmm2,%zmm4,%zmm2
0x000055555610410c <+348>: vpaddb %zmm2,%zmm0,%zmm0
0x0000555556104112 <+354>: vpsadbw %zmm7,%zmm0,%zmm0
0x0000555556104118 <+360>: vpaddq %zmm1,%zmm0,%zmm0
0x000055555610411e <+366>: vmovdqa64 %zmm8,%zmm1
0x0000555556104124 <+372>: vpgatherqq 0x18(%r9,%zmm0,8),%zmm1{%k3}
0x000055555610412c <+380>: vpandq %zmm6,%zmm1,%zmm0{%k4}{z}
Would you try some of the above and see can it be reproduced?
Regards, -Harry
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev