> -----Original Message-----
> From: Flavio Leitner <[email protected]>
> Sent: Thursday 15 July 2021 19:58
> To: Ferriter, Cian <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD
> statistic.
>
> On Thu, Jul 15, 2021 at 01:39:04PM +0000, Ferriter, Cian wrote:
> >
> >
> > > -----Original Message-----
> > > From: Flavio Leitner <[email protected]>
> > > Sent: Friday 9 July 2021 18:54
> > > To: Ferriter, Cian <[email protected]>
> > > Cc: [email protected]; [email protected]
> > > Subject: Re: [ovs-dev] [v15 06/10] dpif-netdev: Add a partial HWOL PMD
> > > statistic.
> > >
> > >
> > >
> > > Hi,
> > >
> > > After rebasing, the performance of branch master boosted in my env
> > > from 12Mpps to 13Mpps. However, this specific patch brings down
> > > to 12Mpps. I am using dpif_scalar and generic lookup (no AVX512).
> > >
> >
> > Thanks for the investigation. Always great seeing perf numbers and details!
> >
> > I just want to check my understanding here with what you're seeing:
> >
> > Performance before DPIF patchset
> > 12Mpps
> >
> > Performance at this patch
> > 12Mpps
> >
> > Performance after DPIF patchset
> > 13Mpps
> >
> > So the performance recovers somewhere else in the patchset?
>
>
> Interesting, which flags are you passing to build OVS?
>
> Thanks for following up!
> fbl
>
>
My flags:
./configure CFLAGS="-g -Ofast -march=native" --with-dpdk=static
This is how I build OVS to get the performance numbers below.
> >
> > I've checked the performance behaviour in my case. I'm going to report
> > relative performance numbers.
> They are relative to master branch before AVX512 DPIF was applied (c36c8e3).
> > I tried to run a similar testcase, I can see you are using EMC from the
> > memcmp in perf top output. I
> am also using the scalar DPIF in all the below testcases.
> >
> > Master before AVX512 DPIF (c36c8e3)
> > 1.000x (0.0%)
> > DPIF patch 3 - dpif-avx512: Add ISA implementation of dpif.
> > 1.010x (1.0%)
> > DPIF patch 4 - dpif-netdev: Add command to switch dpif implementation.
> > 1.042x (4.2%)
> > DPIF patch 5 - dpif-netdev: Add command to get dpif implementations.
> > 1.063x (6.3%)
> > DPIF patch 6 - dpif-netdev: Add a partial HWOL PMD statistic.
> > 1.069x (6.9%)
> > Latest master which has AVX512 DPIF patches (d2e9703)
> > 1.075x (7.5%)
> > Master before AVX512 DPIF (c36c8e3), with prefetch change
> > 0.983x (-1.7%)
> > Latest master which has AVX512 DPIF patches (d2e9703), with prefetch change
> > 1.080x (8.0%)
> >
> > > (I don't think this report should block the patch because the
> > > counter are interesting and the analysis below doesn't point
> > > directly to the proposed changes.)
> > >
> > > This is a diff using all patches applied versus this patch reverted:
> > > 21.44% +6.08% ovs-vswitchd [.] miniflow_extract
> > > 8.94% -1.92% libc-2.28.so [.] __memcmp_avx2_movbe
> > > 14.62% +1.44% ovs-vswitchd [.] dp_netdev_input__
> > > 2.80% -1.08% ovs-vswitchd [.]
> > > dp_netdev_pmd_flush_output_on_port
> > > 3.44% -0.91% ovs-vswitchd [.] netdev_send
> > >
> > > This is the code side by side, patch applied on the right side:
> > > (sorry, long lines)
> > >
> >
> > My mail client has wrapped the below lines, sorry for mangling the output!
> >
> > <snip mangled perf diff output>
> > Please find it here:
> > https://mail.openvswitch.org/pipermail/ovs-dev/2021-July/385448.html
> >
> > >
> > >
> > > I don't see any relevant optimization difference in the code
> > > above, but the "mov %r15w,-0x2(%r13)" on the right side accounts
> > > for almost all the difference, though on the left side it seems
> > > a bit more spread.
> > >
> > > I applied the patch below and it helped to get to 12.7Mpps, so
> > > almost at the same levels. I wonder if you see the same result.
> > >
> >
> > Since I don't see the drop that you see with this patch, when I apply the
> > below patch to the latest
> master, I see a smaller benefit.
> > The relative performance after adding the below prefetch compared to before
> > (latest master):
> > 1.005x (0.5%)
> >
> > When I compare before/after performance (including the prefetch code, on
> > latest master), the overall
> performance difference is 0.5% here.
> >
> > > diff --git a/lib/flow.c b/lib/flow.c
> > > index 729d59b1b..4572e356b 100644
> > > --- a/lib/flow.c
> > > +++ b/lib/flow.c
> > > @@ -746,6 +746,9 @@ miniflow_extract(struct dp_packet *packet, struct
> > > miniflow *dst)
> > > uint8_t *ct_nw_proto_p = NULL;
> > > ovs_be16 ct_tp_src = 0, ct_tp_dst = 0;
> > >
> > > + /* dltype will be updated later. */
> > > + OVS_PREFETCH_WRITE(miniflow_pointer(mf, dl_type));
> > > +
> > > /* Metadata. */
> > > if (flow_tnl_dst_is_set(&md->tunnel)) {
> > > miniflow_push_words(mf, tunnel, &md->tunnel,
> > >
> > >
> > > fbl
> > >
> >
> > <snip actual patch away>
> >
> > Thanks,
> > Cian
>
> --
> fbl
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev