Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

Eli Britstein Wed, 14 Jul 2021 08:21:02 -0700


On 7/14/2021 5:58 PM, Ferriter, Cian wrote:

External email: Use caution opening links or attachments

-----Original Message-----
From: Ilya Maximets <i.maxim...@ovn.org>
Sent: Friday 9 July 2021 21:53
To: Ferriter, Cian <cian.ferri...@intel.com>; Gaëtan Rivet <gr...@u256.net>; 
Eli Britstein
<el...@nvidia.com>; d...@openvswitch.org; Van Haaren, Harry 
<harry.van.haa...@intel.com>
Cc: Majd Dibbiny <m...@nvidia.com>; Ilya Maximets <i.maxim...@ovn.org>; Stokes, 
Ian
<ian.sto...@intel.com>; Flavio Leitner <f...@sysclose.org>
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On 7/8/21 6:43 PM, Ferriter, Cian wrote:

Hi Gaetan, Eli and all,

Thanks for the patch and the info on how it affects performance in your case. I 
just wanted to post

the performance we are seeing.

I've posted the numbers inline. Please note, I'll be away on leave till Tuesday.
Thanks,
Cian

-----Original Message-----
From: Gaëtan Rivet <gr...@u256.net>
Sent: Wednesday 7 July 2021 17:36
To: Eli Britstein <el...@nvidia.com>; <d...@openvswitch.org> 
<d...@openvswitch.org>; Van Haaren,

Harry

<harry.van.haa...@intel.com>; Ferriter, Cian <cian.ferri...@intel.com>
Cc: Majd Dibbiny <m...@nvidia.com>; Ilya Maximets <i.maxim...@ovn.org>
Subject: Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

On Wed, Jul 7, 2021, at 17:05, Eli Britstein wrote:

Port numbers are usually small. Maintain an array of netdev handles indexed
by port numbers. It accelerates looking up for them for
netdev_hw_miss_packet_recover().

Reported-by: Cian Ferriter <cian.ferri...@intel.com>
Signed-off-by: Eli Britstein <el...@nvidia.com>
Reviewed-by: Gaetan Rivet <gaet...@nvidia.com>
---

<snipped patch contents>

_______________________________________________
dev mailing list
d...@openvswitch.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=rv%2FdenANxrcTGxBBbRvhhlNioyswL7ieFr8AGcGtCs8%3D&amp;reserved=0

Hello,

I tested the performance impact of this patch with a partial offload setup.
As reported by pmd-stats-show, in average cycles per packet:

Before vxlan-decap: 525 c/p
After vxlan-decap: 542 c/p
After this fix: 530 c/p

Without those fixes, vxlan-decap has a 3.2% negative impact on cycles,
with the fixes, the impact is reduced to 0.95%.

As I had to force partial offloads for our hardware, it would be better
with an outside confirmation on a proper setup.

Kind regards,
--
Gaetan Rivet

I'm showing the performance relative to what we measured on OVS master directly 
before the VXLAN

HWOL changes went in. All of the below results are using the scalar DPIF and 
partial HWOL.

Link to "Fixup patches": 
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Flist%2F%3Fseries%3D252356&amp;data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Y62OrCRyS00vJHHPQvAHyhG5C4eO%2FSfWMCSPtszn3Is%3D&amp;reserved=0

Master before VXLAN HWOL changes (f0e4a73)
1.000x

Latest master after VXLAN HWOL changes (b780911)
0.918x (-8.2%)

After fixup patches on OVS ML are applied (with ALLOW_EXPERIMENTAL_API=off)
0.973x (-2.7%)

After fixup patches on OVS ML are applied and after ALLOW_EXPERIMENTAL_API is 
removed.
0.938x (-6.2%)

I ran the last set of results by applying the below diff. I did this because 
I'm assuming the plan

is to remove the ALLOW_EXPERIMENTAL_API '#ifdef's at some point?

Yes, that is the plan.

Thanks for confirming this.

And thanks for testing, Gaetan and Cian!

Could you also provide more details on your test environment,
so someone else can reproduce?

Good idea, I'll add the details inline below. These details apply to the 
performance measured previously by me, and the performance in this mail.

What is important to know:
- Test configuration: P2P, V2V, PVP, etc.


P2P
1 PHY port
1 RXQ

- Test type: max. throughput, zero packet loss.

Max throughput.

- OVS config: EMC, SMC, HWOL, AVX512 - on/off/type

In all tests, all packets hit a single datapath flow with "offloaded:partial". 
So all packets are partially offloaded, skipping miniflow_extract() and EMC/SMC/DPCLS 
lookups.

AVX512 is off.

- Installed OF rules.

$ $OVS_DIR/utilities/ovs-ofctl dump-flows br0
  cookie=0x0, duration=253.691s, table=0, n_packets=2993867136, 
n_bytes=179632028160, in_port=phy0 actions=IN_PORT

- Traffic pattern: Packet size, number of flows, packet type.

64B, 1 flow, ETH/IP packets.

This tests also didn't include the fix from Balazs, IIUC, because
they were performed a bit before that patch got accepted.

Correct, the above tests didn't include the optimization from Balazs.

And Flavio reported what seems to be noticeable performance
drop due to just accepted AVX512 DPIF implementation for the
non-HWOL non-AVX512 setup:
   
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fpipermail%2Fovs-dev%2F2021-July%2F385448.html&amp;data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Co%2FugazVORXH43uWlB2UUAoP%2BGdHpdwisKs10BPFW7c%3D&amp;reserved=0

We are testing partial HWOL setups, so I don't think Flavio's mail is relevant 
to this.

So, it seems that everything will need to be re-tested anyway
in order to understand what is the current situation.

Agreed, let's retest the performance. I've included the new numbers below:

I'm showing the performance relative to what we measured on OVS master directly 
before the VXLAN HWOL changes went in. All of the below results are using the 
scalar DPIF and partial HWOL.
Link to "Fixup patches" (v2): 
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Fopenvswitch%2Flist%2F%3Fseries%3D253488&amp;data=04%7C01%7Celibr%40nvidia.com%7C7ca0caf9434e429e4ffd08d946d7cf2f%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637618715041410254%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ejj3KdwvnZwYfiYM%2BEcpL5tnb3JIbtdhMmaMMhxJkSo%3D&amp;reserved=0

Master before VXLAN HWOL changes. (f0e4a73)
1.000x

Master after VXLAN HWOL changes. (d53ea18)
0.964x (-3.6%)

After rebased fixup patches on OVS ML are applied. (with 
ALLOW_EXPERIMENTAL_API=off)
0.993x (-0.7%)

After rebased fixup patches on OVS ML are applied and after 
ALLOW_EXPERIMENTAL_API is removed.
0.961x (-3.9%)

According to this, we are better off without the array cache commit, atleast in this test (-3.6% vs -3.9%). Isn't it?

In our internal tests we saw it actually improves, though not gainingback all the degradation.


What do you think?


So the performance is looking better now because of the optimization from 
Balazs. This together with #ifdefing out the code brings the performance almost 
to where it was before.

I'm worried that the #ifdef is a temporary solution for the partial HWOL case, 
since eventually that will be removed. This leaves us with a -3.9% degradation.

Thanks,
Cian

Best regards, Ilya Maximets.

Diff:
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index accb23a1a..0e29c609f 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -7132,7 +7132,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
      struct netdev *netdev OVS_UNUSED;
      uint32_t mark;

-#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
      /* Restore the packet if HW processing was terminated before completion. 
*/
      netdev = pmd_netdev_cache_lookup(pmd, port_no);
      if (OVS_LIKELY(netdev)) {
@@ -7143,7 +7142,6 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
              return -1;
          }
      }
-#endif

      /* If no mark, no flow to find. */
      if (!dp_packet_has_flow_mark(packet, &mark)) {

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Re: [ovs-dev] [PATCH 2/2] dpif-netdev: Introduce netdev array cache

Reply via email to