Hi Ian, Thanks for the review. My responses are inline.
> -----Original Message----- > From: Stokes, Ian <[email protected]> > Sent: Tuesday 15 June 2021 18:04 > To: Ferriter, Cian <[email protected]>; [email protected]; > Ferriter, Cian <[email protected]>; Van Haaren, Harry > <[email protected]> > Cc: [email protected] > Subject: RE: [ovs-dev] [v12 15/16] dpif-netdev: Optimize dp output action. > > > From: Harry van Haaren <[email protected]> > > > > This commit optimizes the output action, by enabling the compiler to > > optimize the code better through reducing code complexity. > > > > The core concept of this optimization is that the array-length checks > > have already been performed above the copying code, so can be removed. > > Removing of the per-packet length checks allows the compiler to > > auto-vectorize > > the stores using SIMD registers. > > > > Signed-off-by: Harry van Haaren <[email protected]> > > > > Thanks for the patch Cian/Harry. > > Comments inline. > > --- > > > > v8: Add NEWS entry. > > --- > > NEWS | 1 + > > lib/dpif-netdev.c | 23 ++++++++++++++++++----- > > 2 files changed, 19 insertions(+), 5 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index d04dac746..a7ec34af1 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -17,6 +17,7 @@ Post-v2.15.0 > > * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if > > the > > CPU supports it. This enhances performance by using the native > > vpopcount > > instructions, instead of the emulated version of vpopcount. > > + * Optimize dp_netdev_output by enhancing compiler optimization > > potential. > > - ovs-ctl: > > * New option '--no-record-hostname' to disable hostname configuration > > in ovsdb on startup. > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 04a4f71cb..f46269ae3 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -7290,12 +7290,25 @@ dp_execute_output_action(struct > > dp_netdev_pmd_thread *pmd, > > pmd->n_output_batches++; > > } > > > > - struct dp_packet *packet; > > - DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) { > > - p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] = > > - pmd->ctx.last_rxq; > > - dp_packet_batch_add(&p->output_pkts, packet); > > + /* The above checks ensure that there is enough space in the output > > batch. > > + * Using dp_packet_batch_add() has a branch to check if the batch is > > full. > > + * This branch reduces the compiler's ability to optimize efficiently. > > The > > + * below code implements packet movement between batches without > > checks, > > + * with the required semantics of output batch perhaps containing > > packets. > > + */ > > What is the performance gain recorded with this approach with compiler > optimizations? > The test was run using the scalar DPIF, 1 flow and EMC on, where the before/after cycle cost of the patch is 243 cycles before to 230 cycles per packet after. Hence this optimization saves 13 cycles per packet in my measurements. > > + int batch_size = dp_packet_batch_size(packets_); > > + int out_batch_idx = dp_packet_batch_size(&p->output_pkts); > > + struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq; > > + struct dp_packet_batch *output_batch = &p->output_pkts; > > + > > + for (int i = 0; i < batch_size; i++) { > > + struct dp_packet *packet = packets_->packets[i]; > > + p->output_pkts_rxqs[out_batch_idx] = rxq; > > + output_batch->packets[out_batch_idx] = packet; > > + out_batch_idx++; > > } > > + output_batch->count += batch_size; > > So I guess the counter argument here is that the previous approach uses the > dp_packet standard functions. > > There has been work over time to ensure that dp_packets are handled in a > similar way using the dp_packet api. > > This I think reverses the position. That's why I'd like to get a sense of the > performance gain as I think it's important to justify breaking > from code conformance. > > Has there been thought to modifying the dp_packet_batch_add ? > We thought about adding an "unsafe" version of dp_packet_batch_add() which doesn't check whether the packet batch is full. Due to this optimization being dependent on other checks in the surrounding code, it isn't appropriate to have a generic function containing this optimization. > Regards > Ian > > + > > return true; > > } > > > > -- > > 2.31.1 > > > > _______________________________________________ > > dev mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
