> From: Harry van Haaren <[email protected]>
>
> This commit optimizes the output action, by enabling the compiler to
> optimize the code better through reducing code complexity.
>
> The core concept of this optimization is that the array-length checks
> have already been performed above the copying code, so can be removed.
> Removing of the per-packet length checks allows the compiler to auto-vectorize
> the stores using SIMD registers.
>
> Signed-off-by: Harry van Haaren <[email protected]>
>
Thanks for the patch Cian/Harry.
Comments inline.
> ---
>
> v8: Add NEWS entry.
> ---
> NEWS | 1 +
> lib/dpif-netdev.c | 23 ++++++++++++++++++-----
> 2 files changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index d04dac746..a7ec34af1 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,6 +17,7 @@ Post-v2.15.0
> * Enable the AVX512 DPCLS implementation to use VPOPCNT instruction if
> the
> CPU supports it. This enhances performance by using the native
> vpopcount
> instructions, instead of the emulated version of vpopcount.
> + * Optimize dp_netdev_output by enhancing compiler optimization
> potential.
> - ovs-ctl:
> * New option '--no-record-hostname' to disable hostname configuration
> in ovsdb on startup.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 04a4f71cb..f46269ae3 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -7290,12 +7290,25 @@ dp_execute_output_action(struct
> dp_netdev_pmd_thread *pmd,
> pmd->n_output_batches++;
> }
>
> - struct dp_packet *packet;
> - DP_PACKET_BATCH_FOR_EACH (i, packet, packets_) {
> - p->output_pkts_rxqs[dp_packet_batch_size(&p->output_pkts)] =
> - pmd->ctx.last_rxq;
> - dp_packet_batch_add(&p->output_pkts, packet);
> + /* The above checks ensure that there is enough space in the output
> batch.
> + * Using dp_packet_batch_add() has a branch to check if the batch is
> full.
> + * This branch reduces the compiler's ability to optimize efficiently.
> The
> + * below code implements packet movement between batches without
> checks,
> + * with the required semantics of output batch perhaps containing
> packets.
> + */
What is the performance gain recorded with this approach with compiler
optimizations?
> + int batch_size = dp_packet_batch_size(packets_);
> + int out_batch_idx = dp_packet_batch_size(&p->output_pkts);
> + struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
> + struct dp_packet_batch *output_batch = &p->output_pkts;
> +
> + for (int i = 0; i < batch_size; i++) {
> + struct dp_packet *packet = packets_->packets[i];
> + p->output_pkts_rxqs[out_batch_idx] = rxq;
> + output_batch->packets[out_batch_idx] = packet;
> + out_batch_idx++;
> }
> + output_batch->count += batch_size;
So I guess the counter argument here is that the previous approach uses the
dp_packet standard functions.
There has been work over time to ensure that dp_packets are handled in a
similar way using the dp_packet api.
This I think reverses the position. That's why I'd like to get a sense of the
performance gain as I think it's important to justify breaking from code
conformance.
Has there been thought to modifying the dp_packet_batch_add ?
Regards
Ian
> +
> return true;
> }
>
> --
> 2.31.1
>
> _______________________________________________
> dev mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
_______________________________________________
dev mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-dev