Re: [dpdk-users] How to use software prefetching for custom structures to increase throughput on the fast path

Arvind Narayanan Tue, 11 Sep 2018 11:39:59 -0700

Stephen, thanks!

That is it! Not sure if there is any workaround.

So, essentially, what I am doing is -- core 0 gets a burst of my_packet(s)
from its pre-allocated mempool, and then (bulk) enqueues it into a
rte_ring. Core 1 then (bulk) dequeues from this ring and when it access the
data pointed by the ring's element (i.e. my_packet->tag1), this memory
access latency issue is seen. I cannot advance the prefetch any earlier. Is
there any clever workaround (or hack) to overcome this issue other than
using the same core for all the functions? For e.g. can I can prefetch the
packets in core 0 for core 1's cache (could be a dumb question!)?

Thanks,
Arvind

On Tue, Sep 11, 2018 at 1:07 PM Stephen Hemminger <
[email protected]> wrote:

> On Tue, 11 Sep 2018 12:18:42 -0500
> Arvind Narayanan <[email protected]> wrote:
>
> > If I don't do any processing, I easily get 10G. It is only when I access
> > the tag when the throughput drops.
> > What confuses me is if I use the following snippet, it works at line
> rate.
> >
> > ```
> > int temp_key = 1; // declared outside of the for loop
> >
> > for (i = 0; i < pkt_count; i++) {
> >     if (rte_hash_lookup_data(rx_table, &(temp_key), (void **)&val[i]) <
> 0) {
> >     }
> > }
> > ```
> >
> > But as soon as I replace `temp_key` with `my_packet->tag1`, I experience
> > fall in throughput (which in a way confirms the issue is due to cache
> > misses).
>
> Your packet data is not in cache.
> Doing prefetch can help but it is very timing sensitive. If prefetch is
> done
> before data is available it won't help. And if prefetch is done just before
> data is used then there isn't enough cycles to get it from memory to the
> cache.
>
>
>

Re: [dpdk-users] How to use software prefetching for custom structures to increase throughput on the fast path

Reply via email to