[dpdk-dev] TX performance regression caused by the mbuf cachline split

2016-02-19 Thread Olivier MATZ
Hi Paul, On 02/15/2016 08:15 PM, Paul Emmerich wrote: > The bulk_alloc patch is great and helps. I'd love to see such a function > in DPDK. > A patch has been submitted by Huawei. I guess it will be integrated soon. See http://dpdk.org/dev/patchwork/patch/10122/ Regards, Olivier

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2016-02-15 Thread Paul Emmerich
Hi, here's a kind of late follow-up. I've only recently found the need (mostly for the better support of XL710 NICs (which I still dislike but people are using them...)) to seriously address DPDK 2.x support in MoonGen. On 13.05.15 11:03, Ananyev, Konstantin wrote: > Before start to discuss

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-13 Thread Ananyev, Konstantin
Hi Paul, > -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Paul Emmerich > Sent: Tuesday, May 12, 2015 12:19 AM > To: dev at dpdk.org > Subject: Re: [dpdk-dev] TX performance regression caused by the mbuf cachline > split > > Found

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Marc Sune
On 12/05/15 02:28, Marc Sune wrote: > > > On 12/05/15 01:18, Paul Emmerich wrote: >> Found a really simple solution that almost restores the original >> performance: just add a prefetch on alloc. For some reason, I assumed >> that this was already done since the troublesome commit I >>

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Marc Sune
On 12/05/15 01:18, Paul Emmerich wrote: > Found a really simple solution that almost restores the original > performance: just add a prefetch on alloc. For some reason, I assumed > that this was already done since the troublesome commit I investigated > mentioned something about

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Paul Emmerich
Found a really simple solution that almost restores the original performance: just add a prefetch on alloc. For some reason, I assumed that this was already done since the troublesome commit I investigated mentioned something about prefetching... I guess the commit referred to the hardware

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-12 Thread Paul Emmerich
Paul Emmerich: > I naively tried to move the pool pointer into the first cache line in > the v2.0.0 tag and the performance actually decreased, I'm not yet sure > why this happens. There are probably assumptions about the cacheline > locations and prefetching in the code that would need to be

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-11 Thread Paul Emmerich
Hi Luke, thanks for your suggestion, I actually looked at how your packet generator in SnabbSwitch works before and it's quite clever. But unfortunately that's not what I'm looking for. I'm looking for a generic solution that works with whatever NIC is supported by DPDK and I don't want to

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-11 Thread Luke Gorrie
Hi Paul, On 11 May 2015 at 02:14, Paul Emmerich wrote: > Another possible solution would be a more dynamic approach to mbufs: Let me suggest a slightly more extreme idea for your consideration. This method can easily do > 100 Mpps with one very lightly loaded core. I don't know if it works

[dpdk-dev] TX performance regression caused by the mbuf cachline split

2015-05-11 Thread Paul Emmerich
Hi, this is a follow-up to my post from 3 weeks ago [1]. I'm starting a new thread here since I now got a completely new test setup for improved reproducibility. Background for anyone that didn't catch my last post: I'm investigating a performance regression in my packet generator [2] that