[PATCH mm] slab: implement bulking for SLAB allocator

2015-09-08 Thread Jesper Dangaard Brouer
(bulk:2048) 90 cycles(tsc) 22.585 ns (bulk:4096) [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slab.c | 87 +++--

Re: slab-nomerge (was Re: [git pull] device mapper changes for 4.3)

2015-09-07 Thread Jesper Dangaard Brouer
On Mon, 7 Sep 2015 13:22:13 -0700 Linus Torvalds <torva...@linux-foundation.org> wrote: > On Mon, Sep 7, 2015 at 2:30 AM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > > > The slub allocator have a faster "fastpath", if your workload is > >

Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()

2015-09-07 Thread Jesper Dangaard Brouer
an_tx_irq() 373 ns. At 10Gbit/s how many bytes can arrive in this period, only: 466 bytes. ((373/10^9)*(1*10^6)/8) -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/br

Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.

2015-09-09 Thread Jesper Dangaard Brouer
On Tue, 8 Sep 2015 12:32:40 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Sat, 5 Sep 2015, Jesper Dangaard Brouer wrote: > > > The double_cmpxchg without lock prefix still cost 9 cycles, which is > > very fast but still a cost (add approx 19

Experiences with slub bulk use-case for network stack

2015-09-16 Thread Jesper Dangaard Brouer
the API of always returning the exact number of requested objects will not work... -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer (related to http://t

Re: Experiences with slub bulk use-case for network stack

2015-09-17 Thread Jesper Dangaard Brouer
On Wed, 16 Sep 2015 10:13:25 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Wed, 16 Sep 2015, Jesper Dangaard Brouer wrote: > > > > > Hint, this leads up to discussing if current bulk *ALLOC* API need to > > be changed... > > > > Alex and

Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 11:30:00 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote: > > > Not knowing SLUB as well as you, it took me several hours to realize > > init_object() didn't overwrite the freepointer

Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 11:28:15 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote: > > > > Do you really need separate parameters for freelist_head? If you just want > > > to deal with one object pass it as

[MM PATCH V4 3/6] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG

2015-09-29 Thread Jesper Dangaard Brouer
The #ifdef of CONFIG_SLUB_DEBUG is located very far from the associated #else. For readability mark it with a comment. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Acked-by: Christoph Lameter <c...@linux.com> --- mm/slub.c |2 +- 1 file changed, 1 insertion(+)

[MM PATCH V4 0/6] Further optimizing SLAB/SLUB bulking

2015-09-29 Thread Jesper Dangaard Brouer
soon as I've cleaned it up, rebased it on net-next and re-run all the benchmarks. --- Christoph Lameter (2): slub: create new ___slab_alloc function that can be called with irqs disabled slub: Avoid irqoff/on in bulk allocation Jesper Dangaard Brouer (4): slub: mark the da

[MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist

2015-09-29 Thread Jesper Dangaard Brouer
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> Acked-by: Christoph Lameter <c.

Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Tue, 29 Sep 2015 09:38:30 -0700 Alexander Duyck <alexander.du...@gmail.com> wrote: > On 09/29/2015 08:48 AM, Jesper Dangaard Brouer wrote: > > Make it possible to free a freelist with several objects by adjusting > > API of slab_free() and __slab_free() to have head

[MM PATCH V4 4/6] slab: implement bulking for SLAB allocator

2015-09-29 Thread Jesper Dangaard Brouer
://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Acked-by: Christoph Lameter <c...@linux.com> --- mm/slab.c | 87 +++-- 1 file changed, 6

[MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
no performance reduction due to this change, when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- V4: - Change API per req of Christoph Lameter - Rem

[MM PATCH V4 2/6] slub: Avoid irqoff/on in bulk allocation

2015-09-29 Thread Jesper Dangaard Brouer
ter <c...@linux.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slub.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 02cfb3a5983e..024eed32da2c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@

[MM PATCH V4 1/6] slub: create new ___slab_alloc function that can be called with irqs disabled

2015-09-29 Thread Jesper Dangaard Brouer
which promptly disables them again using the expensive local_irq_save(). Signed-off-by: Christoph Lameter <c...@linux.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slub.c | 44 +--- 1 file changed, 29 insertions(+), 15

[MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-09-30 Thread Jesper Dangaard Brouer
no performance reduction due to this change, when debugging is turned off (compiled with CONFIG_SLUB_DEBUG). Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- V4: - Change API per req of Christoph Lameter - Rem

Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 10:16:49 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Mon, 28 Sep 2015, Jesper Dangaard Brouer wrote: > > > diff --git a/mm/slub.c b/mm/slub.c > > index 1cf98d89546d..13b5f53e4840 100644 > > --- a/mm/slub.c > > +++ b/mm/slu

Re: [PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk()

2015-09-28 Thread Jesper Dangaard Brouer
On Mon, 28 Sep 2015 07:53:16 -0700 Alexander Duyck <alexander.du...@gmail.com> wrote: > On 09/28/2015 05:26 AM, Jesper Dangaard Brouer wrote: > > For practical use-cases it is beneficial to prefetch the next freelist > > object in bulk allocation loop. > > > >

Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-10-02 Thread Jesper Dangaard Brouer
On Fri, 2 Oct 2015 11:41:18 +0200 Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Thu, 1 Oct 2015 15:10:15 -0700 > Andrew Morton <a...@linux-foundation.org> wrote: > > > On Wed, 30 Sep 2015 13:44:19 +0200 Jesper Dangaard Brouer > > <bro...@redhat.com

Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-10-02 Thread Jesper Dangaard Brouer
On Fri, 2 Oct 2015 05:10:02 -0500 (CDT) Christoph Lameter <c...@linux.com> wrote: > On Fri, 2 Oct 2015, Jesper Dangaard Brouer wrote: > > > Thus, I need introducing new code like this patch and at the same time > > have to reduce the number of instruction-cache misses/usa

Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-10-02 Thread Jesper Dangaard Brouer
On Thu, 1 Oct 2015 15:10:15 -0700 Andrew Morton <a...@linux-foundation.org> wrote: > On Wed, 30 Sep 2015 13:44:19 +0200 Jesper Dangaard Brouer <bro...@redhat.com> > wrote: > > > Make it possible to free a freelist with several objects by adjusting > > A

bisect: bug X11 not working for SSH on net-next commit 6ae459bda

2015-10-02 Thread Jesper Dangaard Brouer
n git tree "net" by commit 31b33dfb0a14 ("skbuff: Fix skb checksum partial check."). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer keywords:

[PATCH 1/7] slub: create new ___slab_alloc function that can be called with irqs disabled

2015-09-28 Thread Jesper Dangaard Brouer
which promptly disables them again using the expensive local_irq_save(). Signed-off-by: Christoph Lameter <c...@linux.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slub.c | 44 +--- 1 file changed, 29 insertions(+), 15

[PATCH 0/7] Further optimizing SLAB/SLUB bulking

2015-09-28 Thread Jesper Dangaard Brouer
mem_cache_alloc_bulk() mm/slab.c | 87 ++- mm/slub.c | 276 + 2 files changed, 267 insertions(+), 96 deletions(-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Autho

[PATCH 2/7] slub: Avoid irqoff/on in bulk allocation

2015-09-28 Thread Jesper Dangaard Brouer
ter <c...@linux.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slub.c | 24 +++- 1 file changed, 11 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 02cfb3a5983e..024eed32da2c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@

[PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk()

2015-09-28 Thread Jesper Dangaard Brouer
cycles(tsc) - 27 cycles(tsc) - increase in cycles:0 158 - 30 cycles(tsc) - 30 cycles(tsc) - increase in cycles:0 250 - 37 cycles(tsc) - 37 cycles(tsc) - increase in cycles:0 Note, benchmark done with slab_nomerge to keep it stable enough for accurate comparison. Signed-off-by: Jesper Dangaard

[PATCH 3/7] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG

2015-09-28 Thread Jesper Dangaard Brouer
The #ifdef of CONFIG_SLUB_DEBUG is located very far from the associated #else. For readability mark it with a comment. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slub.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/slub.c b/mm/slub.c

[PATCH 4/7] slab: implement bulking for SLAB allocator

2015-09-28 Thread Jesper Dangaard Brouer
(bulk:2048) 90 cycles(tsc) 22.585 ns (bulk:4096) [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- mm/slab.c | 87 +++--

[PATCH 5/7] slub: support for bulk free with SLUB freelists

2015-09-28 Thread Jesper Dangaard Brouer
Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- mm/slub.c | 97 + 1 file changed, 84 insertions(+), 13 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index

[PATCH 6/7] slub: optimize bulk slowpath free by detached freelist

2015-09-28 Thread Jesper Dangaard Brouer
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- mm/slub.c | 109

[net-next PATCH] net: help compiler generate better code in eth_get_headlen

2015-09-28 Thread Jesper Dangaard Brouer
Dangaard Brouer <bro...@redhat.com> --- net/ethernet/eth.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index d850fdc828f9..9e63f252a89e 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -127,7 +127,7 @@ u32 eth_get_h

Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists

2015-09-29 Thread Jesper Dangaard Brouer
On Tue, 29 Sep 2015 10:20:20 -0700 Alexander Duyck <alexander.du...@gmail.com> wrote: > On 09/29/2015 10:00 AM, Jesper Dangaard Brouer wrote: > > On Tue, 29 Sep 2015 09:38:30 -0700 > > Alexander Duyck <alexander.du...@gmail.com> wrote: > > > >> On 09/29/2

Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

2015-10-05 Thread Jesper Dangaard Brouer
mpling happens. So you may > have large skid and the sampling points may be far away. Skylake has new > special FRONTEND_* PEBS events for this, but before it was often difficult. This testlab CPU is i7-4790K @ 4.00GHz. Maybe I should get a Skylake... p.s. thanks for your pmu-tools[1], eve

Re: [PATCH][iproute2] tc/q_htb.c: Fix the MPU value output in 'tc -d class show dev ' command

2015-12-16 Thread Jesper Dangaard Brouer
nt_size(hopt->rate.mpu, b2)); > + fprintf(f, "cburst %s/%u mpu %s ", > sprint_size(cbuffer, b1), > 1<ceil.cell_log, > - sprint_size(hopt->ceil.mpu&0xFF, b2)

Re: [RFC PATCH 05/12] net: sched: per cpu gso handlers

2015-12-30 Thread Jesper Dangaard Brouer
is point the > skb has already been popped off the qdisc so it has to be handled > by the infrastructure. I generally like this idea of resolving this per cpu. (I stalled here, on the requeue issue, last time I implemented a lockless qdisc approach). -- Best regards, Jesper Dangaard Broue

Re: [PATCH 2/2] [iproute2] tc/q_htb.c: rename b4 buffer to b3 to make its name more consistent

2015-12-18 Thread Jesper Dangaard Brouer
On Fri, 18 Dec 2015 16:16:39 +0300 Dmitrii Shcherbakov <fw.dmit...@yandex.com> wrote: > b3 buffer has been deleted previously so b2 is followed by b4 which is not > consistent > > Signed-off-by: Dmitrii Shcherbakov <fw.dmit...@yandex.com> > --- Acked-by:

Re: [PATCH 1/2] [iproute2] tc/q_htb.c: remove printing of a deprecated overhead value previously encoded as a part of mpu field

2015-12-18 Thread Jesper Dangaard Brouer
overhead' field in the ratespec structure has been > introduced. > > Signed-off-by: Dmitrii Shcherbakov <fw.dmit...@yandex.com> > --- Acked-by: Jesper Dangaard Brouer <bro...@redhat.com> Thank you Dmitrii for cleaning this up :-) -- Best regards, Jesper Dangaard Brouer MSc.C

Re: WARN due to local_bh_disable called with interrupts disabled

2015-11-19 Thread Jesper Dangaard Brouer
)) __dev_kfree_skb_irq(skb, reason); else dev_kfree_skb(skb); } > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux > Foundation Collaborative Project -- Best reg

Re: [PATCH RFC v7 3/5] skb_array: array based FIFO for skbs

2016-06-03 Thread Jesper Dangaard Brouer
> + > +static inline int skb_array_peek_len(struct skb_array *a) > +{ > + return PTR_RING_PEEK_CALL(>ring, __skb_array_len_with_tag); > +} > + [...] -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH RFC v7 0/5] skb_array: array based FIFO for skbs

2016-06-03 Thread Jesper Dangaard Brouer
array_XXX APIs with a spinlock, > so this should not be an issue for them. I would like to see some bulking support... As my experiments[1] show that alf_queue (primarily) can beat skb_array due to bulking support. It seems like an obvious optimization for the virt tun use-case to bulk dequeue SK

Re: [PATCH RFC v7 1/5] ptr_ring: array based FIFO for pointers

2016-06-03 Thread Jesper Dangaard Brouer
in_lock_bh() > + ptr = __ptr_ring_consume(r); > + spin_unlock(>consumer_lock); and spin_unlock_bh() > + > + return ptr; > +} -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v5 2/2] skb_array: ring test

2016-06-03 Thread Jesper Dangaard Brouer
On Thu, 2 Jun 2016 20:47:25 +0200 Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Tue, 24 May 2016 23:34:14 +0300 > "Michael S. Tsirkin" <m...@redhat.com> wrote: > > > On Tue, May 24, 2016 at 07:03:20PM +0200, Jesper Dangaard Brouer wrote: > >

Re: [PATCH RFC v7 3/5] skb_array: array based FIFO for skbs

2016-06-03 Thread Jesper Dangaard Brouer
ppers around ptr_array. ^ It is called "ptr_ring" not "ptr_array". -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v8 0/5] skb_array: array based FIFO for skbs

2016-06-14 Thread Jesper Dangaard Brouer
t; destroy now scans the array and frees all queued skbs > > changes since v5 > implemented a generic ptr_ring api, and > made skb_array a type-safe wrapper > apis for taking the spinlock in different contexts > following expected usecase

Re: [PATCH v8 5/5] skb_array: resize support

2016-06-14 Thread Jesper Dangaard Brouer
On Mon, 13 Jun 2016 23:54:50 +0300 "Michael S. Tsirkin" <m...@redhat.com> wrote: > Update skb_array after ptr_ring API changes. > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Acked-by: Jesper Dangaard Brouer <bro...@redhat.com> Tested-by: Jesp

Re: [PATCH v8 4/5] ptr_ring: resize support

2016-06-14 Thread Jesper Dangaard Brouer
ains destructor callback such that > all pointers in queue can be cleaned up. > > This changes some APIs but we don't have any users yet, > so it won't break bisect. > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Acked-by: Jesper Dangaard Brouer <bro...@redhat

Re: [PATCH v8 1/5] ptr_ring: array based FIFO for pointers

2016-06-14 Thread Jesper Dangaard Brouer
On Mon, 13 Jun 2016 23:54:31 +0300 "Michael S. Tsirkin" <m...@redhat.com> wrote: > A simple array based FIFO of pointers. Intended for net stack which > commonly has a single consumer/producer. > > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Ack

Re: [PATCH v8 3/5] skb_array: array based FIFO for skbs

2016-06-14 Thread Jesper Dangaard Brouer
kin <m...@redhat.com> Acked-by: Jesper Dangaard Brouer <bro...@redhat.com> Tested-by: Jesper Dangaard Brouer <bro...@redhat.com> -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH net-next 01/10] net_sched: add the ability to defer skb freeing

2016-06-15 Thread Jesper Dangaard Brouer
_list = NULL; > + > mutex_unlock(_mutex); > + > + while (head) { > + struct sk_buff *next = head->next; > + > + kfree_skb(head); > + cond_resched(); > + head = next; > + } > } This looks a lot like kfree_skb_list() What about bulk free'ing SKBs here? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH v5 2/2] skb_array: ring test

2016-06-02 Thread Jesper Dangaard Brouer
On Tue, 24 May 2016 23:34:14 +0300 "Michael S. Tsirkin" <m...@redhat.com> wrote: > On Tue, May 24, 2016 at 07:03:20PM +0200, Jesper Dangaard Brouer wrote: > > > > On Tue, 24 May 2016 12:28:09 +0200 > > Jesper Dangaard Brouer <bro...@redhat.com> wrot

Re: [PATCH net-next 4/4] net_sched: generalize bulk dequeue

2016-06-22 Thread Jesper Dangaard Brouer
pps > > Now we should work to add batches on the enqueue() side ;) Yes, please! :-))) That will be the next big step! > Signed-off-by: Eric Dumazet <eduma...@google.com> > Cc: John Fastabend <john.r.fastab...@intel.com> > Cc: Jesper Dangaard Brouer <bro...@redhat.co

Re: [PATCH net-next 1/4] net_sched: drop packets after root qdisc lock is released

2016-06-22 Thread Jesper Dangaard Brouer
uct sk_buff *skb, > struct Qdisc *q, > } > } > spin_unlock(root_lock); > + if (unlikely(to_free)) > + kfree_skb_list(to_free); Great, now there is a good argument for implementing kmem_cache bulk freeing inside kfree_skb_list(). I did a ugly

Re: [PATCH net-next 0/4] net_sched: bulk dequeue and deferred drops

2016-06-22 Thread Jesper Dangaard Brouer
series brings a nice qdisc performance increase (more than 80 % > in some cases). Thanks for working on this Eric! this is great work! :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: [PATCH net-next 0/4] net_sched: bulk dequeue and deferred drops

2016-06-22 Thread Jesper Dangaard Brouer
On Wed, 22 Jun 2016 07:55:43 -0700 Eric Dumazet <eric.duma...@gmail.com> wrote: > On Wed, 2016-06-22 at 16:47 +0200, Jesper Dangaard Brouer wrote: > > On Tue, 21 Jun 2016 23:16:48 -0700 > > Eric Dumazet <eduma...@google.com> wrote: > > > > >

Re: [PATCH net-next 0/4] net_sched: bulk dequeue and deferred drops

2016-06-23 Thread Jesper Dangaard Brouer
On Wed, 22 Jun 2016 09:49:48 -0700 Eric Dumazet <eric.duma...@gmail.com> wrote: > On Wed, 2016-06-22 at 17:44 +0200, Jesper Dangaard Brouer wrote: > > On Wed, 22 Jun 2016 07:55:43 -0700 > > Eric Dumazet <eric.duma...@gmail.com> wrote: > > > > >

[net-next PATCH V2 3/3] ixgbe: bulk free SKBs during TX completion cleanup cycle

2016-02-08 Thread Jesper Dangaard Brouer
1.1-4)) Joint work with Alexander Duyck. Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --gi

[net-next PATCH V2 2/3] net: bulk free SKBs that were delay free'ed due to IRQ context

2016-02-08 Thread Jesper Dangaard Brouer
needed. This due to netpoll can call from IRQ context. Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- include/linux/skbuff.h |1 + net/core/dev.c |8 +++- net/core/skbuff.c |8 +

[net-next PATCH V2 0/3] net: mitigating kmem_cache free slowpath

2016-02-08 Thread Jesper Dangaard Brouer
, e.g. replacing their calles to dev_kfree_skb() / dev_consume_skb_any(). Driver ixgbe is the first user of this new API. [1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373 --- Jesper Dangaard Brouer (3): net: bulk free infrastructure for NAPI context, use napi_consume_skb

[net-next PATCH V2 1/3] net: bulk free infrastructure for NAPI context, use napi_consume_skb

2016-02-08 Thread Jesper Dangaard Brouer
is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- include/linux/skbuff.h |3 ++ ne

Re: [net-next PATCH 06/11] RFC: mlx5: RX bulking or bundling of packets before calling network stack

2016-02-10 Thread Jesper Dangaard Brouer
On Tue, 9 Feb 2016 13:57:41 +0200 Saeed Mahameed <sae...@dev.mellanox.co.il> wrote: > On Tue, Feb 2, 2016 at 11:13 PM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > There are several techniques/concepts combined in this optimization. > > It is both a da

Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-05 Thread Jesper Dangaard Brouer
pecially it have been difficult to get people to really adopt "ip", which is also worst search term in the Internet today... -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-28 Thread Jesper Dangaard Brouer
On Thu, 28 Jan 2016 08:37:07 -0800 Tom Herbert <t...@herbertland.com> wrote: > On Thu, Jan 28, 2016 at 4:45 AM, Eric Dumazet <eric.duma...@gmail.com> wrote: > > On Thu, 2016-01-28 at 10:25 +0100, Jesper Dangaard Brouer wrote: > > > >> Yes, th

Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-28 Thread Jesper Dangaard Brouer
On Wed, 27 Jan 2016 18:50:27 -0800 Tom Herbert <t...@herbertland.com> wrote: > On Wed, Jan 27, 2016 at 12:47 PM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > On Mon, 25 Jan 2016 23:10:16 +0100 > > Jesper Dangaard Brouer <bro...@redhat.com> wrote: >

Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-28 Thread Jesper Dangaard Brouer
On Wed, 27 Jan 2016 13:56:03 -0800 Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > On Wed, Jan 27, 2016 at 09:47:50PM +0100, Jesper Dangaard Brouer wrote: > > Sum: 18.75 % => calc: 30.0 ns (sum: 30.0 ns) => Total: 159.9 ns > > > > To get around the

[net-next PATCH 11/11] RFC: net: RPS bulk enqueue to backlog

2016-02-02 Thread Jesper Dangaard Brouer
NEED TO CLEAN UP PATCH (likely still contains bugs...) When enabling Receive Packet Steering (RPS) like : echo 32768 > /proc/sys/net/core/rps_sock_flow_entries for N in $(seq 0 7) ; do echo 4096 > /sys/class/net/${DEV}/queues/rx-$N/rps_flow_cnt echo f >

[net-next PATCH 10/11] RFC: net: API for RX handover of multiple SKBs to stack

2016-02-02 Thread Jesper Dangaard Brouer
maintains a qlen, which is unnecessary in this hotpath code. A simple list within the first SKB could be a minimum solution. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 12 +--- include/linux/netde

[net-next PATCH 09/11] RFC: dummy: bulk free SKBs

2016-02-02 Thread Jesper Dangaard Brouer
Normal TX completion uses napi_consume_skb(), thus also make dummy driver use this, as it make it easier to see the effect of bulk freeing SKBs. --- drivers/net/dummy.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/dummy.c b/drivers/net/dummy.c index

[net-next PATCH 08/11] mlx5: hint the NAPI alloc skb API about the expected bulk size

2016-02-02 Thread Jesper Dangaard Brouer
Use the newly introduced napi_alloc_skb_hint() API, to get the underlying slab bulk allocation sizes to align with what mlx5 driver need for refilling its RX ring queue. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h

[net-next PATCH 07/11] net: introduce napi_alloc_skb_hint() for more use-cases

2016-02-02 Thread Jesper Dangaard Brouer
is the mlx5 driver, which bulk re-populate it's RX ring with both SKBs and pages. Thus, it would like to work with bigger bulk alloc chunks. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- include/linux/skbuff.h | 19 +++ net/core/skbuff.c |8 +++--

[net-next PATCH 06/11] RFC: mlx5: RX bulking or bundling of packets before calling network stack

2016-02-02 Thread Jesper Dangaard Brouer
packet from the RX ring and starting the prefetching, and the second loop calling eth_type_trans() and invoking the stack via napi_gro_receive(). Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Notes: This is the patch that gave a speed up of 6.2Mpps to 12Mpps, when trying to m

[net-next PATCH 05/11] mlx5: use napi_*_skb APIs to get bulk alloc and free

2016-02-02 Thread Jesper Dangaard Brouer
allocation, knowing the size of objects it need to refill. For now, just use the default bulk size hidden inside napi_alloc_skb(). Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h |4 ++-- drivers/net/ethernet/mellanox/mlx

[net-next PATCH 01/11] net: bulk free infrastructure for NAPI context, use napi_consume_skb

2016-02-02 Thread Jesper Dangaard Brouer
is to see if budget is 0. In that case, we need to invoke dev_consume_skb_irq(). Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> --- include/linux/skbuff.h |3 ++ ne

[net-next PATCH 02/11] net: bulk free SKBs that were delay free'ed due to IRQ context

2016-02-02 Thread Jesper Dangaard Brouer
needed. This due to netpoll can call from IRQ context. Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- include/linux/skbuff.h |1 + net/core/dev.c |8 +++- net/core/skbuff.c |8 +

[net-next PATCH 00/11] net: mitigating kmem_cache slowpath and BoF discussion patches

2016-02-02 Thread Jesper Dangaard Brouer
oF [1] [1] http://netdevconf.org/1.1/bof-network-performance-bof-jesper-dangaard-brouer.html [2] http://thread.gmane.org/gmane.linux.network/384302/ --- Jesper Dangaard Brouer (11): net: bulk free infrastructure for NAPI context, use napi_consume_skb net: bulk free SKBs that were del

[net-next PATCH 04/11] net: bulk alloc and reuse of SKBs in NAPI context

2016-02-02 Thread Jesper Dangaard Brouer
t (normal hidden by prefetch) * In case RX queue is not full, alloc and free more SKBs than needed More testing is needed with more real life benchmarks. Joint work with Alexander Duyck. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> Signed-off-by: Alexander Duyck <alexander.

[net-next PATCH 03/11] ixgbe: bulk free SKBs during TX completion cleanup cycle

2016-02-02 Thread Jesper Dangaard Brouer
1.1-4)) Joint work with Alexander Duyck. Signed-off-by: Alexander Duyck <alexander.h.du...@redhat.com> Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --gi

Re: [net-next PATCH 04/11] net: bulk alloc and reuse of SKBs in NAPI context

2016-02-03 Thread Jesper Dangaard Brouer
On Tue, 2 Feb 2016 16:52:50 -0800 Alexei Starovoitov <alexei.starovoi...@gmail.com> wrote: > On Tue, Feb 02, 2016 at 10:12:01PM +0100, Jesper Dangaard Brouer wrote: > > Think twice before applying > > - This patch can potentially introduce added latency in some workload

Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-03 Thread Jesper Dangaard Brouer
port set DEV/PORT_INDEX [ type { eth | ib | auto} ] > > butter:~$ dl port show > devlink0/1: type ib ibdev mlx4_0 > devlink0/2: type ib ibdev mlx4_0 > > butter:~$ sudo dl port set devlink0/1 type eth > > butter:~$ dl port show > devlink0/1: type eth netdev ens4 > d

Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-27 Thread Jesper Dangaard Brouer
On Mon, 25 Jan 2016 23:10:16 +0100 Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Mon, 25 Jan 2016 09:50:16 -0800 John Fastabend <john.fastab...@gmail.com> > wrote: > > > On 16-01-25 09:09 AM, Tom Herbert wrote: > > > On Mon, Jan 25, 2016 at 5:15 A

Re: Optimizing instruction-cache, more packets at each stage

2016-01-21 Thread Jesper Dangaard Brouer
On Thu, 21 Jan 2016 14:49:25 +0200 Or Gerlitz <gerlitz...@gmail.com> wrote: > On Thu, Jan 21, 2016 at 1:27 PM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > On Wed, 20 Jan 2016 15:27:38 -0800 Tom Herbert <t...@herbertland.com> wrote: > >

Re: Optimizing instruction-cache, more packets at each stage

2016-01-21 Thread Jesper Dangaard Brouer
between packets: * 10 Gbit/s -> 67.2 nanosec * 40 Gbit/s -> 16.8 nanosec * 100 Gbit/s -> 6.7 nanosec Adding such a per packet cost is not going to fly. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analy

Re: Optimizing instruction-cache, more packets at each stage

2016-01-21 Thread Jesper Dangaard Brouer
st match the devices dev->dev_addr (else a SW compare is required). Is that doable in hardware? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat Author of http://www.iptv-analyzer.org LinkedIn: http://www.linkedin.com/in/brouer

Re: Tools for sampling ethtool --statistics

2016-01-21 Thread Jesper Dangaard Brouer
On Thu, 21 Jan 2016 00:34:16 +0200 Or Gerlitz <gerlitz...@gmail.com> wrote: > On Wed, Jan 20, 2016 at 11:13 AM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > Hi All, > > > > I wrote a small tool[1] to extract ethtool --statistics|-S, sample and >

Re: Optimizing instruction-cache, more packets at each stage

2016-01-22 Thread Jesper Dangaard Brouer
"slowpath" case: SLUB => 117 cycles(tsc) 29.276 ns SLAB => 101 cycles(tsc) 25.342 ns I've addressed this "slowpath" problem in the SLUB and SLAB allocators, by introducing a bulk API, which amortize the needed sync-mechanisms. Kmem_cache using bulk API: SLUB =>

Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-25 Thread Jesper Dangaard Brouer
6-01-24 06:44 AM, Michael S. Tsirkin wrote: > > On Sun, Jan 24, 2016 at 03:28:14PM +0100, Jesper Dangaard Brouer wrote: > >> On Thu, 21 Jan 2016 10:54:01 -0800 (PST) > >> David Miller <da...@davemloft.net> wrote: > >> > >>> From: Jesper Danga

Re: Bypass at packet-page level (Was: Optimizing instruction-cache, more packets at each stage)

2016-01-25 Thread Jesper Dangaard Brouer
On Mon, 25 Jan 2016 09:50:16 -0800 John Fastabend <john.fastab...@gmail.com> wrote: > On 16-01-25 09:09 AM, Tom Herbert wrote: > > On Mon, Jan 25, 2016 at 5:15 AM, Jesper Dangaard Brouer > > <bro...@redhat.com> wrote: > >> [...] > >> > &g

Re: Optimizing instruction-cache, more packets at each stage

2016-01-24 Thread Jesper Dangaard Brouer
On Thu, 21 Jan 2016 10:54:01 -0800 (PST) David Miller <da...@davemloft.net> wrote: > From: Jesper Dangaard Brouer <bro...@redhat.com> > Date: Thu, 21 Jan 2016 12:27:30 +0100 > > > eth_type_trans() does two things: > > > > 1) determine skb->protocol >

Re: Optimizing instruction-cache, more packets at each stage

2016-01-22 Thread Jesper Dangaard Brouer
On Fri, 22 Jan 2016 09:07:43 -0800 Tom Herbert <t...@herbertland.com> wrote: > On Fri, Jan 22, 2016 at 4:33 AM, Jesper Dangaard Brouer > <bro...@redhat.com> wrote: > > On Thu, 21 Jan 2016 09:48:36 -0800 > > Eric Dumazet <eric.duma...@gmail.com> wrote: > >

[net-next PATCH V2 0/3] net: bulk free adjustment and two driver use-cases

2016-03-10 Thread Jesper Dangaard Brouer
.gmane.org/gmane.linux.network/402503/focus=403386 Patchset based on net-next at commit 3ebeac1d0295 --- Jesper Dangaard Brouer (3): net: adjust napi_consume_skb to handle none-NAPI callers mlx4: use napi_consume_skb API to get bulk free operations mlx5: use napi_consume_skb API t

[net-next PATCH V2 3/3] mlx5: use napi_consume_skb API to get bulk free operations

2016-03-10 Thread Jesper Dangaard Brouer
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is needed by napi_consume_skb() to detect if called from netpoll. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h

[net-next PATCH V2 1/3] net: adjust napi_consume_skb to handle none-NAPI callers

2016-03-10 Thread Jesper Dangaard Brouer
not originating from NAPI/softirq. Simply handled by using dev_consume_skb_any(). This adds an extra branch+call for the netpoll case (checking in_irq() + irqs_disabled()), but that is okay as this is a slowpath. Suggested-by: Alexander Duyck <adu...@mirantis.com> Signed-off-by: Jesper Dangaard Broue

[net-next PATCH V2 2/3] mlx4: use napi_consume_skb API to get bulk free operations

2016-03-10 Thread Jesper Dangaard Brouer
for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/d

[net-next PATCH V3 1/3] net: adjust napi_consume_skb to handle none-NAPI callers

2016-03-10 Thread Jesper Dangaard Brouer
not originating from NAPI/softirq. Simply handled by using dev_consume_skb_any(). This adds an extra branch+call for the netpoll case (checking in_irq() + irqs_disabled()), but that is okay as this is a slowpath. Suggested-by: Alexander Duyck <adu...@mirantis.com> Signed-off-by: Jesper Dangaard Broue

[net-next PATCH V3 0/3] net: bulk free adjustment and two driver use-cases

2016-03-10 Thread Jesper Dangaard Brouer
.gmane.org/gmane.linux.network/402503/focus=403386 Patchset based on net-next at commit 3ebeac1d0295 V3: spelling fixes from Sergei --- Jesper Dangaard Brouer (3): net: adjust napi_consume_skb to handle none-NAPI callers mlx4: use napi_consume_skb API to get bulk free operations

[net-next PATCH V3 3/3] mlx5: use napi_consume_skb API to get bulk free operations

2016-03-10 Thread Jesper Dangaard Brouer
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is needed by napi_consume_skb() to detect if called from netpoll. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h

[net-next PATCH V3 2/3] mlx4: use napi_consume_skb API to get bulk free operations

2016-03-10 Thread Jesper Dangaard Brouer
for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/d

[net-next PATCH V4 1/3] net: adjust napi_consume_skb to handle non-NAPI callers

2016-03-11 Thread Jesper Dangaard Brouer
from NAPI/softirq. Simply handled by using dev_consume_skb_any(). This adds an extra branch+call for the netpoll case (checking in_irq() + irqs_disabled()), but that is okay as this is a slowpath. Suggested-by: Alexander Duyck <adu...@mirantis.com> Signed-off-by: Jesper Dangaard Broue

[net-next PATCH V4 3/3] mlx5: use napi_consume_skb API to get bulk free operations

2016-03-11 Thread Jesper Dangaard Brouer
Bulk free of SKBs happen transparently by the API call napi_consume_skb(). The napi budget parameter is needed by napi_consume_skb() to detect if called from netpoll. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h

[net-next PATCH V4 2/3] mlx4: use napi_consume_skb API to get bulk free operations

2016-03-11 Thread Jesper Dangaard Brouer
for the function call that needed this distinction. Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com> --- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drive

<    1   2   3   4   5   6   7   8   9   10   >