Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 23:57 GMT+09:00 Christoph Lameter : > > On Thu, 18 Dec 2014, Joonsoo Kim wrote: >> > Good idea. How does this affect the !CONFIG_PREEMPT case? >> >> One more this_cpu_xxx makes fastpath slow if !CONFIG_PREEMPT. >> Roughly 3~5%. >> >> We can deal with each cases separately although it

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Christoph Lameter
On Thu, 18 Dec 2014, Joonsoo Kim wrote: > > Good idea. How does this affect the !CONFIG_PREEMPT case? > > One more this_cpu_xxx makes fastpath slow if !CONFIG_PREEMPT. > Roughly 3~5%. > > We can deal with each cases separately although it looks dirty. Ok maybe you can come up with a solution

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 1:10 GMT+09:00 Christoph Lameter : > On Wed, 17 Dec 2014, Joonsoo Kim wrote: > >> + do { >> + tid = this_cpu_read(s->cpu_slab->tid); >> + c = this_cpu_ptr(s->cpu_slab); >> + } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid)); > > >

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 0:36 GMT+09:00 Christoph Lameter : > On Wed, 17 Dec 2014, Joonsoo Kim wrote: > >> Ping... and I found another way to remove preempt_disable/enable >> without complex changes. >> >> What we want to ensure is getting tid and kmem_cache_cpu >> on the same cpu. We can achieve that goal with

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-17 21:08 GMT+09:00 Jesper Dangaard Brouer : > On Wed, 17 Dec 2014 16:13:49 +0900 Joonsoo Kim wrote: > >> Ping... and I found another way to remove preempt_disable/enable >> without complex changes. >> >> What we want to ensure is getting tid and kmem_cache_cpu >> on the same cpu. We can

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-17 21:08 GMT+09:00 Jesper Dangaard Brouer bro...@redhat.com: On Wed, 17 Dec 2014 16:13:49 +0900 Joonsoo Kim js1...@gmail.com wrote: Ping... and I found another way to remove preempt_disable/enable without complex changes. What we want to ensure is getting tid and kmem_cache_cpu on

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 0:36 GMT+09:00 Christoph Lameter c...@linux.com: On Wed, 17 Dec 2014, Joonsoo Kim wrote: Ping... and I found another way to remove preempt_disable/enable without complex changes. What we want to ensure is getting tid and kmem_cache_cpu on the same cpu. We can achieve that goal

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 1:10 GMT+09:00 Christoph Lameter c...@linux.com: On Wed, 17 Dec 2014, Joonsoo Kim wrote: + do { + tid = this_cpu_read(s-cpu_slab-tid); + c = this_cpu_ptr(s-cpu_slab); + } while (IS_ENABLED(CONFIG_PREEMPT) unlikely(tid != c-tid));

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Christoph Lameter
On Thu, 18 Dec 2014, Joonsoo Kim wrote: Good idea. How does this affect the !CONFIG_PREEMPT case? One more this_cpu_xxx makes fastpath slow if !CONFIG_PREEMPT. Roughly 3~5%. We can deal with each cases separately although it looks dirty. Ok maybe you can come up with a solution that is as

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-18 Thread Joonsoo Kim
2014-12-18 23:57 GMT+09:00 Christoph Lameter c...@linux.com: On Thu, 18 Dec 2014, Joonsoo Kim wrote: Good idea. How does this affect the !CONFIG_PREEMPT case? One more this_cpu_xxx makes fastpath slow if !CONFIG_PREEMPT. Roughly 3~5%. We can deal with each cases separately although it

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Christoph Lameter wrote: > On Wed, 17 Dec 2014, Joonsoo Kim wrote: > > > + do { > > + tid = this_cpu_read(s->cpu_slab->tid); > > + c = this_cpu_ptr(s->cpu_slab); > > + } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid));

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Joonsoo Kim wrote: > + do { > + tid = this_cpu_read(s->cpu_slab->tid); > + c = this_cpu_ptr(s->cpu_slab); > + } while (IS_ENABLED(CONFIG_PREEMPT) && unlikely(tid != c->tid)); Assembly code produced is a bit weird. I think the compiler

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Joonsoo Kim wrote: > Ping... and I found another way to remove preempt_disable/enable > without complex changes. > > What we want to ensure is getting tid and kmem_cache_cpu > on the same cpu. We can achieve that goal with below condition loop. > > I ran Jesper's benchmark

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Jesper Dangaard Brouer
On Wed, 17 Dec 2014 16:13:49 +0900 Joonsoo Kim wrote: > Ping... and I found another way to remove preempt_disable/enable > without complex changes. > > What we want to ensure is getting tid and kmem_cache_cpu > on the same cpu. We can achieve that goal with below condition loop. > > I ran

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Jesper Dangaard Brouer
On Wed, 17 Dec 2014 16:13:49 +0900 Joonsoo Kim js1...@gmail.com wrote: Ping... and I found another way to remove preempt_disable/enable without complex changes. What we want to ensure is getting tid and kmem_cache_cpu on the same cpu. We can achieve that goal with below condition loop. I

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Joonsoo Kim wrote: Ping... and I found another way to remove preempt_disable/enable without complex changes. What we want to ensure is getting tid and kmem_cache_cpu on the same cpu. We can achieve that goal with below condition loop. I ran Jesper's benchmark and saw

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Joonsoo Kim wrote: + do { + tid = this_cpu_read(s-cpu_slab-tid); + c = this_cpu_ptr(s-cpu_slab); + } while (IS_ENABLED(CONFIG_PREEMPT) unlikely(tid != c-tid)); Assembly code produced is a bit weird. I think the compiler undoes

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-17 Thread Christoph Lameter
On Wed, 17 Dec 2014, Christoph Lameter wrote: On Wed, 17 Dec 2014, Joonsoo Kim wrote: + do { + tid = this_cpu_read(s-cpu_slab-tid); + c = this_cpu_ptr(s-cpu_slab); + } while (IS_ENABLED(CONFIG_PREEMPT) unlikely(tid != c-tid)); Here is another

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-16 Thread Joonsoo Kim
2014-12-15 16:59 GMT+09:00 Joonsoo Kim : > On Wed, Dec 10, 2014 at 10:30:17AM -0600, Christoph Lameter wrote: >> We had to insert a preempt enable/disable in the fastpath a while ago. This >> was mainly due to a lot of state that is kept to be allocating from the per >> cpu freelist. In particular

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-16 Thread Joonsoo Kim
2014-12-15 16:59 GMT+09:00 Joonsoo Kim iamjoonsoo@lge.com: On Wed, Dec 10, 2014 at 10:30:17AM -0600, Christoph Lameter wrote: We had to insert a preempt enable/disable in the fastpath a while ago. This was mainly due to a lot of state that is kept to be allocating from the per cpu

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-14 Thread Joonsoo Kim
On Wed, Dec 10, 2014 at 10:30:17AM -0600, Christoph Lameter wrote: > We had to insert a preempt enable/disable in the fastpath a while ago. This > was mainly due to a lot of state that is kept to be allocating from the per > cpu freelist. In particular the page field is not covered by >

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-14 Thread Joonsoo Kim
On Wed, Dec 10, 2014 at 10:30:17AM -0600, Christoph Lameter wrote: We had to insert a preempt enable/disable in the fastpath a while ago. This was mainly due to a lot of state that is kept to be allocating from the per cpu freelist. In particular the page field is not covered by

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-12 Thread Christoph Lameter
On Fri, 12 Dec 2014, Jesper Dangaard Brouer wrote: > Crash/OOM during IP-forwarding network overload test[1] with pktgen, > single flow thus activating a single CPU on target (device under test). Hmmm... Bisected it and the patch that removes the page pointer from kmem_cache_cpu causes in a

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-12 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 18:37:58 +0100 Jesper Dangaard Brouer wrote: > Warning, I'm getting crashes with this patchset, during my network load > testing. > I don't have a nice crash dump to show, yet, but it is in the slub code. Crash/OOM during IP-forwarding network overload test[1] with pktgen,

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-12 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 18:37:58 +0100 Jesper Dangaard Brouer bro...@redhat.com wrote: Warning, I'm getting crashes with this patchset, during my network load testing. I don't have a nice crash dump to show, yet, but it is in the slub code. Crash/OOM during IP-forwarding network overload test[1]

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-12 Thread Christoph Lameter
On Fri, 12 Dec 2014, Jesper Dangaard Brouer wrote: Crash/OOM during IP-forwarding network overload test[1] with pktgen, single flow thus activating a single CPU on target (device under test). Hmmm... Bisected it and the patch that removes the page pointer from kmem_cache_cpu causes in a memory

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 11:18:31 -0600 (CST) Christoph Lameter wrote: > On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: > > > I was expecting to see at least (specifically) 4.291 ns improvement, as > > this is the measured[1] cost of preempt_{disable,enable] on my system. > > Right. Those calls

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
Warning, I'm getting crashes with this patchset, during my network load testing. I don't have a nice crash dump to show, yet, but it is in the slub code. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Christoph Lameter
On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: > I was expecting to see at least (specifically) 4.291 ns improvement, as > this is the measured[1] cost of preempt_{disable,enable] on my system. Right. Those calls are taken out of the fastpaths by this patchset for the CONFIG_PREEMPT case. So

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 09:03:24 -0600 (CST) Christoph Lameter wrote: > On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: > > > It looks like an impressive saving 116 -> 60 cycles. I just don't see > > the same kind of improvements with my similar tests[1][2]. > > This is particularly for a

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Christoph Lameter
On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: > It looks like an impressive saving 116 -> 60 cycles. I just don't see > the same kind of improvements with my similar tests[1][2]. This is particularly for a CONFIG_PREEMPT kernel. There will be no effect on !CONFIG_PREEMPT I hope. > I do

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Wed, 10 Dec 2014 10:30:17 -0600 Christoph Lameter wrote: [...] > > Slab Benchmarks on a kernel with CONFIG_PREEMPT show an improvement of > 20%-50% of fastpath latency: > > Before: > > Single thread testing [...] > 2. Kmalloc: alloc/free test [...] > 1 times kmalloc(256)/kfree -> 116

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Wed, 10 Dec 2014 10:30:17 -0600 Christoph Lameter c...@linux.com wrote: [...] Slab Benchmarks on a kernel with CONFIG_PREEMPT show an improvement of 20%-50% of fastpath latency: Before: Single thread testing [...] 2. Kmalloc: alloc/free test [...] 1 times kmalloc(256)/kfree -

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Christoph Lameter
On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: It looks like an impressive saving 116 - 60 cycles. I just don't see the same kind of improvements with my similar tests[1][2]. This is particularly for a CONFIG_PREEMPT kernel. There will be no effect on !CONFIG_PREEMPT I hope. I do see

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 09:03:24 -0600 (CST) Christoph Lameter c...@linux.com wrote: On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: It looks like an impressive saving 116 - 60 cycles. I just don't see the same kind of improvements with my similar tests[1][2]. This is particularly for a

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Christoph Lameter
On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: I was expecting to see at least (specifically) 4.291 ns improvement, as this is the measured[1] cost of preempt_{disable,enable] on my system. Right. Those calls are taken out of the fastpaths by this patchset for the CONFIG_PREEMPT case. So

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
Warning, I'm getting crashes with this patchset, during my network load testing. I don't have a nice crash dump to show, yet, but it is in the slub code. -- Best regards, Jesper Dangaard Brouer MSc.CS, Sr. Network Kernel Developer at Red Hat Author of http://www.iptv-analyzer.org

Re: [PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-11 Thread Jesper Dangaard Brouer
On Thu, 11 Dec 2014 11:18:31 -0600 (CST) Christoph Lameter c...@linux.com wrote: On Thu, 11 Dec 2014, Jesper Dangaard Brouer wrote: I was expecting to see at least (specifically) 4.291 ns improvement, as this is the measured[1] cost of preempt_{disable,enable] on my system. Right. Those

[PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-10 Thread Christoph Lameter
We had to insert a preempt enable/disable in the fastpath a while ago. This was mainly due to a lot of state that is kept to be allocating from the per cpu freelist. In particular the page field is not covered by this_cpu_cmpxchg used in the fastpath to do the necessary atomic state change for

[PATCH 0/7] slub: Fastpath optimization (especially for RT) V1

2014-12-10 Thread Christoph Lameter
We had to insert a preempt enable/disable in the fastpath a while ago. This was mainly due to a lot of state that is kept to be allocating from the per cpu freelist. In particular the page field is not covered by this_cpu_cmpxchg used in the fastpath to do the necessary atomic state change for