Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Steven Rostedt
On Fri, 16 Jan 2015 05:40:59 -0800 Eric Dumazet wrote: > I made same observation about 3 years ago, on old cpus. > Thank you for letting me know. I was thinking I was going insane! (yeah yeah, there's lots of people who will still say that I've already gone insane, but at least I know my

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Eric Dumazet
On Thu, 2015-01-15 at 23:07 -0500, Steven Rostedt wrote: > On Thu, 15 Jan 2015 21:57:58 -0600 (CST) > Christoph Lameter wrote: > > > > I get: > > > > > > mov%gs:0x18(%rax),%rdx > > > > > > Looks to me that %gs is used. > > > > %gs is used as a segment prefix. That does not add

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Steven Rostedt
On Fri, 16 Jan 2015 05:40:59 -0800 Eric Dumazet eric.duma...@gmail.com wrote: I made same observation about 3 years ago, on old cpus. Thank you for letting me know. I was thinking I was going insane! (yeah yeah, there's lots of people who will still say that I've already gone insane, but at

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-16 Thread Eric Dumazet
On Thu, 2015-01-15 at 23:07 -0500, Steven Rostedt wrote: On Thu, 15 Jan 2015 21:57:58 -0600 (CST) Christoph Lameter c...@linux.com wrote: I get: mov%gs:0x18(%rax),%rdx Looks to me that %gs is used. %gs is used as a segment prefix. That does not add significant

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:57:58 -0600 (CST) Christoph Lameter wrote: > > I get: > > > > mov%gs:0x18(%rax),%rdx > > > > Looks to me that %gs is used. > > %gs is used as a segment prefix. That does not add significant cycles. > Retrieving the content of %gs and loading it into

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 22:51:30 -0500 Steven Rostedt wrote: > > I haven't done benchmarks in a while, so perhaps accessing the %gs > segment isn't as expensive as I saw it before. I'll have to profile > function tracing on my i7 and see where things are slow again. I just ran it on my i7, and

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
> I get: > > mov%gs:0x18(%rax),%rdx > > Looks to me that %gs is used. %gs is used as a segment prefix. That does not add significant cycles. Retrieving the content of %gs and loading it into another register would be expensive in terms of cpu cycles. -- To unsubscribe from this

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:27:14 -0600 (CST) Christoph Lameter wrote: > > The %gs register is not used since the address of the per cpu area is > available as one of the first fields in the per cpu areas. Have you disassembled your code? Looking at put_cpu_partial() from 3.19-rc3 where it does:

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Andrew Morton wrote: > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > I'm surprised. preempt_disable/enable are pretty fast. I wonder why > this makes a measurable difference. Perhaps

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Steven Rostedt wrote: > profiling function tracing I discovered that accessing preempt_count > was actually quite expensive, even just to read. But it may not be as > bad since Peter Zijlstra converted preempt_count to a per_cpu variable. > Although, IIRC, the perf profiling

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 17:16:34 -0800 Andrew Morton wrote: > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > I'm surprised. preempt_disable/enable are pretty fast. I wonder why > this makes a measurable difference.

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Andrew Morton
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim wrote: > We had to insert a preempt enable/disable in the fastpath a while ago > in order to guarantee that tid and kmem_cache_cpu are retrieved on the > same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler > can move the process

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Jesper Dangaard Brouer
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim wrote: [...] > > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns) > > Below is the result of Christoph's slab_test reported by > Jesper Dangaard Brouer. > [...] Acked-by: Jesper

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Steven Rostedt wrote: profiling function tracing I discovered that accessing preempt_count was actually quite expensive, even just to read. But it may not be as bad since Peter Zijlstra converted preempt_count to a per_cpu variable. Although, IIRC, the perf profiling

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
On Thu, 15 Jan 2015, Andrew Morton wrote: I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in CONFIG_PREEMPT. (14.821 ns - 14.049 ns) I'm surprised. preempt_disable/enable are pretty fast. I wonder why this makes a measurable difference. Perhaps preempt_enable()'s

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 22:51:30 -0500 Steven Rostedt rost...@goodmis.org wrote: I haven't done benchmarks in a while, so perhaps accessing the %gs segment isn't as expensive as I saw it before. I'll have to profile function tracing on my i7 and see where things are slow again. I just ran it on

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:57:58 -0600 (CST) Christoph Lameter c...@linux.com wrote: I get: mov%gs:0x18(%rax),%rdx Looks to me that %gs is used. %gs is used as a segment prefix. That does not add significant cycles. Retrieving the content of %gs and loading it into

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 17:16:34 -0800 Andrew Morton a...@linux-foundation.org wrote: I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in CONFIG_PREEMPT. (14.821 ns - 14.049 ns) I'm surprised. preempt_disable/enable are pretty fast. I wonder why this makes a measurable

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Andrew Morton
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote: We had to insert a preempt enable/disable in the fastpath a while ago in order to guarantee that tid and kmem_cache_cpu are retrieved on the same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Jesper Dangaard Brouer
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote: [...] I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free in CONFIG_PREEMPT. (14.821 ns - 14.049 ns) Below is the result of Christoph's slab_test reported by Jesper Dangaard Brouer. [...]

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Steven Rostedt
On Thu, 15 Jan 2015 21:27:14 -0600 (CST) Christoph Lameter c...@linux.com wrote: The %gs register is not used since the address of the per cpu area is available as one of the first fields in the per cpu areas. Have you disassembled your code? Looking at put_cpu_partial() from 3.19-rc3 where

Re: [PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-15 Thread Christoph Lameter
I get: mov%gs:0x18(%rax),%rdx Looks to me that %gs is used. %gs is used as a segment prefix. That does not add significant cycles. Retrieving the content of %gs and loading it into another register would be expensive in terms of cpu cycles. -- To unsubscribe from this

[PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-14 Thread Joonsoo Kim
We had to insert a preempt enable/disable in the fastpath a while ago in order to guarantee that tid and kmem_cache_cpu are retrieved on the same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can move the process to other cpu during retrieving data. Now, I reach the solution

[PATCH v2 1/2] mm/slub: optimize alloc/free fastpath by removing preemption on/off

2015-01-14 Thread Joonsoo Kim
We had to insert a preempt enable/disable in the fastpath a while ago in order to guarantee that tid and kmem_cache_cpu are retrieved on the same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler can move the process to other cpu during retrieving data. Now, I reach the solution