On Fri, 16 Jan 2015 05:40:59 -0800
Eric Dumazet wrote:
> I made same observation about 3 years ago, on old cpus.
>
Thank you for letting me know. I was thinking I was going insane!
(yeah yeah, there's lots of people who will still say that I've already
gone insane, but at least I know my
On Thu, 2015-01-15 at 23:07 -0500, Steven Rostedt wrote:
> On Thu, 15 Jan 2015 21:57:58 -0600 (CST)
> Christoph Lameter wrote:
>
> > > I get:
> > >
> > > mov%gs:0x18(%rax),%rdx
> > >
> > > Looks to me that %gs is used.
> >
> > %gs is used as a segment prefix. That does not add
On Fri, 16 Jan 2015 05:40:59 -0800
Eric Dumazet eric.duma...@gmail.com wrote:
I made same observation about 3 years ago, on old cpus.
Thank you for letting me know. I was thinking I was going insane!
(yeah yeah, there's lots of people who will still say that I've already
gone insane, but at
On Thu, 2015-01-15 at 23:07 -0500, Steven Rostedt wrote:
On Thu, 15 Jan 2015 21:57:58 -0600 (CST)
Christoph Lameter c...@linux.com wrote:
I get:
mov%gs:0x18(%rax),%rdx
Looks to me that %gs is used.
%gs is used as a segment prefix. That does not add significant
On Thu, 15 Jan 2015 21:57:58 -0600 (CST)
Christoph Lameter wrote:
> > I get:
> >
> > mov%gs:0x18(%rax),%rdx
> >
> > Looks to me that %gs is used.
>
> %gs is used as a segment prefix. That does not add significant cycles.
> Retrieving the content of %gs and loading it into
On Thu, 15 Jan 2015 22:51:30 -0500
Steven Rostedt wrote:
>
> I haven't done benchmarks in a while, so perhaps accessing the %gs
> segment isn't as expensive as I saw it before. I'll have to profile
> function tracing on my i7 and see where things are slow again.
I just ran it on my i7, and
> I get:
>
> mov%gs:0x18(%rax),%rdx
>
> Looks to me that %gs is used.
%gs is used as a segment prefix. That does not add significant cycles.
Retrieving the content of %gs and loading it into another register would
be expensive in terms of cpu cycles.
--
To unsubscribe from this
On Thu, 15 Jan 2015 21:27:14 -0600 (CST)
Christoph Lameter wrote:
>
> The %gs register is not used since the address of the per cpu area is
> available as one of the first fields in the per cpu areas.
Have you disassembled your code?
Looking at put_cpu_partial() from 3.19-rc3 where it does:
On Thu, 15 Jan 2015, Andrew Morton wrote:
> > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
> > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
>
> I'm surprised. preempt_disable/enable are pretty fast. I wonder why
> this makes a measurable difference. Perhaps
On Thu, 15 Jan 2015, Steven Rostedt wrote:
> profiling function tracing I discovered that accessing preempt_count
> was actually quite expensive, even just to read. But it may not be as
> bad since Peter Zijlstra converted preempt_count to a per_cpu variable.
> Although, IIRC, the perf profiling
On Thu, 15 Jan 2015 17:16:34 -0800
Andrew Morton wrote:
> > I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
> > in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
>
> I'm surprised. preempt_disable/enable are pretty fast. I wonder why
> this makes a measurable difference.
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim wrote:
> We had to insert a preempt enable/disable in the fastpath a while ago
> in order to guarantee that tid and kmem_cache_cpu are retrieved on the
> same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler
> can move the process
On Thu, 15 Jan 2015 16:40:32 +0900
Joonsoo Kim wrote:
[...]
>
> I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
> in CONFIG_PREEMPT. (14.821 ns -> 14.049 ns)
>
> Below is the result of Christoph's slab_test reported by
> Jesper Dangaard Brouer.
>
[...]
Acked-by: Jesper
On Thu, 15 Jan 2015, Steven Rostedt wrote:
profiling function tracing I discovered that accessing preempt_count
was actually quite expensive, even just to read. But it may not be as
bad since Peter Zijlstra converted preempt_count to a per_cpu variable.
Although, IIRC, the perf profiling
On Thu, 15 Jan 2015, Andrew Morton wrote:
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
in CONFIG_PREEMPT. (14.821 ns - 14.049 ns)
I'm surprised. preempt_disable/enable are pretty fast. I wonder why
this makes a measurable difference. Perhaps preempt_enable()'s
On Thu, 15 Jan 2015 22:51:30 -0500
Steven Rostedt rost...@goodmis.org wrote:
I haven't done benchmarks in a while, so perhaps accessing the %gs
segment isn't as expensive as I saw it before. I'll have to profile
function tracing on my i7 and see where things are slow again.
I just ran it on
On Thu, 15 Jan 2015 21:57:58 -0600 (CST)
Christoph Lameter c...@linux.com wrote:
I get:
mov%gs:0x18(%rax),%rdx
Looks to me that %gs is used.
%gs is used as a segment prefix. That does not add significant cycles.
Retrieving the content of %gs and loading it into
On Thu, 15 Jan 2015 17:16:34 -0800
Andrew Morton a...@linux-foundation.org wrote:
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
in CONFIG_PREEMPT. (14.821 ns - 14.049 ns)
I'm surprised. preempt_disable/enable are pretty fast. I wonder why
this makes a measurable
On Thu, 15 Jan 2015 16:40:32 +0900 Joonsoo Kim iamjoonsoo@lge.com wrote:
We had to insert a preempt enable/disable in the fastpath a while ago
in order to guarantee that tid and kmem_cache_cpu are retrieved on the
same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler
can
On Thu, 15 Jan 2015 16:40:32 +0900
Joonsoo Kim iamjoonsoo@lge.com wrote:
[...]
I saw roughly 5% win in a fast-path loop over kmem_cache_alloc/free
in CONFIG_PREEMPT. (14.821 ns - 14.049 ns)
Below is the result of Christoph's slab_test reported by
Jesper Dangaard Brouer.
[...]
On Thu, 15 Jan 2015 21:27:14 -0600 (CST)
Christoph Lameter c...@linux.com wrote:
The %gs register is not used since the address of the per cpu area is
available as one of the first fields in the per cpu areas.
Have you disassembled your code?
Looking at put_cpu_partial() from 3.19-rc3 where
I get:
mov%gs:0x18(%rax),%rdx
Looks to me that %gs is used.
%gs is used as a segment prefix. That does not add significant cycles.
Retrieving the content of %gs and loading it into another register would
be expensive in terms of cpu cycles.
--
To unsubscribe from this
We had to insert a preempt enable/disable in the fastpath a while ago
in order to guarantee that tid and kmem_cache_cpu are retrieved on the
same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler
can move the process to other cpu during retrieving data.
Now, I reach the solution
We had to insert a preempt enable/disable in the fastpath a while ago
in order to guarantee that tid and kmem_cache_cpu are retrieved on the
same cpu. It is the problem only for CONFIG_PREEMPT in which scheduler
can move the process to other cpu during retrieving data.
Now, I reach the solution
24 matches
Mail list logo