From: Hiro Yoshioka <[EMAIL PROTECTED]>
Subject: Re: [RFC] [PATCH] cache pollution aware __copy_from_user_ll()
Date: Fri, 02 Sep 2005 13:37:16 +0900 (JST)
Message-ID: <[EMAIL PROTECTED]>
> From: Andrew Morton <[EMAIL PROTECTED]>
> > Hiro Yoshioka <[EMAIL PROTECTED]
From: Andrew Morton <[EMAIL PROTECTED]>
> Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
> >
> > --- linux-2.6.12.4.orig/arch/i386/lib/usercopy.c2005-08-05
> > 16:04:37.0 +0900
> > +++ linux-2.6.12.4.nt/arch/i386/lib/usercopy.c 2005-09-01
> > 17:09:41.0 +0900
>
> Really. Plea
Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
>
> --- linux-2.6.12.4.orig/arch/i386/lib/usercopy.c 2005-08-05
> 16:04:37.0 +0900
> +++ linux-2.6.12.4.nt/arch/i386/lib/usercopy.c 2005-09-01
> 17:09:41.0 +0900
Really. Please redo and retest the patch against a current kerne
Andrew,
From: Andrew Morton <[EMAIL PROTECTED]>
> Andi Kleen <[EMAIL PROTECTED]> wrote:
> >
> > On Friday 02 September 2005 04:08, Andrew Morton wrote:
> >
> > > I suppose I'll queue it up in -mm for a while, although I'm a bit dubious
> > > about the whole idea... We'll gain some and we'll lose
Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> On Friday 02 September 2005 04:08, Andrew Morton wrote:
>
> > I suppose I'll queue it up in -mm for a while, although I'm a bit dubious
> > about the whole idea... We'll gain some and we'll lose some - how do we
> > know it's a net gain?
>
> I suspect it
On Friday 02 September 2005 04:08, Andrew Morton wrote:
> I suppose I'll queue it up in -mm for a while, although I'm a bit dubious
> about the whole idea... We'll gain some and we'll lose some - how do we
> know it's a net gain?
I suspect it'll gain more than it loses. The only case where it mi
Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
>
> From: Andi Kleen <[EMAIL PROTECTED]>
> > On Thursday 01 September 2005 11:07, Hiro Yoshioka wrote:
> >
> > > The following is the almost final version of the
> > > cache pollution aware __copy_from_user_ll() patch.
> >
> > Looks good to me.
> >
> > On
On Friday 02 September 2005 03:43, Hiro Yoshioka wrote:
> From: Andi Kleen <[EMAIL PROTECTED]>
>
> > On Thursday 01 September 2005 11:07, Hiro Yoshioka wrote:
> > > The following is the almost final version of the
> > > cache pollution aware __copy_from_user_ll() patch.
> >
> > Looks good to me.
>
From: Andi Kleen <[EMAIL PROTECTED]>
> On Thursday 01 September 2005 11:07, Hiro Yoshioka wrote:
>
> > The following is the almost final version of the
> > cache pollution aware __copy_from_user_ll() patch.
>
> Looks good to me.
>
> Once the filemap.c hunk is in I'll probably do something
> simi
On Thursday 01 September 2005 11:07, Hiro Yoshioka wrote:
> The following is the almost final version of the
> cache pollution aware __copy_from_user_ll() patch.
Looks good to me.
Once the filemap.c hunk is in I'll probably do something
similar for x86-64.
-Andi
-
To unsubscribe from this list:
Hi,
> From: Andi Kleen <[EMAIL PROTECTED]>
> > > Hi,
> > >
> > > The following patch does not use MMX regsiters so that we don't have
> > > to worry about save/restore the FPU/MMX states.
> > >
> > > What do you think?
> >
> > Performance will probably be bad on K7 Athlons - those have a microc
From: Andi Kleen <[EMAIL PROTECTED]>
> > Hi,
> >
> > The following patch does not use MMX regsiters so that we don't have
> > to worry about save/restore the FPU/MMX states.
> >
> > What do you think?
>
> Performance will probably be bad on K7 Athlons - those have a microcoded
> movnti which is
From: Hirokazu Takahashi <[EMAIL PROTECTED]>
> > The following patch does not use MMX regsiters so that we don't have
> > to worry about save/restore the FPU/MMX states.
> >
> > What do you think?
>
> I think __copy_user_zeroing_intel_nocache() should be followed by sfence
> or mfence instruction
Hi,
> The following patch does not use MMX regsiters so that we don't have
> to worry about save/restore the FPU/MMX states.
>
> What do you think?
I think __copy_user_zeroing_intel_nocache() should be followed by sfence
or mfence instruction to flush the data.
-
To unsubscribe from this list:
Hiro Yoshioka <[EMAIL PROTECTED]> writes:
> Hi,
>
> The following patch does not use MMX regsiters so that we don't have
> to worry about save/restore the FPU/MMX states.
>
> What do you think?
Performance will probably be bad on K7 Athlons - those have a microcoded
movnti which is quite slow.
On Wed, 2005-08-24 at 23:11 +0900, Hiro Yoshioka wrote:
> Hi,
>
> The following patch does not use MMX regsiters so that we don't have
> to worry about save/restore the FPU/MMX states.
>
> What do you think?
excellent!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" i
Hi,
The following patch does not use MMX regsiters so that we don't have
to worry about save/restore the FPU/MMX states.
What do you think?
Some performance data are
Total of GLOBAL_POWER_EVENTS (CPU cycle samples)
2.6.12.4.orig1921587
2.6.12.4.nt 1688900
1688900/1921587=87.89% (12.1%
Hi,
It seems to me this mail does not go out.
So resending it.
> On 8/18/05, Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
> > 1) using stack to save/restore MMX registers
>
> It seems to me that it has some regression.
> I'd like to rollback it and use kernel_fpu_begin() and kernel_fpu_end().
The f
Hi,
It seems to me this mail does not go out.
So resending it.
> On 8/18/05, Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
> > 1) using stack to save/restore MMX registers
>
> It seems to me that it has some regression.
> I'd like to rollback it and use kernel_fpu_begin() and kernel_fpu_end().
The f
> On 8/18/05, Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
> > 1) using stack to save/restore MMX registers
>
> It seems to me that it has some regression.
> I'd like to rollback it and use kernel_fpu_begin() and kernel_fpu_end().
The following is a current version of cache aware copy_from_user_ll.
> 2) low latency version of cache aware copy
Having a low latency version that is only active with CONFIG_PREEMPT
is bad - non preempt kernels need good latency too.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More
Hi,
On 8/18/05, Hiro Yoshioka <[EMAIL PROTECTED]> wrote:
> 1) using stack to save/restore MMX registers
It seems to me that it has some regression.
I'd like to rollback it and use kernel_fpu_begin() and kernel_fpu_end().
Regards,
Hiro
--
Hiro Yoshioka
mailto:hyoshiok at miraclelinux.com
-
To
On Thu, 2005-08-18 at 00:27 +0900, Akira Tsukamoto wrote:
> My computer with Athlon K7 was faster with manually prefetching,
> but I did not know it is already becoming obsolete.
>
Don't listen to people who tell you $FOO hardware is obsolete, they have
a very narrow view. "Obsolete" is meaningl
> So I make two APIs.
> __copy_user_zeroing_nocache()
> __copy_user_zeroing_inatomic_nocache()
>
> The former is a low latency version and the other is a throughput version.
1) using stack to save/restore MMX registers
2) low latency version of cache aware copy
3) __copy_user*_nocache APIs so if
On 16 Aug 2005 15:15:35 +0200, Andi Kleen <[EMAIL PROTECTED]> wrote:
> However it disables preemption, which especially for bigger
> copies will probably make the low latency people unhappy.
In the copy loop,
+#ifdef CONFIG_PREEMPT
+ if ( (i%64)==0 ) {
+ MMX_RESTORE
Chuck,
On 8/18/05, Chuck Ebbert <[EMAIL PROTECTED]> wrote:
> On Wed, 17 Aug 2005 at 13:50:22 +0900 (JST), Hiro Yoshioka wrote:
>
> > 3) page faults/exceptions/...
> > 3-1 TS flag is set by the CPU (Am I right?)
>
> TS will _not_ be set if a trap/fault or interrupt occurs. The only
> way that
On Wed, 17 Aug 2005 23:30:13 +0900
Akira Tsukamoto <[EMAIL PROTECTED]> mentioned:
> > I'm trying to understand this mechanism but I don't
> > understand very well.
>
> My explanation was a bit ambiguous, see the code below.
> Where the fp register saved? It saves fp register *inside* task_struct
On Wed, 17 Aug 2005 at 13:50:22 +0900 (JST), Hiro Yoshioka wrote:
> 3) page faults/exceptions/...
> 3-1 TS flag is set by the CPU (Am I right?)
TS will _not_ be set if a trap/fault or interrupt occurs. The only
way that could happen automatically would be to use a separate hardware
task with
I am resubmitting this because it seems to be lost when I posted
the before yesterday.
Arjan van de Ven mentioned:
> The only comment/question I have is about the use of prefetchnta; that
> might have cache-evicting properties as well (eg evict the cache of t
On Wed, 17 Aug 2005 14:10:34 +0900
Hiro Yoshioka <[EMAIL PROTECTED]> mentioned:
> On 8/17/05, Akira Tsukamoto <[EMAIL PROTECTED]> wrote:
> > Anyway, going back to copy_user topic,
> > big remaining issues are
> > 1)store/restore floating point register (80/64bytes) twice every time by
> > su
Akira,
Thanks for your suggestions.
On 8/17/05, Akira Tsukamoto <[EMAIL PROTECTED]> wrote:
> Anyway, going back to copy_user topic,
> big remaining issues are
> 1)store/restore floating point register (80/64bytes) twice every time by
> surrounding with kernel_fpu_begin()/kernel_fpu_end() i
From: Hiro Yoshioka <[EMAIL PROTECTED]>
Subject: Re: [RFC] [PATCH] cache pollution aware __copy_from_user_ll()
Date: Wed, 17 Aug 2005 08:21:53 +0900 (JST)
Message-ID: <[EMAIL PROTECTED]>
> Chuck,
>
> From: Chuck Ebbert <[EMAIL PROTECTED]>
> > On Tue, 16 Aug 2
Chuck,
From: Chuck Ebbert <[EMAIL PROTECTED]>
> On Tue, 16 Aug 2005 at 19:16:17 +0900 (JST), Hiro Yoshioka wrote:
> > oh, really? Does the linux kernel take care of
> > SSE save/restore on a task switch?
>
> Check out XMMS_SAVE and XMMS_RESTORE in include/asm-i386/xor.h
Thanks for your suggesti
On Tue, 16 Aug 2005 at 19:16:17 +0900 (JST), Hiro Yoshioka wrote:
> oh, really? Does the linux kernel take care of
> SSE save/restore on a task switch?
Check out XMMS_SAVE and XMMS_RESTORE in include/asm-i386/xor.h
__
Chuck
-
To unsubscribe from this list: send the line "unsubscribe linux-kern
Arjan van de Ven <[EMAIL PROTECTED]> writes:
>
> not on kernel entry afaik.
> However just save the register on the stack and put it back at the
> end...
You need to do more than that, like disabling lazy FPU mode.
That is what kernel_fpu_begin/end takes care of.
However it disables preemption
Hi,
> > > > My code does nothing do it.
> > > >
> > > > I need a volunteer to implement it.
> > >
> > > it's actually not too hard; all you need is to use SSE and not MMX; and
> > > then just store sse register you're overwriting on the stack or so...
> >
> > oh, really? Does the linux kernel t
On Tue, 2005-08-16 at 19:16 +0900, Hiro Yoshioka wrote:
> From: Arjan van de Ven <[EMAIL PROTECTED]>
> > > My code does nothing do it.
> > >
> > > I need a volunteer to implement it.
> >
> > it's actually not too hard; all you need is to use SSE and not MMX; and
> > then just store sse register y
Hi
> > > My code does nothing do it.
> > >
> > > I need a volunteer to implement it.
> >
> > it's actually not too hard; all you need is to use SSE and not MMX; and
> > then just store sse register you're overwriting on the stack or so...
>
> oh, really? Does the linux kernel take care of
> SSE
From: Arjan van de Ven <[EMAIL PROTECTED]>
> > My code does nothing do it.
> >
> > I need a volunteer to implement it.
>
> it's actually not too hard; all you need is to use SSE and not MMX; and
> then just store sse register you're overwriting on the stack or so...
oh, really? Does the linux ke
On Tue, 2005-08-16 at 12:30 +0900, Hiro Yoshioka wrote:
> The following example shows the L3 cache miss is reduced from 37410 to 107.
most impressive; it seems the approach to do this selectively is paying
off very well!
The only comment/question I have is about the use of prefetchnta; that
mig
On Tue, 2005-08-16 at 13:17 +0900, Hirokazu Takahashi wrote:
> Hi,
>
> BTW, what are you going to do with the page-faults which may happen
> during __copy_user_zeroing_nocache()? The current process may be blocked
> in the handler for a while and get FPU registers polluted.
> kernel_fpu_begin() wo
On Tue, 2005-08-16 at 13:54 +0900, Hiro Yoshioka wrote:
> Takahashi san,
>
> I appreciate your comments.
>
> > Hi,
> >
> > BTW, what are you going to do with the page-faults which may happen
> > during __copy_user_zeroing_nocache()? The current process may be blocked
> > in the handler for a whi
Takahashi san,
I appreciate your comments.
> Hi,
>
> BTW, what are you going to do with the page-faults which may happen
> during __copy_user_zeroing_nocache()? The current process may be blocked
> in the handler for a while and get FPU registers polluted.
> kernel_fpu_begin() won't help the cas
Hi,
BTW, what are you going to do with the page-faults which may happen
during __copy_user_zeroing_nocache()? The current process may be blocked
in the handler for a while and get FPU registers polluted.
kernel_fpu_begin() won't help the case. This is another issue, though.
> > Thanks.
> >
> > f
From: Hiro Yoshioka <[EMAIL PROTECTED]>
Date: Tue, 16 Aug 2005 08:33:59 +0900
> Thanks.
>
> filemap_copy_from_user() calls __copy_from_user_inatomic() calls
> __copy_from_user_ll().
>
> I'll look at the code.
The following is a quick hack of cache aware implementation
of __copy_from_user_ll() a
On 8/15/05, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > copy_from_user_nocache() is fine.
> >
> > But I don't know where I can use it. (I'm not so
> > familiar with the linux kernel file system yet.)
>
> I suspect the few cases where it will make the most difference will be
> in the VFS for t
On Mon, Aug 15, 2005 at 05:09:12PM +0200, Arjan van de Ven wrote:
> On Mon, 2005-08-15 at 17:02 +0200, Andi Kleen wrote:
> > Arjan van de Ven <[EMAIL PROTECTED]> writes:
> >
> > > On Mon, 2005-08-15 at 08:15 -0400, [EMAIL PROTECTED] wrote:
> > > > Actually, is there any place *other* than write()
On Mon, 2005-08-15 at 17:02 +0200, Andi Kleen wrote:
> Arjan van de Ven <[EMAIL PROTECTED]> writes:
>
> > On Mon, 2005-08-15 at 08:15 -0400, [EMAIL PROTECTED] wrote:
> > > Actually, is there any place *other* than write() to the page cache that
> > > warrants a non-temporal store? Network sockets
Arjan van de Ven <[EMAIL PROTECTED]> writes:
> On Mon, 2005-08-15 at 08:15 -0400, [EMAIL PROTECTED] wrote:
> > Actually, is there any place *other* than write() to the page cache that
> > warrants a non-temporal store? Network sockets with scatter/gather and
> > hardware checksum, maybe?
>
> afa
On Mon, 2005-08-15 at 09:21 +0200, Arjan van de Ven wrote:
> On Sun, 2005-08-14 at 23:24 +0200, Ian Kumlien wrote:
> > Hi, all
> >
> > I might be missunderstanding things but...
> >
> > First of all, machines with long pipelines will suffer from cache misses
> > (p4 in this case).
> >
> > Depend
On Mon, 2005-08-15 at 08:15 -0400, [EMAIL PROTECTED] wrote:
> Actually, is there any place *other* than write() to the page cache that
> warrants a non-temporal store? Network sockets with scatter/gather and
> hardware checksum, maybe?
afaik those use zero copy already, eg straight pagecache copy
Actually, is there any place *other* than write() to the page cache that
warrants a non-temporal store? Network sockets with scatter/gather and
hardware checksum, maybe?
This is pretty much synonomous with what is allowed to go into high
memory, no?
While we're on the subject, for the copy_from
On Mon, 2005-08-15 at 17:44 +0900, Hiro Yoshioka wrote:
> Hi,
>
> I appreciate your suggestion.
>
> On 8/15/05, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> >
> > > Anyway we could not find the cache aware version of __copy_from_user_ll
> > > has a big regression yet.
> >
> >
> > that is beca
Hi,
I appreciate your suggestion.
On 8/15/05, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
>
> > Anyway we could not find the cache aware version of __copy_from_user_ll
> > has a big regression yet.
>
>
> that is because you spread the cache misses out from one place to all
> over the place, so
On Sun, 2005-08-14 at 23:24 +0200, Ian Kumlien wrote:
> Hi, all
>
> I might be missunderstanding things but...
>
> First of all, machines with long pipelines will suffer from cache misses
> (p4 in this case).
>
> Depending on the size copied, (i don't know how large they are so..)
> can't one ru
> Anyway we could not find the cache aware version of __copy_from_user_ll
> has a big regression yet.
that is because you spread the cache misses out from one place to all
over the place, so that no one single point sticks out anymore.
Do you agree that your copy is less optimal for the case wh
Hi,
From: Arjan van de Ven <[EMAIL PROTECTED]>
Subject: Re: [RFC] [PATCH] cache pollution aware __copy_from_user_ll()
Date: Sun, 14 Aug 2005 12:35:43 +0200
Message-ID: <[EMAIL PROTECTED]>
> On Sun, 2005-08-14 at 19:22 +0900, Hiro Yoshioka wrote:
> > Thanks for your comments
Hi, all
I might be missunderstanding things but...
First of all, machines with long pipelines will suffer from cache misses
(p4 in this case).
Depending on the size copied, (i don't know how large they are so..)
can't one run out of cachelines and/or evict more useful cache data?
Ie, if it's ca
> the problem is that the pay elsewhere is far more spread out, but not
> less. At least generally
>
> I can see the point of a copy_from_user_nocache() or something, for
> those cases where we *know* we are not going to use the copied data in
> the cpu (but say, only do DMA).
> But that shoul
On Sun, 2005-08-14 at 19:22 +0900, Hiro Yoshioka wrote:
> Thanks for your comments.
>
> On 8/14/05, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > On Sun, 2005-08-14 at 18:16 +0900, Hiro Yoshioka wrote:
> > > Hi,
> > >
> > > The following is a patch to reduce a cache pollution
> > > of __copy_fro
Thanks for your comments.
On 8/14/05, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> On Sun, 2005-08-14 at 18:16 +0900, Hiro Yoshioka wrote:
> > Hi,
> >
> > The following is a patch to reduce a cache pollution
> > of __copy_from_user_ll().
> >
> > When I run simple iozone benchmark to find a perfor
On Sun, 2005-08-14 at 18:16 +0900, Hiro Yoshioka wrote:
> Hi,
>
> The following is a patch to reduce a cache pollution
> of __copy_from_user_ll().
>
> When I run simple iozone benchmark to find a performance bottleneck of
> the linux kernel, I found that __copy_from_user_ll() spent CPU cycle
> mo
Hi,
The following is a patch to reduce a cache pollution
of __copy_from_user_ll().
When I run simple iozone benchmark to find a performance bottleneck of
the linux kernel, I found that __copy_from_user_ll() spent CPU cycle
most and it did many cache misses.
The following is profiled by oprofile.
63 matches
Mail list logo