Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread David Miller
From: Stephen Hemminger <[EMAIL PROTECTED]>
Date: Fri, 17 Aug 2007 20:31:39 -0700

> On Fri, 17 Aug 2007 18:57:00 -0700
> Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> > > On Fri, 17 Aug 2007 18:49:34 -0700
> > > Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > > 
> > > > 
> > > > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > > > be optimized into simple assignments without string instructions.
> > > > 
> > > > 
> > > > so... question.
> > > > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > > > the first place (when using __builtin_memset)? I very much suspect it
> > > > would (and if some version doesn't we really ought to get that
> > > > fixed)
> > > 
> > > i386 and x86_64 are not using __builtin_memset, as least from the
> > > code that I see generated.
> > 
> > .. maybe we should just fix it that way then?
> > 
> There probably is history behind the decision, like gcc problems
> on some old compiler version.

Yes, but those reasons are very likely no longer true.

In fact, just removing the memcpy macro altogether is the best
thing to do.  GCC will do inline memcpy when appropriate.
Then put all of the rep; movsl; etc. code in an external
assembler file and name the routine memcpy.

The inlining is senseless even if it all gets optimized away
into the bare necessary instructions.  All the x86 registers
get clobbered in most of those rep; movsl; code paths, so
it's hardly going to be more expensive to extern the thing
and it will make the kernel smaller to boot.

Anyways the point is to make the real C symbol called by "memcpy"
because that's what makes all the automatic gcc inline memcpy logic
kick in.  If you define the "memcpy" as a macro which calls
differently named functions, you bypass all of that.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:57:00 -0700
Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> 
> On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> > On Fri, 17 Aug 2007 18:49:34 -0700
> > Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> > 
> > > 
> > > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > > be optimized into simple assignments without string instructions.
> > > 
> > > 
> > > so... question.
> > > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > > the first place (when using __builtin_memset)? I very much suspect it
> > > would (and if some version doesn't we really ought to get that
> > > fixed)
> > 
> > i386 and x86_64 are not using __builtin_memset, as least from the
> > code that I see generated.
> 
> .. maybe we should just fix it that way then?
> 
There probably is history behind the decision, like gcc problems
on some old compiler version.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
> On Fri, 17 Aug 2007 18:49:34 -0700
> Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> 
> > 
> > On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > > Tne network code does memset for 6 and 8 byte values, that can easily
> > > be optimized into simple assignments without string instructions.
> > 
> > 
> > so... question.
> > Why are we doing this by hand? Wouldn't gcc just generate this code in
> > the first place (when using __builtin_memset)? I very much suspect it
> > would (and if some version doesn't we really ought to get that
> > fixed)
> 
> i386 and x86_64 are not using __builtin_memset, as least from the
> code that I see generated.

.. maybe we should just fix it that way then?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:49:34 -0700
Arjan van de Ven <[EMAIL PROTECTED]> wrote:

> 
> On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> > Tne network code does memset for 6 and 8 byte values, that can easily
> > be optimized into simple assignments without string instructions.
> 
> 
> so... question.
> Why are we doing this by hand? Wouldn't gcc just generate this code in
> the first place (when using __builtin_memset)? I very much suspect it
> would (and if some version doesn't we really ought to get that
> fixed)

i386 and x86_64 are not using __builtin_memset, as least from the
code that I see generated.

-- 
Stephen Hemminger <[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
> Tne network code does memset for 6 and 8 byte values, that can easily
> be optimized into simple assignments without string instructions.


so... question.
Why are we doing this by hand? Wouldn't gcc just generate this code in
the first place (when using __builtin_memset)? I very much suspect it
would (and if some version doesn't we really ought to get that
fixed)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
Tne network code does memset for 6 and 8 byte values, that can easily
be optimized into simple assignments without string instructions.

Signed-off-by: Stephen Hemminger <[EMAIL PROTECTED]>


--- a/include/asm-i386/string.h 2007-08-17 15:14:37.0 -0700
+++ b/include/asm-i386/string.h 2007-08-17 16:49:10.0 -0700
@@ -228,6 +228,14 @@ static __always_inline void * __constant
case 4:
*(unsigned long *)s = pattern;
return s;
+   case 6:
+   *(unsigned long *)s = pattern;
+   *(2+(unsigned short *)s) = pattern;
+   return s;
+   case 8:
+   *(unsigned long *)s = pattern;
+   *(1+(unsigned long *)s) = pattern;
+   return s;
}
 #define COMMON(x) \
 __asm__  __volatile__( \
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
Tne network code does memset for 6 and 8 byte values, that can easily
be optimized into simple assignments without string instructions.

Signed-off-by: Stephen Hemminger [EMAIL PROTECTED]


--- a/include/asm-i386/string.h 2007-08-17 15:14:37.0 -0700
+++ b/include/asm-i386/string.h 2007-08-17 16:49:10.0 -0700
@@ -228,6 +228,14 @@ static __always_inline void * __constant
case 4:
*(unsigned long *)s = pattern;
return s;
+   case 6:
+   *(unsigned long *)s = pattern;
+   *(2+(unsigned short *)s) = pattern;
+   return s;
+   case 8:
+   *(unsigned long *)s = pattern;
+   *(1+(unsigned long *)s) = pattern;
+   return s;
}
 #define COMMON(x) \
 __asm__  __volatile__( \
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
 Tne network code does memset for 6 and 8 byte values, that can easily
 be optimized into simple assignments without string instructions.


so... question.
Why are we doing this by hand? Wouldn't gcc just generate this code in
the first place (when using __builtin_memset)? I very much suspect it
would (and if some version doesn't we really ought to get that
fixed)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:49:34 -0700
Arjan van de Ven [EMAIL PROTECTED] wrote:

 
 On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
  Tne network code does memset for 6 and 8 byte values, that can easily
  be optimized into simple assignments without string instructions.
 
 
 so... question.
 Why are we doing this by hand? Wouldn't gcc just generate this code in
 the first place (when using __builtin_memset)? I very much suspect it
 would (and if some version doesn't we really ought to get that
 fixed)

i386 and x86_64 are not using __builtin_memset, as least from the
code that I see generated.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Arjan van de Ven

On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
 On Fri, 17 Aug 2007 18:49:34 -0700
 Arjan van de Ven [EMAIL PROTECTED] wrote:
 
  
  On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
   Tne network code does memset for 6 and 8 byte values, that can easily
   be optimized into simple assignments without string instructions.
  
  
  so... question.
  Why are we doing this by hand? Wouldn't gcc just generate this code in
  the first place (when using __builtin_memset)? I very much suspect it
  would (and if some version doesn't we really ought to get that
  fixed)
 
 i386 and x86_64 are not using __builtin_memset, as least from the
 code that I see generated.

.. maybe we should just fix it that way then?

-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread Stephen Hemminger
On Fri, 17 Aug 2007 18:57:00 -0700
Arjan van de Ven [EMAIL PROTECTED] wrote:

 
 On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
  On Fri, 17 Aug 2007 18:49:34 -0700
  Arjan van de Ven [EMAIL PROTECTED] wrote:
  
   
   On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
Tne network code does memset for 6 and 8 byte values, that can easily
be optimized into simple assignments without string instructions.
   
   
   so... question.
   Why are we doing this by hand? Wouldn't gcc just generate this code in
   the first place (when using __builtin_memset)? I very much suspect it
   would (and if some version doesn't we really ought to get that
   fixed)
  
  i386 and x86_64 are not using __builtin_memset, as least from the
  code that I see generated.
 
 .. maybe we should just fix it that way then?
 
There probably is history behind the decision, like gcc problems
on some old compiler version.

-- 
Stephen Hemminger [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] i386: optimize memset of 6 and 8 bytes

2007-08-17 Thread David Miller
From: Stephen Hemminger [EMAIL PROTECTED]
Date: Fri, 17 Aug 2007 20:31:39 -0700

 On Fri, 17 Aug 2007 18:57:00 -0700
 Arjan van de Ven [EMAIL PROTECTED] wrote:
 
  
  On Fri, 2007-08-17 at 18:54 -0700, Stephen Hemminger wrote:
   On Fri, 17 Aug 2007 18:49:34 -0700
   Arjan van de Ven [EMAIL PROTECTED] wrote:
   

On Fri, 2007-08-17 at 16:50 -0700, Stephen Hemminger wrote:
 Tne network code does memset for 6 and 8 byte values, that can easily
 be optimized into simple assignments without string instructions.


so... question.
Why are we doing this by hand? Wouldn't gcc just generate this code in
the first place (when using __builtin_memset)? I very much suspect it
would (and if some version doesn't we really ought to get that
fixed)
   
   i386 and x86_64 are not using __builtin_memset, as least from the
   code that I see generated.
  
  .. maybe we should just fix it that way then?
  
 There probably is history behind the decision, like gcc problems
 on some old compiler version.

Yes, but those reasons are very likely no longer true.

In fact, just removing the memcpy macro altogether is the best
thing to do.  GCC will do inline memcpy when appropriate.
Then put all of the rep; movsl; etc. code in an external
assembler file and name the routine memcpy.

The inlining is senseless even if it all gets optimized away
into the bare necessary instructions.  All the x86 registers
get clobbered in most of those rep; movsl; code paths, so
it's hardly going to be more expensive to extern the thing
and it will make the kernel smaller to boot.

Anyways the point is to make the real C symbol called by memcpy
because that's what makes all the automatic gcc inline memcpy logic
kick in.  If you define the memcpy as a macro which calls
differently named functions, you bypass all of that.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/