[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-13 Thread Ross Zwisler
On Wed, 2014-11-12 at 15:12 +0100, Borislav Petkov wrote:
> On Wed, Nov 12, 2014 at 01:38:45PM +, Anvin, H Peter wrote:
> > No, it doesn't.  x86 requires 3.4+ at a minimum.
> 
> The only test I see is:
> 
> #if GCC_VERSION < 30200
> # error Sorry, your compiler is too old - please upgrade it.
> #endif
> 
> And even if we do require 3.4, the build fails with 4.1+ so...

Ah, dang, you're right.  Okay, I'll figure out how to do this without
using xsaveopt.

Thank you for pointing this out.

- Ross



[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-12 Thread Borislav Petkov
On Wed, Nov 12, 2014 at 01:38:45PM +, Anvin, H Peter wrote:
> No, it doesn't.  x86 requires 3.4+ at a minimum.

The only test I see is:

#if GCC_VERSION < 30200
# error Sorry, your compiler is too old - please upgrade it.
#endif

And even if we do require 3.4, the build fails with 4.1+ so...

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-12 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> +  ".byte 0x66; clflush %P0",
> +  X86_FEATURE_CLFLUSHOPT,
> +  ".byte 0x66; xsaveopt %P0",

Btw, I'm afraid you're going to have to spell out those new instruction
mnemonics as they aren't supported by older compilers:

  CC  drivers/gpu/drm/drm_cache.o
{standard input}: Assembler messages:
{standard input}:63: Error: no such instruction: `xsaveopt (%rax)'
{standard input}:249: Error: no such instruction: `xsaveopt (%rdi)'
{standard input}:283: Error: no such instruction: `xsaveopt -1(%rdx)'
make[1]: *** [drivers/gpu/drm/drm_cache.o] Error 1
make: *** [drivers/gpu/drm/drm_cache.o] Error 2
[boris at etch:14:35:40:k:14)->  gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

This is on a very old guest I have which has this gcc in it and the
kernel supports everything gcc >= 3.2.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-12 Thread Anvin, H Peter
No, it doesn't.  x86 requires 3.4+ at a minimum.


From: Borislav Petkov
Sent: Wednesday, November 12, 2014 4:39:09 AM
To: Ross Zwisler
Cc: linux-kernel at vger.kernel.org; Anvin, H Peter; Ingo Molnar; Thomas 
Gleixner; David Airlie; dri-devel at lists.freedesktop.org; x86 at kernel.org
Subject: Re: [PATCH 3/6] x86: Add support for the clwb instruction

On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> +  ".byte 0x66; clflush %P0",
> +  X86_FEATURE_CLFLUSHOPT,
> +  ".byte 0x66; xsaveopt %P0",

Btw, I'm afraid you're going to have to spell out those new instruction
mnemonics as they aren't supported by older compilers:

  CC  drivers/gpu/drm/drm_cache.o
{standard input}: Assembler messages:
{standard input}:63: Error: no such instruction: `xsaveopt (%rax)'
{standard input}:249: Error: no such instruction: `xsaveopt (%rdi)'
{standard input}:283: Error: no such instruction: `xsaveopt -1(%rdx)'
make[1]: *** [drivers/gpu/drm/drm_cache.o] Error 1
make: *** [drivers/gpu/drm/drm_cache.o] Error 2
[boris at etch:14:35:40:k:14)->  gcc --version
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

This is on a very old guest I have which has this gcc in it and the
kernel supports everything gcc >= 3.2.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 12:48:52PM -0700, Ross Zwisler wrote:
> Essentially we need one additional byte at the beginning of the clflush so
> that we can flip it into a clflushopt by changing that byte into a 0x66
> prefix.  Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
> byte NOP_DS_PREFIX.  Both have no functional effect with the plain clflush,
> but I've been told that executing a clflush + prefix should be faster than
> executing a clflush + NOP.

I see. :)

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 12:40:00PM -0700, Ross Zwisler wrote:
> Yep, it's weird, I know.  :)

But sure, saving opcode space, makes sense to me.

Btw, I'd still be interested about this:

> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",

Any particular reason for using 0x3e as a prefix to have the insns be
the same size or is it simply because CLFLUSH can stomach it?

Thanks.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 08:12:39PM +0100, Borislav Petkov wrote:
> > +".byte 0x66; xsaveopt %P0",
> 
> Huh, XSAVEOPT?!? Shouldn't that be CLWB??

Bah, the same opcodes, only 0x66 prefix makes it into CLWB. Could use a
comment I guess.

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Borislav Petkov
On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> Add support for the new clwb instruction.  This instruction was
> announced in the document "Intel Architecture Instruction Set Extensions
> Programming Reference" with reference number 319433-022.
> 
> https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> 
> Here are some things of note:
> 
>  - As with the clflushopt patches before this, I'm assuming that the 
> addressing
>mode generated by the original clflush instruction will match the new
>clflush instruction with the 0x66 prefix for clflushopt, and for the
>xsaveopt instruction with the 0x66 prefix for clwb.  For all the test cases
>that I've come up with and for the new clwb code generated by this patch
>series, this has proven to be true on my test machine.
> 
>  - According to the SDM, xsaveopt has a form where it has a REX.W prefix.  I
>believe that this prefix will not be generated by gcc in x86_64 kernel 
> code.
>Based on this, I don't believe I need to account for this extra prefix when
>dealing with the assembly language created for clwb.  Please correct me if
>I'm wrong.
> 
> Signed-off-by: Ross Zwisler 
> Cc: H Peter Anvin 
> Cc: Ingo Molnar 
> Cc: Thomas Gleixner 
> Cc: David Airlie 
> Cc: dri-devel at lists.freedesktop.org
> Cc: x86 at kernel.org
> ---
>  arch/x86/include/asm/cpufeature.h|  1 +
>  arch/x86/include/asm/special_insns.h | 10 ++
>  2 files changed, 11 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpufeature.h 
> b/arch/x86/include/asm/cpufeature.h
> index b3e6b89..fbbed34 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -227,6 +227,7 @@
>  #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention 
> */
>  #define X86_FEATURE_PCOMMIT  ( 9*32+22) /* PCOMMIT instruction */
>  #define X86_FEATURE_CLFLUSHOPT   ( 9*32+23) /* CLFLUSHOPT instruction */
> +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
>  #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
>  #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and 
> Reciprocal */
>  #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> diff --git a/arch/x86/include/asm/special_insns.h 
> b/arch/x86/include/asm/special_insns.h
> index 1709a2e..a328460 100644
> --- a/arch/x86/include/asm/special_insns.h
> +++ b/arch/x86/include/asm/special_insns.h
> @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
>  "+m" (*(volatile char __force *)__p));
>  }
>  
> +static inline void clwb(volatile void *__p)
> +{
> + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",

Any particular reason for using 0x3e as a prefix to have the insns be
the same size or is it simply because CLFLUSH can stomach it?

:-)

> +  ".byte 0x66; clflush %P0",
> +  X86_FEATURE_CLFLUSHOPT,
> +  ".byte 0x66; xsaveopt %P0",

Huh, XSAVEOPT?!? Shouldn't that be CLWB??

> +  X86_FEATURE_CLWB,
> +  "+m" (*(volatile char __force *)__p));
> +}
> +
>  static inline void pcommit(void)
>  {
>   alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
> -- 
> 1.9.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--


[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Ross Zwisler
On Tue, 2014-11-11 at 20:46 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 12:40:00PM -0700, Ross Zwisler wrote:
> > Yep, it's weird, I know.  :)
> 
> But sure, saving opcode space, makes sense to me.
> 
> Btw, I'd still be interested about this:
> 
> > +static inline void clwb(volatile void *__p)
> > +{
> > + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> 
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?

Ah, sorry, I was still responding to your first mail.  :)  Response
copied here to save searching:

Essentially we need one additional byte at the beginning of the
clflush so that we can flip it into a clflushopt by changing that byte
into a 0x66 prefix.  Two options are to either insert a 1 byte
ASM_NOP1, or to add a 1 byte NOP_DS_PREFIX.  Both have no functional
effect with the plain clflush, but I've been told that executing a
clflush + prefix should be faster than executing a clflush + NOP.

I agree, this is useful info - I'll add it to the patch comments for v2.

Thank you for the feedback.

- Ross



[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Ross Zwisler
On Tue, 2014-11-11 at 20:12 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> > Add support for the new clwb instruction.  This instruction was
> > announced in the document "Intel Architecture Instruction Set Extensions
> > Programming Reference" with reference number 319433-022.
> > 
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> > 
> > Here are some things of note:
> > 
> >  - As with the clflushopt patches before this, I'm assuming that the 
> > addressing
> >mode generated by the original clflush instruction will match the new
> >clflush instruction with the 0x66 prefix for clflushopt, and for the
> >xsaveopt instruction with the 0x66 prefix for clwb.  For all the test 
> > cases
> >that I've come up with and for the new clwb code generated by this patch
> >series, this has proven to be true on my test machine.
> > 
> >  - According to the SDM, xsaveopt has a form where it has a REX.W prefix.  I
> >believe that this prefix will not be generated by gcc in x86_64 kernel 
> > code.
> >Based on this, I don't believe I need to account for this extra prefix 
> > when
> >dealing with the assembly language created for clwb.  Please correct me 
> > if
> >I'm wrong.
> > 
> > Signed-off-by: Ross Zwisler 
> > Cc: H Peter Anvin 
> > Cc: Ingo Molnar 
> > Cc: Thomas Gleixner 
> > Cc: David Airlie 
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: x86 at kernel.org
> > ---
> >  arch/x86/include/asm/cpufeature.h|  1 +
> >  arch/x86/include/asm/special_insns.h | 10 ++
> >  2 files changed, 11 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/cpufeature.h 
> > b/arch/x86/include/asm/cpufeature.h
> > index b3e6b89..fbbed34 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -227,6 +227,7 @@
> >  #define X86_FEATURE_SMAP   ( 9*32+20) /* Supervisor Mode Access Prevention 
> > */
> >  #define X86_FEATURE_PCOMMIT( 9*32+22) /* PCOMMIT instruction */
> >  #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> > +#define X86_FEATURE_CLWB   ( 9*32+24) /* CLWB instruction */
> >  #define X86_FEATURE_AVX512PF   ( 9*32+26) /* AVX-512 Prefetch */
> >  #define X86_FEATURE_AVX512ER   ( 9*32+27) /* AVX-512 Exponential and 
> > Reciprocal */
> >  #define X86_FEATURE_AVX512CD   ( 9*32+28) /* AVX-512 Conflict 
> > Detection */
> > diff --git a/arch/x86/include/asm/special_insns.h 
> > b/arch/x86/include/asm/special_insns.h
> > index 1709a2e..a328460 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
> >"+m" (*(volatile char __force *)__p));
> >  }
> >  
> > +static inline void clwb(volatile void *__p)
> > +{
> > +   alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
> 
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?
> 
> :-)

Essentially we need one additional byte at the beginning of the clflush so
that we can flip it into a clflushopt by changing that byte into a 0x66
prefix.  Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
byte NOP_DS_PREFIX.  Both have no functional effect with the plain clflush,
but I've been told that executing a clflush + prefix should be faster than
executing a clflush + NOP.




[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Ross Zwisler
On Tue, 2014-11-11 at 20:19 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 08:12:39PM +0100, Borislav Petkov wrote:
> > > +  ".byte 0x66; xsaveopt %P0",
> > 
> > Huh, XSAVEOPT?!? Shouldn't that be CLWB??
> 
> Bah, the same opcodes, only 0x66 prefix makes it into CLWB. Could use a
> comment I guess.

Yep, it's weird, I know.  :)  I'll add a comment.

Thanks,
- Ross



[PATCH 3/6] x86: Add support for the clwb instruction

2014-11-11 Thread Ross Zwisler
Add support for the new clwb instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with reference number 319433-022.

https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf

Here are some things of note:

 - As with the clflushopt patches before this, I'm assuming that the addressing
   mode generated by the original clflush instruction will match the new
   clflush instruction with the 0x66 prefix for clflushopt, and for the
   xsaveopt instruction with the 0x66 prefix for clwb.  For all the test cases
   that I've come up with and for the new clwb code generated by this patch
   series, this has proven to be true on my test machine.

 - According to the SDM, xsaveopt has a form where it has a REX.W prefix.  I
   believe that this prefix will not be generated by gcc in x86_64 kernel code.
   Based on this, I don't believe I need to account for this extra prefix when
   dealing with the assembly language created for clwb.  Please correct me if
   I'm wrong.

Signed-off-by: Ross Zwisler 
Cc: H Peter Anvin 
Cc: Ingo Molnar 
Cc: Thomas Gleixner 
Cc: David Airlie 
Cc: dri-devel at lists.freedesktop.org
Cc: x86 at kernel.org
---
 arch/x86/include/asm/cpufeature.h|  1 +
 arch/x86/include/asm/special_insns.h | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h 
b/arch/x86/include/asm/cpufeature.h
index b3e6b89..fbbed34 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -227,6 +227,7 @@
 #define X86_FEATURE_SMAP   ( 9*32+20) /* Supervisor Mode Access Prevention 
*/
 #define X86_FEATURE_PCOMMIT( 9*32+22) /* PCOMMIT instruction */
 #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
+#define X86_FEATURE_CLWB   ( 9*32+24) /* CLWB instruction */
 #define X86_FEATURE_AVX512PF   ( 9*32+26) /* AVX-512 Prefetch */
 #define X86_FEATURE_AVX512ER   ( 9*32+27) /* AVX-512 Exponential and 
Reciprocal */
 #define X86_FEATURE_AVX512CD   ( 9*32+28) /* AVX-512 Conflict Detection */
diff --git a/arch/x86/include/asm/special_insns.h 
b/arch/x86/include/asm/special_insns.h
index 1709a2e..a328460 100644
--- a/arch/x86/include/asm/special_insns.h
+++ b/arch/x86/include/asm/special_insns.h
@@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
   "+m" (*(volatile char __force *)__p));
 }

+static inline void clwb(volatile void *__p)
+{
+   alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
+".byte 0x66; clflush %P0",
+X86_FEATURE_CLFLUSHOPT,
+".byte 0x66; xsaveopt %P0",
+X86_FEATURE_CLWB,
+"+m" (*(volatile char __force *)__p));
+}
+
 static inline void pcommit(void)
 {
alternative(ASM_NOP4, ".byte 0x66, 0x0f, 0xae, 0xf8",
-- 
1.9.3