Re: The killing of ideal_nops[]

2021-03-14 Thread Maciej W. Rozycki
On Wed, 10 Mar 2021, Peter Zijlstra wrote:

> Below is the latest version which I just pushed out to my git tree so
> that the robots can have a go at it.

 Do you want me to quickly check it with a real i486, or is it already 
covered by said robots (NB I wouldn't trust QEMU with such stuff)?

  Maciej


RE: The killing of ideal_nops[]

2021-03-11 Thread David Laight
From: Peter Zijlstra 
...
> Below is the latest version which I just pushed out to my git tree so
> that the robots can have a go at it.

Why not delete the indirection table?
So you end up with:

> +#ifndef CONFIG_64BIT
> +
> +/*
> + * Generic 32bit nops from GAS:
> + *
> + * 1: nop
> + * 2: movl %esi,%esi
> + * 3: leal 0x00(%esi),%esi
> + * 4: leal 0x00(,%esi,1),%esi
> + * 5: leal %ds:0x00(,%esi,1),%esi
> + * 6: leal 0x(%esi),%esi
> + * 7: leal 0x(,%esi,1),%esi
> + * 8: leal %ds:0x(,%esi,1),%esi
>   *
> - * *_NOP5_ATOMIC must be a single instruction.
> + * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
> + * nop instructions.
>   */
> +#define BYTES_NOP1   0x90
> +#define BYTES_NOP2   0x89,0xf6
> +#define BYTES_NOP3   0x8d,0x76,0x00
> +#define BYTES_NOP4   0x8d,0x74,0x26,0x00
> +#define BYTES_NOP5   0x3e,BYTES_NOP4
> +#define BYTES_NOP6   0x8d,0xb6,0x00,0x00,0x00,0x00
> +#define BYTES_NOP7   0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
> +#define BYTES_NOP8   0x3e,BYTES_NOP7

const unsigned char const x86_nops[8][8] = {
{ BYTES_NOP1 },
{ BYTES_NOP2 },
{ BYTES_NOP3 },
{ BYTES_NOP4 },
{ BYTES_NOP5 },
{ BYTES_NOP6 },
{ BYTES_NOP7 },
{ BYTES_NOP8 }
};

The rest of the patch may not need changing.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 07:48:08AM -0800, Alexei Starovoitov wrote:
> Ack for bpf bits.

Thanks!

> I think the cleanup is good from the point of having one way to do things.

Right, that was the motivation. Currently x86 is the sole architecture
(and thus weird and more complicated) where NOPs vary at runtime.

> Though I won't be surprised if somebody comes along with a patch
> to use different nops eventually.

I don't particularly care about the exact NOP encoding, as long as we
can all agree on it at build time. Having to either do boot time NOP
rewrites of everything or runtime compare against multiple NOPs is
complexity we don't need.

(and ideally the kernel and gas/clang .nops directives should agree with
one another, although that's not the case with the below patch for
32bit).

> When I first looked at it years ago I was wondering why segment selector
> prefix is not used. afaik windows was using it, because having cs: ds: nop
> makes intel use only one of the instruction decoders (the big and slow one)
> which allegedly saves power, since the pipeline has bubbles.
> Things could be completely different now in modern u-arches.

So we do use DS prefix for NOP[58] on 32bit. 64bit seems to prefer
OSP prefixes.

Below is the latest version which I just pushed out to my git tree so
that the robots can have a go at it.

---
Subject: x86: Remove dynamic NOP selection
From: Peter Zijlstra 
Date: Wed Mar 10 11:43:35 CET 2021

This ensures that a NOP is a NOP and not a random other instruction
that is also a NOP. It allows simplification of dynamic code patching
that wants to verify existing code before writing new instructions
(ftrace, jump_label, static_call, etc..).

Differentiating on NOPs is not a 'feature'.

After this FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.

AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
64bit CPUs (CARELESS) and early AMD K8 which is from 2003 and almost
2 decades old by now (SHRUG).

Everything x86_64 since AMD K10 (2007) is using p6_nops.

And per FEATURE_NOPL being required for x86_64, all those CPUs can use
p6_nops. So stop caring about NOPs, simplify things and get on with
life.

(more cleanups possible)

Signed-off-by: Peter Zijlstra (Intel) 
---
 arch/x86/include/asm/jump_label.h|   12 --
 arch/x86/include/asm/nops.h  |  180 ++-
 arch/x86/include/asm/special_insns.h |4 
 arch/x86/kernel/alternative.c|  198 +++
 arch/x86/kernel/ftrace.c |4 
 arch/x86/kernel/jump_label.c |   32 +
 arch/x86/kernel/kprobes/core.c   |2 
 arch/x86/kernel/setup.c  |1 
 arch/x86/kernel/static_call.c|4 
 arch/x86/net/bpf_jit_comp.c  |8 -
 10 files changed, 98 insertions(+), 347 deletions(-)

--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -6,12 +6,6 @@
 
 #define JUMP_LABEL_NOP_SIZE 5
 
-#ifdef CONFIG_X86_64
-# define STATIC_KEY_INIT_NOP P6_NOP5_ATOMIC
-#else
-# define STATIC_KEY_INIT_NOP GENERIC_NOP5_ATOMIC
-#endif
-
 #include 
 #include 
 
@@ -23,7 +17,7 @@
 static __always_inline bool arch_static_branch(struct static_key * const key, 
const bool branch)
 {
asm_volatile_goto("1:"
-   ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+   ".byte " __stringify(BYTES_NOP5) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
".long 1b - ., %l[l_yes] - . \n\t"
@@ -63,7 +57,7 @@ static __always_inline bool arch_static_
.long   \target - .Lstatic_jump_after_\@
 .Lstatic_jump_after_\@:
.else
-   .byte   STATIC_KEY_INIT_NOP
+   .byte   BYTES_NOP5
.endif
.pushsection __jump_table, "aw"
_ASM_ALIGN
@@ -75,7 +69,7 @@ static __always_inline bool arch_static_
 .macro STATIC_JUMP_IF_FALSE target, key, def
 .Lstatic_jump_\@:
.if \def
-   .byte   STATIC_KEY_INIT_NOP
+   .byte   BYTES_NOP5
.else
/* Equivalent to "jmp.d32 \target" */
.byte   0xe9
--- a/arch/x86/include/asm/nops.h
+++ b/arch/x86/include/asm/nops.h
@@ -4,89 +4,58 @@
 
 /*
  * Define nops for use with alternative() and for tracing.
+ */
+
+#ifndef CONFIG_64BIT
+
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x00(%esi),%esi
+ * 4: leal 0x00(,%esi,1),%esi
+ * 5: leal %ds:0x00(,%esi,1),%esi
+ * 6: leal 0x(%esi),%esi
+ * 7: leal 0x(,%esi,1),%esi
+ * 8: leal %ds:0x(,%esi,1),%esi
  *
- * *_NOP5_ATOMIC must be a single instruction.
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
  */
+#define BYTES_NOP1 0x90
+#define BYTES_NOP2 0x89,0xf6
+#define BYTES_NOP3 0x8d,0x76,0x00
+#define BYTES_NOP4 0x8d,0x74,0x26,0x00

Re: The killing of ideal_nops[]

2021-03-10 Thread Alexei Starovoitov
On Wed, Mar 10, 2021 at 6:29 AM Peter Zijlstra  wrote:
>
> On Wed, Mar 10, 2021 at 09:13:24AM -0500, Steven Rostedt wrote:
> > On Wed, 10 Mar 2021 11:22:48 +0100
> > Peter Zijlstra  wrote:
> >
> > > After this FEATURE_NOPL is unused except for required-features for
> > > x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.
> > >
> > > AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
> > > 64bit CPUs (CARELESS) and early AMD (K8) which is from 2003 and almost
> > > 2 decades old by now (SHRUG).
> > >
> > > Everything x86_64 since AMD K10 (2007) was using p6_nops.
> > >
> > > And per FEATURE_NOPL being required for x86_64, all those CPUs can use
> > > p6_nops. So stop caring about NOPs, simplify things and get on with life
> > > :-)
> >
> > Before ripping out all the ideal_nop logic, I wonder if we should just
> > force the nops you want now (that is, don't change the selected
> > ideal_nops, just "pretend" that the CPU wants p6_nops), and see if anyone
> > complains. After a few releases, if there's no complaints, then we can
> > rip out the ideal_nop logic.
>
> Nah, just rip the entire thing out. You should be happy about
> deterministic NOPs :-)

Ack for bpf bits.
I think the cleanup is good from the point of having one way to do things.
Though I won't be surprised if somebody comes along with a patch
to use different nops eventually.
When I first looked at it years ago I was wondering why segment selector
prefix is not used. afaik windows was using it, because having cs: ds: nop
makes intel use only one of the instruction decoders (the big and slow one)
which allegedly saves power, since the pipeline has bubbles.
Things could be completely different now in modern u-arches.


Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 03:24:47PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 10, 2021 at 09:13:24AM -0500, Steven Rostedt wrote:
> > On Wed, 10 Mar 2021 11:22:48 +0100
> > Peter Zijlstra  wrote:
> > 
> > > After this FEATURE_NOPL is unused except for required-features for
> > > x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.
> > > 
> > > AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
> > > 64bit CPUs (CARELESS) and early AMD (K8) which is from 2003 and almost
> > > 2 decades old by now (SHRUG).
> > > 
> > > Everything x86_64 since AMD K10 (2007) was using p6_nops.
> > > 
> > > And per FEATURE_NOPL being required for x86_64, all those CPUs can use
> > > p6_nops. So stop caring about NOPs, simplify things and get on with life
> > > :-)
> > 
> > Before ripping out all the ideal_nop logic, I wonder if we should just
> > force the nops you want now (that is, don't change the selected
> > ideal_nops, just "pretend" that the CPU wants p6_nops), and see if anyone
> > complains. After a few releases, if there's no complaints, then we can
> > rip out the ideal_nop logic.
> 
> Nah, just rip the entire thing out. You should be happy about
> deterministic NOPs :-)
> 
> NOP encoding is not something CPUs should differentiate on, that's just
> bollocks.

Also, you seem to have fallen off of IRC. Anyway, weren't you the one
that was complaining x86 was 'special' for having different NOPs the
other day?

Fixed it ;-)


Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 09:13:24AM -0500, Steven Rostedt wrote:
> On Wed, 10 Mar 2021 11:22:48 +0100
> Peter Zijlstra  wrote:
> 
> > After this FEATURE_NOPL is unused except for required-features for
> > x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.
> > 
> > AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
> > 64bit CPUs (CARELESS) and early AMD (K8) which is from 2003 and almost
> > 2 decades old by now (SHRUG).
> > 
> > Everything x86_64 since AMD K10 (2007) was using p6_nops.
> > 
> > And per FEATURE_NOPL being required for x86_64, all those CPUs can use
> > p6_nops. So stop caring about NOPs, simplify things and get on with life
> > :-)
> 
> Before ripping out all the ideal_nop logic, I wonder if we should just
> force the nops you want now (that is, don't change the selected
> ideal_nops, just "pretend" that the CPU wants p6_nops), and see if anyone
> complains. After a few releases, if there's no complaints, then we can
> rip out the ideal_nop logic.

Nah, just rip the entire thing out. You should be happy about
deterministic NOPs :-)

NOP encoding is not something CPUs should differentiate on, that's just
bollocks.


Re: The killing of ideal_nops[]

2021-03-10 Thread Steven Rostedt
On Wed, 10 Mar 2021 11:22:48 +0100
Peter Zijlstra  wrote:

> After this FEATURE_NOPL is unused except for required-features for
> x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.
> 
> AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
> 64bit CPUs (CARELESS) and early AMD (K8) which is from 2003 and almost
> 2 decades old by now (SHRUG).
> 
> Everything x86_64 since AMD K10 (2007) was using p6_nops.
> 
> And per FEATURE_NOPL being required for x86_64, all those CPUs can use
> p6_nops. So stop caring about NOPs, simplify things and get on with life
> :-)

Before ripping out all the ideal_nop logic, I wonder if we should just
force the nops you want now (that is, don't change the selected
ideal_nops, just "pretend" that the CPU wants p6_nops), and see if anyone
complains. After a few releases, if there's no complaints, then we can
rip out the ideal_nop logic.

-- Steve


Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 11:03:10AM +0100, Peter Zijlstra wrote:
> -void __init arch_init_ideal_nops(void)
> -{
> - switch (boot_cpu_data.x86_vendor) {
> - case X86_VENDOR_INTEL:
> - /*
> -  * Due to a decoder implementation quirk, some
> -  * specific Intel CPUs actually perform better with
> -  * the "k8_nops" than with the SDM-recommended NOPs.
> -  */
> - if (boot_cpu_data.x86 == 6 &&
> - boot_cpu_data.x86_model >= 0x0f &&
> - boot_cpu_data.x86_model != 0x1c &&
> - boot_cpu_data.x86_model != 0x26 &&
> - boot_cpu_data.x86_model != 0x27 &&
> - boot_cpu_data.x86_model < 0x30) {
> - ideal_nops = k8_nops;
> - } else if (boot_cpu_has(X86_FEATURE_NOPL)) {
> -ideal_nops = p6_nops;
> - } else {
> -#ifdef CONFIG_X86_64
> - ideal_nops = k8_nops;
> -#else
> - ideal_nops = intel_nops;
> -#endif
> - }
> - break;
> -
> - case X86_VENDOR_HYGON:
> - ideal_nops = p6_nops;
> - return;
> -
> - case X86_VENDOR_AMD:
> - if (boot_cpu_data.x86 > 0xf) {
> - ideal_nops = p6_nops;
> - return;
> - }
> -
> - fallthrough;
> -
> - default:
> -#ifdef CONFIG_X86_64
> - ideal_nops = k8_nops;
> -#else
> - if (boot_cpu_has(X86_FEATURE_K8))
> - ideal_nops = k8_nops;
> - else if (boot_cpu_has(X86_FEATURE_K7))
> - ideal_nops = k7_nops;
> - else
> - ideal_nops = intel_nops;
> -#endif
> - }
> -}

After this FEATURE_NOPL is unused except for required-features for
x86_64. FEATURE_K8 is only used for PTI and FEATURE_K7 is unused.

AFAICT this negatively affects lots of 32bit (DONTCARE) and 32bit on
64bit CPUs (CARELESS) and early AMD (K8) which is from 2003 and almost
2 decades old by now (SHRUG).

Everything x86_64 since AMD K10 (2007) was using p6_nops.

And per FEATURE_NOPL being required for x86_64, all those CPUs can use
p6_nops. So stop caring about NOPs, simplify things and get on with life
:-)


Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 10:35:45AM +0100, Peter Zijlstra wrote:
> On Wed, Mar 10, 2021 at 10:14:35AM +0100, Peter Zijlstra wrote:
> > Sure, but we can have one set on 32bit and another set on 64bit.
> 
> > Although I would use DS prefix nops for 32bit nop5/nop8 to keep them
> > single instructions.
> > 
> > Then we can do away with runtime nop selection and special atomic nops
> > and simplify things.
> > 
> > All this runtime faffing about nops is tedious and causes complications
> > we can do without.
> 
> Something like so, except I think there's more cleanups possible.

This one builds and boots x86_64-defconfig. Still, more cleanups are
possible.

---
 arch/x86/include/asm/jump_label.h|  12 +--
 arch/x86/include/asm/nops.h  | 176 ++-
 arch/x86/include/asm/special_insns.h |   4 +-
 arch/x86/kernel/alternative.c| 198 ---
 arch/x86/kernel/ftrace.c |   4 +-
 arch/x86/kernel/jump_label.c |  23 +---
 arch/x86/kernel/kprobes/core.c   |   2 +-
 arch/x86/kernel/setup.c  |   1 -
 arch/x86/kernel/static_call.c|   4 +-
 arch/x86/net/bpf_jit_comp.c  |   8 +-
 10 files changed, 93 insertions(+), 339 deletions(-)

diff --git a/arch/x86/include/asm/jump_label.h 
b/arch/x86/include/asm/jump_label.h
index 7f2006645d84..610a05374c02 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -6,12 +6,6 @@
 
 #define JUMP_LABEL_NOP_SIZE 5
 
-#ifdef CONFIG_X86_64
-# define STATIC_KEY_INIT_NOP P6_NOP5_ATOMIC
-#else
-# define STATIC_KEY_INIT_NOP GENERIC_NOP5_ATOMIC
-#endif
-
 #include 
 #include 
 
@@ -23,7 +17,7 @@
 static __always_inline bool arch_static_branch(struct static_key * const key, 
const bool branch)
 {
asm_volatile_goto("1:"
-   ".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+   ".byte " __stringify(BYTES_NOP5) "\n\t"
".pushsection __jump_table,  \"aw\" \n\t"
_ASM_ALIGN "\n\t"
".long 1b - ., %l[l_yes] - . \n\t"
@@ -63,7 +57,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key * const ke
.long   \target - .Lstatic_jump_after_\@
 .Lstatic_jump_after_\@:
.else
-   .byte   STATIC_KEY_INIT_NOP
+   .byte   BYTES_NOP5
.endif
.pushsection __jump_table, "aw"
_ASM_ALIGN
@@ -75,7 +69,7 @@ static __always_inline bool arch_static_branch_jump(struct 
static_key * const ke
 .macro STATIC_JUMP_IF_FALSE target, key, def
 .Lstatic_jump_\@:
.if \def
-   .byte   STATIC_KEY_INIT_NOP
+   .byte   BYTES_NOP5
.else
/* Equivalent to "jmp.d32 \target" */
.byte   0xe9
diff --git a/arch/x86/include/asm/nops.h b/arch/x86/include/asm/nops.h
index 12f12b5cf2ca..c6bdf8d8e79d 100644
--- a/arch/x86/include/asm/nops.h
+++ b/arch/x86/include/asm/nops.h
@@ -4,89 +4,58 @@
 
 /*
  * Define nops for use with alternative() and for tracing.
- *
- * *_NOP5_ATOMIC must be a single instruction.
  */
 
-#define NOP_DS_PREFIX 0x3e
+#ifndef CONFIG_64BIT
 
-/* generic versions from gas
-   1: nop
-   the following instructions are NOT nops in 64-bit mode,
-   for 64-bit mode use K8 or P6 nops instead
-   2: movl %esi,%esi
-   3: leal 0x00(%esi),%esi
-   4: leal 0x00(,%esi,1),%esi
-   6: leal 0x(%esi),%esi
-   7: leal 0x(,%esi,1),%esi
-*/
-#define GENERIC_NOP1 0x90
-#define GENERIC_NOP2 0x89,0xf6
-#define GENERIC_NOP3 0x8d,0x76,0x00
-#define GENERIC_NOP4 0x8d,0x74,0x26,0x00
-#define GENERIC_NOP5 GENERIC_NOP1,GENERIC_NOP4
-#define GENERIC_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
-#define GENERIC_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
-#define GENERIC_NOP8 GENERIC_NOP1,GENERIC_NOP7
-#define GENERIC_NOP5_ATOMIC NOP_DS_PREFIX,GENERIC_NOP4
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x00(%esi),%esi
+ * 4: leal 0x00(,%esi,1),%esi
+ * 5: leal %ds:0x00(,%esi,1),%esi
+ * 6: leal 0x(%esi),%esi
+ * 7: leal 0x(,%esi,1),%esi
+ * 8: leal %ds:0x(,%esi,1),%esi
+ *
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
+ */
+#define BYTES_NOP1 0x90
+#define BYTES_NOP2 0x89,0xf6
+#define BYTES_NOP3 0x8d,0x76,0x00
+#define BYTES_NOP4 0x8d,0x74,0x26,0x00
+#define BYTES_NOP5 0x3e,BYTES_NOP4
+#define BYTES_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8 0x3e,BYTES_NOP7
 
-/* Opteron 64bit nops
-   1: nop
-   2: osp nop
-   3: osp osp nop
-   4: osp osp osp nop
-*/
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP20x66,K8_NOP1
-#define K8_NOP30x66,K8_NOP2
-#define K8_NOP40x66,K8_NOP3
-#define K8_NOP5K8_NOP3,K8_NOP2
-#define K8_NOP6K8_NOP3,K8_NOP3
-#define K8_NOP7K8_NOP4,K8_NOP3
-#define K8_NOP8K8_NOP4,K8_NOP4
-#define 

Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Wed, Mar 10, 2021 at 10:14:35AM +0100, Peter Zijlstra wrote:
> Sure, but we can have one set on 32bit and another set on 64bit.

> Although I would use DS prefix nops for 32bit nop5/nop8 to keep them
> single instructions.
> 
> Then we can do away with runtime nop selection and special atomic nops
> and simplify things.
> 
> All this runtime faffing about nops is tedious and causes complications
> we can do without.

Something like so, except I think there's more cleanups possible.

(*completely* untested)

---
 arch/x86/include/asm/nops.h| 176 
 arch/x86/kernel/alternative.c  | 199 +
 arch/x86/kernel/ftrace.c   |   4 +-
 arch/x86/kernel/jump_label.c   |  21 +
 arch/x86/kernel/kprobes/core.c |   2 +-
 arch/x86/kernel/setup.c|   1 -
 arch/x86/kernel/static_call.c  |   4 +-
 arch/x86/net/bpf_jit_comp.c|   8 +-
 8 files changed, 88 insertions(+), 327 deletions(-)

diff --git a/arch/x86/include/asm/nops.h b/arch/x86/include/asm/nops.h
index 12f12b5cf2ca..aec7b092d3dc 100644
--- a/arch/x86/include/asm/nops.h
+++ b/arch/x86/include/asm/nops.h
@@ -4,89 +4,58 @@
 
 /*
  * Define nops for use with alternative() and for tracing.
- *
- * *_NOP5_ATOMIC must be a single instruction.
  */
 
-#define NOP_DS_PREFIX 0x3e
+#ifndef CONFIG_64BIT
 
-/* generic versions from gas
-   1: nop
-   the following instructions are NOT nops in 64-bit mode,
-   for 64-bit mode use K8 or P6 nops instead
-   2: movl %esi,%esi
-   3: leal 0x00(%esi),%esi
-   4: leal 0x00(,%esi,1),%esi
-   6: leal 0x(%esi),%esi
-   7: leal 0x(,%esi,1),%esi
-*/
-#define GENERIC_NOP1 0x90
-#define GENERIC_NOP2 0x89,0xf6
-#define GENERIC_NOP3 0x8d,0x76,0x00
-#define GENERIC_NOP4 0x8d,0x74,0x26,0x00
-#define GENERIC_NOP5 GENERIC_NOP1,GENERIC_NOP4
-#define GENERIC_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
-#define GENERIC_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
-#define GENERIC_NOP8 GENERIC_NOP1,GENERIC_NOP7
-#define GENERIC_NOP5_ATOMIC NOP_DS_PREFIX,GENERIC_NOP4
+/*
+ * Generic 32bit nops from GAS:
+ *
+ * 1: nop
+ * 2: movl %esi,%esi
+ * 3: leal 0x00(%esi),%esi
+ * 4: leal 0x00(,%esi,1),%esi
+ * 5: leal %ds:0x00(,%esi,1),%esi
+ * 6: leal 0x(%esi),%esi
+ * 7: leal 0x(,%esi,1),%esi
+ * 8: leal %ds:0x(,%esi,1),%esi
+ *
+ * Except 5 and 8, which are DS prefixed 4 and 7 resp, where GAS would emit 2
+ * nop instructions.
+ */
+#define BYTES_NOP1 0x90
+#define BYTES_NOP2 0x89,0xf6
+#define BYTES_NOP3 0x8d,0x76,0x00
+#define BYTES_NOP4 0x8d,0x74,0x26,0x00
+#define BYTES_NOP5 0x3e,BYTES_NOP4
+#define BYTES_NOP6 0x8d,0xb6,0x00,0x00,0x00,0x00
+#define BYTES_NOP7 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00
+#define BYTES_NOP8 0x3e,BYTES_NOP7
 
-/* Opteron 64bit nops
-   1: nop
-   2: osp nop
-   3: osp osp nop
-   4: osp osp osp nop
-*/
-#define K8_NOP1 GENERIC_NOP1
-#define K8_NOP20x66,K8_NOP1
-#define K8_NOP30x66,K8_NOP2
-#define K8_NOP40x66,K8_NOP3
-#define K8_NOP5K8_NOP3,K8_NOP2
-#define K8_NOP6K8_NOP3,K8_NOP3
-#define K8_NOP7K8_NOP4,K8_NOP3
-#define K8_NOP8K8_NOP4,K8_NOP4
-#define K8_NOP5_ATOMIC 0x66,K8_NOP4
+#else
 
-/* K7 nops
-   uses eax dependencies (arbitrary choice)
-   1: nop
-   2: movl %eax,%eax
-   3: leal (,%eax,1),%eax
-   4: leal 0x00(,%eax,1),%eax
-   6: leal 0x(%eax),%eax
-   7: leal 0x(,%eax,1),%eax
-*/
-#define K7_NOP1GENERIC_NOP1
-#define K7_NOP20x8b,0xc0
-#define K7_NOP30x8d,0x04,0x20
-#define K7_NOP40x8d,0x44,0x20,0x00
-#define K7_NOP5K7_NOP4,K7_NOP1
-#define K7_NOP60x8d,0x80,0,0,0,0
-#define K7_NOP70x8D,0x04,0x05,0,0,0,0
-#define K7_NOP8K7_NOP7,K7_NOP1
-#define K7_NOP5_ATOMIC NOP_DS_PREFIX,K7_NOP4
+/*
+ * Generic 64bit nops from GAS:
+ *
+ * 1: nop
+ * 2: osp nop
+ * 3: nopl (%eax)
+ * 4: nopl 0x00(%eax)
+ * 5: nopl 0x00(%eax,%eax,1)
+ * 6: osp nopl 0x00(%eax,%eax,1)
+ * 7: nopl 0x(%eax)
+ * 8: nopl 0x(%eax,%eax,1)
+ */
+#define BYTES_NOP1 0x90
+#define BYTES_NOP2 0x66,0x90
+#define BYTES_NOP3 0x0f,0x1f,0x00
+#define BYTES_NOP4 0x0f,0x1f,0x40,0
+#define BYTES_NOP5 0x0f,0x1f,0x44,0x00,0
+#define BYTES_NOP6 0x66,0x0f,0x1f,0x44,0x00,0
+#define BYTES_NOP7 0x0f,0x1f,0x80,0,0,0,0
+#define BYTES_NOP8 0x0f,0x1f,0x84,0x00,0,0,0,0
 
-/* P6 nops
-   uses eax dependencies (Intel-recommended choice)
-   1: nop
-   2: osp nop
-   3: nopl (%eax)
-   4: nopl 0x00(%eax)
-   5: nopl 0x00(%eax,%eax,1)
-   6: osp nopl 0x00(%eax,%eax,1)
-   7: nopl 0x(%eax)
-   8: nopl 0x(%eax,%eax,1)
-   Note: All the above are assumed to be a single instruction.
-   There is kernel code that depends on this.
-*/
-#define P6_NOP1GENERIC_NOP1
-#define P6_NOP20x66,0x90
-#define P6_NOP30x0f,0x1f,0x00
-#define P6_NOP40x0f,0x1f,0x40,0
-#define P6_NOP50x0f,0x1f,0x44,0x00,0
-#define P6_NOP6   

Re: The killing of ideal_nops[]

2021-03-10 Thread Peter Zijlstra
On Tue, Mar 09, 2021 at 04:33:45PM -0800, h...@zytor.com wrote:
> On March 9, 2021 1:24:44 PM PST, Peter Zijlstra  wrote:
> >On Tue, Mar 09, 2021 at 12:05:19PM -0500, Steven Rostedt wrote:
> >> On Tue, 9 Mar 2021 17:58:17 +0100
> >> Peter Zijlstra  wrote:
> >> 
> >> > Hi,
> >> > 
> >> > AFAICT everything made in the past 10 years ends up using p6_nops.
> >Is it
> >> > time to kill off ideal_nops[] and simplify life?
> >> > 
> >> 
> >> Well, the one bug that was reported recently was due to a box that
> >uses a
> >> different "ideal_nops" than p6_nops. Perhaps we should ask him if
> >there's
> >> any noticeable difference between using p6_nops for every function
> >than the
> >> ideal_nops that as found for that box.
> >
> >If the machine is more than a decade old, I'm not really caring about
> >optimal performance. If it is 32bit, I really couldn't be arsed as long
> >as it boots.
> 
> p6_nops don't boot on all 32-bit chips.

Sure, but we can have one set on 32bit and another set on 64bit.

$ cat nops.s
.section .text
nop1: .nops 1
nop2: .nops 2
nop3: .nops 3
nop4: .nops 4
nop5: .nops 5
nop6: .nops 6
nop7: .nops 7
nop8: .nops 8

$ as --32 nops.s -o nops.o ; objdump -wd nops.o

nops.o: file format elf32-i386


Disassembly of section .text:

 :
0:   90  nop

0001 :
1:   66 90   xchg   %ax,%ax

0003 :
3:   8d 76 00lea0x0(%esi),%esi

0006 :
6:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi

000a :
a:   8d 74 26 00 lea0x0(%esi,%eiz,1),%esi
e:   90  nop

000f :
f:   8d b6 00 00 00 00   lea0x0(%esi),%esi

0015 :
15:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi

001c :
1c:   8d b4 26 00 00 00 00lea0x0(%esi,%eiz,1),%esi
23:   90  nop

$ as --64 nops.s -o nops.o ; objdump -wd nops.o

nops.o: file format elf64-x86-64


Disassembly of section .text:

 :
0:   90  nop

0001 :
1:   66 90   xchg   %ax,%ax

0003 :
3:   0f 1f 00nopl   (%rax)

0006 :
6:   0f 1f 40 00 nopl   0x0(%rax)

000a :
a:   0f 1f 44 00 00  nopl   0x0(%rax,%rax,1)

000f :
f:   66 0f 1f 44 00 00   nopw   0x0(%rax,%rax,1)

0015 :
15:   0f 1f 80 00 00 00 00nopl   0x0(%rax)

001c :
1c:   0f 1f 84 00 00 00 00 00 nopl   0x0(%rax,%rax,1)


---

Although I would use DS prefix nops for 32bit nop5/nop8 to keep them
single instructions.

Then we can do away with runtime nop selection and special atomic nops
and simplify things.

All this runtime faffing about nops is tedious and causes complications
we can do without.


Re: The killing of ideal_nops[]

2021-03-09 Thread hpa
On March 9, 2021 1:24:44 PM PST, Peter Zijlstra  wrote:
>On Tue, Mar 09, 2021 at 12:05:19PM -0500, Steven Rostedt wrote:
>> On Tue, 9 Mar 2021 17:58:17 +0100
>> Peter Zijlstra  wrote:
>> 
>> > Hi,
>> > 
>> > AFAICT everything made in the past 10 years ends up using p6_nops.
>Is it
>> > time to kill off ideal_nops[] and simplify life?
>> > 
>> 
>> Well, the one bug that was reported recently was due to a box that
>uses a
>> different "ideal_nops" than p6_nops. Perhaps we should ask him if
>there's
>> any noticeable difference between using p6_nops for every function
>than the
>> ideal_nops that as found for that box.
>
>If the machine is more than a decade old, I'm not really caring about
>optimal performance. If it is 32bit, I really couldn't be arsed as long
>as it boots.

p6_nops don't boot on all 32-bit chips.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: The killing of ideal_nops[]

2021-03-09 Thread Peter Zijlstra
On Tue, Mar 09, 2021 at 12:05:19PM -0500, Steven Rostedt wrote:
> On Tue, 9 Mar 2021 17:58:17 +0100
> Peter Zijlstra  wrote:
> 
> > Hi,
> > 
> > AFAICT everything made in the past 10 years ends up using p6_nops. Is it
> > time to kill off ideal_nops[] and simplify life?
> > 
> 
> Well, the one bug that was reported recently was due to a box that uses a
> different "ideal_nops" than p6_nops. Perhaps we should ask him if there's
> any noticeable difference between using p6_nops for every function than the
> ideal_nops that as found for that box.

If the machine is more than a decade old, I'm not really caring about
optimal performance. If it is 32bit, I really couldn't be arsed as long
as it boots.


Re: The killing of ideal_nops[]

2021-03-09 Thread Steven Rostedt
On Tue, 9 Mar 2021 17:58:17 +0100
Peter Zijlstra  wrote:

> Hi,
> 
> AFAICT everything made in the past 10 years ends up using p6_nops. Is it
> time to kill off ideal_nops[] and simplify life?
> 

Well, the one bug that was reported recently was due to a box that uses a
different "ideal_nops" than p6_nops. Perhaps we should ask him if there's
any noticeable difference between using p6_nops for every function than the
ideal_nops that as found for that box.

-- Steve