Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-05 Thread Jan Beulich
>>> On 02.03.18 at 20:34,  wrote:
> On 02/03/18 07:10, Jan Beulich wrote:
> On 01.03.18 at 17:58,  wrote:
>>> The pont of having the toolchain put out optimised nops is to avoid the
>>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>>> set of toolchain nops defeats the purpose in the overwhelming common
>>> case of running on a system which prefers P6 nops.
>>>
>>> The problem of working out when to optimise is that, when we come to
>>> apply an individual alternative, we don't know if we've already patched
>>> this site before.  Even the unoptimised algorithm has a corner case
>>> which explodes, if there is a stream of 0x90's on the end of a
>>> replacement e.g. in a imm or disp field.
>>>
>>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>>> has been patched or not (other than keeping a full copy of the origin
>>> site as a reference).  As soon as we chose to optimise the nops of the
>>> origin site, we cannot determine anything at all.
>>>
>>> Thinking out loud, we could perhaps have a section containing one byte
>>> per origin site, which we use to track whether we've already optimised
>>> the padding bytes, and whether the contents have been replaced.  This
>>> would also add an extra long into struct altentry, but its all cold data
>>> after boot.
>> What about alternatively simply updating the struct alt_instr
>> instances to describe the code _after_ a patch that was applied?
>> That'll allow to always know how much padding there is.
> 
> There are multiple alt_instr pointing to the same origin sites when
> using ALTERNATIVE_2, meaning you keep all the safety problems with the
> current setup.

Oh, right, it was rubbish what I said - we don't ever look a 2nd time
at any particular struct alt_instr instance. We'd have to settle on
multiple changes to the same patch site to always be adjacent, such
that apply_alternatives() could have local state forwarding (between
loop iterations) added.

For the approach you describe I can't see how you would safely
glue together multiple patches to the same origin site, unless we
would outright forbid any use of lower level infrastructure than
ALTERNATIVE_2(). But if we demand that, we could easily flag
the first of the alternatives with a "continued" indicator, used to
trigger state forwarding in apply_alternatives() (perhaps no
more state than "avoid the NOP optimization if the controlling
feature is unavailable"). No need for an extra array and pointer
then.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-02 Thread Andrew Cooper
On 02/03/18 07:10, Jan Beulich wrote:
 On 01.03.18 at 17:58,  wrote:
>> On 01/03/18 10:54, Jan Beulich wrote:
>> On 01.03.18 at 11:36,  wrote:
 On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
 Andrew Cooper  02/28/18 7:20 PM >>>
>> On 28/02/18 16:22, Jan Beulich wrote:
>> On 26.02.18 at 12:35,  wrote:
 --- a/xen/include/asm-x86/alternative-asm.h
 +++ b/xen/include/asm-x86/alternative-asm.h
 @@ -1,6 +1,8 @@
  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
  #define _ASM_X86_ALTERNATIVE_ASM_H_
  
 +#include 
 +
  #ifdef __ASSEMBLY__
  
  /*
 @@ -18,6 +20,14 @@
  .byte \pad_len
  .endm
  
 +.macro mknops nr_bytes
 +#ifdef HAVE_AS_NOP_DIRECTIVE
 +.nop \nr_bytes, ASM_NOP_MAX
>>> And do you really need to specify ASM_NOP_MAX here? What's
>>> wrong with letting the assembler pick what it wants as the longest
>>> NOP?
>> I don't want a toolchain change to cause us to go beyond 11 byte nops,
>> because of the associated decode stall on almost all hardware.  Using
>> ASM_NOP_MAX seemed like the easiest way to keep the end result
>> consistent, irrespective of toolchain support.
> I don't understand - an earlier patch takes care of runtime replacing them
> anyway. What stalls can then result?
 The runtime replacement won't happen when using the .nops directive
 AFAICT, because the original padding section will likely be filled
 with opcodes different than 0x90, and thus the runtime nop
 optimization won't be performed.
>>> Oh, indeed. That puts under question the whole idea of using
>>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>>> to withdraw my ack.
>>>
 I also agree that using the default (not proving a second argument)
 seems like a better solution. Why would the toolstack switch to
 something that leads to worse performance? That would certainly be
 considered a bug.
>>> Why? They may change it based on data available for newer /
>>> older / whatever hardware. Any build-time choice is going to be
>>> suboptimal somewhere, so I think we absolutely should not
>>> bypass runtime replacing these NOPs, the more that now we
>>> may have quite large sequences of them.
>> The pont of having the toolchain put out optimised nops is to avoid the
>> need for us to patch the site at all.  I.e. calling optimise_nops() on a
>> set of toolchain nops defeats the purpose in the overwhelming common
>> case of running on a system which prefers P6 nops.
>>
>> The problem of working out when to optimise is that, when we come to
>> apply an individual alternative, we don't know if we've already patched
>> this site before.  Even the unoptimised algorithm has a corner case
>> which explodes, if there is a stream of 0x90's on the end of a
>> replacement e.g. in a imm or disp field.
>>
>> Put simply, we cannot determine, by peeking at the patchsite, whether it
>> has been patched or not (other than keeping a full copy of the origin
>> site as a reference).  As soon as we chose to optimise the nops of the
>> origin site, we cannot determine anything at all.
>>
>> Thinking out loud, we could perhaps have a section containing one byte
>> per origin site, which we use to track whether we've already optimised
>> the padding bytes, and whether the contents have been replaced.  This
>> would also add an extra long into struct altentry, but its all cold data
>> after boot.
> What about alternatively simply updating the struct alt_instr
> instances to describe the code _after_ a patch that was applied?
> That'll allow to always know how much padding there is.

There are multiple alt_instr pointing to the same origin sites when
using ALTERNATIVE_2, meaning you keep all the safety problems with the
current setup.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-01 Thread Jan Beulich
>>> On 01.03.18 at 17:58,  wrote:
> On 01/03/18 10:54, Jan Beulich wrote:
> On 01.03.18 at 11:36,  wrote:
>>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>>> Andrew Cooper  02/28/18 7:20 PM >>>
> On 28/02/18 16:22, Jan Beulich wrote:
> On 26.02.18 at 12:35,  wrote:
>>> --- a/xen/include/asm-x86/alternative-asm.h
>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>> @@ -1,6 +1,8 @@
>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>  
>>> +#include 
>>> +
>>>  #ifdef __ASSEMBLY__
>>>  
>>>  /*
>>> @@ -18,6 +20,14 @@
>>>  .byte \pad_len
>>>  .endm
>>>  
>>> +.macro mknops nr_bytes
>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>> +.nop \nr_bytes, ASM_NOP_MAX
>> And do you really need to specify ASM_NOP_MAX here? What's
>> wrong with letting the assembler pick what it wants as the longest
>> NOP?
> I don't want a toolchain change to cause us to go beyond 11 byte nops,
> because of the associated decode stall on almost all hardware.  Using
> ASM_NOP_MAX seemed like the easiest way to keep the end result
> consistent, irrespective of toolchain support.
 I don't understand - an earlier patch takes care of runtime replacing them
 anyway. What stalls can then result?
>>> The runtime replacement won't happen when using the .nops directive
>>> AFAICT, because the original padding section will likely be filled
>>> with opcodes different than 0x90, and thus the runtime nop
>>> optimization won't be performed.
>> Oh, indeed. That puts under question the whole idea of using
>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
>> to withdraw my ack.
>>
>>> I also agree that using the default (not proving a second argument)
>>> seems like a better solution. Why would the toolstack switch to
>>> something that leads to worse performance? That would certainly be
>>> considered a bug.
>> Why? They may change it based on data available for newer /
>> older / whatever hardware. Any build-time choice is going to be
>> suboptimal somewhere, so I think we absolutely should not
>> bypass runtime replacing these NOPs, the more that now we
>> may have quite large sequences of them.
> 
> The pont of having the toolchain put out optimised nops is to avoid the
> need for us to patch the site at all.  I.e. calling optimise_nops() on a
> set of toolchain nops defeats the purpose in the overwhelming common
> case of running on a system which prefers P6 nops.
> 
> The problem of working out when to optimise is that, when we come to
> apply an individual alternative, we don't know if we've already patched
> this site before.  Even the unoptimised algorithm has a corner case
> which explodes, if there is a stream of 0x90's on the end of a
> replacement e.g. in a imm or disp field.
> 
> Put simply, we cannot determine, by peeking at the patchsite, whether it
> has been patched or not (other than keeping a full copy of the origin
> site as a reference).  As soon as we chose to optimise the nops of the
> origin site, we cannot determine anything at all.
> 
> Thinking out loud, we could perhaps have a section containing one byte
> per origin site, which we use to track whether we've already optimised
> the padding bytes, and whether the contents have been replaced.  This
> would also add an extra long into struct altentry, but its all cold data
> after boot.

What about alternatively simply updating the struct alt_instr
instances to describe the code _after_ a patch that was applied?
That'll allow to always know how much padding there is.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-01 Thread Andrew Cooper
On 01/03/18 10:54, Jan Beulich wrote:
 On 01.03.18 at 11:36,  wrote:
>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>> Andrew Cooper  02/28/18 7:20 PM >>>
 On 28/02/18 16:22, Jan Beulich wrote:
 On 26.02.18 at 12:35,  wrote:
>> --- a/xen/include/asm-x86/alternative-asm.h
>> +++ b/xen/include/asm-x86/alternative-asm.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>  
>> +#include 
>> +
>>  #ifdef __ASSEMBLY__
>>  
>>  /*
>> @@ -18,6 +20,14 @@
>>  .byte \pad_len
>>  .endm
>>  
>> +.macro mknops nr_bytes
>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> +.nop \nr_bytes, ASM_NOP_MAX
> And do you really need to specify ASM_NOP_MAX here? What's
> wrong with letting the assembler pick what it wants as the longest
> NOP?
 I don't want a toolchain change to cause us to go beyond 11 byte nops,
 because of the associated decode stall on almost all hardware.  Using
 ASM_NOP_MAX seemed like the easiest way to keep the end result
 consistent, irrespective of toolchain support.
>>> I don't understand - an earlier patch takes care of runtime replacing them
>>> anyway. What stalls can then result?
>> The runtime replacement won't happen when using the .nops directive
>> AFAICT, because the original padding section will likely be filled
>> with opcodes different than 0x90, and thus the runtime nop
>> optimization won't be performed.
> Oh, indeed. That puts under question the whole idea of using
> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
> to withdraw my ack.
>
>> I also agree that using the default (not proving a second argument)
>> seems like a better solution. Why would the toolstack switch to
>> something that leads to worse performance? That would certainly be
>> considered a bug.
> Why? They may change it based on data available for newer /
> older / whatever hardware. Any build-time choice is going to be
> suboptimal somewhere, so I think we absolutely should not
> bypass runtime replacing these NOPs, the more that now we
> may have quite large sequences of them.

The pont of having the toolchain put out optimised nops is to avoid the
need for us to patch the site at all.  I.e. calling optimise_nops() on a
set of toolchain nops defeats the purpose in the overwhelming common
case of running on a system which prefers P6 nops.

The problem of working out when to optimise is that, when we come to
apply an individual alternative, we don't know if we've already patched
this site before.  Even the unoptimised algorithm has a corner case
which explodes, if there is a stream of 0x90's on the end of a
replacement e.g. in a imm or disp field.

Put simply, we cannot determine, by peeking at the patchsite, whether it
has been patched or not (other than keeping a full copy of the origin
site as a reference).  As soon as we chose to optimise the nops of the
origin site, we cannot determine anything at all.

Thinking out loud, we could perhaps have a section containing one byte
per origin site, which we use to track whether we've already optimised
the padding bytes, and whether the contents have been replaced.  This
would also add an extra long into struct altentry, but its all cold data
after boot.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-01 Thread Jan Beulich
>>> On 01.03.18 at 11:36,  wrote:
> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
>> >>> Andrew Cooper  02/28/18 7:20 PM >>>
>> >On 28/02/18 16:22, Jan Beulich wrote:
>> > On 26.02.18 at 12:35,  wrote:
>> >>> --- a/xen/include/asm-x86/alternative-asm.h
>> >>> +++ b/xen/include/asm-x86/alternative-asm.h
>> >>> @@ -1,6 +1,8 @@
>> >>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>> >>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>> >>>  
>> >>> +#include 
>> >>> +
>> >>>  #ifdef __ASSEMBLY__
>> >>>  
>> >>>  /*
>> >>> @@ -18,6 +20,14 @@
>> >>>  .byte \pad_len
>> >>>  .endm
>> >>>  
>> >>> +.macro mknops nr_bytes
>> >>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> >>> +.nop \nr_bytes, ASM_NOP_MAX
>> >> And do you really need to specify ASM_NOP_MAX here? What's
>> >> wrong with letting the assembler pick what it wants as the longest
>> >> NOP?
>> >
>> >I don't want a toolchain change to cause us to go beyond 11 byte nops,
>> >because of the associated decode stall on almost all hardware.  Using
>> >ASM_NOP_MAX seemed like the easiest way to keep the end result
>> >consistent, irrespective of toolchain support.
>> 
>> I don't understand - an earlier patch takes care of runtime replacing them
>> anyway. What stalls can then result?
> 
> The runtime replacement won't happen when using the .nops directive
> AFAICT, because the original padding section will likely be filled
> with opcodes different than 0x90, and thus the runtime nop
> optimization won't be performed.

Oh, indeed. That puts under question the whole idea of using
.nops in favor of .skip. Andrew, I'm sorry, but with this I prefer
to withdraw my ack.

> I also agree that using the default (not proving a second argument)
> seems like a better solution. Why would the toolstack switch to
> something that leads to worse performance? That would certainly be
> considered a bug.

Why? They may change it based on data available for newer /
older / whatever hardware. Any build-time choice is going to be
suboptimal somewhere, so I think we absolutely should not
bypass runtime replacing these NOPs, the more that now we
may have quite large sequences of them.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-03-01 Thread Roger Pau Monné
On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote:
> >>> Andrew Cooper  02/28/18 7:20 PM >>>
> >On 28/02/18 16:22, Jan Beulich wrote:
> > On 26.02.18 at 12:35,  wrote:
> >>> --- a/xen/include/asm-x86/alternative-asm.h
> >>> +++ b/xen/include/asm-x86/alternative-asm.h
> >>> @@ -1,6 +1,8 @@
> >>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
> >>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
> >>>  
> >>> +#include 
> >>> +
> >>>  #ifdef __ASSEMBLY__
> >>>  
> >>>  /*
> >>> @@ -18,6 +20,14 @@
> >>>  .byte \pad_len
> >>>  .endm
> >>>  
> >>> +.macro mknops nr_bytes
> >>> +#ifdef HAVE_AS_NOP_DIRECTIVE
> >>> +.nop \nr_bytes, ASM_NOP_MAX
> >> And do you really need to specify ASM_NOP_MAX here? What's
> >> wrong with letting the assembler pick what it wants as the longest
> >> NOP?
> >
> >I don't want a toolchain change to cause us to go beyond 11 byte nops,
> >because of the associated decode stall on almost all hardware.  Using
> >ASM_NOP_MAX seemed like the easiest way to keep the end result
> >consistent, irrespective of toolchain support.
> 
> I don't understand - an earlier patch takes care of runtime replacing them
> anyway. What stalls can then result?

The runtime replacement won't happen when using the .nops directive
AFAICT, because the original padding section will likely be filled
with opcodes different than 0x90, and thus the runtime nop
optimization won't be performed.

I also agree that using the default (not proving a second argument)
seems like a better solution. Why would the toolstack switch to
something that leads to worse performance? That would certainly be
considered a bug.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-28 Thread Jan Beulich
>>> Andrew Cooper  02/28/18 7:20 PM >>>
>On 28/02/18 16:22, Jan Beulich wrote:
> On 26.02.18 at 12:35,  wrote:
>>> --- a/xen/include/asm-x86/alternative-asm.h
>>> +++ b/xen/include/asm-x86/alternative-asm.h
>>> @@ -1,6 +1,8 @@
>>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>>  
>>> +#include 
>>> +
>>>  #ifdef __ASSEMBLY__
>>>  
>>>  /*
>>> @@ -18,6 +20,14 @@
>>>  .byte \pad_len
>>>  .endm
>>>  
>>> +.macro mknops nr_bytes
>>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>>> +.nop \nr_bytes, ASM_NOP_MAX
>> And do you really need to specify ASM_NOP_MAX here? What's
>> wrong with letting the assembler pick what it wants as the longest
>> NOP?
>
>I don't want a toolchain change to cause us to go beyond 11 byte nops,
>because of the associated decode stall on almost all hardware.  Using
>ASM_NOP_MAX seemed like the easiest way to keep the end result
>consistent, irrespective of toolchain support.

I don't understand - an earlier patch takes care of runtime replacing them
anyway. What stalls can then result?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-28 Thread Andrew Cooper
On 28/02/18 16:22, Jan Beulich wrote:
 On 26.02.18 at 12:35,  wrote:
>> Newer versions of binutils are capable of emitting an exact number bytes 
>> worth
>> of optimised nops.  Use this in preference to .skip when available.
>>
>> Signed-off-by: Andrew Cooper 
> In principle
> Reviewed-by: Jan Beulich 
> However, ...
>
>> RFC until support is actually committed to binutils mainline.
> ... upstream looks to have switched to .nops now, so the prereq
> to the R-b is that you consistently switch over.

Yes.  I need to pull, rebuild and retest.  Even if this patch gets
committed, it is still semi-RFC and up for changes until the next
binutils release.

That said, given the explicit probe, it is at least safe to have a stale
ABI in use, and we can change it later if needs be.

>
>> --- a/xen/arch/x86/Rules.mk
>> +++ b/xen/arch/x86/Rules.mk
>> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>>  $(call as-option-add,CFLAGS,CC,\
>>  ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>>  
>> +# Check to see whether the assmbler supports the .nop directive.
>> +$(call as-option-add,CFLAGS,CC,\
>> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> Do you really need the (arbitrary) second argument here?

I want the check to match the form we actually use.

>
>> --- a/xen/include/asm-x86/alternative-asm.h
>> +++ b/xen/include/asm-x86/alternative-asm.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>  
>> +#include 
>> +
>>  #ifdef __ASSEMBLY__
>>  
>>  /*
>> @@ -18,6 +20,14 @@
>>  .byte \pad_len
>>  .endm
>>  
>> +.macro mknops nr_bytes
>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> +.nop \nr_bytes, ASM_NOP_MAX
> And do you really need to specify ASM_NOP_MAX here? What's
> wrong with letting the assembler pick what it wants as the longest
> NOP?

I don't want a toolchain change to cause us to go beyond 11 byte nops,
because of the associated decode stall on almost all hardware.  Using
ASM_NOP_MAX seemed like the easiest way to keep the end result
consistent, irrespective of toolchain support.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-28 Thread Jan Beulich
>>> On 26.02.18 at 12:35,  wrote:
> Newer versions of binutils are capable of emitting an exact number bytes worth
> of optimised nops.  Use this in preference to .skip when available.
> 
> Signed-off-by: Andrew Cooper 

In principle
Reviewed-by: Jan Beulich 
However, ...

> RFC until support is actually committed to binutils mainline.

... upstream looks to have switched to .nops now, so the prereq
to the R-b is that you consistently switch over.

> --- a/xen/arch/x86/Rules.mk
> +++ b/xen/arch/x86/Rules.mk
> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>  $(call as-option-add,CFLAGS,CC,\
>  ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>  
> +# Check to see whether the assmbler supports the .nop directive.
> +$(call as-option-add,CFLAGS,CC,\
> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)

Do you really need the (arbitrary) second argument here?

> --- a/xen/include/asm-x86/alternative-asm.h
> +++ b/xen/include/asm-x86/alternative-asm.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>  
> +#include 
> +
>  #ifdef __ASSEMBLY__
>  
>  /*
> @@ -18,6 +20,14 @@
>  .byte \pad_len
>  .endm
>  
> +.macro mknops nr_bytes
> +#ifdef HAVE_AS_NOP_DIRECTIVE
> +.nop \nr_bytes, ASM_NOP_MAX

And do you really need to specify ASM_NOP_MAX here? What's
wrong with letting the assembler pick what it wants as the longest
NOP?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-26 Thread Roger Pau Monné
On Mon, Feb 26, 2018 at 01:08:05PM +, Andrew Cooper wrote:
> On 26/02/18 12:31, Roger Pau Monné wrote:
> > On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote:
> >> Newer versions of binutils are capable of emitting an exact number bytes 
> >> worth
> >> of optimised nops.  Use this in preference to .skip when available.
> >>
> >> Signed-off-by: Andrew Cooper 
> >> ---
> >> CC: Jan Beulich 
> >> CC: Konrad Rzeszutek Wilk 
> >> CC: Roger Pau Monné 
> >> CC: Wei Liu 
> >>
> >> RFC until support is actually committed to binutils mainline.
> > Since RFC has been dropped from the subject, is this now committed?
> 
> https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532

Thanks for the reference.

> >
> >> ---
> >>  xen/arch/x86/Rules.mk |  4 
> >>  xen/include/asm-x86/alternative-asm.h | 14 --
> >>  xen/include/asm-x86/alternative.h | 13 ++---
> >>  3 files changed, 26 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
> >> index e169d67..bf5047f 100644
> >> --- a/xen/arch/x86/Rules.mk
> >> +++ b/xen/arch/x86/Rules.mk
> >> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
> >>  $(call as-option-add,CFLAGS,CC,\
> >>  ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
> >>  
> >> +# Check to see whether the assmbler supports the .nop directive.
> >> +$(call as-option-add,CFLAGS,CC,\
> >> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> >> +
> >>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
> >>  
> >>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip 
> >> the
> >> diff --git a/xen/include/asm-x86/alternative-asm.h 
> >> b/xen/include/asm-x86/alternative-asm.h
> >> index 25f79fe..9e46bed 100644
> >> --- a/xen/include/asm-x86/alternative-asm.h
> >> +++ b/xen/include/asm-x86/alternative-asm.h
> >> @@ -1,6 +1,8 @@
> >>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
> >>  #define _ASM_X86_ALTERNATIVE_ASM_H_
> >>  
> >> +#include 
> >> +
> >>  #ifdef __ASSEMBLY__
> >>  
> >>  /*
> >> @@ -18,6 +20,14 @@
> >>  .byte \pad_len
> >>  .endm
> >>  
> >> +.macro mknops nr_bytes
> >> +#ifdef HAVE_AS_NOP_DIRECTIVE
> >> +.nop \nr_bytes, ASM_NOP_MAX
> > I'm not able to find any online document about the .nop directive, and
> > I cannot really figure out the purpose of the second parameter.
> 
> Its a bit woolly, named "control".  In practice, it is the maximum
> length of an individual nop.  Beyond 11 bytes (iirc), most pipelines
> take a decode stall.

Oh, I've looked at the commit above, and since control is an optional
parameter, why not leave the assembler chose the default value? AFAICT
in our case this should be 11 because it's 64bit code.

> >
> > Also, after this patch is applied it seems like the padding is not
> > going to be 0x90, because as will already emit optimized nops. Are
> > those nops more optimized than the ones added by the alternatives
> > framework?
> 
> They are the same nops (by and large), although arranged differently. 
> For example, GAS fills backwards rather than forwards, which is as
> recommended in the Intel ORM.

So that 'larger' nop instructions are going to be placed at the end of
the region instead of the beginning of it?

> > I would expect the nops added at runtime would be more optimized than
> > the ones added at build time, because the runtime ones could take into
> > account the specific CPU model Xen is running on.
> 
> There are only a very few 64-bit capable CPUs which prefer K8 nops over
> P6 nops, and they are all old.  The build-time nops are correct for the
> overwhelming majority of hardware.

OK, I have no idea about this, but if you think filling with .nop is
not going to introduce a performance penalty over the run-time filling
then that's fine for me.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-26 Thread Andrew Cooper
On 26/02/18 12:31, Roger Pau Monné wrote:
> On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote:
>> Newer versions of binutils are capable of emitting an exact number bytes 
>> worth
>> of optimised nops.  Use this in preference to .skip when available.
>>
>> Signed-off-by: Andrew Cooper 
>> ---
>> CC: Jan Beulich 
>> CC: Konrad Rzeszutek Wilk 
>> CC: Roger Pau Monné 
>> CC: Wei Liu 
>>
>> RFC until support is actually committed to binutils mainline.
> Since RFC has been dropped from the subject, is this now committed?

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532

>
>> ---
>>  xen/arch/x86/Rules.mk |  4 
>>  xen/include/asm-x86/alternative-asm.h | 14 --
>>  xen/include/asm-x86/alternative.h | 13 ++---
>>  3 files changed, 26 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
>> index e169d67..bf5047f 100644
>> --- a/xen/arch/x86/Rules.mk
>> +++ b/xen/arch/x86/Rules.mk
>> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>>  $(call as-option-add,CFLAGS,CC,\
>>  ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>>  
>> +# Check to see whether the assmbler supports the .nop directive.
>> +$(call as-option-add,CFLAGS,CC,\
>> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
>> +
>>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
>>  
>>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
>> diff --git a/xen/include/asm-x86/alternative-asm.h 
>> b/xen/include/asm-x86/alternative-asm.h
>> index 25f79fe..9e46bed 100644
>> --- a/xen/include/asm-x86/alternative-asm.h
>> +++ b/xen/include/asm-x86/alternative-asm.h
>> @@ -1,6 +1,8 @@
>>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>>  
>> +#include 
>> +
>>  #ifdef __ASSEMBLY__
>>  
>>  /*
>> @@ -18,6 +20,14 @@
>>  .byte \pad_len
>>  .endm
>>  
>> +.macro mknops nr_bytes
>> +#ifdef HAVE_AS_NOP_DIRECTIVE
>> +.nop \nr_bytes, ASM_NOP_MAX
> I'm not able to find any online document about the .nop directive, and
> I cannot really figure out the purpose of the second parameter.

Its a bit woolly, named "control".  In practice, it is the maximum
length of an individual nop.  Beyond 11 bytes (iirc), most pipelines
take a decode stall.

>
> Also, after this patch is applied it seems like the padding is not
> going to be 0x90, because as will already emit optimized nops. Are
> those nops more optimized than the ones added by the alternatives
> framework?

They are the same nops (by and large), although arranged differently. 
For example, GAS fills backwards rather than forwards, which is as
recommended in the Intel ORM.

> I would expect the nops added at runtime would be more optimized than
> the ones added at build time, because the runtime ones could take into
> account the specific CPU model Xen is running on.

There are only a very few 64-bit capable CPUs which prefer K8 nops over
P6 nops, and they are all old.  The build-time nops are correct for the
overwhelming majority of hardware.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-26 Thread Roger Pau Monné
On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote:
> Newer versions of binutils are capable of emitting an exact number bytes worth
> of optimised nops.  Use this in preference to .skip when available.
> 
> Signed-off-by: Andrew Cooper 
> ---
> CC: Jan Beulich 
> CC: Konrad Rzeszutek Wilk 
> CC: Roger Pau Monné 
> CC: Wei Liu 
> 
> RFC until support is actually committed to binutils mainline.

Since RFC has been dropped from the subject, is this now committed?

> ---
>  xen/arch/x86/Rules.mk |  4 
>  xen/include/asm-x86/alternative-asm.h | 14 --
>  xen/include/asm-x86/alternative.h | 13 ++---
>  3 files changed, 26 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
> index e169d67..bf5047f 100644
> --- a/xen/arch/x86/Rules.mk
> +++ b/xen/arch/x86/Rules.mk
> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
>  $(call as-option-add,CFLAGS,CC,\
>  ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
>  
> +# Check to see whether the assmbler supports the .nop directive.
> +$(call as-option-add,CFLAGS,CC,\
> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
> +
>  CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
>  
>  # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
> diff --git a/xen/include/asm-x86/alternative-asm.h 
> b/xen/include/asm-x86/alternative-asm.h
> index 25f79fe..9e46bed 100644
> --- a/xen/include/asm-x86/alternative-asm.h
> +++ b/xen/include/asm-x86/alternative-asm.h
> @@ -1,6 +1,8 @@
>  #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
>  #define _ASM_X86_ALTERNATIVE_ASM_H_
>  
> +#include 
> +
>  #ifdef __ASSEMBLY__
>  
>  /*
> @@ -18,6 +20,14 @@
>  .byte \pad_len
>  .endm
>  
> +.macro mknops nr_bytes
> +#ifdef HAVE_AS_NOP_DIRECTIVE
> +.nop \nr_bytes, ASM_NOP_MAX

I'm not able to find any online document about the .nop directive, and
I cannot really figure out the purpose of the second parameter.

Also, after this patch is applied it seems like the padding is not
going to be 0x90, because as will already emit optimized nops. Are
those nops more optimized than the ones added by the alternatives
framework?

I would expect the nops added at runtime would be more optimized than
the ones added at build time, because the runtime ones could take into
account the specific CPU model Xen is running on.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available

2018-02-26 Thread Andrew Cooper
Newer versions of binutils are capable of emitting an exact number bytes worth
of optimised nops.  Use this in preference to .skip when available.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Konrad Rzeszutek Wilk 
CC: Roger Pau Monné 
CC: Wei Liu 

RFC until support is actually committed to binutils mainline.
---
 xen/arch/x86/Rules.mk |  4 
 xen/include/asm-x86/alternative-asm.h | 14 --
 xen/include/asm-x86/alternative.h | 13 ++---
 3 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk
index e169d67..bf5047f 100644
--- a/xen/arch/x86/Rules.mk
+++ b/xen/arch/x86/Rules.mk
@@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \
 $(call as-option-add,CFLAGS,CC,\
 ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE)
 
+# Check to see whether the assmbler supports the .nop directive.
+$(call as-option-add,CFLAGS,CC,\
+".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE)
+
 CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables
 
 # Xen doesn't use SSE interally.  If the compiler supports it, also skip the
diff --git a/xen/include/asm-x86/alternative-asm.h 
b/xen/include/asm-x86/alternative-asm.h
index 25f79fe..9e46bed 100644
--- a/xen/include/asm-x86/alternative-asm.h
+++ b/xen/include/asm-x86/alternative-asm.h
@@ -1,6 +1,8 @@
 #ifndef _ASM_X86_ALTERNATIVE_ASM_H_
 #define _ASM_X86_ALTERNATIVE_ASM_H_
 
+#include 
+
 #ifdef __ASSEMBLY__
 
 /*
@@ -18,6 +20,14 @@
 .byte \pad_len
 .endm
 
+.macro mknops nr_bytes
+#ifdef HAVE_AS_NOP_DIRECTIVE
+.nop \nr_bytes, ASM_NOP_MAX
+#else
+.skip \nr_bytes, 0x90
+#endif
+.endm
+
 #define orig_len   (.L\@_orig_e   - .L\@_orig_s)
 #define pad_len(.L\@_orig_p   - .L\@_orig_e)
 #define total_len  (.L\@_orig_p   - .L\@_orig_s)
@@ -43,7 +53,7 @@
  */
 .L\@_diff = repl_len(1) - orig_len
 
-.skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+ mknops (as_true(.L\@_diff > 0) * .L\@_diff)
 .L\@_orig_p:
 
 .pushsection .altinstructions, "a", @progbits
@@ -76,7 +86,7 @@
  */
 .L\@_diff = as_max(repl_len(1), repl_len(2)) - orig_len
 
- .skip as_true(.L\@_diff > 0) * .L\@_diff, 0x90
+ mknops (as_true(.L\@_diff > 0) * .L\@_diff)
 .L\@_orig_p:
 
 .pushsection .altinstructions, "a", @progbits
diff --git a/xen/include/asm-x86/alternative.h 
b/xen/include/asm-x86/alternative.h
index d53cea0..51538b6 100644
--- a/xen/include/asm-x86/alternative.h
+++ b/xen/include/asm-x86/alternative.h
@@ -2,7 +2,6 @@
 #define __X86_ALTERNATIVE_H__
 
 #include 
-#include 
 
 #ifndef __ASSEMBLY__
 #include 
@@ -27,6 +26,14 @@ extern void apply_alternatives(const struct alt_instr *start,
const struct alt_instr *end);
 extern void alternative_instructions(void);
 
+asm ( ".macro mknops nr_bytes\n\t"
+#ifdef HAVE_AS_NOP_DIRECTIVE
+  ".nop \\nr_bytes, " __stringify(ASM_NOP_MAX) "\n\t"
+#else
+  ".skip \\nr_bytes, 0x90\n\t"
+#endif
+  ".endm\n\t" );
+
 #define alt_orig_len   "(.LXEN%=_orig_e - .LXEN%=_orig_s)"
 #define alt_pad_len"(.LXEN%=_orig_p - .LXEN%=_orig_e)"
 #define alt_total_len  "(.LXEN%=_orig_p - .LXEN%=_orig_s)"
@@ -46,7 +53,7 @@ extern void alternative_instructions(void);
 #define OLDINSTR_1(oldinstr, n1) \
 ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"  \
 ".LXEN%=_diff = "alt_repl_len(n1)"-"alt_orig_len"\n\t"   \
-".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+"mknops ("AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff)\n\t"\
 ".LXEN%=_orig_p:\n\t"
 
 #define ALT_PADDING_LEN(n1, n2) \
@@ -55,7 +62,7 @@ extern void alternative_instructions(void);
 #define OLDINSTR_2(oldinstr, n1, n2) \
 ".LXEN%=_orig_s:\n\t" oldinstr "\n .LXEN%=_orig_e:\n\t"  \
 ".LXEN%=_diff = "ALT_PADDING_LEN(n1, n2)"\n\t"   \
-".skip "AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff, 0x90\n\t" \
+"mknops ("AS_TRUE"(.LXEN%=_diff > 0) * .LXEN%=_diff)\n\t"\
 ".LXEN%=_orig_p:\n\t"
 
 #define ALTINSTR_ENTRY(feature, num)\
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel