Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> On 02.03.18 at 20:34, wrote: > On 02/03/18 07:10, Jan Beulich wrote: > On 01.03.18 at 17:58, wrote: >>> The pont of having the toolchain put out optimised nops is to avoid the >>> need for us to patch the site at all. I.e. calling optimise_nops() on a >>> set of toolchain nops defeats the purpose in the overwhelming common >>> case of running on a system which prefers P6 nops. >>> >>> The problem of working out when to optimise is that, when we come to >>> apply an individual alternative, we don't know if we've already patched >>> this site before. Even the unoptimised algorithm has a corner case >>> which explodes, if there is a stream of 0x90's on the end of a >>> replacement e.g. in a imm or disp field. >>> >>> Put simply, we cannot determine, by peeking at the patchsite, whether it >>> has been patched or not (other than keeping a full copy of the origin >>> site as a reference). As soon as we chose to optimise the nops of the >>> origin site, we cannot determine anything at all. >>> >>> Thinking out loud, we could perhaps have a section containing one byte >>> per origin site, which we use to track whether we've already optimised >>> the padding bytes, and whether the contents have been replaced. This >>> would also add an extra long into struct altentry, but its all cold data >>> after boot. >> What about alternatively simply updating the struct alt_instr >> instances to describe the code _after_ a patch that was applied? >> That'll allow to always know how much padding there is. > > There are multiple alt_instr pointing to the same origin sites when > using ALTERNATIVE_2, meaning you keep all the safety problems with the > current setup. Oh, right, it was rubbish what I said - we don't ever look a 2nd time at any particular struct alt_instr instance. We'd have to settle on multiple changes to the same patch site to always be adjacent, such that apply_alternatives() could have local state forwarding (between loop iterations) added. For the approach you describe I can't see how you would safely glue together multiple patches to the same origin site, unless we would outright forbid any use of lower level infrastructure than ALTERNATIVE_2(). But if we demand that, we could easily flag the first of the alternatives with a "continued" indicator, used to trigger state forwarding in apply_alternatives() (perhaps no more state than "avoid the NOP optimization if the controlling feature is unavailable"). No need for an extra array and pointer then. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On 02/03/18 07:10, Jan Beulich wrote: On 01.03.18 at 17:58, wrote: >> On 01/03/18 10:54, Jan Beulich wrote: >> On 01.03.18 at 11:36, wrote: On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: Andrew Cooper 02/28/18 7:20 PM >>> >> On 28/02/18 16:22, Jan Beulich wrote: >> On 26.02.18 at 12:35, wrote: --- a/xen/include/asm-x86/alternative-asm.h +++ b/xen/include/asm-x86/alternative-asm.h @@ -1,6 +1,8 @@ #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ #define _ASM_X86_ALTERNATIVE_ASM_H_ +#include + #ifdef __ASSEMBLY__ /* @@ -18,6 +20,14 @@ .byte \pad_len .endm +.macro mknops nr_bytes +#ifdef HAVE_AS_NOP_DIRECTIVE +.nop \nr_bytes, ASM_NOP_MAX >>> And do you really need to specify ASM_NOP_MAX here? What's >>> wrong with letting the assembler pick what it wants as the longest >>> NOP? >> I don't want a toolchain change to cause us to go beyond 11 byte nops, >> because of the associated decode stall on almost all hardware. Using >> ASM_NOP_MAX seemed like the easiest way to keep the end result >> consistent, irrespective of toolchain support. > I don't understand - an earlier patch takes care of runtime replacing them > anyway. What stalls can then result? The runtime replacement won't happen when using the .nops directive AFAICT, because the original padding section will likely be filled with opcodes different than 0x90, and thus the runtime nop optimization won't be performed. >>> Oh, indeed. That puts under question the whole idea of using >>> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer >>> to withdraw my ack. >>> I also agree that using the default (not proving a second argument) seems like a better solution. Why would the toolstack switch to something that leads to worse performance? That would certainly be considered a bug. >>> Why? They may change it based on data available for newer / >>> older / whatever hardware. Any build-time choice is going to be >>> suboptimal somewhere, so I think we absolutely should not >>> bypass runtime replacing these NOPs, the more that now we >>> may have quite large sequences of them. >> The pont of having the toolchain put out optimised nops is to avoid the >> need for us to patch the site at all. I.e. calling optimise_nops() on a >> set of toolchain nops defeats the purpose in the overwhelming common >> case of running on a system which prefers P6 nops. >> >> The problem of working out when to optimise is that, when we come to >> apply an individual alternative, we don't know if we've already patched >> this site before. Even the unoptimised algorithm has a corner case >> which explodes, if there is a stream of 0x90's on the end of a >> replacement e.g. in a imm or disp field. >> >> Put simply, we cannot determine, by peeking at the patchsite, whether it >> has been patched or not (other than keeping a full copy of the origin >> site as a reference). As soon as we chose to optimise the nops of the >> origin site, we cannot determine anything at all. >> >> Thinking out loud, we could perhaps have a section containing one byte >> per origin site, which we use to track whether we've already optimised >> the padding bytes, and whether the contents have been replaced. This >> would also add an extra long into struct altentry, but its all cold data >> after boot. > What about alternatively simply updating the struct alt_instr > instances to describe the code _after_ a patch that was applied? > That'll allow to always know how much padding there is. There are multiple alt_instr pointing to the same origin sites when using ALTERNATIVE_2, meaning you keep all the safety problems with the current setup. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> On 01.03.18 at 17:58, wrote: > On 01/03/18 10:54, Jan Beulich wrote: > On 01.03.18 at 11:36, wrote: >>> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: >>> Andrew Cooper 02/28/18 7:20 PM >>> > On 28/02/18 16:22, Jan Beulich wrote: > On 26.02.18 at 12:35, wrote: >>> --- a/xen/include/asm-x86/alternative-asm.h >>> +++ b/xen/include/asm-x86/alternative-asm.h >>> @@ -1,6 +1,8 @@ >>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >>> #define _ASM_X86_ALTERNATIVE_ASM_H_ >>> >>> +#include >>> + >>> #ifdef __ASSEMBLY__ >>> >>> /* >>> @@ -18,6 +20,14 @@ >>> .byte \pad_len >>> .endm >>> >>> +.macro mknops nr_bytes >>> +#ifdef HAVE_AS_NOP_DIRECTIVE >>> +.nop \nr_bytes, ASM_NOP_MAX >> And do you really need to specify ASM_NOP_MAX here? What's >> wrong with letting the assembler pick what it wants as the longest >> NOP? > I don't want a toolchain change to cause us to go beyond 11 byte nops, > because of the associated decode stall on almost all hardware. Using > ASM_NOP_MAX seemed like the easiest way to keep the end result > consistent, irrespective of toolchain support. I don't understand - an earlier patch takes care of runtime replacing them anyway. What stalls can then result? >>> The runtime replacement won't happen when using the .nops directive >>> AFAICT, because the original padding section will likely be filled >>> with opcodes different than 0x90, and thus the runtime nop >>> optimization won't be performed. >> Oh, indeed. That puts under question the whole idea of using >> .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer >> to withdraw my ack. >> >>> I also agree that using the default (not proving a second argument) >>> seems like a better solution. Why would the toolstack switch to >>> something that leads to worse performance? That would certainly be >>> considered a bug. >> Why? They may change it based on data available for newer / >> older / whatever hardware. Any build-time choice is going to be >> suboptimal somewhere, so I think we absolutely should not >> bypass runtime replacing these NOPs, the more that now we >> may have quite large sequences of them. > > The pont of having the toolchain put out optimised nops is to avoid the > need for us to patch the site at all. I.e. calling optimise_nops() on a > set of toolchain nops defeats the purpose in the overwhelming common > case of running on a system which prefers P6 nops. > > The problem of working out when to optimise is that, when we come to > apply an individual alternative, we don't know if we've already patched > this site before. Even the unoptimised algorithm has a corner case > which explodes, if there is a stream of 0x90's on the end of a > replacement e.g. in a imm or disp field. > > Put simply, we cannot determine, by peeking at the patchsite, whether it > has been patched or not (other than keeping a full copy of the origin > site as a reference). As soon as we chose to optimise the nops of the > origin site, we cannot determine anything at all. > > Thinking out loud, we could perhaps have a section containing one byte > per origin site, which we use to track whether we've already optimised > the padding bytes, and whether the contents have been replaced. This > would also add an extra long into struct altentry, but its all cold data > after boot. What about alternatively simply updating the struct alt_instr instances to describe the code _after_ a patch that was applied? That'll allow to always know how much padding there is. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On 01/03/18 10:54, Jan Beulich wrote: On 01.03.18 at 11:36, wrote: >> On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: >> Andrew Cooper 02/28/18 7:20 PM >>> On 28/02/18 16:22, Jan Beulich wrote: On 26.02.18 at 12:35, wrote: >> --- a/xen/include/asm-x86/alternative-asm.h >> +++ b/xen/include/asm-x86/alternative-asm.h >> @@ -1,6 +1,8 @@ >> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >> #define _ASM_X86_ALTERNATIVE_ASM_H_ >> >> +#include >> + >> #ifdef __ASSEMBLY__ >> >> /* >> @@ -18,6 +20,14 @@ >> .byte \pad_len >> .endm >> >> +.macro mknops nr_bytes >> +#ifdef HAVE_AS_NOP_DIRECTIVE >> +.nop \nr_bytes, ASM_NOP_MAX > And do you really need to specify ASM_NOP_MAX here? What's > wrong with letting the assembler pick what it wants as the longest > NOP? I don't want a toolchain change to cause us to go beyond 11 byte nops, because of the associated decode stall on almost all hardware. Using ASM_NOP_MAX seemed like the easiest way to keep the end result consistent, irrespective of toolchain support. >>> I don't understand - an earlier patch takes care of runtime replacing them >>> anyway. What stalls can then result? >> The runtime replacement won't happen when using the .nops directive >> AFAICT, because the original padding section will likely be filled >> with opcodes different than 0x90, and thus the runtime nop >> optimization won't be performed. > Oh, indeed. That puts under question the whole idea of using > .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer > to withdraw my ack. > >> I also agree that using the default (not proving a second argument) >> seems like a better solution. Why would the toolstack switch to >> something that leads to worse performance? That would certainly be >> considered a bug. > Why? They may change it based on data available for newer / > older / whatever hardware. Any build-time choice is going to be > suboptimal somewhere, so I think we absolutely should not > bypass runtime replacing these NOPs, the more that now we > may have quite large sequences of them. The pont of having the toolchain put out optimised nops is to avoid the need for us to patch the site at all. I.e. calling optimise_nops() on a set of toolchain nops defeats the purpose in the overwhelming common case of running on a system which prefers P6 nops. The problem of working out when to optimise is that, when we come to apply an individual alternative, we don't know if we've already patched this site before. Even the unoptimised algorithm has a corner case which explodes, if there is a stream of 0x90's on the end of a replacement e.g. in a imm or disp field. Put simply, we cannot determine, by peeking at the patchsite, whether it has been patched or not (other than keeping a full copy of the origin site as a reference). As soon as we chose to optimise the nops of the origin site, we cannot determine anything at all. Thinking out loud, we could perhaps have a section containing one byte per origin site, which we use to track whether we've already optimised the padding bytes, and whether the contents have been replaced. This would also add an extra long into struct altentry, but its all cold data after boot. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> On 01.03.18 at 11:36, wrote: > On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: >> >>> Andrew Cooper 02/28/18 7:20 PM >>> >> >On 28/02/18 16:22, Jan Beulich wrote: >> > On 26.02.18 at 12:35, wrote: >> >>> --- a/xen/include/asm-x86/alternative-asm.h >> >>> +++ b/xen/include/asm-x86/alternative-asm.h >> >>> @@ -1,6 +1,8 @@ >> >>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >> >>> #define _ASM_X86_ALTERNATIVE_ASM_H_ >> >>> >> >>> +#include >> >>> + >> >>> #ifdef __ASSEMBLY__ >> >>> >> >>> /* >> >>> @@ -18,6 +20,14 @@ >> >>> .byte \pad_len >> >>> .endm >> >>> >> >>> +.macro mknops nr_bytes >> >>> +#ifdef HAVE_AS_NOP_DIRECTIVE >> >>> +.nop \nr_bytes, ASM_NOP_MAX >> >> And do you really need to specify ASM_NOP_MAX here? What's >> >> wrong with letting the assembler pick what it wants as the longest >> >> NOP? >> > >> >I don't want a toolchain change to cause us to go beyond 11 byte nops, >> >because of the associated decode stall on almost all hardware. Using >> >ASM_NOP_MAX seemed like the easiest way to keep the end result >> >consistent, irrespective of toolchain support. >> >> I don't understand - an earlier patch takes care of runtime replacing them >> anyway. What stalls can then result? > > The runtime replacement won't happen when using the .nops directive > AFAICT, because the original padding section will likely be filled > with opcodes different than 0x90, and thus the runtime nop > optimization won't be performed. Oh, indeed. That puts under question the whole idea of using .nops in favor of .skip. Andrew, I'm sorry, but with this I prefer to withdraw my ack. > I also agree that using the default (not proving a second argument) > seems like a better solution. Why would the toolstack switch to > something that leads to worse performance? That would certainly be > considered a bug. Why? They may change it based on data available for newer / older / whatever hardware. Any build-time choice is going to be suboptimal somewhere, so I think we absolutely should not bypass runtime replacing these NOPs, the more that now we may have quite large sequences of them. Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On Thu, Mar 01, 2018 at 12:28:27AM -0700, Jan Beulich wrote: > >>> Andrew Cooper 02/28/18 7:20 PM >>> > >On 28/02/18 16:22, Jan Beulich wrote: > > On 26.02.18 at 12:35, wrote: > >>> --- a/xen/include/asm-x86/alternative-asm.h > >>> +++ b/xen/include/asm-x86/alternative-asm.h > >>> @@ -1,6 +1,8 @@ > >>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ > >>> #define _ASM_X86_ALTERNATIVE_ASM_H_ > >>> > >>> +#include > >>> + > >>> #ifdef __ASSEMBLY__ > >>> > >>> /* > >>> @@ -18,6 +20,14 @@ > >>> .byte \pad_len > >>> .endm > >>> > >>> +.macro mknops nr_bytes > >>> +#ifdef HAVE_AS_NOP_DIRECTIVE > >>> +.nop \nr_bytes, ASM_NOP_MAX > >> And do you really need to specify ASM_NOP_MAX here? What's > >> wrong with letting the assembler pick what it wants as the longest > >> NOP? > > > >I don't want a toolchain change to cause us to go beyond 11 byte nops, > >because of the associated decode stall on almost all hardware. Using > >ASM_NOP_MAX seemed like the easiest way to keep the end result > >consistent, irrespective of toolchain support. > > I don't understand - an earlier patch takes care of runtime replacing them > anyway. What stalls can then result? The runtime replacement won't happen when using the .nops directive AFAICT, because the original padding section will likely be filled with opcodes different than 0x90, and thus the runtime nop optimization won't be performed. I also agree that using the default (not proving a second argument) seems like a better solution. Why would the toolstack switch to something that leads to worse performance? That would certainly be considered a bug. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> Andrew Cooper 02/28/18 7:20 PM >>> >On 28/02/18 16:22, Jan Beulich wrote: > On 26.02.18 at 12:35, wrote: >>> --- a/xen/include/asm-x86/alternative-asm.h >>> +++ b/xen/include/asm-x86/alternative-asm.h >>> @@ -1,6 +1,8 @@ >>> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >>> #define _ASM_X86_ALTERNATIVE_ASM_H_ >>> >>> +#include >>> + >>> #ifdef __ASSEMBLY__ >>> >>> /* >>> @@ -18,6 +20,14 @@ >>> .byte \pad_len >>> .endm >>> >>> +.macro mknops nr_bytes >>> +#ifdef HAVE_AS_NOP_DIRECTIVE >>> +.nop \nr_bytes, ASM_NOP_MAX >> And do you really need to specify ASM_NOP_MAX here? What's >> wrong with letting the assembler pick what it wants as the longest >> NOP? > >I don't want a toolchain change to cause us to go beyond 11 byte nops, >because of the associated decode stall on almost all hardware. Using >ASM_NOP_MAX seemed like the easiest way to keep the end result >consistent, irrespective of toolchain support. I don't understand - an earlier patch takes care of runtime replacing them anyway. What stalls can then result? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On 28/02/18 16:22, Jan Beulich wrote: On 26.02.18 at 12:35, wrote: >> Newer versions of binutils are capable of emitting an exact number bytes >> worth >> of optimised nops. Use this in preference to .skip when available. >> >> Signed-off-by: Andrew Cooper > In principle > Reviewed-by: Jan Beulich > However, ... > >> RFC until support is actually committed to binutils mainline. > ... upstream looks to have switched to .nops now, so the prereq > to the R-b is that you consistently switch over. Yes. I need to pull, rebuild and retest. Even if this patch gets committed, it is still semi-RFC and up for changes until the next binutils release. That said, given the explicit probe, it is at least safe to have a stale ABI in use, and we can change it later if needs be. > >> --- a/xen/arch/x86/Rules.mk >> +++ b/xen/arch/x86/Rules.mk >> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ >> $(call as-option-add,CFLAGS,CC,\ >> ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE) >> >> +# Check to see whether the assmbler supports the .nop directive. >> +$(call as-option-add,CFLAGS,CC,\ >> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE) > Do you really need the (arbitrary) second argument here? I want the check to match the form we actually use. > >> --- a/xen/include/asm-x86/alternative-asm.h >> +++ b/xen/include/asm-x86/alternative-asm.h >> @@ -1,6 +1,8 @@ >> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >> #define _ASM_X86_ALTERNATIVE_ASM_H_ >> >> +#include >> + >> #ifdef __ASSEMBLY__ >> >> /* >> @@ -18,6 +20,14 @@ >> .byte \pad_len >> .endm >> >> +.macro mknops nr_bytes >> +#ifdef HAVE_AS_NOP_DIRECTIVE >> +.nop \nr_bytes, ASM_NOP_MAX > And do you really need to specify ASM_NOP_MAX here? What's > wrong with letting the assembler pick what it wants as the longest > NOP? I don't want a toolchain change to cause us to go beyond 11 byte nops, because of the associated decode stall on almost all hardware. Using ASM_NOP_MAX seemed like the easiest way to keep the end result consistent, irrespective of toolchain support. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
>>> On 26.02.18 at 12:35, wrote: > Newer versions of binutils are capable of emitting an exact number bytes worth > of optimised nops. Use this in preference to .skip when available. > > Signed-off-by: Andrew Cooper In principle Reviewed-by: Jan Beulich However, ... > RFC until support is actually committed to binutils mainline. ... upstream looks to have switched to .nops now, so the prereq to the R-b is that you consistently switch over. > --- a/xen/arch/x86/Rules.mk > +++ b/xen/arch/x86/Rules.mk > @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ > $(call as-option-add,CFLAGS,CC,\ > ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE) > > +# Check to see whether the assmbler supports the .nop directive. > +$(call as-option-add,CFLAGS,CC,\ > +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE) Do you really need the (arbitrary) second argument here? > --- a/xen/include/asm-x86/alternative-asm.h > +++ b/xen/include/asm-x86/alternative-asm.h > @@ -1,6 +1,8 @@ > #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ > #define _ASM_X86_ALTERNATIVE_ASM_H_ > > +#include > + > #ifdef __ASSEMBLY__ > > /* > @@ -18,6 +20,14 @@ > .byte \pad_len > .endm > > +.macro mknops nr_bytes > +#ifdef HAVE_AS_NOP_DIRECTIVE > +.nop \nr_bytes, ASM_NOP_MAX And do you really need to specify ASM_NOP_MAX here? What's wrong with letting the assembler pick what it wants as the longest NOP? Jan ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On Mon, Feb 26, 2018 at 01:08:05PM +, Andrew Cooper wrote: > On 26/02/18 12:31, Roger Pau Monné wrote: > > On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote: > >> Newer versions of binutils are capable of emitting an exact number bytes > >> worth > >> of optimised nops. Use this in preference to .skip when available. > >> > >> Signed-off-by: Andrew Cooper > >> --- > >> CC: Jan Beulich > >> CC: Konrad Rzeszutek Wilk > >> CC: Roger Pau Monné > >> CC: Wei Liu > >> > >> RFC until support is actually committed to binutils mainline. > > Since RFC has been dropped from the subject, is this now committed? > > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532 Thanks for the reference. > > > >> --- > >> xen/arch/x86/Rules.mk | 4 > >> xen/include/asm-x86/alternative-asm.h | 14 -- > >> xen/include/asm-x86/alternative.h | 13 ++--- > >> 3 files changed, 26 insertions(+), 5 deletions(-) > >> > >> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk > >> index e169d67..bf5047f 100644 > >> --- a/xen/arch/x86/Rules.mk > >> +++ b/xen/arch/x86/Rules.mk > >> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ > >> $(call as-option-add,CFLAGS,CC,\ > >> ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE) > >> > >> +# Check to see whether the assmbler supports the .nop directive. > >> +$(call as-option-add,CFLAGS,CC,\ > >> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE) > >> + > >> CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables > >> > >> # Xen doesn't use SSE interally. If the compiler supports it, also skip > >> the > >> diff --git a/xen/include/asm-x86/alternative-asm.h > >> b/xen/include/asm-x86/alternative-asm.h > >> index 25f79fe..9e46bed 100644 > >> --- a/xen/include/asm-x86/alternative-asm.h > >> +++ b/xen/include/asm-x86/alternative-asm.h > >> @@ -1,6 +1,8 @@ > >> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ > >> #define _ASM_X86_ALTERNATIVE_ASM_H_ > >> > >> +#include > >> + > >> #ifdef __ASSEMBLY__ > >> > >> /* > >> @@ -18,6 +20,14 @@ > >> .byte \pad_len > >> .endm > >> > >> +.macro mknops nr_bytes > >> +#ifdef HAVE_AS_NOP_DIRECTIVE > >> +.nop \nr_bytes, ASM_NOP_MAX > > I'm not able to find any online document about the .nop directive, and > > I cannot really figure out the purpose of the second parameter. > > Its a bit woolly, named "control". In practice, it is the maximum > length of an individual nop. Beyond 11 bytes (iirc), most pipelines > take a decode stall. Oh, I've looked at the commit above, and since control is an optional parameter, why not leave the assembler chose the default value? AFAICT in our case this should be 11 because it's 64bit code. > > > > Also, after this patch is applied it seems like the padding is not > > going to be 0x90, because as will already emit optimized nops. Are > > those nops more optimized than the ones added by the alternatives > > framework? > > They are the same nops (by and large), although arranged differently. > For example, GAS fills backwards rather than forwards, which is as > recommended in the Intel ORM. So that 'larger' nop instructions are going to be placed at the end of the region instead of the beginning of it? > > I would expect the nops added at runtime would be more optimized than > > the ones added at build time, because the runtime ones could take into > > account the specific CPU model Xen is running on. > > There are only a very few 64-bit capable CPUs which prefer K8 nops over > P6 nops, and they are all old. The build-time nops are correct for the > overwhelming majority of hardware. OK, I have no idea about this, but if you think filling with .nop is not going to introduce a performance penalty over the run-time filling then that's fine for me. Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On 26/02/18 12:31, Roger Pau Monné wrote: > On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote: >> Newer versions of binutils are capable of emitting an exact number bytes >> worth >> of optimised nops. Use this in preference to .skip when available. >> >> Signed-off-by: Andrew Cooper >> --- >> CC: Jan Beulich >> CC: Konrad Rzeszutek Wilk >> CC: Roger Pau Monné >> CC: Wei Liu >> >> RFC until support is actually committed to binutils mainline. > Since RFC has been dropped from the subject, is this now committed? https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=62a02d25b6e5d9f92c205260daa11355d0c62532 > >> --- >> xen/arch/x86/Rules.mk | 4 >> xen/include/asm-x86/alternative-asm.h | 14 -- >> xen/include/asm-x86/alternative.h | 13 ++--- >> 3 files changed, 26 insertions(+), 5 deletions(-) >> >> diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk >> index e169d67..bf5047f 100644 >> --- a/xen/arch/x86/Rules.mk >> +++ b/xen/arch/x86/Rules.mk >> @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ >> $(call as-option-add,CFLAGS,CC,\ >> ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE) >> >> +# Check to see whether the assmbler supports the .nop directive. >> +$(call as-option-add,CFLAGS,CC,\ >> +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE) >> + >> CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables >> >> # Xen doesn't use SSE interally. If the compiler supports it, also skip the >> diff --git a/xen/include/asm-x86/alternative-asm.h >> b/xen/include/asm-x86/alternative-asm.h >> index 25f79fe..9e46bed 100644 >> --- a/xen/include/asm-x86/alternative-asm.h >> +++ b/xen/include/asm-x86/alternative-asm.h >> @@ -1,6 +1,8 @@ >> #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ >> #define _ASM_X86_ALTERNATIVE_ASM_H_ >> >> +#include >> + >> #ifdef __ASSEMBLY__ >> >> /* >> @@ -18,6 +20,14 @@ >> .byte \pad_len >> .endm >> >> +.macro mknops nr_bytes >> +#ifdef HAVE_AS_NOP_DIRECTIVE >> +.nop \nr_bytes, ASM_NOP_MAX > I'm not able to find any online document about the .nop directive, and > I cannot really figure out the purpose of the second parameter. Its a bit woolly, named "control". In practice, it is the maximum length of an individual nop. Beyond 11 bytes (iirc), most pipelines take a decode stall. > > Also, after this patch is applied it seems like the padding is not > going to be 0x90, because as will already emit optimized nops. Are > those nops more optimized than the ones added by the alternatives > framework? They are the same nops (by and large), although arranged differently. For example, GAS fills backwards rather than forwards, which is as recommended in the Intel ORM. > I would expect the nops added at runtime would be more optimized than > the ones added at build time, because the runtime ones could take into > account the specific CPU model Xen is running on. There are only a very few 64-bit capable CPUs which prefer K8 nops over P6 nops, and they are all old. The build-time nops are correct for the overwhelming majority of hardware. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH v2 7/7] x86/build: Use new .nop directive when available
On Mon, Feb 26, 2018 at 11:35:04AM +, Andrew Cooper wrote: > Newer versions of binutils are capable of emitting an exact number bytes worth > of optimised nops. Use this in preference to .skip when available. > > Signed-off-by: Andrew Cooper > --- > CC: Jan Beulich > CC: Konrad Rzeszutek Wilk > CC: Roger Pau Monné > CC: Wei Liu > > RFC until support is actually committed to binutils mainline. Since RFC has been dropped from the subject, is this now committed? > --- > xen/arch/x86/Rules.mk | 4 > xen/include/asm-x86/alternative-asm.h | 14 -- > xen/include/asm-x86/alternative.h | 13 ++--- > 3 files changed, 26 insertions(+), 5 deletions(-) > > diff --git a/xen/arch/x86/Rules.mk b/xen/arch/x86/Rules.mk > index e169d67..bf5047f 100644 > --- a/xen/arch/x86/Rules.mk > +++ b/xen/arch/x86/Rules.mk > @@ -28,6 +28,10 @@ $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1", \ > $(call as-option-add,CFLAGS,CC,\ > ".if ((1 > 0) < 0); .error \"\";.endif",,-DHAVE_AS_NEGATIVE_TRUE) > > +# Check to see whether the assmbler supports the .nop directive. > +$(call as-option-add,CFLAGS,CC,\ > +".L1: .L2: .nop (.L2 - .L1)$$(comma)9",-DHAVE_AS_NOP_DIRECTIVE) > + > CFLAGS += -mno-red-zone -fpic -fno-asynchronous-unwind-tables > > # Xen doesn't use SSE interally. If the compiler supports it, also skip the > diff --git a/xen/include/asm-x86/alternative-asm.h > b/xen/include/asm-x86/alternative-asm.h > index 25f79fe..9e46bed 100644 > --- a/xen/include/asm-x86/alternative-asm.h > +++ b/xen/include/asm-x86/alternative-asm.h > @@ -1,6 +1,8 @@ > #ifndef _ASM_X86_ALTERNATIVE_ASM_H_ > #define _ASM_X86_ALTERNATIVE_ASM_H_ > > +#include > + > #ifdef __ASSEMBLY__ > > /* > @@ -18,6 +20,14 @@ > .byte \pad_len > .endm > > +.macro mknops nr_bytes > +#ifdef HAVE_AS_NOP_DIRECTIVE > +.nop \nr_bytes, ASM_NOP_MAX I'm not able to find any online document about the .nop directive, and I cannot really figure out the purpose of the second parameter. Also, after this patch is applied it seems like the padding is not going to be 0x90, because as will already emit optimized nops. Are those nops more optimized than the ones added by the alternatives framework? I would expect the nops added at runtime would be more optimized than the ones added at build time, because the runtime ones could take into account the specific CPU model Xen is running on. Thanks, Roger. ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel