Re: [PATCH v0 1/5] x86_64: march=native support
On 12/8/17, H. Peter Anvinwrote: > One more thing: you HAVE to make > arch/x86/include/asm/required-features.h aware of any features that the > kernel unconditionally depend on. Yes, this is foolprof part I have to think through. > Again, using the gcc cpp macros that reflect what bits gcc itself > broadcast. However, this is perhaps where CONFIG flags become > important, since required-features.h has to be able to be compiled in > the bootcode environment, which is different from the normal kernel > compiler environment. > > We could, however, automagically generate a reflection of these as a > header file: > > echo '#ifndef __LINUX_CC_DEFINES__' > echo '#define __LINUX_CC_DEFINES__' > $(CC) $(c_flags) -x c -E -Wp,-dM /dev/null | sort | \ > sed -nr -e 's/^#define __([^[:space]]+)__$/#define __KERNEL_CC_\1/p' > echo '#endif A lot of then aren't interesting and duplicate each other. Another thing: clang. It detects machine I'm typing this as __corei7__ while gcc does it as __core_avx2__.
Re: [PATCH v0 1/5] x86_64: march=native support
On 12/8/17, H. Peter Anvin wrote: > One more thing: you HAVE to make > arch/x86/include/asm/required-features.h aware of any features that the > kernel unconditionally depend on. Yes, this is foolprof part I have to think through. > Again, using the gcc cpp macros that reflect what bits gcc itself > broadcast. However, this is perhaps where CONFIG flags become > important, since required-features.h has to be able to be compiled in > the bootcode environment, which is different from the normal kernel > compiler environment. > > We could, however, automagically generate a reflection of these as a > header file: > > echo '#ifndef __LINUX_CC_DEFINES__' > echo '#define __LINUX_CC_DEFINES__' > $(CC) $(c_flags) -x c -E -Wp,-dM /dev/null | sort | \ > sed -nr -e 's/^#define __([^[:space]]+)__$/#define __KERNEL_CC_\1/p' > echo '#endif A lot of then aren't interesting and duplicate each other. Another thing: clang. It detects machine I'm typing this as __corei7__ while gcc does it as __core_avx2__.
Re: [PATCH v0 1/5] x86_64: march=native support
One more thing: you HAVE to make arch/x86/include/asm/required-features.h aware of any features that the kernel unconditionally depend on. Again, using the gcc cpp macros that reflect what bits gcc itself broadcast. However, this is perhaps where CONFIG flags become important, since required-features.h has to be able to be compiled in the bootcode environment, which is different from the normal kernel compiler environment. We could, however, automagically generate a reflection of these as a header file: echo '#ifndef __LINUX_CC_DEFINES__' echo '#define __LINUX_CC_DEFINES__' $(CC) $(c_flags) -x c -E -Wp,-dM /dev/null | sort | \ sed -nr -e 's/^#define __([^[:space]]+)__$/#define __KERNEL_CC_\1/p' echo '#endif -hpa
Re: [PATCH v0 1/5] x86_64: march=native support
One more thing: you HAVE to make arch/x86/include/asm/required-features.h aware of any features that the kernel unconditionally depend on. Again, using the gcc cpp macros that reflect what bits gcc itself broadcast. However, this is perhaps where CONFIG flags become important, since required-features.h has to be able to be compiled in the bootcode environment, which is different from the normal kernel compiler environment. We could, however, automagically generate a reflection of these as a header file: echo '#ifndef __LINUX_CC_DEFINES__' echo '#define __LINUX_CC_DEFINES__' $(CC) $(c_flags) -x c -E -Wp,-dM /dev/null | sort | \ sed -nr -e 's/^#define __([^[:space]]+)__$/#define __KERNEL_CC_\1/p' echo '#endif -hpa
[PATCH v0 1/5] x86_64: march=native support
Being Gentoo user part of me died every time I compiled kernel with raw -O2 when userspace was running with "-march=native -O2" for years. This patch implements kernel build with "-march=native", at last. So far resulting kernel is good enough to boot in VM. Benchmarks: No serious benchmarking was done yet. :-( Random microbenchmarking indicates that a) SHLX et al enabled SHA-1 can be ~10% faster than regular one as there are no carry flags dependencies and b) REP STOSB clear_page() can be ~15% faster then REP STOSQ one where fast REP STOSB is advertised. This is actually important because clear_page()/copy_page() are regularly seen on top of kernel profiles. Code size: SHLX et al bloat kernel quite a lot as these new instructions live in extended opcode space. However, this is compensated by telling gcc to use REP STOSB/MOVSB. gcc loves to unroll memset/memcpy to ungodly amounts. These 2 effects roughly compensate each other: shifts and memset/memcpy are everywhere. Regardless, code size in not the objective of this patch, performance is. Support status: x86_64 only (didn't run 386 for a long time) Intel only (never owner AMD box) TODO: foolproof protection SSE2/AVX/AVX2/AVX-512 disabling (-mno-...) .config injection BMI2 for %08x/%016lx faster clear_user() RAID functions (ungodly unrolling, requires lots of courage) BPF JIT and of course more instructions which kernel is forced to ignore because generic kernels If you want to try it out: * make sure this kernel is only used on machine which it is compiled at * grab gcc with "-march=native" support (modern ones have it) * select CONFIG_MARCH_NATIVE in the CPU choice menu * add "unexpected options" to scripts/march-native.sh until checks pass * verify CONFIG_MARCH_NATIVE options in .config, include/config/auto.conf and include/generated/autoconf.h * cross fingers, recompile and reboot Signed-off-by: Alexey Dobriyan--- Makefile | 4 ++ arch/x86/Kconfig.cpu | 8 scripts/kconfig/.gitignore | 1 + scripts/kconfig/Makefile | 9 - scripts/kconfig/cpuid.c| 76 scripts/march-native.sh| 96 ++ 6 files changed, 193 insertions(+), 1 deletion(-) create mode 100644 scripts/kconfig/cpuid.c create mode 100755 scripts/march-native.sh diff --git a/Makefile b/Makefile index 86bb80540cbd..c1cc730b81a8 100644 --- a/Makefile +++ b/Makefile @@ -587,6 +587,10 @@ ifeq ($(dot-config),1) # Read in config -include include/config/auto.conf +ifdef CONFIG_MARCH_NATIVE +KBUILD_CFLAGS += -march=native +endif + ifeq ($(KBUILD_EXTMOD),) # Read in dependencies to all Kconfig* files, make sure to run # oldconfig if changes are detected. diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 4493e8c5d1ea..2e4750b6b891 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -287,6 +287,12 @@ config GENERIC_CPU Generic x86-64 CPU. Run equally well on all x86-64 CPUs. +config MARCH_NATIVE + bool "-march=native" + depends on X86_64 + ---help--- + -march=native support. + endchoice config X86_GENERIC @@ -307,6 +313,7 @@ config X86_INTERNODE_CACHE_SHIFT int default "12" if X86_VSMP default X86_L1_CACHE_SHIFT + depends on !MARCH_NATIVE config X86_L1_CACHE_SHIFT int @@ -314,6 +321,7 @@ config X86_L1_CACHE_SHIFT default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU default "4" if MELAN || M486 || MGEODEGX1 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX + depends on !MARCH_NATIVE config X86_PPRO_FENCE bool "PentiumPro memory ordering errata workaround" diff --git a/scripts/kconfig/.gitignore b/scripts/kconfig/.gitignore index 51f1c877b543..73ebca4b1888 100644 --- a/scripts/kconfig/.gitignore +++ b/scripts/kconfig/.gitignore @@ -14,6 +14,7 @@ gconf.glade.h # configuration programs # conf +cpuid mconf nconf qconf diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile index 297c1bf35140..7b43b66d4efa 100644 --- a/scripts/kconfig/Makefile +++ b/scripts/kconfig/Makefile @@ -21,24 +21,30 @@ unexport CONFIG_ xconfig: $(obj)/qconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid gconfig: $(obj)/gconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid menuconfig: $(obj)/mconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid config: $(obj)/conf $< $(silent) --oldaskconfig $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid
[PATCH v0 1/5] x86_64: march=native support
Being Gentoo user part of me died every time I compiled kernel with raw -O2 when userspace was running with "-march=native -O2" for years. This patch implements kernel build with "-march=native", at last. So far resulting kernel is good enough to boot in VM. Benchmarks: No serious benchmarking was done yet. :-( Random microbenchmarking indicates that a) SHLX et al enabled SHA-1 can be ~10% faster than regular one as there are no carry flags dependencies and b) REP STOSB clear_page() can be ~15% faster then REP STOSQ one where fast REP STOSB is advertised. This is actually important because clear_page()/copy_page() are regularly seen on top of kernel profiles. Code size: SHLX et al bloat kernel quite a lot as these new instructions live in extended opcode space. However, this is compensated by telling gcc to use REP STOSB/MOVSB. gcc loves to unroll memset/memcpy to ungodly amounts. These 2 effects roughly compensate each other: shifts and memset/memcpy are everywhere. Regardless, code size in not the objective of this patch, performance is. Support status: x86_64 only (didn't run 386 for a long time) Intel only (never owner AMD box) TODO: foolproof protection SSE2/AVX/AVX2/AVX-512 disabling (-mno-...) .config injection BMI2 for %08x/%016lx faster clear_user() RAID functions (ungodly unrolling, requires lots of courage) BPF JIT and of course more instructions which kernel is forced to ignore because generic kernels If you want to try it out: * make sure this kernel is only used on machine which it is compiled at * grab gcc with "-march=native" support (modern ones have it) * select CONFIG_MARCH_NATIVE in the CPU choice menu * add "unexpected options" to scripts/march-native.sh until checks pass * verify CONFIG_MARCH_NATIVE options in .config, include/config/auto.conf and include/generated/autoconf.h * cross fingers, recompile and reboot Signed-off-by: Alexey Dobriyan --- Makefile | 4 ++ arch/x86/Kconfig.cpu | 8 scripts/kconfig/.gitignore | 1 + scripts/kconfig/Makefile | 9 - scripts/kconfig/cpuid.c| 76 scripts/march-native.sh| 96 ++ 6 files changed, 193 insertions(+), 1 deletion(-) create mode 100644 scripts/kconfig/cpuid.c create mode 100755 scripts/march-native.sh diff --git a/Makefile b/Makefile index 86bb80540cbd..c1cc730b81a8 100644 --- a/Makefile +++ b/Makefile @@ -587,6 +587,10 @@ ifeq ($(dot-config),1) # Read in config -include include/config/auto.conf +ifdef CONFIG_MARCH_NATIVE +KBUILD_CFLAGS += -march=native +endif + ifeq ($(KBUILD_EXTMOD),) # Read in dependencies to all Kconfig* files, make sure to run # oldconfig if changes are detected. diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu index 4493e8c5d1ea..2e4750b6b891 100644 --- a/arch/x86/Kconfig.cpu +++ b/arch/x86/Kconfig.cpu @@ -287,6 +287,12 @@ config GENERIC_CPU Generic x86-64 CPU. Run equally well on all x86-64 CPUs. +config MARCH_NATIVE + bool "-march=native" + depends on X86_64 + ---help--- + -march=native support. + endchoice config X86_GENERIC @@ -307,6 +313,7 @@ config X86_INTERNODE_CACHE_SHIFT int default "12" if X86_VSMP default X86_L1_CACHE_SHIFT + depends on !MARCH_NATIVE config X86_L1_CACHE_SHIFT int @@ -314,6 +321,7 @@ config X86_L1_CACHE_SHIFT default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU default "4" if MELAN || M486 || MGEODEGX1 default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX + depends on !MARCH_NATIVE config X86_PPRO_FENCE bool "PentiumPro memory ordering errata workaround" diff --git a/scripts/kconfig/.gitignore b/scripts/kconfig/.gitignore index 51f1c877b543..73ebca4b1888 100644 --- a/scripts/kconfig/.gitignore +++ b/scripts/kconfig/.gitignore @@ -14,6 +14,7 @@ gconf.glade.h # configuration programs # conf +cpuid mconf nconf qconf diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile index 297c1bf35140..7b43b66d4efa 100644 --- a/scripts/kconfig/Makefile +++ b/scripts/kconfig/Makefile @@ -21,24 +21,30 @@ unexport CONFIG_ xconfig: $(obj)/qconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid gconfig: $(obj)/gconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid menuconfig: $(obj)/mconf $< $(silent) $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid config: $(obj)/conf $< $(silent) --oldaskconfig $(Kconfig) + $(Q)$(srctree)/scripts/march-native.sh $(CC) $(obj)/cpuid nconfig: