Re: [PATCH 13/17] mm: move numa_distance and related code from x86 to numa_memblks
On 2024-07-16 6:13 AM, Mike Rapoport wrote: > From: "Mike Rapoport (Microsoft)" > > Move code dealing with numa_distance array from arch/x86 to > mm/numa_memblks.c > > This code will be later reused by arch_numa. > > No functional changes. > > Signed-off-by: Mike Rapoport (Microsoft) > --- > arch/x86/mm/numa.c | 101 --- > arch/x86/mm/numa_internal.h | 2 - > include/linux/numa_memblks.h | 4 ++ > {arch/x86/mm => mm}/numa_emulation.c | 0 > mm/numa_memblks.c| 101 +++ > 5 files changed, 105 insertions(+), 103 deletions(-) > rename {arch/x86/mm => mm}/numa_emulation.c (100%) The numa_emulation.c rename looks like it should be part of the next commit, not this one.
[PATCH] powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64
When building a 32-bit kernel, some toolchains do not allow mixing soft float and hard float object files: LD vmlinux.o powerpc64le-unknown-linux-musl-ld: lib/test_fpu_impl.o uses hard float, arch/powerpc/kernel/udbg.o uses soft float powerpc64le-unknown-linux-musl-ld: failed to merge target specific data of file lib/test_fpu_impl.o make[2]: *** [scripts/Makefile.vmlinux_o:62: vmlinux.o] Error 1 make[1]: *** [Makefile:1152: vmlinux_o] Error 2 make: *** [Makefile:240: __sub-make] Error 2 This is not an issue when building a 64-bit kernel. To unbreak the build, limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64-bit kernels. This is okay because the only real user of this option, amdgpu, was previously limited to PPC64 anyway; see commit a28e4b672f04 ("drm/amd/display: use ARCH_HAS_KERNEL_FPU_SUPPORT"). Fixes: 01db473e1aa3 ("powerpc: implement ARCH_HAS_KERNEL_FPU_SUPPORT") Reported-by: kernel test robot Closes: https://lore.kernel.org/oe-kbuild-all/202405250851.z4dayswg-...@intel.com/ Reported-by: Guenter Roeck Closes: https://lore.kernel.org/lkml/eeffaec3-df63-4e55-ab7a-064a65c00...@roeck-us.net/ Signed-off-by: Samuel Holland --- arch/powerpc/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 3c968f2f4ac4..c88c6d46a5bc 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,7 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV - select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC64 && PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU -- 2.44.1
Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Hi Thiago, On 2024-04-10 8:02 PM, Thiago Jung Bauermann wrote: > Samuel Holland writes: >> On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote: >>> >>> Unfortunately this patch causes build failures on arm with allyesconfig >>> and allmodconfig. Tested with next-20240410. > > > >> In both cases, the issue is that the toolchain requires runtime support to >> convert between `unsigned long long` and `double`, even when hardware FP is >> enabled. There was some past discussion about GCC inlining some of these >> conversions[1], but that did not get implemented. > > Thank you for the explanation and the bugzilla reference. I added a > comment there mentioning that the problem came up again with this patch > series. > >> The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` >> for >> 32-bit arm until we can provide these runtime library functions. > > Does this mean that patch 2 in this series: > > [PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT > > will be dropped? No, because later patches in the series (3, 6) depend on the definition of CC_FLAGS_FPU from that patch. I will need to send a fixup patch unless I can find a GPL-2 compatible implementation of the runtime library functions. Regards, Samuel
Re: [PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Hi Thiago, On 2024-04-10 5:21 PM, Thiago Jung Bauermann wrote: > Samuel Holland writes: > >> Now that all previously-supported architectures select >> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead >> of the existing list of architectures. It can also take advantage of the >> common kernel-mode FPU API and method of adjusting CFLAGS. >> >> Acked-by: Alex Deucher >> Reviewed-by: Christoph Hellwig >> Signed-off-by: Samuel Holland > > Unfortunately this patch causes build failures on arm with allyesconfig > and allmodconfig. Tested with next-20240410. > > Error with allyesconfig: > > $ make -j 8 \ > O=$HOME/.cache/builds/linux-cross-arm \ > ARCH=arm \ > CROSS_COMPILE=arm-linux-gnueabihf- > make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm' > ⋮ > arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/dcn20/dcn20_fpu.o: > in function `dcn20_populate_dml_pipes_from_context': > dcn20_fpu.c:(.text+0x20f4): undefined reference to `__aeabi_l2d' > arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x210c): undefined reference to > `__aeabi_l2d' > arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x2124): undefined reference to > `__aeabi_l2d' > arm-linux-gnueabihf-ld: dcn20_fpu.c:(.text+0x213c): undefined reference to > `__aeabi_l2d' > arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o: > in function `pipe_ctx_to_e2e_pipe_params': > dcn_calcs.c:(.text+0x390): undefined reference to `__aeabi_l2d' > arm-linux-gnueabihf-ld: > drivers/gpu/drm/amd/display/dc/dml/calcs/dcn_calcs.o:dcn_calcs.c:(.text+0x3a4): > more undefined references to `__aeabi_l2d' follow > arm-linux-gnueabihf-ld: drivers/gpu/drm/amd/display/dc/dml2/dml2_wrapper.o: > in function `optimize_configuration': > dml2_wrapper.c:(.text+0xcbc): undefined reference to `__aeabi_d2ulz' > arm-linux-gnueabihf-ld: > drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.o: in function > `populate_dml_plane_cfg_from_plane_state': > dml2_translation_helper.c:(.text+0x9e4): undefined reference to `__aeabi_l2d' > arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa20): undefined > reference to `__aeabi_l2d' > arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa58): undefined > reference to `__aeabi_l2d' > arm-linux-gnueabihf-ld: dml2_translation_helper.c:(.text+0xa90): undefined > reference to `__aeabi_l2d' > make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.vmlinux:37: vmlinux] > Error 1 > make[2]: *** [/home/bauermann/src/linux/Makefile:1165: vmlinux] Error 2 > make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2 > make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm' > make: *** [Makefile:240: __sub-make] Error 2 > > The error with allmodconfig is slightly different: > > $ make -j 8 \ > O=$HOME/.cache/builds/linux-cross-arm \ > ARCH=arm \ > CROSS_COMPILE=arm-linux-gnueabihf- > make[1]: Entering directory '/home/bauermann/.cache/builds/linux-cross-arm' > ⋮ > ERROR: modpost: "__aeabi_d2ulz" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] > undefined! > ERROR: modpost: "__aeabi_l2d" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] > undefined! > make[3]: *** [/home/bauermann/src/linux/scripts/Makefile.modpost:145: > Module.symvers] Error 1 > make[2]: *** [/home/bauermann/src/linux/Makefile:1876: modpost] Error 2 > make[1]: *** [/home/bauermann/src/linux/Makefile:240: __sub-make] Error 2 > make[1]: Leaving directory '/home/bauermann/.cache/builds/linux-cross-arm' > make: *** [Makefile:240: __sub-make] Error 2 In both cases, the issue is that the toolchain requires runtime support to convert between `unsigned long long` and `double`, even when hardware FP is enabled. There was some past discussion about GCC inlining some of these conversions[1], but that did not get implemented. The short-term fix would be to drop the `select ARCH_HAS_KERNEL_FPU_SUPPORT` for 32-bit arm until we can provide these runtime library functions. Regards, Samuel [1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91970
Re: [PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
On 2024-03-29 12:28 PM, Dave Hansen wrote: > On 3/29/24 00:18, Samuel Holland wrote: >> +# >> +# CFLAGS for compiling floating point code inside the kernel. >> +# >> +CC_FLAGS_FPU := -msse -msse2 >> +ifdef CONFIG_CC_IS_GCC >> +# Stack alignment mismatch, proceed with caution. >> +# GCC < 7.1 cannot compile code using `double` and >> -mpreferred-stack-boundary=3 >> +# (8B stack alignment). >> +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 >> +# >> +# The "-msse" in the first argument is there so that the >> +# -mpreferred-stack-boundary=3 build error: >> +# >> +# -mpreferred-stack-boundary=3 is not between 4 and 12 >> +# >> +# can be triggered. Otherwise gcc doesn't complain. >> +CC_FLAGS_FPU += -mhard-float >> +CC_FLAGS_FPU += $(call cc-option,-msse >> -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) >> +endif > > I was expecting to see this (now duplicate) hunk come _out_ of > lib/Makefile somewhere in the series. > > Did I miss that, or is there something keeping the duplicate there? This hunk is removed in patch 15/15, after the conversion of lib/test_fpu.c: https://lore.kernel.org/linux-kernel/20240329072441.591471-16-samuel.holl...@sifive.com/ Regards, Samuel
[PATCH v4 15/15] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c63a5fbf1f1c..f93e778e0405 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index fcb35bf50979..e44ad11f77b5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 85963d7be826..eef282a2715f 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include "test_fpu.h" @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.44.0
[PATCH v4 14/15] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Declare test_fpu() in a header lib/Makefile| 3 ++- lib/test_fpu.h | 8 +++ lib/{test_fpu.c => test_fpu_glue.c} | 32 + lib/test_fpu_impl.c | 37 + 4 files changed, 48 insertions(+), 32 deletions(-) create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index ffc6b2341b45..fcb35bf50979 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu.h b/lib/test_fpu.h new file mode 100644 index ..4459807084bc --- /dev/null +++ b/lib/test_fpu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _LIB_TEST_FPU_H +#define _LIB_TEST_FPU_H + +int test_fpu(void); + +#endif diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..85963d7be826 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +#include "test_fpu.h" static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..777894dbbe86 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include "test_fpu.h" + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.44.0
[PATCH v4 13/15] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Acked-by: Alex Deucher Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++ drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 7 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 0de16796466b..e46f8ce41d87 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) -#include -#endif +#include /** * DOC: DC FPU manipulation overview @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 59d3972341d2..a94b6d546cd1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 7b51364084b5..4f6c804a26ad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -24,40 +24,8 @@ # # Makefile for dml2. -ifdef CONFIG_X86 -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml2_ccflags := $(dml2_ccflags-y)
[PATCH v4 12/15] drm/amd/display: Only use hard-float, not altivec on powerpc
From: Michael Ellerman The compiler flags enable altivec, but that is not required; hard-float is sufficient for the code to build and function. Drop altivec from the compiler flags and adjust the enable/disable code to only enable FPU use. Signed-off-by: Michael Ellerman Acked-by: Alex Deucher Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++-- drivers/gpu/drm/amd/display/dc/dml/Makefile| 2 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..0de16796466b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) enable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_begin(); @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) disable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_end(); diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index c4a5efd2dda5..59d3972341d2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec +dml_ccflags := -mhard-float endif ifdef CONFIG_ARM64 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..7b51364084b5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -maltivec +dml2_ccflags := -mhard-float endif ifdef CONFIG_ARM64 -- 2.44.0
[PATCH v4 11/15] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Support is limited to riscv64 because riscv32 requires runtime (libgcc) assistance to convert between doubles and 64-bit integers. Acked-by: Palmer Dabbelt Reviewed-by: Palmer Dabbelt Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v3) Changes in v3: - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Remove RISC-V architecture-specific preprocessor check arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 16 arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 49 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index be09c8836d56..3bcd0d250810 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 252d63942f34..76ff4033c854 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..91c04c244e12 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 81d94a8ee10f..5b243d46f4b1 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)+= unaligned_access_speed.o obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..0ac8348876c4 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.44.0
[PATCH v4 10/15] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 39886bab943a..7c9d032ee675 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -83,6 +83,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 662d9d4033e6..5a5f5999c505 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.44.0
[PATCH v4 09/15] x86/fpu: Fix asm/fpu/types.h include guard
The include guard should match the filename, or it will conflict with the newly-added asm/fpu.h. Signed-off-by: Samuel Holland --- Changes in v4: - New patch for v4 arch/x86/include/asm/fpu/types.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index ace9aa3b78a3..eb17f31b06d2 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -2,8 +2,8 @@ /* * FPU data structures: */ -#ifndef _ASM_X86_FPU_H -#define _ASM_X86_FPU_H +#ifndef _ASM_X86_FPU_TYPES_H +#define _ASM_X86_FPU_TYPES_H #include @@ -596,4 +596,4 @@ struct fpu_state_config { /* FPU state configuration information */ extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg; -#endif /* _ASM_X86_FPU_H */ +#endif /* _ASM_X86_FPU_TYPES_H */ -- 2.44.0
[PATCH v4 08/15] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Acked-by: Michael Ellerman (powerpc) Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..c42a57b6839d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 65261cbe5bfd..93d89f055b70 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/powerpc $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.44.0
[PATCH v4 07/15] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Acked-by: WANG Xuerui Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v3) Changes in v3: - Rebase on v6.9-rc1 arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index a5f300ec6f28..2266c6c41c38 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -18,6 +18,7 @@ config LOONGARCH select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index df6caf79537a..efb5440a43ec 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -26,6 +26,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_UNWINDER_ORC orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h orc_hash_sh := $(srctree)/scripts/orc_hash.sh @@ -59,7 +62,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.44.0
[PATCH v4 06/15] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v4: - Add missed CFLAGS changes for recov_neon_inner.c (fixes arm build failures) lib/raid6/Makefile | 33 ++--- 1 file changed, 10 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 385a94aa0b99..0e88bfe6445b 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,16 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_recov_neon_inner.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_recov_neon_inner.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.44.0
[PATCH v4 05/15] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 arch/arm64/lib/Makefile | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 29490be2546b..13e6a2829116 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o -CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only -CFLAGS_xor-neon.o += -ffreestanding -# Enable -CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include) +CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) endif lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o -- 2.44.0
[PATCH v4 04/15] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 15 +++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b11c98b3e84..67f0d3b5b7df 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0e075d3c546b..3e863e5b0169 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.44.0
[PATCH v4 03/15] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.44.0
[PATCH v4 00/15] Unified cross-architecture kernel-mode FPU API
This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. Previous versions: v3: https://lore.kernel.org/linux-kernel/20240327200157.1097089-1-samuel.holl...@sifive.com/ v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holl...@sifive.com/ v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/ v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/ Changes in v4: - Add missed CFLAGS changes for recov_neon_inner.c (fixes arm build failures) - Fix x86 include guard issue (fixes x86 build failures) Changes in v3: - Rebase on v6.9-rc1 - Limit riscv ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement - Remove file name from header comment - Clean up arch/arm64/lib/Makefile, like for arch/arm - Remove RISC-V architecture-specific preprocessor check - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers - Declare test_fpu() in a header Michael Ellerman (1): drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland (14): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86/fpu: Fix asm/fpu/types.h include guard x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 ++ arch/arm/include/asm/fpu.h| 15 arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 ++- arch/arm64/include/asm/fpu.h | 15 arch/arm64/lib/Makefile | 6 +- arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 +- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 +- arch/powerpc/include/asm/fpu.h| 28 +++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 + arch/riscv/include/asm/fpu.h | 16 arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 +++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 + arch/x86/include/asm/fpu.h| 13 arch/x86/include/asm/fpu/types.h | 6 +- drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 + drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 + include/linux/fpu.h | 12 +++ lib/Kconfig.debug | 2 +- lib/Makefile | 26 +-- lib/raid6/Makefile| 33 +++- lib/test_fpu.h| 8 ++ lib/{test_fpu.c => test_fpu_glue.c} | 37 ++--- lib/test_fpu_impl.c | 37 + 38 files chan
[PATCH v4 02/15] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 15 +++ 3 files changed, 23 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b14aed3a17ab..b1751c2cab87 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -15,6 +15,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index d82908b1b1bb..71afdd98ddf2 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.44.0
[PATCH v4 01/15] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ include/linux/fpu.h | 12 5 files changed, 102 insertions(+) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 include/linux/fpu.h diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst new file mode 100644 index ..a8d0d4b05052 --- /dev/null +++ b/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + +CFLAGS_foo.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + +CC_FLAGS_FPU := -mhard-float + +or:: + +CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +--- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + +This function reports if floating-point code can be used on this CPU or +platform. The value returned by this function is not expected to change +at runtime, so it only needs to be called once, not before every +critical section. + +.. c:function:: void kernel_fpu_begin( void ) +void kernel_fpu_end( void ) + +These functions create a floating-point critical section. It is only +valid to call ``kernel_fpu_begin()`` after a previous call to +``kernel_fpu_available()`` returned ``true``. These functions are only +guaranteed to be callable from (preemptible or non-preemptible) process +context. + +Preemption may be disabled inside critical sections, so their size +should be minimized. They are *not* required to be reentrant. If the +caller expects to nest critical sections, it must implement its own +reference counting. diff --git a/Documentation/core-api/index.rst b/Doc
Re: [PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
On 2024-03-27 4:25 PM, Andrew Morton wrote: > On Wed, 27 Mar 2024 13:00:43 -0700 Samuel Holland > wrote: > >> Now that all previously-supported architectures select >> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead >> of the existing list of architectures. It can also take advantage of the >> common kernel-mode FPU API and method of adjusting CFLAGS. >> >> ... >> >> @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int >> line) >> WARN_ON_ONCE(!in_task()); >> preempt_disable(); >> depth = __this_cpu_inc_return(fpu_recursion_depth); >> - >> if (depth == 1) { >> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> +BUG_ON(!kernel_fpu_available()); >> kernel_fpu_begin(); > > For some reason kernel_fpu_available() was undefined in my x86_64 > allmodconfig build. I just removed the statement. This is because the include guard in asm/fpu.h conflicts with the existing one in asm/fpu/types.h (which doesn't match its filename), so the definition of kernel_fpu_available() is not seen. I can fix up the include guard in asm/fpu/types.h in the next version: diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h index ace9aa3b78a3..75a3910d867a 100644 --- a/arch/x86/include/asm/fpu/types.h +++ b/arch/x86/include/asm/fpu/types.h @@ -2,8 +2,8 @@ /* * FPU data structures: */ -#ifndef _ASM_X86_FPU_H -#define _ASM_X86_FPU_H +#ifndef _ASM_X86_FPU_TYPES_H +#define _ASM_X86_FPU_TYPES_H #include @@ -596,4 +596,4 @@ struct fpu_state_config { /* FPU state configuration information */ extern struct fpu_state_config fpu_kernel_cfg, fpu_user_cfg; -#endif /* _ASM_X86_FPU_H */ +#endif /* _ASM_X86_FPU_TY{ES_H */ Regards, Samuel
[PATCH v3 14/14] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index c63a5fbf1f1c..f93e778e0405 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2890,7 +2890,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index fcb35bf50979..e44ad11f77b5 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -110,31 +110,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 85963d7be826..eef282a2715f 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include "test_fpu.h" @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.43.1
[PATCH v3 13/14] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Declare test_fpu() in a header lib/Makefile| 3 ++- lib/test_fpu.h | 8 +++ lib/{test_fpu.c => test_fpu_glue.c} | 32 + lib/test_fpu_impl.c | 37 + 4 files changed, 48 insertions(+), 32 deletions(-) create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index ffc6b2341b45..fcb35bf50979 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -133,7 +133,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) # Some KUnit files (hooks.o) need to be built-in even when KUnit is a module, # so we can't just use obj-$(CONFIG_KUNIT). diff --git a/lib/test_fpu.h b/lib/test_fpu.h new file mode 100644 index ..4459807084bc --- /dev/null +++ b/lib/test_fpu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _LIB_TEST_FPU_H +#define _LIB_TEST_FPU_H + +int test_fpu(void); + +#endif diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..85963d7be826 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +#include "test_fpu.h" static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..777894dbbe86 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include "test_fpu.h" + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.43.1
[PATCH v3 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Acked-by: Alex Deucher Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++ drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 7 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 0de16796466b..e46f8ce41d87 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) -#include -#endif +#include /** * DOC: DC FPU manipulation overview @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 59d3972341d2..a94b6d546cd1 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 7b51364084b5..4f6c804a26ad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -24,40 +24,8 @@ # # Makefile for dml2. -ifdef CONFIG_X86 -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml2_ccflags := $(dml2_ccflags-y)
[PATCH v3 11/14] drm/amd/display: Only use hard-float, not altivec on powerpc
From: Michael Ellerman The compiler flags enable altivec, but that is not required; hard-float is sufficient for the code to build and function. Drop altivec from the compiler flags and adjust the enable/disable code to only enable FPU use. Signed-off-by: Michael Ellerman Acked-by: Alex Deucher Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++-- drivers/gpu/drm/amd/display/dc/dml/Makefile| 2 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..0de16796466b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) enable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_begin(); @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) disable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_end(); diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index c4a5efd2dda5..59d3972341d2 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec +dml_ccflags := -mhard-float endif ifdef CONFIG_ARM64 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..7b51364084b5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -maltivec +dml2_ccflags := -mhard-float endif ifdef CONFIG_ARM64 -- 2.43.1
[PATCH v3 10/14] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Support is limited to riscv64 because riscv32 requires runtime (libgcc) assistance to convert between doubles and 64-bit integers. Acked-by: Palmer Dabbelt Reviewed-by: Palmer Dabbelt Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v3: - Rebase on v6.9-rc1 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Remove RISC-V architecture-specific preprocessor check arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 16 arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 49 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index be09c8836d56..3bcd0d250810 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if 64BIT && FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MMIOWB diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index 252d63942f34..76ff4033c854 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -84,6 +84,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..91c04c244e12 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 81d94a8ee10f..5b243d46f4b1 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -67,6 +67,7 @@ obj-$(CONFIG_RISCV_MISALIGNED)+= unaligned_access_speed.o obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o obj-$(CONFIG_SMP) += smpboot.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..0ac8348876c4 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.43.1
[PATCH v3 09/14] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 39886bab943a..7c9d032ee675 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -83,6 +83,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 662d9d4033e6..5a5f5999c505 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -74,6 +74,26 @@ KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += --target=$(objtree)/scripts/target.json KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.43.1
[PATCH v3 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Acked-by: Michael Ellerman (powerpc) Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 1c4be3373686..c42a57b6839d 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index 65261cbe5bfd..93d89f055b70 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -153,6 +153,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -174,7 +177,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/powerpc $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.43.1
[PATCH v3 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Acked-by: WANG Xuerui Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v3: - Rebase on v6.9-rc1 arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index a5f300ec6f28..2266c6c41c38 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -18,6 +18,7 @@ config LOONGARCH select ARCH_HAS_CURRENT_STACK_POINTER select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index df6caf79537a..efb5440a43ec 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -26,6 +26,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_UNWINDER_ORC orc_hash_h := arch/$(SRCARCH)/include/generated/asm/orc_hash.h orc_hash_sh := $(srctree)/scripts/orc_hash.sh @@ -59,7 +62,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.43.1
[PATCH v3 06/14] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/raid6/Makefile | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 385a94aa0b99..c71984e04c4d 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.43.1
[PATCH v3 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - New patch for v2 arch/arm64/lib/Makefile | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 29490be2546b..13e6a2829116 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o -CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only -CFLAGS_xor-neon.o += -ffreestanding -# Enable -CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include) +CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) endif lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o -- 2.43.1
[PATCH v3 04/14] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 15 +++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b11c98b3e84..67f0d3b5b7df 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 0e075d3c546b..3e863e5b0169 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.43.1
[PATCH v3 03/14] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.43.1
[PATCH v3 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ include/linux/fpu.h | 12 5 files changed, 102 insertions(+) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 include/linux/fpu.h diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst new file mode 100644 index ..a8d0d4b05052 --- /dev/null +++ b/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + +CFLAGS_foo.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + +CC_FLAGS_FPU := -mhard-float + +or:: + +CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +--- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + +This function reports if floating-point code can be used on this CPU or +platform. The value returned by this function is not expected to change +at runtime, so it only needs to be called once, not before every +critical section. + +.. c:function:: void kernel_fpu_begin( void ) +void kernel_fpu_end( void ) + +These functions create a floating-point critical section. It is only +valid to call ``kernel_fpu_begin()`` after a previous call to +``kernel_fpu_available()`` returned ``true``. These functions are only +guaranteed to be callable from (preemptible or non-preemptible) process +context. + +Preemption may be disabled inside critical sections, so their size +should be minimized. They are *not* required to be reentrant. If the +caller expects to nest critical sections, it must implement its own +reference counting. diff --git a/Documentation/core-api/index.rst b/Doc
[PATCH v3 00/14] Unified cross-architecture kernel-mode FPU API
This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. Previous versions: v2: https://lore.kernel.org/linux-kernel/20231228014220.3562640-1-samuel.holl...@sifive.com/ v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/ v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/ Changes in v3: - Rebase on v6.9-rc1 - Limit ARCH_HAS_KERNEL_FPU_SUPPORT to 64BIT Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement - Remove file name from header comment - Clean up arch/arm64/lib/Makefile, like for arch/arm - Remove RISC-V architecture-specific preprocessor check - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers - Declare test_fpu() in a header Michael Ellerman (1): drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland (13): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 ++ arch/arm/include/asm/fpu.h| 15 arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 ++- arch/arm64/include/asm/fpu.h | 15 arch/arm64/lib/Makefile | 6 +- arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 +- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 +- arch/powerpc/include/asm/fpu.h| 28 +++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 + arch/riscv/include/asm/fpu.h | 16 arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 +++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 + arch/x86/include/asm/fpu.h| 13 drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 + drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 + include/linux/fpu.h | 12 +++ lib/Kconfig.debug | 2 +- lib/Makefile | 26 +-- lib/raid6/Makefile| 31 ++-- lib/test_fpu.h| 8 ++ lib/{test_fpu.c => test_fpu_glue.c} | 37 ++--- lib/test_fpu_impl.c | 37 + 37 files changed, 343 insertions(+), 190 deletions(-) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 arch/arm/include/asm/fpu.h create mode 100644 arch/arm64/include/asm/fpu.h create mode 100644 arch/powerpc/include/asm/fpu.h create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c cre
[PATCH v3 02/14] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v2) Changes in v2: - Remove file name from header comment arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 15 +++ 3 files changed, 23 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index b14aed3a17ab..b1751c2cab87 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -15,6 +15,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index d82908b1b1bb..71afdd98ddf2 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.43.1
Re: [PATCH 1/4] arch: consolidate existing CONFIG_PAGE_SIZE_*KB definitions
On 2024-02-26 10:14 AM, Arnd Bergmann wrote: > From: Arnd Bergmann > > These four architectures define the same Kconfig symbols for configuring > the page size. Move the logic into a common place where it can be shared > with all other architectures. > > Signed-off-by: Arnd Bergmann > --- > arch/Kconfig | 58 +-- > arch/hexagon/Kconfig | 25 +++-- > arch/hexagon/include/asm/page.h | 6 +--- > arch/loongarch/Kconfig| 21 --- > arch/loongarch/include/asm/page.h | 10 +- > arch/mips/Kconfig | 58 +++ > arch/mips/include/asm/page.h | 16 + > arch/sh/include/asm/page.h| 13 +-- > arch/sh/mm/Kconfig| 42 +++--- > 9 files changed, 88 insertions(+), 161 deletions(-) > > diff --git a/arch/Kconfig b/arch/Kconfig > index a5af0edd3eb8..237cea01ed9b 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1078,17 +1078,71 @@ config HAVE_ARCH_COMPAT_MMAP_BASES > and vice-versa 32-bit applications to call 64-bit mmap(). > Required for applications doing different bitness syscalls. > > +config HAVE_PAGE_SIZE_4KB > + bool > + > +config HAVE_PAGE_SIZE_8KB > + bool > + > +config HAVE_PAGE_SIZE_16KB > + bool > + > +config HAVE_PAGE_SIZE_32KB > + bool > + > +config HAVE_PAGE_SIZE_64KB > + bool > + > +config HAVE_PAGE_SIZE_256KB > + bool > + > +choice > + prompt "MMU page size" Should this have some generic help text (at least a warning about compatibility)? > + > +config PAGE_SIZE_4KB > + bool "4KB pages" > + depends on HAVE_PAGE_SIZE_4KB > + > +config PAGE_SIZE_8KB > + bool "8KB pages" > + depends on HAVE_PAGE_SIZE_8KB > + > +config PAGE_SIZE_16KB > + bool "16KB pages" > + depends on HAVE_PAGE_SIZE_16KB > + > +config PAGE_SIZE_32KB > + bool "32KB pages" > + depends on HAVE_PAGE_SIZE_32KB > + > +config PAGE_SIZE_64KB > + bool "64KB pages" > + depends on HAVE_PAGE_SIZE_64KB > + > +config PAGE_SIZE_256KB > + bool "256KB pages" > + depends on HAVE_PAGE_SIZE_256KB > + > +endchoice > + > config PAGE_SIZE_LESS_THAN_64KB > def_bool y > - depends on !ARM64_64K_PAGES > depends on !PAGE_SIZE_64KB > - depends on !PARISC_PAGE_SIZE_64KB > depends on PAGE_SIZE_LESS_THAN_256KB > > config PAGE_SIZE_LESS_THAN_256KB > def_bool y > depends on !PAGE_SIZE_256KB > > +config PAGE_SHIFT > + int > + default 12 if PAGE_SIZE_4KB > + default 13 if PAGE_SIZE_8KB > + default 14 if PAGE_SIZE_16KB > + default 15 if PAGE_SIZE_32KB > + default 16 if PAGE_SIZE_64KB > + default 18 if PAGE_SIZE_256KB > + > # This allows to use a set of generic functions to determine mmap base > # address by giving priority to top-down scheme only if the process > # is not in legacy mode (compat task, unlimited stack size or > diff --git a/arch/hexagon/Kconfig b/arch/hexagon/Kconfig > index a880ee067d2e..aac46ee1a000 100644 > --- a/arch/hexagon/Kconfig > +++ b/arch/hexagon/Kconfig > @@ -8,6 +8,11 @@ config HEXAGON > select ARCH_HAS_SYNC_DMA_FOR_DEVICE > select ARCH_NO_PREEMPT > select DMA_GLOBAL_POOL > + select FRAME_POINTER Looks like a paste error. > + select HAVE_PAGE_SIZE_4KB > + select HAVE_PAGE_SIZE_16KB > + select HAVE_PAGE_SIZE_64KB > + select HAVE_PAGE_SIZE_256KB > # Other pending projects/to-do items. > # select HAVE_REGS_AND_STACK_ACCESS_API > # select HAVE_HW_BREAKPOINT if PERF_EVENTS > @@ -120,26 +125,6 @@ config NR_CPUS > This is purely to save memory - each supported CPU adds > approximately eight kilobytes to the kernel image. > > -choice > - prompt "Kernel page size" > - default PAGE_SIZE_4KB > - help > - Changes the default page size; use with caution. > - > -config PAGE_SIZE_4KB > - bool "4KB" > - > -config PAGE_SIZE_16KB > - bool "16KB" > - > -config PAGE_SIZE_64KB > - bool "64KB" > - > -config PAGE_SIZE_256KB > - bool "256KB" > - > -endchoice > - > source "kernel/Kconfig.hz" > > endmenu > diff --git a/arch/hexagon/include/asm/page.h b/arch/hexagon/include/asm/page.h > index 10f1bc07423c..65c9bac639fa 100644 > --- a/arch/hexagon/include/asm/page.h > +++ b/arch/hexagon/include/asm/page.h > @@ -13,27 +13,22 @@ > /* This is probably not the most graceful way to handle this. */ > > #ifdef CONFIG_PAGE_SIZE_4KB > -#define PAGE_SHIFT 12 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_4KB > #endif > > #ifdef CONFIG_PAGE_SIZE_16KB > -#define PAGE_SHIFT 14 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_16KB > #endif > > #ifdef CONFIG_PAGE_SIZE_64KB > -#define PAGE_SHIFT 16 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_64KB > #endif > > #ifdef CONFIG_PAGE_SIZE_256KB > -#define PAGE_SHIFT 18 > #define HEXAGON_L1_PTE_SIZE __HVM_PDE_S_256KB > #endif > > #ifdef CONFIG_PAGE_SIZE_1MB > -#
Re: [PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
Hi Huacai, On 2024-01-04 3:55 AM, Huacai Chen wrote: > Hi, Samuel, > > On Thu, Dec 28, 2023 at 9:42 AM Samuel Holland > wrote: >> >> LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in >> asm/fpu.h, so it only needs to add kernel_fpu_available() and export >> the CFLAGS adjustments. >> >> Acked-by: WANG Xuerui >> Reviewed-by: Christoph Hellwig >> Signed-off-by: Samuel Holland >> --- >> >> (no changes since v1) >> >> arch/loongarch/Kconfig | 1 + >> arch/loongarch/Makefile | 5 - >> arch/loongarch/include/asm/fpu.h | 1 + >> 3 files changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig >> index ee123820a476..65d4475565b8 100644 >> --- a/arch/loongarch/Kconfig >> +++ b/arch/loongarch/Kconfig >> @@ -15,6 +15,7 @@ config LOONGARCH >> select ARCH_HAS_CPU_FINALIZE_INIT >> select ARCH_HAS_FORTIFY_SOURCE >> select ARCH_HAS_KCOV >> + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU >> select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS >> select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE >> select ARCH_HAS_PTE_SPECIAL >> diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile >> index 4ba8d67ddb09..1afe28feaba5 100644 >> --- a/arch/loongarch/Makefile >> +++ b/arch/loongarch/Makefile >> @@ -25,6 +25,9 @@ endif >> 32bit-emul = elf32loongarch >> 64bit-emul = elf64loongarch >> >> +CC_FLAGS_FPU := -mfpu=64 >> +CC_FLAGS_NO_FPU:= -msoft-float > We will add LoongArch32 support later, maybe it should be -mfpu=32 in > that case, and do other archs have the case that only support FP32? Do you mean that LoongArch32 does not support double-precision FP in hardware? At least both of the consumers in this series use double-precision, so my first thought is that LoongArch32 could not select ARCH_HAS_KERNEL_FPU_SUPPORT. Regards, Samuel
[PATCH v2 14/14] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 4405f81248fb..4596100eeb14 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2918,7 +2918,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index e7cbd54944a2..b9f28558c9bd 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -109,31 +109,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/ diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 85963d7be826..eef282a2715f 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include #include "test_fpu.h" @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.42.0
[PATCH v2 13/14] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Signed-off-by: Samuel Holland --- Changes in v2: - Declare test_fpu() in a header lib/Makefile| 3 ++- lib/test_fpu.h | 8 +++ lib/{test_fpu.c => test_fpu_glue.c} | 32 + lib/test_fpu_impl.c | 37 + 4 files changed, 48 insertions(+), 32 deletions(-) create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..e7cbd54944a2 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -132,7 +132,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/ diff --git a/lib/test_fpu.h b/lib/test_fpu.h new file mode 100644 index ..4459807084bc --- /dev/null +++ b/lib/test_fpu.h @@ -0,0 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ + +#ifndef _LIB_TEST_FPU_H +#define _LIB_TEST_FPU_H + +int test_fpu(void); + +#endif diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..85963d7be826 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +#include "test_fpu.h" static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..777894dbbe86 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +#include "test_fpu.h" + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.42.0
[PATCH v2 12/14] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Signed-off-by: Samuel Holland --- Changes in v2: - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 27 ++ drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 7 insertions(+), 94 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 0de16796466b..e46f8ce41d87 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) -#include -#endif +#include /** * DOC: DC FPU manipulation overview @@ -87,16 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -118,14 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 554c39024a40..be15d366b786 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) ifeq ($(filter y,$(CONFIG_KASAN)$(CONFIG_KCSAN)),y) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 7b51364084b5..4f6c804a26ad 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -24,40 +24,8 @@ # # Makefile for dml2. -ifdef CONFIG_X86 -dml2_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml2_ccflags := $(dml2_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -endif - -ifdef C
[PATCH v2 11/14] drm/amd/display: Only use hard-float, not altivec on powerpc
From: Michael Ellerman The compiler flags enable altivec, but that is not required; hard-float is sufficient for the code to build and function. Drop altivec from the compiler flags and adjust the enable/disable code to only enable FPU use. Signed-off-by: Michael Ellerman Signed-off-by: Samuel Holland --- Changes in v2: - New patch for v2 drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 12 ++-- drivers/gpu/drm/amd/display/dc/dml/Makefile| 2 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 2 +- 3 files changed, 4 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..0de16796466b 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_begin(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) enable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_begin(); @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int line) #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); #elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) disable_kernel_fp(); #elif defined(CONFIG_ARM64) kernel_neon_end(); diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index 6042a5a6a44f..554c39024a40 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -31,7 +31,7 @@ dml_ccflags := $(dml_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec +dml_ccflags := -mhard-float endif ifdef CONFIG_ARM64 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..7b51364084b5 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -30,7 +30,7 @@ dml2_ccflags := $(dml2_ccflags-y) -msse endif ifdef CONFIG_PPC64 -dml2_ccflags := -mhard-float -maltivec +dml2_ccflags := -mhard-float endif ifdef CONFIG_ARM64 -- 2.42.0
[PATCH v2 10/14] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Signed-off-by: Samuel Holland --- Changes in v2: - Remove RISC-V architecture-specific preprocessor check arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 16 arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 49 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 24c1799e2ec4..4d4d1d64ce34 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if FPU select ARCH_HAS_MMIOWB select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index a74be78678eb..2e719c369210 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -81,6 +81,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..91c04c244e12 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index fee22a3d1b53..662c483e338d 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/ obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..0ac8348876c4 --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.42.0
[PATCH v2 09/14] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..1fe7f2d8d017 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -81,6 +81,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1a068de12a56..71576c8dbe79 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -70,6 +70,26 @@ export BITS KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.42.0
[PATCH v2 08/14] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6f105ee4f3cf..e96cb5b7c571 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f19dbaa1d541..91106970a8c1 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -142,6 +142,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU:= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -163,7 +166,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/$(ARCH) $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.42.0
[PATCH v2 07/14] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Acked-by: WANG Xuerui Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index ee123820a476..65d4475565b8 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -15,6 +15,7 @@ config LOONGARCH select ARCH_HAS_CPU_FINALIZE_INIT select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index 4ba8d67ddb09..1afe28feaba5 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -25,6 +25,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_DYNAMIC_FTRACE KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY CC_FLAGS_FTRACE := -fpatchable-function-entry=2 @@ -46,7 +49,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.42.0
[PATCH v2 06/14] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) lib/raid6/Makefile | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 1c5420ff254e..309fea97efc6 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.42.0
[PATCH v2 03/14] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- (no changes since v1) arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.42.0
[PATCH v2 05/14] arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Signed-off-by: Samuel Holland --- Changes in v2: - New patch for v2 arch/arm64/lib/Makefile | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile index 29490be2546b..13e6a2829116 100644 --- a/arch/arm64/lib/Makefile +++ b/arch/arm64/lib/Makefile @@ -7,10 +7,8 @@ lib-y := clear_user.o delay.o copy_from_user.o \ ifeq ($(CONFIG_KERNEL_MODE_NEON), y) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o -CFLAGS_REMOVE_xor-neon.o += -mgeneral-regs-only -CFLAGS_xor-neon.o += -ffreestanding -# Enable -CFLAGS_xor-neon.o += -isystem $(shell $(CC) -print-file-name=include) +CFLAGS_xor-neon.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU) endif lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o -- 2.42.0
[PATCH v2 04/14] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v2: - Remove file name from header comment arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 15 +++ 3 files changed, 24 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..485ac389ac11 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 9a2d3723cd0f..4a65f24c7998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.42.0
[PATCH v2 02/14] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Reviewed-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v2: - Remove file name from header comment arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 15 +++ 3 files changed, 23 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f8567e95f98b..92e21a4a2903 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -14,6 +14,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 5ba42f69f8ce..1dd860dba5f5 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..2ae50bdce59b --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.42.0
[PATCH v2 01/14] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ include/linux/fpu.h | 12 5 files changed, 102 insertions(+) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 include/linux/fpu.h diff --git a/Documentation/core-api/floating-point.rst b/Documentation/core-api/floating-point.rst new file mode 100644 index ..a8d0d4b05052 --- /dev/null +++ b/Documentation/core-api/floating-point.rst @@ -0,0 +1,78 @@ +.. SPDX-License-Identifier: GPL-2.0+ + +Floating-point API +== + +Kernel code is normally prohibited from using floating-point (FP) registers or +instructions, including the C float and double data types. This rule reduces +system call overhead, because the kernel does not need to save and restore the +userspace floating-point register state. + +However, occasionally drivers or library functions may need to include FP code. +This is supported by isolating the functions containing FP code to a separate +translation unit (a separate source file), and saving/restoring the FP register +state around calls to those functions. This creates "critical sections" of +floating-point usage. + +The reason for this isolation is to prevent the compiler from generating code +touching the FP registers outside these critical sections. Compilers sometimes +use FP registers to optimize inlined ``memcpy`` or variable assignment, as +floating-point registers may be wider than general-purpose registers. + +Usability of floating-point code within the kernel is architecture-specific. +Additionally, because a single kernel may be configured to support platforms +both with and without a floating-point unit, FPU availability must be checked +both at build time and at run time. + +Several architectures implement the generic kernel floating-point API from +``linux/fpu.h``, as described below. Some other architectures implement their +own unique APIs, which are documented separately. + +Build-time API +-- + +Floating-point code may be built if the option ``ARCH_HAS_KERNEL_FPU_SUPPORT`` +is enabled. For C code, such code must be placed in a separate file, and that +file must have its compilation flags adjusted using the following pattern:: + +CFLAGS_foo.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_foo.o += $(CC_FLAGS_NO_FPU) + +Architectures are expected to define one or both of these variables in their +top-level Makefile as needed. For example:: + +CC_FLAGS_FPU := -mhard-float + +or:: + +CC_FLAGS_NO_FPU := -msoft-float + +Normal kernel code is assumed to use the equivalent of ``CC_FLAGS_NO_FPU``. + +Runtime API +--- + +The runtime API is provided in ``linux/fpu.h``. This header cannot be included +from files implementing FP code (those with their compilation flags adjusted as +above). Instead, it must be included when defining the FP critical sections. + +.. c:function:: bool kernel_fpu_available( void ) + +This function reports if floating-point code can be used on this CPU or +platform. The value returned by this function is not expected to change +at runtime, so it only needs to be called once, not before every +critical section. + +.. c:function:: void kernel_fpu_begin( void ) +void kernel_fpu_end( void ) + +These functions create a floating-point critical section. It is only +valid to call ``kernel_fpu_begin()`` after a previous call to +``kernel_fpu_available()`` returned ``true``. These functions are only +guaranteed to be callable from (preemptible or non-preemptible) process +context. + +Preemption may be disabled inside critical sections, so their size +should be minimized. They are *not* required to be reentrant. If the +caller expects to nest critical sections, it must implement its own +reference counting. diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 7a3a08d81f11..97
[PATCH v2 00/14] Unified cross-architecture kernel-mode FPU API
This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. Previous versions: v1: https://lore.kernel.org/linux-kernel/20231208055501.2916202-1-samuel.holl...@sifive.com/ v0: https://lore.kernel.org/linux-kernel/20231122030621.3759313-1-samuel.holl...@sifive.com/ Changes in v2: - Add documentation explaining the built-time and runtime APIs - Add a linux/fpu.h header for generic isolation enforcement - Remove file name from header comment - Clean up arch/arm64/lib/Makefile, like for arch/arm - Remove RISC-V architecture-specific preprocessor check - Split altivec removal to a separate patch - Use linux/fpu.h instead of asm/fpu.h in consumers - Declare test_fpu() in a header Michael Ellerman (1): drm/amd/display: Only use hard-float, not altivec on powerpc Samuel Holland (13): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT arm64: crypto: Use CC_FLAGS_FPU for NEON CFLAGS lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Documentation/core-api/floating-point.rst | 78 +++ Documentation/core-api/index.rst | 1 + Makefile | 5 ++ arch/Kconfig | 6 ++ arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 ++ arch/arm/include/asm/fpu.h| 15 arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 ++- arch/arm64/include/asm/fpu.h | 15 arch/arm64/lib/Makefile | 6 +- arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 +- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 +- arch/powerpc/include/asm/fpu.h| 28 +++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 + arch/riscv/include/asm/fpu.h | 16 arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 +++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 + arch/x86/include/asm/fpu.h| 13 drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 35 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 + drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 + include/linux/fpu.h | 12 +++ lib/Kconfig.debug | 2 +- lib/Makefile | 26 +-- lib/raid6/Makefile| 31 ++-- lib/test_fpu.h| 8 ++ lib/{test_fpu.c => test_fpu_glue.c} | 37 ++--- lib/test_fpu_impl.c | 37 + 37 files changed, 343 insertions(+), 190 deletions(-) create mode 100644 Documentation/core-api/floating-point.rst create mode 100644 arch/arm/include/asm/fpu.h create mode 100644 arch/arm64/include/asm/fpu.h create mode 100644 arch/powerpc/include/asm/fpu.h create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c create mode 100644 arch/x86/include/asm/fpu.h create mode 100644 include/linux/fpu.h create mode 100644 lib/test_fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create
Re: [RFC PATCH 10/12] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
On 2023-12-11 6:23 AM, Michael Ellerman wrote: > Hi Samuel, > > Thanks for trying to clean all this up. > > One problem below. > > Samuel Holland writes: >> Now that all previously-supported architectures select >> ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead >> of the existing list of architectures. It can also take advantage of the >> common kernel-mode FPU API and method of adjusting CFLAGS. >> >> Signed-off-by: Samuel Holland > ... >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> index 4ae4720535a5..b64f917174ca 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> @@ -87,20 +78,9 @@ void dc_fpu_begin(const char *function_name, const int >> line) >> WARN_ON_ONCE(!in_task()); >> preempt_disable(); >> depth = __this_cpu_inc_return(fpu_recursion_depth); >> - >> if (depth == 1) { >> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> +BUG_ON(!kernel_fpu_available()); >> kernel_fpu_begin(); >> -#elif defined(CONFIG_PPC64) >> -if (cpu_has_feature(CPU_FTR_VSX_COMP)) >> -enable_kernel_vsx(); >> -else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) >> -enable_kernel_altivec(); > > Note altivec. > >> -else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) >> -enable_kernel_fp(); >> -#elif defined(CONFIG_ARM64) >> -kernel_neon_begin(); >> -#endif >> } >> >> TRACE_DCN_FPU(true, function_name, line, depth); >> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile >> b/drivers/gpu/drm/amd/display/dc/dml/Makefile >> index ea7d60f9a9b4..5aad0f572ba3 100644 >> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile >> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile >> @@ -25,40 +25,8 @@ >> # It provides the general basic services required by other DAL >> # subcomponents. >> >> -ifdef CONFIG_X86 >> -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float >> -dml_ccflags := $(dml_ccflags-y) -msse >> -endif >> - >> -ifdef CONFIG_PPC64 >> -dml_ccflags := -mhard-float -maltivec >> -endif > > And altivec is enabled in the flags there. > > That doesn't match your implementation for powerpc in patch 7, which > only deals with float. > > I suspect the AMD driver actually doesn't need altivec enabled, but I > don't know that for sure. It compiles without it, but I don't have a GPU > to actually test. I've added Timothy on Cc who added the support for > powerpc to the driver originally, hopefully he has a test system. I tested this series on a POWER9 system with an AMD Radeon RX 6400 GPU (which requires this FPU code to initialize), and got functioning graphics output. > Anyway if that's true that it doesn't need altivec we should probably do > a lead-up patch that drops altivec from the AMD driver explicitly, eg. > as below. That makes sense to me. Do you want to provide your Signed-off-by so I can send this patch with your authorship? Regards, Samuel > cheers > > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > index 4ae4720535a5..0de16796466b 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > @@ -92,11 +92,7 @@ void dc_fpu_begin(const char *function_name, const int > line) > #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) > kernel_fpu_begin(); > #elif defined(CONFIG_PPC64) > - if (cpu_has_feature(CPU_FTR_VSX_COMP)) > - enable_kernel_vsx(); > - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) > - enable_kernel_altivec(); > - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) > + if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) > enable_kernel_fp(); > #elif defined(CONFIG_ARM64) > kernel_neon_begin(); > @@ -125,11 +121,7 @@ void dc_fpu_end(const char *function_name, const int > line) > #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) > kernel_fpu_end(); > #elif defined(CONFIG_PPC64) > - if (cpu_has_feature(CPU_FTR_VSX_COMP)) > - disable_kernel_vsx(); > - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) > - disable_kernel_altivec(); > -
Re: [RFC PATCH 09/12] riscv: Add support for kernel-mode FPU
On 2023-12-11 10:11 AM, Christoph Hellwig wrote: >> +#ifdef __riscv_f >> + >> +#define kernel_fpu_begin() \ >> +static_assert(false, "floating-point code must use a separate >> translation unit") >> +#define kernel_fpu_end() kernel_fpu_begin() >> + >> +#else >> + >> +void kernel_fpu_begin(void); >> +void kernel_fpu_end(void); >> + >> +#endif > > I'll assume this is related to trick that places code in a separate > translation unit, but I fail to understand it. Can you add a comment > explaining it? Yes, I can add a comment. Here, __riscv_f refers to RISC-V's F extension for single-precision floating point, which is enabled by CC_FLAGS_FPU.
Re: [RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
On 2023-12-11 10:07 AM, Christoph Hellwig wrote: >> +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) >> +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) >> +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) >> +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) > > Btw, do we even really need the extra variables for compiler flags > to remove? Don't gcc/clang options work so that if you add a > no-prefixed version of the option later it transparently gets removed? Unfortunately, not all of the relevant options can be no-prefixed: $ cat float.c int main(void) { volatile float f = 123.456; return f / 10; } $ aarch64-linux-musl-gcc float.c $ aarch64-linux-musl-gcc -mgeneral-regs-only float.c float.c: In function 'main': float.c:1:33: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types 1 | int main(void) { volatile float f = 123.456; return f / 10; } | ^ float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types 1 | int main(void) { volatile float f = 123.456; return f / 10; } | ~~^~~~ float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types float.c:1:55: error: '-mgeneral-regs-only' is incompatible with the use of floating-point types $ aarch64-linux-musl-gcc -mgeneral-regs-only -mno-general-regs-only float.c aarch64-linux-musl-gcc: error: unrecognized command-line option '-mno-general-regs-only'; did you mean '-mgeneral-regs-only'? $
[RFC PATCH 12/12] selftests/fpu: Allow building on other architectures
Now that ARCH_HAS_KERNEL_FPU_SUPPORT provides a common way to compile and run floating-point code, this test is no longer x86-specific. Signed-off-by: Samuel Holland --- lib/Kconfig.debug | 2 +- lib/Makefile| 25 ++--- lib/test_fpu_glue.c | 5 - 3 files changed, 7 insertions(+), 25 deletions(-) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cc7d53d9dc01..bbab0b054e09 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2933,7 +2933,7 @@ config TEST_FREE_PAGES config TEST_FPU tristate "Test floating point operations in kernel space" - depends on X86 && !KCOV_INSTRUMENT_ALL + depends on ARCH_HAS_KERNEL_FPU_SUPPORT && !KCOV_INSTRUMENT_ALL help Enable this option to add /sys/kernel/debug/selftest_helpers/test_fpu which will trigger a sequence of floating point operations. This is used diff --git a/lib/Makefile b/lib/Makefile index e7cbd54944a2..b9f28558c9bd 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -109,31 +109,10 @@ CFLAGS_test_fprobe.o += $(CC_FLAGS_FTRACE) obj-$(CONFIG_FPROBE_SANITY_TEST) += test_fprobe.o obj-$(CONFIG_TEST_OBJPOOL) += test_objpool.o -# -# CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns -# off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS -# get appended last to CFLAGS and thus override those previous compiler options. -# -FPU_CFLAGS := -msse -msse2 -ifdef CONFIG_CC_IS_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 -# -# The "-msse" in the first argument is there so that the -# -mpreferred-stack-boundary=3 build error: -# -# -mpreferred-stack-boundary=3 is not between 4 and 12 -# -# can be triggered. Otherwise gcc doesn't complain. -FPU_CFLAGS += -mhard-float -FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) -endif - obj-$(CONFIG_TEST_FPU) += test_fpu.o test_fpu-y := test_fpu_glue.o test_fpu_impl.o -CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) +CFLAGS_test_fpu_impl.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_test_fpu_impl.o += $(CC_FLAGS_NO_FPU) obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/ diff --git a/lib/test_fpu_glue.c b/lib/test_fpu_glue.c index 2761b51117b0..2e0b4027a5e3 100644 --- a/lib/test_fpu_glue.c +++ b/lib/test_fpu_glue.c @@ -17,7 +17,7 @@ #include #include #include -#include +#include int test_fpu(void); @@ -38,6 +38,9 @@ static struct dentry *selftest_dir; static int __init test_fpu_init(void) { + if (!kernel_fpu_available()) + return -EINVAL; + selftest_dir = debugfs_create_dir("selftest_helpers", NULL); if (!selftest_dir) return -ENOMEM; -- 2.42.0
[RFC PATCH 10/12] drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT
Now that all previously-supported architectures select ARCH_HAS_KERNEL_FPU_SUPPORT, this code can depend on that symbol instead of the existing list of architectures. It can also take advantage of the common kernel-mode FPU API and method of adjusting CFLAGS. Signed-off-by: Samuel Holland --- drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 33 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 ++- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 ++- 4 files changed, 6 insertions(+), 101 deletions(-) diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 901d1961b739..5fcd4f778dc3 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 4ae4720535a5..b64f917174ca 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -26,16 +26,7 @@ #include "dc_trace.h" -#if defined(CONFIG_X86) -#include -#elif defined(CONFIG_PPC64) -#include -#include -#elif defined(CONFIG_ARM64) -#include -#elif defined(CONFIG_LOONGARCH) #include -#endif /** * DOC: DC FPU manipulation overview @@ -87,20 +78,9 @@ void dc_fpu_begin(const char *function_name, const int line) WARN_ON_ONCE(!in_task()); preempt_disable(); depth = __this_cpu_inc_return(fpu_recursion_depth); - if (depth == 1) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) + BUG_ON(!kernel_fpu_available()); kernel_fpu_begin(); -#elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - enable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - enable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - enable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_begin(); -#endif } TRACE_DCN_FPU(true, function_name, line, depth); @@ -122,18 +102,7 @@ void dc_fpu_end(const char *function_name, const int line) depth = __this_cpu_dec_return(fpu_recursion_depth); if (depth == 0) { -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); -#elif defined(CONFIG_PPC64) - if (cpu_has_feature(CPU_FTR_VSX_COMP)) - disable_kernel_vsx(); - else if (cpu_has_feature(CPU_FTR_ALTIVEC_COMP)) - disable_kernel_altivec(); - else if (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) - disable_kernel_fp(); -#elif defined(CONFIG_ARM64) - kernel_neon_end(); -#endif } else { WARN_ON_ONCE(depth < 0); } diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile b/drivers/gpu/drm/amd/display/dc/dml/Makefile index ea7d60f9a9b4..5aad0f572ba3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile @@ -25,40 +25,8 @@ # It provides the general basic services required by other DAL # subcomponents. -ifdef CONFIG_X86 -dml_ccflags-$(CONFIG_CC_IS_GCC) := -mhard-float -dml_ccflags := $(dml_ccflags-y) -msse -endif - -ifdef CONFIG_PPC64 -dml_ccflags := -mhard-float -maltivec -endif - -ifdef CONFIG_ARM64 -dml_rcflags := -mgeneral-regs-only -endif - -ifdef CONFIG_LOONGARCH -dml_ccflags := -mfpu=64 -dml_rcflags := -msoft-float -endif - -ifdef CONFIG_CC_IS_GCC -ifneq ($(call gcc-min-version, 70100),y) -IS_OLD_GCC = 1 -endif -endif - -ifdef CONFIG_X86 -ifdef IS_OLD_GCC -# Stack alignment mismatch, proceed with caution. -# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 -# (8B stack alignment). -dml_ccflags += -mpreferred-stack-boundary=4 -else -dml_ccflags += -msse2 -endif -endif +dml_ccflags := $(CC_FLAGS_FPU) +dml_rcflags := $(CC_FLAGS_NO_FPU) ifneq ($(CONFIG_FRAME_WARN),0) frame_warn_flag := -Wframe-larger-than=2048 diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index acff3449b8d7..4f6c804a26ad 100644 --- a/drivers/gp
[RFC PATCH 11/12] selftests/fpu: Move FP code to a separate translation unit
This ensures no compiler-generated floating-point code can appear outside kernel_fpu_{begin,end}() sections, and some architectures enforce this separation. Signed-off-by: Samuel Holland --- lib/Makefile| 3 ++- lib/{test_fpu.c => test_fpu_glue.c} | 32 +- lib/test_fpu_impl.c | 35 + 3 files changed, 38 insertions(+), 32 deletions(-) rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..e7cbd54944a2 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -132,7 +132,8 @@ FPU_CFLAGS += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-st endif obj-$(CONFIG_TEST_FPU) += test_fpu.o -CFLAGS_test_fpu.o += $(FPU_CFLAGS) +test_fpu-y := test_fpu_glue.o test_fpu_impl.o +CFLAGS_test_fpu_impl.o += $(FPU_CFLAGS) obj-$(CONFIG_TEST_LIVEPATCH) += livepatch/ diff --git a/lib/test_fpu.c b/lib/test_fpu_glue.c similarity index 71% rename from lib/test_fpu.c rename to lib/test_fpu_glue.c index e82db19fed84..2761b51117b0 100644 --- a/lib/test_fpu.c +++ b/lib/test_fpu_glue.c @@ -19,37 +19,7 @@ #include #include -static int test_fpu(void) -{ - /* -* This sequence of operations tests that rounding mode is -* to nearest and that denormal numbers are supported. -* Volatile variables are used to avoid compiler optimizing -* the calculations away. -*/ - volatile double a, b, c, d, e, f, g; - - a = 4.0; - b = 1e-15; - c = 1e-310; - - /* Sets precision flag */ - d = a + b; - - /* Result depends on rounding mode */ - e = a + b / 2; - - /* Denormal and very large values */ - f = b / c; - - /* Depends on denormal support */ - g = a + c * f; - - if (d > a && e > a && g > a) - return 0; - else - return -EINVAL; -} +int test_fpu(void); static int test_fpu_get(void *data, u64 *val) { diff --git a/lib/test_fpu_impl.c b/lib/test_fpu_impl.c new file mode 100644 index ..2ff01980bc22 --- /dev/null +++ b/lib/test_fpu_impl.c @@ -0,0 +1,35 @@ +// SPDX-License-Identifier: GPL-2.0+ + +#include + +int test_fpu(void) +{ + /* +* This sequence of operations tests that rounding mode is +* to nearest and that denormal numbers are supported. +* Volatile variables are used to avoid compiler optimizing +* the calculations away. +*/ + volatile double a, b, c, d, e, f, g; + + a = 4.0; + b = 1e-15; + c = 1e-310; + + /* Sets precision flag */ + d = a + b; + + /* Result depends on rounding mode */ + e = a + b / 2; + + /* Denormal and very large values */ + f = b / c; + + /* Depends on denormal support */ + g = a + c * f; + + if (d > a && e > a && g > a) + return 0; + else + return -EINVAL; +} -- 2.42.0
[RFC PATCH 09/12] riscv: Add support for kernel-mode FPU
This is motivated by the amdgpu DRM driver, which needs floating-point code to support recent hardware. That code is not performance-critical, so only provide a minimal non-preemptible implementation for now. Use a similar trick as ARM to force placing floating-point code in a separate translation unit, so it is not possible for compiler-generated floating-point code to appear outside kernel_fpu_{begin,end}(). Signed-off-by: Samuel Holland --- arch/riscv/Kconfig | 1 + arch/riscv/Makefile | 3 +++ arch/riscv/include/asm/fpu.h| 26 ++ arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 5 files changed, 59 insertions(+) create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 95a2a06acc6a..cf0967928e6d 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -27,6 +27,7 @@ config RISCV select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if FPU select ARCH_HAS_MMIOWB select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PMEM_API diff --git a/arch/riscv/Makefile b/arch/riscv/Makefile index a74be78678eb..2e719c369210 100644 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -81,6 +81,9 @@ KBUILD_CFLAGS += -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64i KBUILD_AFLAGS += -march=$(riscv-march-y) +# For C code built with floating-point support, exclude V but keep F and D. +CC_FLAGS_FPU := -march=$(shell echo $(riscv-march-y) | sed -E 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') + KBUILD_CFLAGS += -mno-save-restore KBUILD_CFLAGS += -DCONFIG_PAGE_OFFSET=$(CONFIG_PAGE_OFFSET) diff --git a/arch/riscv/include/asm/fpu.h b/arch/riscv/include/asm/fpu.h new file mode 100644 index ..8cd027acc015 --- /dev/null +++ b/arch/riscv/include/asm/fpu.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_RISCV_FPU_H +#define _ASM_RISCV_FPU_H + +#include + +#define kernel_fpu_available() has_fpu() + +#ifdef __riscv_f + +#define kernel_fpu_begin() \ + static_assert(false, "floating-point code must use a separate translation unit") +#define kernel_fpu_end() kernel_fpu_begin() + +#else + +void kernel_fpu_begin(void); +void kernel_fpu_end(void); + +#endif + +#endif /* ! _ASM_RISCV_FPU_H */ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index fee22a3d1b53..662c483e338d 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_MMU) += vdso.o vdso/ obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o obj-$(CONFIG_FPU) += fpu.o +obj-$(CONFIG_FPU) += kernel_mode_fpu.o obj-$(CONFIG_RISCV_ISA_V) += vector.o obj-$(CONFIG_SMP) += smpboot.o obj-$(CONFIG_SMP) += smp.o diff --git a/arch/riscv/kernel/kernel_mode_fpu.c b/arch/riscv/kernel/kernel_mode_fpu.c new file mode 100644 index ..9b2024cc056b --- /dev/null +++ b/arch/riscv/kernel/kernel_mode_fpu.c @@ -0,0 +1,28 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2023 SiFive + */ + +#include +#include + +#include +#include +#include +#include + +void kernel_fpu_begin(void) +{ + preempt_disable(); + fstate_save(current, task_pt_regs(current)); + csr_set(CSR_SSTATUS, SR_FS); +} +EXPORT_SYMBOL_GPL(kernel_fpu_begin); + +void kernel_fpu_end(void) +{ + csr_clear(CSR_SSTATUS, SR_FS); + fstate_restore(current, task_pt_regs(current)); + preempt_enable(); +} +EXPORT_SYMBOL_GPL(kernel_fpu_end); -- 2.42.0
[RFC PATCH 08/12] x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
x86 already provides kernel_fpu_begin() and kernel_fpu_end(), but in a different header. Add a wrapper header, and export the CFLAGS adjustments as found in lib/Makefile. Signed-off-by: Samuel Holland --- arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 arch/x86/include/asm/fpu.h | 13 + 3 files changed, 34 insertions(+) create mode 100644 arch/x86/include/asm/fpu.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 3762f41bb092..1fe7f2d8d017 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -81,6 +81,7 @@ config X86 select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOVif X86_64 + select ARCH_HAS_KERNEL_FPU_SUPPORT select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/Makefile b/arch/x86/Makefile index 1a068de12a56..71576c8dbe79 100644 --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -70,6 +70,26 @@ export BITS KBUILD_CFLAGS += -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx KBUILD_RUSTFLAGS += -Ctarget-feature=-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-avx,-avx2 +# +# CFLAGS for compiling floating point code inside the kernel. +# +CC_FLAGS_FPU := -msse -msse2 +ifdef CONFIG_CC_IS_GCC +# Stack alignment mismatch, proceed with caution. +# GCC < 7.1 cannot compile code using `double` and -mpreferred-stack-boundary=3 +# (8B stack alignment). +# See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53383 +# +# The "-msse" in the first argument is there so that the +# -mpreferred-stack-boundary=3 build error: +# +# -mpreferred-stack-boundary=3 is not between 4 and 12 +# +# can be triggered. Otherwise gcc doesn't complain. +CC_FLAGS_FPU += -mhard-float +CC_FLAGS_FPU += $(call cc-option,-msse -mpreferred-stack-boundary=3,-mpreferred-stack-boundary=4) +endif + ifeq ($(CONFIG_X86_KERNEL_IBT),y) # # Kernel IBT has S_CET.NOTRACK_EN=0, as such the compilers must not generate diff --git a/arch/x86/include/asm/fpu.h b/arch/x86/include/asm/fpu.h new file mode 100644 index ..b2743fe19339 --- /dev/null +++ b/arch/x86/include/asm/fpu.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_X86_FPU_H +#define _ASM_X86_FPU_H + +#include + +#define kernel_fpu_available() true + +#endif /* ! _ASM_X86_FPU_H */ -- 2.42.0
[RFC PATCH 07/12] powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
PowerPC provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. The PowerPC API also requires a non-preemptible context. Add a wrapper header, and export the CFLAGS adjustments. Signed-off-by: Samuel Holland --- arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 - arch/powerpc/include/asm/fpu.h | 28 3 files changed, 33 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/include/asm/fpu.h diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 6f105ee4f3cf..e96cb5b7c571 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_HUGEPD if HUGETLB_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if PPC_FPU select ARCH_HAS_MEMBARRIER_CALLBACKS select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_MEMREMAP_COMPAT_ALIGN if PPC_64S_HASH_MMU diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile index f19dbaa1d541..2d5f21baf6ff 100644 --- a/arch/powerpc/Makefile +++ b/arch/powerpc/Makefile @@ -142,6 +142,9 @@ CFLAGS-$(CONFIG_PPC32) += $(call cc-option, $(MULTIPLEWORD)) CFLAGS-$(CONFIG_PPC32) += $(call cc-option,-mno-readonly-in-sdata) +CC_FLAGS_FPU := $(call cc-option,-mhard-float) +CC_FLAGS_NO_FPU+= $(call cc-option,-msoft-float) + ifdef CONFIG_FUNCTION_TRACER ifdef CONFIG_ARCH_USING_PATCHABLE_FUNCTION_ENTRY KBUILD_CPPFLAGS+= -DCC_USING_PATCHABLE_FUNCTION_ENTRY @@ -163,7 +166,7 @@ asinstr := $(call as-instr,lis 9$(comma)foo@high,-DHAVE_AS_ATHIGH=1) KBUILD_CPPFLAGS+= -I $(srctree)/arch/$(ARCH) $(asinstr) KBUILD_AFLAGS += $(AFLAGS-y) -KBUILD_CFLAGS += $(call cc-option,-msoft-float) +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) KBUILD_CFLAGS += $(CFLAGS-y) CPP= $(CC) -E $(KBUILD_CFLAGS) diff --git a/arch/powerpc/include/asm/fpu.h b/arch/powerpc/include/asm/fpu.h new file mode 100644 index ..ca584e4bc40f --- /dev/null +++ b/arch/powerpc/include/asm/fpu.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2023 SiFive + */ + +#ifndef _ASM_POWERPC_FPU_H +#define _ASM_POWERPC_FPU_H + +#include + +#include +#include + +#define kernel_fpu_available() (!cpu_has_feature(CPU_FTR_FPU_UNAVAILABLE)) + +static inline void kernel_fpu_begin(void) +{ + preempt_disable(); + enable_kernel_fp(); +} + +static inline void kernel_fpu_end(void) +{ + disable_kernel_fp(); + preempt_enable(); +} + +#endif /* ! _ASM_POWERPC_FPU_H */ -- 2.42.0
[RFC PATCH 05/12] lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Signed-off-by: Samuel Holland --- lib/raid6/Makefile | 31 --- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 1c5420ff254e..309fea97efc6 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -33,25 +33,6 @@ CFLAGS_REMOVE_vpermxor8.o += -msoft-float endif endif -# The GCC option -ffreestanding is required in order to compile code containing -# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) -ifeq ($(CONFIG_KERNEL_MODE_NEON),y) -NEON_FLAGS := -ffreestanding -# Enable -NEON_FLAGS += -isystem $(shell $(CC) -print-file-name=include) -ifeq ($(ARCH),arm) -NEON_FLAGS += -march=armv7-a -mfloat-abi=softfp -mfpu=neon -endif -CFLAGS_recov_neon_inner.o += $(NEON_FLAGS) -ifeq ($(ARCH),arm64) -CFLAGS_REMOVE_recov_neon_inner.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon1.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon2.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon4.o += -mgeneral-regs-only -CFLAGS_REMOVE_neon8.o += -mgeneral-regs-only -endif -endif - quiet_cmd_unroll = UNROLL $@ cmd_unroll = $(AWK) -v N=$* -f $(srctree)/$(src)/unroll.awk < $< > $@ @@ -75,10 +56,14 @@ targets += vpermxor1.c vpermxor2.c vpermxor4.c vpermxor8.c $(obj)/vpermxor%.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -CFLAGS_neon1.o += $(NEON_FLAGS) -CFLAGS_neon2.o += $(NEON_FLAGS) -CFLAGS_neon4.o += $(NEON_FLAGS) -CFLAGS_neon8.o += $(NEON_FLAGS) +CFLAGS_neon1.o += $(CC_FLAGS_FPU) +CFLAGS_neon2.o += $(CC_FLAGS_FPU) +CFLAGS_neon4.o += $(CC_FLAGS_FPU) +CFLAGS_neon8.o += $(CC_FLAGS_FPU) +CFLAGS_REMOVE_neon1.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon2.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon4.o += $(CC_FLAGS_NO_FPU) +CFLAGS_REMOVE_neon8.o += $(CC_FLAGS_NO_FPU) targets += neon1.c neon2.c neon4.c neon8.c $(obj)/neon%.c: $(src)/neon.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) -- 2.42.0
[RFC PATCH 06/12] LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
LoongArch already provides kernel_fpu_begin() and kernel_fpu_end() in asm/fpu.h, so it only needs to add kernel_fpu_available() and export the CFLAGS adjustments. Signed-off-by: Samuel Holland --- arch/loongarch/Kconfig | 1 + arch/loongarch/Makefile | 5 - arch/loongarch/include/asm/fpu.h | 1 + 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/Kconfig b/arch/loongarch/Kconfig index ee123820a476..65d4475565b8 100644 --- a/arch/loongarch/Kconfig +++ b/arch/loongarch/Kconfig @@ -15,6 +15,7 @@ config LOONGARCH select ARCH_HAS_CPU_FINALIZE_INIT select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if CPU_HAS_FPU select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL diff --git a/arch/loongarch/Makefile b/arch/loongarch/Makefile index 204b94b2e6aa..f5c4f7e921db 100644 --- a/arch/loongarch/Makefile +++ b/arch/loongarch/Makefile @@ -25,6 +25,9 @@ endif 32bit-emul = elf32loongarch 64bit-emul = elf64loongarch +CC_FLAGS_FPU := -mfpu=64 +CC_FLAGS_NO_FPU:= -msoft-float + ifdef CONFIG_DYNAMIC_FTRACE KBUILD_CPPFLAGS += -DCC_USING_PATCHABLE_FUNCTION_ENTRY CC_FLAGS_FTRACE := -fpatchable-function-entry=2 @@ -46,7 +49,7 @@ ld-emul = $(64bit-emul) cflags-y += -mabi=lp64s endif -cflags-y += -pipe -msoft-float +cflags-y += -pipe $(CC_FLAGS_NO_FPU) LDFLAGS_vmlinux+= -static -n -nostdlib # When the assembler supports explicit relocation hint, we must use it. diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h index c2d8962fda00..3177674228f8 100644 --- a/arch/loongarch/include/asm/fpu.h +++ b/arch/loongarch/include/asm/fpu.h @@ -21,6 +21,7 @@ struct sigcontext; +#define kernel_fpu_available() cpu_has_fpu extern void kernel_fpu_begin(void); extern void kernel_fpu_end(void); -- 2.42.0
[RFC PATCH 04/12] arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
arm64 provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Signed-off-by: Samuel Holland --- arch/arm64/Kconfig | 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 17 + 3 files changed, 26 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/fpu.h diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..485ac389ac11 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -30,6 +30,7 @@ config ARM64 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_GIGANTIC_PAGE select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_KEEPINITRD select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 9a2d3723cd0f..4a65f24c7998 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -36,7 +36,14 @@ ifeq ($(CONFIG_BROKEN_GAS_INST),y) $(warning Detected assembler with broken .inst; disassembly will be unreliable) endif -KBUILD_CFLAGS += -mgeneral-regs-only \ +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_NO_FPU:= -mgeneral-regs-only + +KBUILD_CFLAGS += $(CC_FLAGS_NO_FPU) \ $(compat_vdso) $(cc_has_k_constraint) KBUILD_CFLAGS += $(call cc-disable-warning, psabi) KBUILD_AFLAGS += $(compat_vdso) diff --git a/arch/arm64/include/asm/fpu.h b/arch/arm64/include/asm/fpu.h new file mode 100644 index ..664c0a192ab1 --- /dev/null +++ b/arch/arm64/include/asm/fpu.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * linux/arch/arm64/include/asm/fpu.h + * + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.42.0
[RFC PATCH 03/12] ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS
Now that CC_FLAGS_FPU is exported and can be used anywhere in the source tree, use it instead of duplicating the flags here. Signed-off-by: Samuel Holland --- arch/arm/lib/Makefile | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile index 650404be6768..0ca5aae1bcc3 100644 --- a/arch/arm/lib/Makefile +++ b/arch/arm/lib/Makefile @@ -40,8 +40,7 @@ $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S ifeq ($(CONFIG_KERNEL_MODE_NEON),y) - NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp -mfpu=neon - CFLAGS_xor-neon.o+= $(NEON_FLAGS) + CFLAGS_xor-neon.o+= $(CC_FLAGS_FPU) obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o endif -- 2.42.0
[RFC PATCH 02/12] ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT
ARM provides an equivalent to the common kernel-mode FPU API, but in a different header and using different function names. Add a wrapper header, and export CFLAGS adjustments as found in lib/raid6/Makefile. Signed-off-by: Samuel Holland --- arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 +++ arch/arm/include/asm/fpu.h | 17 + 3 files changed, 25 insertions(+) create mode 100644 arch/arm/include/asm/fpu.h diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f8567e95f98b..92e21a4a2903 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -14,6 +14,7 @@ config ARM select ARCH_HAS_FORTIFY_SOURCE select ARCH_HAS_KEEPINITRD select ARCH_HAS_KCOV + select ARCH_HAS_KERNEL_FPU_SUPPORT if KERNEL_MODE_NEON select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE select ARCH_HAS_PTE_SPECIAL if ARM_LPAE diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 5ba42f69f8ce..1dd860dba5f5 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -130,6 +130,13 @@ endif # Accept old syntax despite ".syntax unified" AFLAGS_NOWARN :=$(call as-option,-Wa$(comma)-mno-warn-deprecated,-Wa$(comma)-W) +# The GCC option -ffreestanding is required in order to compile code containing +# ARM/NEON intrinsics in a non C99-compliant environment (such as the kernel) +CC_FLAGS_FPU := -ffreestanding +# Enable +CC_FLAGS_FPU += -isystem $(shell $(CC) -print-file-name=include) +CC_FLAGS_FPU += -march=armv7-a -mfloat-abi=softfp -mfpu=neon + ifeq ($(CONFIG_THUMB2_KERNEL),y) CFLAGS_ISA :=-Wa,-mimplicit-it=always $(AFLAGS_NOWARN) AFLAGS_ISA :=$(CFLAGS_ISA) -Wa$(comma)-mthumb diff --git a/arch/arm/include/asm/fpu.h b/arch/arm/include/asm/fpu.h new file mode 100644 index ..d01ca06e700a --- /dev/null +++ b/arch/arm/include/asm/fpu.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * linux/arch/arm/include/asm/fpu.h + * + * Copyright (C) 2023 SiFive + */ + +#ifndef __ASM_FPU_H +#define __ASM_FPU_H + +#include + +#define kernel_fpu_available() cpu_has_neon() +#define kernel_fpu_begin() kernel_neon_begin() +#define kernel_fpu_end() kernel_neon_end() + +#endif /* ! __ASM_FPU_H */ -- 2.42.0
[RFC PATCH 00/12] Unified cross-architecture kernel-mode FPU API
This series supersedes my earier RISC-V specific series[1]. This series unifies the kernel-mode FPU API across several architectures by wrapping the existing functions (where needed) in consistently-named functions placed in a consistent header location, with mostly the same semantics: they can be called from preemptible or non-preemptible task context, and are not assumed to be reentrant. Architectures are also expected to provide CFLAGS adjustments for compiling FPU-dependent code. For the moment, SIMD/vector units are out of scope for this common API. This allows us to remove the ifdeffery and duplicated Makefile logic at each FPU user. It then implements the common API on RISC-V, and converts a couple of users to the new API: the AMDGPU DRM driver, and the FPU self test. The underlying goal of this series is to allow using newer AMD GPUs (e.g. Navi) on RISC-V boards such as SiFive's HiFive Unmatched. Those GPUs need CONFIG_DRM_AMD_DC_FP to initialize, which requires kernel-mode FPU support. [1]: https://lore.kernel.org/linux-riscv/20231122030621.3759313-1-samuel.holl...@sifive.com/ Samuel Holland (12): arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT ARM: Implement ARCH_HAS_KERNEL_FPU_SUPPORT ARM: crypto: Use CC_FLAGS_FPU for NEON CFLAGS arm64: Implement ARCH_HAS_KERNEL_FPU_SUPPORT lib/raid6: Use CC_FLAGS_FPU for NEON CFLAGS LoongArch: Implement ARCH_HAS_KERNEL_FPU_SUPPORT powerpc: Implement ARCH_HAS_KERNEL_FPU_SUPPORT x86: Implement ARCH_HAS_KERNEL_FPU_SUPPORT riscv: Add support for kernel-mode FPU drm/amd/display: Use ARCH_HAS_KERNEL_FPU_SUPPORT selftests/fpu: Move FP code to a separate translation unit selftests/fpu: Allow building on other architectures Makefile | 4 ++ arch/Kconfig | 9 + arch/arm/Kconfig | 1 + arch/arm/Makefile | 7 arch/arm/include/asm/fpu.h| 17 + arch/arm/lib/Makefile | 3 +- arch/arm64/Kconfig| 1 + arch/arm64/Makefile | 9 - arch/arm64/include/asm/fpu.h | 17 + arch/loongarch/Kconfig| 1 + arch/loongarch/Makefile | 5 ++- arch/loongarch/include/asm/fpu.h | 1 + arch/powerpc/Kconfig | 1 + arch/powerpc/Makefile | 5 ++- arch/powerpc/include/asm/fpu.h| 28 ++ arch/riscv/Kconfig| 1 + arch/riscv/Makefile | 3 ++ arch/riscv/include/asm/fpu.h | 26 + arch/riscv/kernel/Makefile| 1 + arch/riscv/kernel/kernel_mode_fpu.c | 28 ++ arch/x86/Kconfig | 1 + arch/x86/Makefile | 20 ++ arch/x86/include/asm/fpu.h| 13 +++ drivers/gpu/drm/amd/display/Kconfig | 2 +- .../gpu/drm/amd/display/amdgpu_dm/dc_fpu.c| 33 + drivers/gpu/drm/amd/display/dc/dml/Makefile | 36 +- drivers/gpu/drm/amd/display/dc/dml2/Makefile | 36 +- lib/Kconfig.debug | 2 +- lib/Makefile | 26 ++--- lib/raid6/Makefile| 31 lib/{test_fpu.c => test_fpu_glue.c} | 37 +++ lib/test_fpu_impl.c | 35 ++ 32 files changed, 255 insertions(+), 185 deletions(-) create mode 100644 arch/arm/include/asm/fpu.h create mode 100644 arch/arm64/include/asm/fpu.h create mode 100644 arch/powerpc/include/asm/fpu.h create mode 100644 arch/riscv/include/asm/fpu.h create mode 100644 arch/riscv/kernel/kernel_mode_fpu.c create mode 100644 arch/x86/include/asm/fpu.h rename lib/{test_fpu.c => test_fpu_glue.c} (71%) create mode 100644 lib/test_fpu_impl.c -- 2.42.0
[RFC PATCH 01/12] arch: Add ARCH_HAS_KERNEL_FPU_SUPPORT
Several architectures provide an API to enable the FPU and run floating-point SIMD code in kernel space. However, the function names, header locations, and semantics are inconsistent across architectures, and FPU support may be gated behind other Kconfig options. Provide a standard way for architectures to declare that kernel space FPU support is available. Architectures selecting this option must implement what is currently the most common API (kernel_fpu_begin() and kernel_fpu_end(), plus a new function kernel_fpu_available()) and provide the appropriate CFLAGS for compiling floating-point C code. Suggested-by: Christoph Hellwig Signed-off-by: Samuel Holland --- Makefile | 4 arch/Kconfig | 9 + 2 files changed, 13 insertions(+) diff --git a/Makefile b/Makefile index 511b5616aa41..e65c186cf2c9 100644 --- a/Makefile +++ b/Makefile @@ -969,6 +969,10 @@ KBUILD_CFLAGS += $(CC_FLAGS_CFI) export CC_FLAGS_CFI endif +# Architectures can define flags to add/remove for floating-point support +export CC_FLAGS_FPU +export CC_FLAGS_NO_FPU + ifneq ($(CONFIG_FUNCTION_ALIGNMENT),0) KBUILD_CFLAGS += -falign-functions=$(CONFIG_FUNCTION_ALIGNMENT) endif diff --git a/arch/Kconfig b/arch/Kconfig index f4b210ab0612..6df834e18e9c 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1478,6 +1478,15 @@ config ARCH_HAS_NONLEAF_PMD_YOUNG address translations. Page table walkers that clear the accessed bit may use this capability to reduce their search space. +config ARCH_HAS_KERNEL_FPU_SUPPORT + bool + help + An architecture should select this option if it supports running + floating-point code in kernel space. It must export the functions + kernel_fpu_available(), kernel_fpu_begin(), and kernel_fpu_end() from + , and define CC_FLAGS_FPU and/or CC_FLAGS_NO_FPU as + necessary in its Makefile. + source "kernel/gcov/Kconfig" source "scripts/gcc-plugins/Kconfig" -- 2.42.0
Re: [PATCH 3/3] drm/amd/display: Support DRM_AMD_DC_FP on RISC-V
Hi Christoph, On 2023-11-22 2:40 AM, Christoph Hellwig wrote: >> -select DRM_AMD_DC_FP if (X86 || LOONGARCH || (PPC64 && ALTIVEC) || >> (ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG)) >> +select DRM_AMD_DC_FP if ARM64 && KERNEL_MODE_NEON && !CC_IS_CLANG >> +select DRM_AMD_DC_FP if PPC64 && ALTIVEC >> +select DRM_AMD_DC_FP if RISCV && FPU >> +select DRM_AMD_DC_FP if LOONGARCH || X86 > > This really is a mess. Can you add a ARCH_HAS_KERNEL_FPU_SUPPORT > symbol that all architetures that have it select instead, and them > make DRM_AMD_DC_FP depend on it? Yes, I have done this for v2, which I will send shortly. >> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) || >> defined(CONFIG_RISCV) >> kernel_fpu_begin(); >> #elif defined(CONFIG_PPC64) >> if (cpu_has_feature(CPU_FTR_VSX_COMP)) >> @@ -122,7 +124,7 @@ void dc_fpu_end(const char *function_name, const int >> line) >> >> depth = __this_cpu_dec_return(fpu_recursion_depth); >> if (depth == 0) { >> -#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> +#if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) || >> defined(CONFIG_RISCV) >> kernel_fpu_end(); >> #elif defined(CONFIG_PPC64) >> if (cpu_has_feature(CPU_FTR_VSX_COMP)) > > And then this mess can go away. We'll need to decide if we want to > cover all the in-kernel vector support as part of it, which would > seem reasonable to me, or have a separate generic kernel_vector_begin > with it's own option. I think we may want to keep vector separate for performance on architectures with separate FP and vector register files. For now, I have limited my changes to FPU support only, which means I have removed VSX/Altivec from here; the AMDGPU code doesn't need Altivec anyway. >> diff --git a/drivers/gpu/drm/amd/display/dc/dml/Makefile >> b/drivers/gpu/drm/amd/display/dc/dml/Makefile >> index ea7d60f9a9b4..5c8f840ef323 100644 >> --- a/drivers/gpu/drm/amd/display/dc/dml/Makefile >> +++ b/drivers/gpu/drm/amd/display/dc/dml/Makefile >> @@ -43,6 +43,12 @@ dml_ccflags := -mfpu=64 >> dml_rcflags := -msoft-float >> endif >> >> +ifdef CONFIG_RISCV >> +include $(srctree)/arch/riscv/Makefile.isa >> +# Remove V from the ISA string, like in arch/riscv/Makefile, but keep F and >> D. >> +dml_ccflags := -march=$(shell echo $(riscv-march-y) | sed -E >> 's/(rv32ima|rv64ima)([^v_]*)v?/\1\2/') >> +endif >> + >> ifdef CONFIG_CC_IS_GCC >> ifneq ($(call gcc-min-version, 70100),y) >> IS_OLD_GCC = 1 > > And this is again not really something we should be doing. > Instead we need a generic way in Kconfig to enable FPU support > for an object file or set of, that the arch support can hook > into. I've included this in v2 as well. > Btw, I'm also really worried about folks using the FPU instructions > outside the kernel_fpu_begin/end windows in general (not directly > related to the RISC-V support). Can we have objecttool checks > for that similar to only allowing the unsafe uaccess in the > uaccess begin/end pairs? ARM partially enforces this at compile time: it disallows calling kernel_neon_begin() inside a translation unit that has NEON enabled. That doesn't prevent the programmer from calling a FPU-enabled function from outside a begin/end section, but it does prevent the compiler from generating unexpected FPU usage behind your back. I implemented this same functionality for RISC-V. Actually tracking all possibly-FPU-tainted functions and their call sites is probably possible, but a much larger task. Regards, Samuel
Re: [PATCH v4 1/5] RISC-V: Add stubs for sbi_console_putchar/getchar()
Hi Anup, On 2023-11-23 4:38 AM, Anup Patel wrote: > On Wed, Nov 22, 2023 at 4:06 AM Samuel Holland > wrote: >> On 2023-11-17 9:38 PM, Anup Patel wrote: >>> The functions sbi_console_putchar() and sbi_console_getchar() are >>> not defined when CONFIG_RISCV_SBI_V01 is disabled so let us add >>> stub of these functions to avoid "#ifdef" on user side. >>> >>> Signed-off-by: Anup Patel >>> Reviewed-by: Andrew Jones >>> --- >>> arch/riscv/include/asm/sbi.h | 5 + >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h >>> index 0892f4421bc4..66f3933c14f6 100644 >>> --- a/arch/riscv/include/asm/sbi.h >>> +++ b/arch/riscv/include/asm/sbi.h >>> @@ -271,8 +271,13 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned >>> long arg0, >>> unsigned long arg3, unsigned long arg4, >>> unsigned long arg5); >>> >>> +#ifdef CONFIG_RISCV_SBI_V01 >>> void sbi_console_putchar(int ch); >>> int sbi_console_getchar(void); >>> +#else >>> +static inline void sbi_console_putchar(int ch) { } >>> +static inline int sbi_console_getchar(void) { return -ENOENT; } >> >> "The SBI call returns the byte on success, or -1 for failure." >> >> So -ENOENT is not really an appropriate value to return here. > > Actually, I had -1 over here previously but based on GregKH's > suggestion, we are now returning proper Linux error code here. > > Also, all users of sbi_console_getchar() onlyl expect a negative > value upon error so better to return proper Linux error code. Alright, makes sense to me. Regards, Samuel
Re: [PATCH v4 5/5] RISC-V: Enable SBI based earlycon support
Hi Anup, On 2023-11-17 9:38 PM, Anup Patel wrote: > Let us enable SBI based earlycon support in defconfigs for both RV32 > and RV64 so that "earlycon=sbi" can be used again. > > Signed-off-by: Anup Patel > Reviewed-by: Andrew Jones > --- > arch/riscv/configs/defconfig | 1 + > arch/riscv/configs/rv32_defconfig | 1 + > 2 files changed, 2 insertions(+) > > diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig > index 905881282a7c..eaf34e871e30 100644 > --- a/arch/riscv/configs/defconfig > +++ b/arch/riscv/configs/defconfig > @@ -149,6 +149,7 @@ CONFIG_SERIAL_8250_CONSOLE=y > CONFIG_SERIAL_8250_DW=y > CONFIG_SERIAL_OF_PLATFORM=y > CONFIG_SERIAL_SH_SCI=y > +CONFIG_SERIAL_EARLYCON_RISCV_SBI=y > CONFIG_VIRTIO_CONSOLE=y > CONFIG_HW_RANDOM=y > CONFIG_HW_RANDOM_VIRTIO=y > diff --git a/arch/riscv/configs/rv32_defconfig > b/arch/riscv/configs/rv32_defconfig > index 89b601e253a6..5721af39afd1 100644 > --- a/arch/riscv/configs/rv32_defconfig > +++ b/arch/riscv/configs/rv32_defconfig This file isn't used anymore since 72f045d19f25 ("riscv: Fixup difference with defconfig"), so there's no need to update it. I'll send a patch deleting it. Regards, Samuel > @@ -66,6 +66,7 @@ CONFIG_INPUT_MOUSEDEV=y > CONFIG_SERIAL_8250=y > CONFIG_SERIAL_8250_CONSOLE=y > CONFIG_SERIAL_OF_PLATFORM=y > +CONFIG_SERIAL_EARLYCON_RISCV_SBI=y > CONFIG_VIRTIO_CONSOLE=y > CONFIG_HW_RANDOM=y > CONFIG_HW_RANDOM_VIRTIO=y
Re: [PATCH v4 2/5] RISC-V: Add SBI debug console helper routines
Hi Anup, On 2023-11-17 9:38 PM, Anup Patel wrote: > Let us provide SBI debug console helper routines which can be > shared by serial/earlycon-riscv-sbi.c and hvc/hvc_riscv_sbi.c. > > Signed-off-by: Anup Patel > --- > arch/riscv/include/asm/sbi.h | 5 + > arch/riscv/kernel/sbi.c | 43 > 2 files changed, 48 insertions(+) > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h > index 66f3933c14f6..ee7aef5f6233 100644 > --- a/arch/riscv/include/asm/sbi.h > +++ b/arch/riscv/include/asm/sbi.h > @@ -334,6 +334,11 @@ static inline unsigned long sbi_mk_version(unsigned long > major, > } > > int sbi_err_map_linux_errno(int err); > + > +extern bool sbi_debug_console_available; > +int sbi_debug_console_write(unsigned int num_bytes, phys_addr_t base_addr); > +int sbi_debug_console_read(unsigned int num_bytes, phys_addr_t base_addr); > + > #else /* CONFIG_RISCV_SBI */ > static inline int sbi_remote_fence_i(const struct cpumask *cpu_mask) { > return -1; } > static inline void sbi_init(void) {} > diff --git a/arch/riscv/kernel/sbi.c b/arch/riscv/kernel/sbi.c > index 5a62ed1da453..73a9c22c3945 100644 > --- a/arch/riscv/kernel/sbi.c > +++ b/arch/riscv/kernel/sbi.c > @@ -571,6 +571,44 @@ long sbi_get_mimpid(void) > } > EXPORT_SYMBOL_GPL(sbi_get_mimpid); > > +bool sbi_debug_console_available; > + > +int sbi_debug_console_write(unsigned int num_bytes, phys_addr_t base_addr) > +{ > + struct sbiret ret; > + > + if (!sbi_debug_console_available) > + return -EOPNOTSUPP; > + > + if (IS_ENABLED(CONFIG_32BIT)) > + ret = sbi_ecall(SBI_EXT_DBCN, SBI_EXT_DBCN_CONSOLE_WRITE, > + num_bytes, lower_32_bits(base_addr), > + upper_32_bits(base_addr), 0, 0, 0); > + else > + ret = sbi_ecall(SBI_EXT_DBCN, SBI_EXT_DBCN_CONSOLE_WRITE, > + num_bytes, base_addr, 0, 0, 0, 0); > + > + return ret.error ? sbi_err_map_linux_errno(ret.error) : ret.value; > +} > + > +int sbi_debug_console_read(unsigned int num_bytes, phys_addr_t base_addr) > +{ > + struct sbiret ret; > + > + if (!sbi_debug_console_available) > + return -EOPNOTSUPP; > + > + if (IS_ENABLED(CONFIG_32BIT)) > + ret = sbi_ecall(SBI_EXT_DBCN, SBI_EXT_DBCN_CONSOLE_READ, > + num_bytes, lower_32_bits(base_addr), > + upper_32_bits(base_addr), 0, 0, 0); > + else > + ret = sbi_ecall(SBI_EXT_DBCN, SBI_EXT_DBCN_CONSOLE_READ, > + num_bytes, base_addr, 0, 0, 0, 0); > + > + return ret.error ? sbi_err_map_linux_errno(ret.error) : ret.value; > +} Since every place that calls these functions will need to do the vmalloc lookup, would it make sense to do it here, and have these take a pointer instead? Regards, Samuel
Re: [PATCH v4 3/5] tty/serial: Add RISC-V SBI debug console based earlycon
Hi Anup, On 2023-11-17 9:38 PM, Anup Patel wrote: > We extend the existing RISC-V SBI earlycon support to use the new > RISC-V SBI debug console extension. > > Signed-off-by: Anup Patel > Reviewed-by: Andrew Jones > --- > drivers/tty/serial/Kconfig | 2 +- > drivers/tty/serial/earlycon-riscv-sbi.c | 24 > 2 files changed, 21 insertions(+), 5 deletions(-) > > diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig > index 732c893c8d16..1f2594b8ab9d 100644 > --- a/drivers/tty/serial/Kconfig > +++ b/drivers/tty/serial/Kconfig > @@ -87,7 +87,7 @@ config SERIAL_EARLYCON_SEMIHOST > > config SERIAL_EARLYCON_RISCV_SBI > bool "Early console using RISC-V SBI" > - depends on RISCV_SBI_V01 > + depends on RISCV_SBI > select SERIAL_CORE > select SERIAL_CORE_CONSOLE > select SERIAL_EARLYCON > diff --git a/drivers/tty/serial/earlycon-riscv-sbi.c > b/drivers/tty/serial/earlycon-riscv-sbi.c > index 27afb0b74ea7..5351e1e31f45 100644 > --- a/drivers/tty/serial/earlycon-riscv-sbi.c > +++ b/drivers/tty/serial/earlycon-riscv-sbi.c > @@ -15,17 +15,33 @@ static void sbi_putc(struct uart_port *port, unsigned > char c) > sbi_console_putchar(c); > } > > -static void sbi_console_write(struct console *con, > - const char *s, unsigned n) > +static void sbi_0_1_console_write(struct console *con, > + const char *s, unsigned int n) > { > struct earlycon_device *dev = con->data; > uart_console_write(&dev->port, s, n, sbi_putc); > } > > +static void sbi_dbcn_console_write(struct console *con, > +const char *s, unsigned int n) > +{ > + sbi_debug_console_write(n, __pa(s)); This only works for strings in the linear mapping or the kernel mapping (not vmalloc, which includes the stack). So I don't think we can use __pa() here. > +} > + > static int __init early_sbi_setup(struct earlycon_device *device, > const char *opt) > { > - device->con->write = sbi_console_write; > - return 0; > + int ret = 0; > + > + if (sbi_debug_console_available) { > + device->con->write = sbi_dbcn_console_write; > + } else { > + if (IS_ENABLED(CONFIG_RISCV_SBI_V01)) "else if", no need for the extra block/indentation. Regards, Samuel > + device->con->write = sbi_0_1_console_write; > + else > + ret = -ENODEV; > + } > + > + return ret; > } > EARLYCON_DECLARE(sbi, early_sbi_setup);
Re: [PATCH v4 1/5] RISC-V: Add stubs for sbi_console_putchar/getchar()
Hi Anup, On 2023-11-17 9:38 PM, Anup Patel wrote: > The functions sbi_console_putchar() and sbi_console_getchar() are > not defined when CONFIG_RISCV_SBI_V01 is disabled so let us add > stub of these functions to avoid "#ifdef" on user side. > > Signed-off-by: Anup Patel > Reviewed-by: Andrew Jones > --- > arch/riscv/include/asm/sbi.h | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/arch/riscv/include/asm/sbi.h b/arch/riscv/include/asm/sbi.h > index 0892f4421bc4..66f3933c14f6 100644 > --- a/arch/riscv/include/asm/sbi.h > +++ b/arch/riscv/include/asm/sbi.h > @@ -271,8 +271,13 @@ struct sbiret sbi_ecall(int ext, int fid, unsigned long > arg0, > unsigned long arg3, unsigned long arg4, > unsigned long arg5); > > +#ifdef CONFIG_RISCV_SBI_V01 > void sbi_console_putchar(int ch); > int sbi_console_getchar(void); > +#else > +static inline void sbi_console_putchar(int ch) { } > +static inline int sbi_console_getchar(void) { return -ENOENT; } "The SBI call returns the byte on success, or -1 for failure." So -ENOENT is not really an appropriate value to return here. Regards, Samuel > +#endif > long sbi_get_mvendorid(void); > long sbi_get_marchid(void); > long sbi_get_mimpid(void);
Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks
On 9/21/22 00:17, Christophe Leroy wrote: > Le 21/09/2022 à 05:33, Samuel Holland a écrit : >> On 9/19/22 07:37, Michael Ellerman wrote: >>> Christophe Leroy writes: >>>> Le 16/09/2022 à 07:05, Samuel Holland a écrit : >>>>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible >>>>> to switch away from a task inside copy_{from,to}_user. This left the CPU >>>>> with userspace access enabled until after the next IRQ or privilege >>>>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when >>>>> switching back to the original task, the userspace access would fault: >>>> >>>> This is not supposed to happen. You never switch away from a task >>>> magically. Task switch will always happen in an interrupt, that means >>>> copy_{from,to}_user() get interrupted. >>> >>> Unfortunately this isn't true when CONFIG_PREEMPT=y. >>> >>> We can switch away without an interrupt via: >>>__copy_tofrom_user() >>> -> __copy_tofrom_user_power7() >>> -> exit_vmx_usercopy() >>>-> preempt_enable() >>> -> __preempt_schedule() >>> -> preempt_schedule() >>> -> preempt_schedule_common() >>>-> __schedule() >>> >>> I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are >>> all on Power8, which is a bit of an oversight on my part. >>> >>> And clearly no one else tests it, until now :) >>> >>> I think the root of our problem is that our KUAP lock/unlock is at too >>> high a level, ie. we do it in C around the low-level copy to/from. >>> >>> eg: >>> >>> static inline unsigned long >>> raw_copy_to_user(void __user *to, const void *from, unsigned long n) >>> { >>> unsigned long ret; >>> >>> allow_write_to_user(to, n); >>> ret = __copy_tofrom_user(to, (__force const void __user *)from, n); >>> prevent_write_to_user(to, n); >>> return ret; >>> } >>> >>> There's a reason we did that, which is that we have various different >>> KUAP methods on different platforms, not a simple instruction like other >>> arches. >>> >>> But that means we have that exit_vmx_usercopy() being called deep in the >>> guts of __copy_tofrom_user(), with KUAP disabled, and then we call into >>> the preempt machinery and eventually schedule. >>> >>> I don't see an easy way to fix that "properly", it would be a big change >>> to all platforms to push the KUAP save/restore down into the low level >>> asm code. >>> >>> But I think the patch below does fix it, although it abuses things a >>> little. Namely it only works because the 64s KUAP code can handle a >>> double call to prevent, and doesn't need the addresses or size for the >>> allow. >>> >>> Still I think it might be our best option for an easy fix. >>> >>> Samuel, can you try this on your system and check it works for you? >> >> It looks like your patch works. Thanks for the correct fix! > > Instead of the patch from Michael, could you try by replacing > preempt_enable() by preempt_enable_no_resched() in exit_vmx_usercopy() ? I finally got a chance to test this, and the simpler fix of using preempt_enable_no_resched() works as well. >> I replaced my patch with the one below, and enabled >> CONFIG_PPC_KUAP_DEBUG=y, and I was able to do several kernel builds >> without any crashes or splats in dmesg. > > Did you try CONFIG_PPC_KUAP_DEBUG without the patch ? Did it detect any > problem ? I believe I did at one point, and it did not detect anything. Regards, Samuel
Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks
On 9/19/22 07:37, Michael Ellerman wrote: > Christophe Leroy writes: >> Le 16/09/2022 à 07:05, Samuel Holland a écrit : >>> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible >>> to switch away from a task inside copy_{from,to}_user. This left the CPU >>> with userspace access enabled until after the next IRQ or privilege >>> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when >>> switching back to the original task, the userspace access would fault: >> >> This is not supposed to happen. You never switch away from a task >> magically. Task switch will always happen in an interrupt, that means >> copy_{from,to}_user() get interrupted. > > Unfortunately this isn't true when CONFIG_PREEMPT=y. > > We can switch away without an interrupt via: > __copy_tofrom_user() > -> __copy_tofrom_user_power7() >-> exit_vmx_usercopy() > -> preempt_enable() > -> __preempt_schedule() > -> preempt_schedule() >-> preempt_schedule_common() > -> __schedule() > > I do some boot tests with CONFIG_PREEMPT=y, but I realise now those are > all on Power8, which is a bit of an oversight on my part. > > And clearly no one else tests it, until now :) > > I think the root of our problem is that our KUAP lock/unlock is at too > high a level, ie. we do it in C around the low-level copy to/from. > > eg: > > static inline unsigned long > raw_copy_to_user(void __user *to, const void *from, unsigned long n) > { > unsigned long ret; > > allow_write_to_user(to, n); > ret = __copy_tofrom_user(to, (__force const void __user *)from, n); > prevent_write_to_user(to, n); > return ret; > } > > There's a reason we did that, which is that we have various different > KUAP methods on different platforms, not a simple instruction like other > arches. > > But that means we have that exit_vmx_usercopy() being called deep in the > guts of __copy_tofrom_user(), with KUAP disabled, and then we call into > the preempt machinery and eventually schedule. > > I don't see an easy way to fix that "properly", it would be a big change > to all platforms to push the KUAP save/restore down into the low level > asm code. > > But I think the patch below does fix it, although it abuses things a > little. Namely it only works because the 64s KUAP code can handle a > double call to prevent, and doesn't need the addresses or size for the > allow. > > Still I think it might be our best option for an easy fix. > > Samuel, can you try this on your system and check it works for you? It looks like your patch works. Thanks for the correct fix! I replaced my patch with the one below, and enabled CONFIG_PPC_KUAP_DEBUG=y, and I was able to do several kernel builds without any crashes or splats in dmesg. I suppose the other calls to exit_vmx_usercopy() in copyuser_power7.S would not cause a crash, because there is no userspace memory access afterward, but couldn't they still leave KUAP erroneously unlocked? Regards, Samuel > diff --git a/arch/powerpc/include/asm/processor.h > b/arch/powerpc/include/asm/processor.h > index 97a77b37daa3..c50080c6a136 100644 > --- a/arch/powerpc/include/asm/processor.h > +++ b/arch/powerpc/include/asm/processor.h > @@ -432,6 +432,7 @@ int speround_handler(struct pt_regs *regs); > /* VMX copying */ > int enter_vmx_usercopy(void); > int exit_vmx_usercopy(void); > +void exit_vmx_usercopy_continue(void); > int enter_vmx_ops(void); > void *exit_vmx_ops(void *dest); > > diff --git a/arch/powerpc/lib/copyuser_power7.S > b/arch/powerpc/lib/copyuser_power7.S > index 28f0be523c06..77804860383c 100644 > --- a/arch/powerpc/lib/copyuser_power7.S > +++ b/arch/powerpc/lib/copyuser_power7.S > @@ -47,7 +47,7 @@ > ld r15,STK_REG(R15)(r1) > ld r14,STK_REG(R14)(r1) > .Ldo_err3: > - bl exit_vmx_usercopy > + bl exit_vmx_usercopy_continue > ld r0,STACKFRAMESIZE+16(r1) > mtlrr0 > b .Lexit > diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c > index f76a50291fd7..78a18b8384ff 100644 > --- a/arch/powerpc/lib/vmx-helper.c > +++ b/arch/powerpc/lib/vmx-helper.c > @@ -8,6 +8,7 @@ > */ > #include > #include > +#include > #include > > int enter_vmx_usercopy(void) > @@ -34,12 +35,19 @@ int enter_vmx_usercopy(void) > */ > int exit_vmx_usercopy(void) > { > + prevent_user_access(KUAP_READ_WRITE); > disable_kernel_altivec(); > pagefault_enable(); > preempt_enable(); > return 0; > } > > +void exit_vmx_usercopy_continue(void) > +{ > + exit_vmx_usercopy(); > + allow_read_write_user(NULL, NULL, 0); > +} > + > int enter_vmx_ops(void) > { > if (in_interrupt()) >
Re: [PATCH] powerpc: Save AMR/IAMR when switching tasks
On 9/17/22 03:16, Christophe Leroy wrote: > Le 16/09/2022 à 07:05, Samuel Holland a écrit : >> With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible >> to switch away from a task inside copy_{from,to}_user. This left the CPU >> with userspace access enabled until after the next IRQ or privilege >> level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when >> switching back to the original task, the userspace access would fault: > > This is not supposed to happen. You never switch away from a task > magically. Task switch will always happen in an interrupt, that means > copy_{from,to}_user() get interrupted. That makes sense, the interrupt handler is responsible for saving the KUAP status. It looks like neither DEFINE_INTERRUPT_HANDLER_RAW nor any of its users (performance_monitor_exception(), do_slb_fault()) do that. Yet they still call one of the interrupt_return variants, which restores the status from the stack. > Whenever an interrupt is taken, kuap_save_amr_and_lock() macro is used > to save KUAP status into the stack then lock KUAP access. At interrupt > exit, kuap_kernel_restore() macro or function is used to restore KUAP > access from the stack. At the time the task switch happens, KUAP access > is expected to be locked. During task switch, the stack is switched so > the KUAP status is taken back from the new task's stack. What if another task calls schedule() from kernel process context, and the scheduler switches to a task that had been preempted inside copy_{from,to}_user()? Then there is no interrupt involved, and I don't see where kuap_kernel_restore() would get called. > Your fix suggests that there is some path where the KUAP status is not > properly saved and/or restored. Did you try running with > CONFIG_PPC_KUAP_DEBUG ? It should warn whenever a KUAP access is left > unlocked. > >> >>Kernel attempted to write user page (3fff7ab68190) - exploit attempt? >> (uid: 65536) >>[ cut here ] >>Bug: Write fault blocked by KUAP! >>WARNING: CPU: 56 PID: 4939 at arch/powerpc/mm/fault.c:228 >> ___do_page_fault+0x7b4/0xaa0 >>CPU: 56 PID: 4939 Comm: git Tainted: GW >> 5.19.8-5-gba424747260d #1 >>NIP: c00555e4 LR: c00555e0 CTR: c079d9d0 >>REGS: c0008f507370 TRAP: 0700 Tainted: GW >> (5.19.8-5-gba424747260d) >>MSR: 90021033 CR: 2804 XER: 2004 >>CFAR: c0123780 IRQMASK: 3 >>NIP [c00555e4] ___do_page_fault+0x7b4/0xaa0 >>LR [c00555e0] ___do_page_fault+0x7b0/0xaa0 >>Call Trace: >>[c0008f507610] [c00555e0] ___do_page_fault+0x7b0/0xaa0 >> (unreliable) >>[c0008f5076c0] [c0055938] do_page_fault+0x68/0x130 >>[c0008f5076f0] [c0008914] data_access_common_virt+0x194/0x1f0 >>--- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4 > > ... > >> >> Fix this by saving and restoring the kernel-side AMR/IAMR values when >> switching tasks. > > As explained above, KUAP access should be locked at that time, so saving > and restoring it should not have any effect. If it does, it means > something goes wrong somewhere else. > >> >> Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") >> Signed-off-by: Samuel Holland >> --- >> I have no idea if this is the right change to make, and it could be >> optimized, but my system has been stable with this patch for 5 days now. >> >> Without the patch, I hit the bug every few minutes when my load average >> is <1, and I hit it immediately if I try to do a parallel kernel build. > > Great, then can you make a try with CONFIG_PPC_KUAP_DEBUG ? Yes, I will try this out in the next few days. Regards, Samuel
[PATCH] powerpc: Save AMR/IAMR when switching tasks
With CONFIG_PREEMPT=y (involuntary preemption enabled), it is possible to switch away from a task inside copy_{from,to}_user. This left the CPU with userspace access enabled until after the next IRQ or privilege level switch, when AMR/IAMR got reset to AMR_KU[AE]P_BLOCKED. Then, when switching back to the original task, the userspace access would fault: Kernel attempted to write user page (3fff7ab68190) - exploit attempt? (uid: 65536) [ cut here ] Bug: Write fault blocked by KUAP! WARNING: CPU: 56 PID: 4939 at arch/powerpc/mm/fault.c:228 ___do_page_fault+0x7b4/0xaa0 CPU: 56 PID: 4939 Comm: git Tainted: GW 5.19.8-5-gba424747260d #1 NIP: c00555e4 LR: c00555e0 CTR: c079d9d0 REGS: c0008f507370 TRAP: 0700 Tainted: GW (5.19.8-5-gba424747260d) MSR: 90021033 CR: 2804 XER: 2004 CFAR: c0123780 IRQMASK: 3 NIP [c00555e4] ___do_page_fault+0x7b4/0xaa0 LR [c00555e0] ___do_page_fault+0x7b0/0xaa0 Call Trace: [c0008f507610] [c00555e0] ___do_page_fault+0x7b0/0xaa0 (unreliable) [c0008f5076c0] [c0055938] do_page_fault+0x68/0x130 [c0008f5076f0] [c0008914] data_access_common_virt+0x194/0x1f0 --- interrupt: 300 at __copy_tofrom_user_base+0x9c/0x5a4 NIP: c007b1a8 LR: c073f4d4 CTR: 0080 REGS: c0008f507760 TRAP: 0300 Tainted: GW (5.19.8-5-gba424747260d) MSR: 9280b033 CR: 24002220 XER: 2004 CFAR: c007b174 DAR: 3fff7ab68190 DSISR: 0a00 IRQMASK: 0 NIP [c007b1a8] __copy_tofrom_user_base+0x9c/0x5a4 LR [c073f4d4] copyout+0x74/0x150 --- interrupt: 300 [c0008f507a30] [c07430cc] copy_page_to_iter+0x12c/0x4b0 [c0008f507ab0] [c02c7c20] filemap_read+0x200/0x460 [c0008f507bf0] [c05f96f4] xfs_file_buffered_read+0x104/0x170 [c0008f507c30] [c05f9800] xfs_file_read_iter+0xa0/0x150 [c0008f507c70] [c03bddc8] new_sync_read+0x108/0x180 [c0008f507d10] [c03c06b0] vfs_read+0x1d0/0x240 [c0008f507d60] [c03c0ba4] ksys_read+0x84/0x140 [c0008f507db0] [c002a3fc] system_call_exception+0x15c/0x300 [c0008f507e10] [c000c63c] system_call_common+0xec/0x250 --- interrupt: c00 at 0x3fff83aa7238 NIP: 3fff83aa7238 LR: 3fff83a923b8 CTR: REGS: c0008f507e80 TRAP: 0c00 Tainted: GW (5.19.8-5-gba424747260d) MSR: 9280f033 CR: 80002482 XER: IRQMASK: 0 NIP [3fff83aa7238] 0x3fff83aa7238 LR [3fff83a923b8] 0x3fff83a923b8 --- interrupt: c00 Instruction dump: e87f0100 48101021 6000 2c23 4182fee8 408e0128 3c82ff80 3884e978 3c62ff80 3863ea78 480ce13d 6000 <0fe0> fb010070 fb810090 e80100c0 ---[ end trace ]--- Fix this by saving and restoring the kernel-side AMR/IAMR values when switching tasks. Fixes: 890274c2dc4c ("powerpc/64s: Implement KUAP for Radix MMU") Signed-off-by: Samuel Holland --- I have no idea if this is the right change to make, and it could be optimized, but my system has been stable with this patch for 5 days now. Without the patch, I hit the bug every few minutes when my load average is <1, and I hit it immediately if I try to do a parallel kernel build. Because of the instability (file I/O randomly raises SIGBUS), I don't think anyone would run a system in this configuration, so I don't think this bug is exploitable. arch/powerpc/kernel/process.c | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index 0fbda89cd1bb..69b189d63124 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -1150,6 +1150,12 @@ static inline void save_sprs(struct thread_struct *t) */ t->tar = mfspr(SPRN_TAR); } + if (t->regs) { + if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) + t->regs->amr = mfspr(SPRN_AMR); + if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP)) + t->regs->iamr = mfspr(SPRN_IAMR); + } #endif } @@ -1228,6 +1234,13 @@ static inline void restore_sprs(struct thread_struct *old_thread, if (cpu_has_feature(CPU_FTR_P9_TIDR) && old_thread->tidr != new_thread->tidr) mtspr(SPRN_TIDR, new_thread->tidr); + if (new_thread->regs) { + if (mmu_has_feature(MMU_FTR_BOOK3S_KUAP)) + mtspr(SPRN_AMR, new_thread->regs->amr); + if (mmu_has_feature(MMU_FTR_BOOK3S_KUEP)) + mtspr(SPRN_IAMR, new_thread->regs->iamr); + isync(); + } #endif } -- 2.35.1
[PATCH] powerpc: Select HAVE_FUTEX_CMPXCHG
On powerpc, access_ok() succeeds for the NULL pointer. This breaks the dynamic check in futex_detect_cmpxchg(), which expects -EFAULT. As a result, robust futex operations are not functional on powerpc. Since the architecture's futex_atomic_cmpxchg_inatomic() implementation requires no runtime feature detection, we can select HAVE_FUTEX_CMPXCHG to skip futex_detect_cmpxchg() and enable the use of robust futexes. Signed-off-by: Samuel Holland --- arch/powerpc/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index ad620637cbd1..5ad1deb0c669 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -196,6 +196,7 @@ config PPC select HAVE_FUNCTION_ERROR_INJECTION select HAVE_FUNCTION_GRAPH_TRACER select HAVE_FUNCTION_TRACER + select HAVE_FUTEX_CMPXCHG select HAVE_GCC_PLUGINS if GCC_VERSION >= 50200 # plugin support on gcc <= 5.1 is buggy on PPC select HAVE_HW_BREAKPOINT if PERF_EVENTS && (PPC_BOOK3S || PPC_8xx) select HAVE_IDE -- 2.26.2
Re: [PATCH 1/3] powerpc/book3s64/hash/4k: 4k supports only 16TB linear mapping
Hello, On 9/17/19 9:57 AM, Aneesh Kumar K.V wrote: > With commit: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in > the > same 0xc range"), we now split the 64TB address range into 4 contexts each of > 16TB. That implies we can do only 16TB linear mapping. Make sure we don't > add physical memory above 16TB if that is present in the system. > > Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in > thesame 0xc range") > Reported-by: Cameron Berkenpas > Signed-off-by: Aneesh Kumar K.V > --- > arch/powerpc/include/asm/book3s/64/mmu.h | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h > b/arch/powerpc/include/asm/book3s/64/mmu.h > index bb3deb76c951..86cce8189240 100644 > --- a/arch/powerpc/include/asm/book3s/64/mmu.h > +++ b/arch/powerpc/include/asm/book3s/64/mmu.h > @@ -35,12 +35,16 @@ extern struct mmu_psize_def > mmu_psize_defs[MMU_PAGE_COUNT]; > * memory requirements with large number of sections. > * 51 bits is the max physical real address on POWER9 > */ > -#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) > && \ > - defined(CONFIG_PPC_64K_PAGES) > + > +#if defined(CONFIG_PPC_64K_PAGES) > +#if defined(CONFIG_SPARSEMEM_VMEMMAP) && defined(CONFIG_SPARSEMEM_EXTREME) This prevents accessing physical memory over 16TB with 4k pages and radix MMU as well. Was this intentional? > #define MAX_PHYSMEM_BITS 51 > #else > #define MAX_PHYSMEM_BITS 46 > #endif > +#else /* CONFIG_PPC_64K_PAGES */ > +#define MAX_PHYSMEM_BITS 44 > +#endif > > /* 64-bit classic hash table MMU */ > #include > Cheers, Samuel
Re: [PATCH v5 1/2] powerpc/32: add stack protector support
Hello all, On 09/27/18 02:05, Christophe Leroy wrote: [..snip..] > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile > index 07d9dce7eda6..45b8eb4d8fe7 100644 > --- a/arch/powerpc/Makefile > +++ b/arch/powerpc/Makefile > @@ -112,6 +112,9 @@ KBUILD_LDFLAGS+= -m elf$(BITS)$(LDEMULATION) > KBUILD_ARFLAGS += --target=elf$(BITS)-$(GNUTARGET) > endif > > +cflags-$(CONFIG_STACKPROTECTOR) += -mstack-protector-guard=tls > +cflags-$(CONFIG_STACKPROTECTOR) += -mstack-protector-guard-reg=r2 > + > LDFLAGS_vmlinux-y := -Bstatic > LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie > LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y) > @@ -404,6 +407,13 @@ archclean: > > archprepare: checkbin > > +ifdef CONFIG_STACKPROTECTOR > +prepare: stack_protector_prepare > + > +stack_protector_prepare: prepare0 > + $(eval KBUILD_CFLAGS += -mstack-protector-guard-offset=$(shell awk '{if > ($$2 == "TASK_CANARY") print $$3;}' include/generated/asm-offsets.h)) > +endif > + This breaks when building out-of-tree kernel modules. GCC is not getting passed the -mstack-protector-guard-offset argument, so the default offset is used. The kernel then panics the first time a function with stack protector is called. I'm seeing this on powerpc64. It looks like it was reported for powerpc on kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201891 Linux 4.20 does not have a "prepare" target when KBUILD_EXTMOD is set. One is added by: commit e07db28eea38ed4e332b3a89f3995c86b713cb5b Author: Masahiro Yamada Date: Thu Nov 22 08:11:54 2018 +0900 kbuild: fix single target build for external module However, after cherry-picking that patch, the build fails because it's missing prepare0. I applied the patch below and I successfully built an out-of-tree module with CONFIG_STACKPROTECTOR=y. diff --git a/Makefile b/Makefile index 826826553085..f0a93e1ba1b6 100644 --- a/Makefile +++ b/Makefile @@ -1596,9 +1596,10 @@ help: @echo '' # Dummies... -PHONY += prepare scripts -prepare: +PHONY += prepare prepare0 scripts +prepare: prepare0 $(cmd_crmodverdir) +prepare0: ; scripts: ; endif # KBUILD_EXTMOD The context has been changed some in later patches, but I think a change like this one should go into 5.0, and it e07db28eea38 should go into 4.20.y. Thanks, Samuel