Re: [PATCH] x86/hyperv/vtl: Replace real_mode_header only under Hyper-V
On 08.09.23 17:02, Saurabh Singh Sengar wrote: > On Fri, Sep 08, 2023 at 12:26:10PM +0200, Mathias Krause wrote: >> Booting a CONFIG_HYPERV_VTL_MODE=y enabled kernel on bare metal or a >> non-Hyper-V hypervisor leads to serve memory corruption as > > FWIW, CONFIG_HYPERV_VTL_MODE is not expected to be enabled for non VTL > platforms. Fair enough, but there's really no excuse to randomly crashing the kernel if one forgot to RTFM like I did. The code should (and easily can) handle such situations, especially if it's just a matter of a two line change. > Referring Kconfig documentation: > "A kernel built with this option must run at VTL2, and will not run as > a normal guest." So, maybe, the 'return 0' below should be a 'panic("Need to run on Hyper-V!")' instead? But then, looking at the code, most of the VTL specifics only run when the Hyper-V hypervisor was actually detected during early boot. It's just hv_vtl_early_init() that runs unconditionally and spoils the game. Is there really a *hard* requirement / reason for having VTL support disabled when not running under Hyper-V? At least I can't find any from the code side. It'll all be fine with the below patch, also enabling running the same kernel on multiple platforms -- bare metal, KVM, Hyper-V,.. Thanks, Mathias > > - Saurabh > >> hv_vtl_early_init() will run even though hv_vtl_init_platform() did not. >> This skips no-oping the 'realmode_reserve' and 'realmode_init' platform >> hooks, making init_real_mode() -> setup_real_mode() try to copy >> 'real_mode_blob' over 'real_mode_header' which we set to the stub >> 'hv_vtl_real_mode_header'. However, as 'real_mode_blob' isn't just a >> 'struct real_mode_header' -- it's the complete code! -- copying it over >> 'hv_vtl_real_mode_header' will corrupt quite some memory following it. >> >> The real cause for this erroneous behaviour is that hv_vtl_early_init() >> blindly assumes the kernel is running on Hyper-V, which it may not. >> >> Fix this by making sure the code only replaces the real mode header with >> the stub one iff the kernel is running under Hyper-V. >> >> Fixes: 3be1bc2fe9d2 ("x86/hyperv: VTL support for Hyper-V") >> Cc: Saurabh Sengar >> Cc: sta...@kernel.org >> Signed-off-by: Mathias Krause >> --- >> arch/x86/hyperv/hv_vtl.c | 4 >> 1 file changed, 4 insertions(+) >> >> diff --git a/arch/x86/hyperv/hv_vtl.c b/arch/x86/hyperv/hv_vtl.c >> index 57df7821d66c..54c06f4b8b4c 100644 >> --- a/arch/x86/hyperv/hv_vtl.c >> +++ b/arch/x86/hyperv/hv_vtl.c >> @@ -12,6 +12,7 @@ >> #include >> #include >> #include >> +#include >> #include >> >> extern struct boot_params boot_params; >> @@ -214,6 +215,9 @@ static int hv_vtl_wakeup_secondary_cpu(int apicid, >> unsigned long start_eip) >> >> static int __init hv_vtl_early_init(void) >> { >> +if (!hypervisor_is_type(X86_HYPER_MS_HYPERV)) >> +return 0; >> + >> /* >> * `boot_cpu_has` returns the runtime feature support, >> * and here is the earliest it can be used. >> -- >> 2.30.2 >>
Re: [PATCH] vmlinux.lds.h: drop unused __vermagic
Hi Jessica, On Mon, 21 Jan 2019 at 12:39, Mathias Krause wrote: > > On Wed, 2 Jan 2019 at 21:29, Jessica Yu wrote: > > > > +++ Mathias Krause [30/12/18 13:40 +0100]: > > >The reference to '__vermagic' is a relict from v2.5 times. And even > > >there it had a very short life time, from v2.5.59 (commit 1d411b80ee18 > > >("Module Sanity Check") in the historic tree) to v2.5.69 (commit > > >67ac5b866bda ("[PATCH] complete modinfo section")). > > > > > >Neither current kernels nor modules contain a '__vermagic' section any > > >more, so get rid of it. > > > > > >Signed-off-by: Mathias Krause > > >Cc: Rusty Russell > > >Cc: Jessica Yu > > > > Thanks for the cleanup. > > > > Reviewed-by: Jessica Yu > > > > Arnd, will you carry this patch through your asm-generic tree or did > the MAINTAINERS file mislead me here? > looks like Arnd keeps ignoring my Emails. :/ Can you carry this patch through your module tree? Or do you think I should ask Andrew Morton instead? Thanks, Mathias > > >--- > > > include/asm-generic/vmlinux.lds.h | 1 - > > > 1 file changed, 1 deletion(-) > > > > > >diff --git a/include/asm-generic/vmlinux.lds.h > > >b/include/asm-generic/vmlinux.lds.h > > >index 3d7a6a9c2370..530ce1e7a1ec 100644 > > >--- a/include/asm-generic/vmlinux.lds.h > > >+++ b/include/asm-generic/vmlinux.lds.h > > >@@ -332,7 +332,6 @@ > > > __start_rodata = .; \ > > > *(.rodata) *(.rodata.*) \ > > > RO_AFTER_INIT_DATA /* Read only after init */ \ > > >- KEEP(*(__vermagic)) /* Kernel version magic */ \ > > > . = ALIGN(8); \ > > > __start___tracepoints_ptrs = .; \ > > > KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array > > > */ \ > > >-- > > >2.19.2 > > >
Re: [PATCH] vmlinux.lds.h: drop unused __vermagic
On Wed, 2 Jan 2019 at 21:29, Jessica Yu wrote: > > +++ Mathias Krause [30/12/18 13:40 +0100]: > >The reference to '__vermagic' is a relict from v2.5 times. And even > >there it had a very short life time, from v2.5.59 (commit 1d411b80ee18 > >("Module Sanity Check") in the historic tree) to v2.5.69 (commit > >67ac5b866bda ("[PATCH] complete modinfo section")). > > > >Neither current kernels nor modules contain a '__vermagic' section any > >more, so get rid of it. > > > >Signed-off-by: Mathias Krause > >Cc: Rusty Russell > >Cc: Jessica Yu > > Thanks for the cleanup. > > Reviewed-by: Jessica Yu > Arnd, will you carry this patch through your asm-generic tree or did the MAINTAINERS file mislead me here? Thanks, Mathias > >--- > > include/asm-generic/vmlinux.lds.h | 1 - > > 1 file changed, 1 deletion(-) > > > >diff --git a/include/asm-generic/vmlinux.lds.h > >b/include/asm-generic/vmlinux.lds.h > >index 3d7a6a9c2370..530ce1e7a1ec 100644 > >--- a/include/asm-generic/vmlinux.lds.h > >+++ b/include/asm-generic/vmlinux.lds.h > >@@ -332,7 +332,6 @@ > > __start_rodata = .; \ > > *(.rodata) *(.rodata.*) \ > > RO_AFTER_INIT_DATA /* Read only after init */ \ > >- KEEP(*(__vermagic)) /* Kernel version magic */ \ > > . = ALIGN(8); \ > > __start___tracepoints_ptrs = .; \ > > KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ > >-- > >2.19.2 > >
[PATCH] vmlinux.lds.h: drop unused __vermagic
The reference to '__vermagic' is a relict from v2.5 times. And even there it had a very short life time, from v2.5.59 (commit 1d411b80ee18 ("Module Sanity Check") in the historic tree) to v2.5.69 (commit 67ac5b866bda ("[PATCH] complete modinfo section")). Neither current kernels nor modules contain a '__vermagic' section any more, so get rid of it. Signed-off-by: Mathias Krause Cc: Rusty Russell Cc: Jessica Yu --- include/asm-generic/vmlinux.lds.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 3d7a6a9c2370..530ce1e7a1ec 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -332,7 +332,6 @@ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ RO_AFTER_INIT_DATA /* Read only after init */ \ - KEEP(*(__vermagic)) /* Kernel version magic */ \ . = ALIGN(8); \ __start___tracepoints_ptrs = .; \ KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ -- 2.19.2
[PATCH] kallsyms: lower alignment on ARM
As mentioned in the info pages of gas, the '.align' pseudo op's interpretation of the alignment value is architecture specific. It might either be a byte value or taken to the power of two. On ARM it's actually the latter which leads to unnecessary large alignments of 16 bytes for 32 bit builds or 256 bytes for 64 bit builds. Fix this by switching to '.balign' instead which is consistent across all architectures. Signed-off-by: Mathias Krause Cc: Catalin Marinas Cc: Will Deacon --- scripts/kallsyms.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 109a1af7e444..77cebad0474e 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -334,10 +334,10 @@ static void write_src(void) printf("#include \n"); printf("#if BITS_PER_LONG == 64\n"); printf("#define PTR .quad\n"); - printf("#define ALGN .align 8\n"); + printf("#define ALGN .balign 8\n"); printf("#else\n"); printf("#define PTR .long\n"); - printf("#define ALGN .align 4\n"); + printf("#define ALGN .balign 4\n"); printf("#endif\n"); printf("\t.section .rodata, \"a\"\n"); -- 2.19.2
Re: [PATCH] net: ipv6: xfrm6_state: remove VLA usage
On 9 March 2018 at 13:21, Andreas Christoforouwrote: > The kernel would like to have all stack VLA usage removed[1]. > > Signed-off-by: Andreas Christoforou > --- > net/ipv6/xfrm6_state.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c > index b15075a..45c0d98 100644 > --- a/net/ipv6/xfrm6_state.c > +++ b/net/ipv6/xfrm6_state.c > @@ -62,7 +62,12 @@ __xfrm6_sort(void **dst, void **src, int n, int > (*cmp)(void *p), int maxclass) > { > int i; > int class[XFRM_MAX_DEPTH]; > - int count[maxclass]; > + int *count; > + > + count = kcalloc(maxclass + 1, sizeof(*count), GFP_KERNEL); > + > + if (!count) > + return -ENOMEM; > > memset(count, 0, sizeof(count)); > > @@ -80,6 +85,7 @@ __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void > *p), int maxclass) > src[i] = NULL; > } > > + kfree(count); > return 0; > } Instead of dynamically allocating and freeing memory here, shouldn't we just get rid of the maxclass parameter and use XFRM_MAX_DEPTH as size for the count[] array, too? Cheers, Mathias
Re: [PATCH] net: ipv6: xfrm6_state: remove VLA usage
On 9 March 2018 at 13:21, Andreas Christoforou wrote: > The kernel would like to have all stack VLA usage removed[1]. > > Signed-off-by: Andreas Christoforou > --- > net/ipv6/xfrm6_state.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/net/ipv6/xfrm6_state.c b/net/ipv6/xfrm6_state.c > index b15075a..45c0d98 100644 > --- a/net/ipv6/xfrm6_state.c > +++ b/net/ipv6/xfrm6_state.c > @@ -62,7 +62,12 @@ __xfrm6_sort(void **dst, void **src, int n, int > (*cmp)(void *p), int maxclass) > { > int i; > int class[XFRM_MAX_DEPTH]; > - int count[maxclass]; > + int *count; > + > + count = kcalloc(maxclass + 1, sizeof(*count), GFP_KERNEL); > + > + if (!count) > + return -ENOMEM; > > memset(count, 0, sizeof(count)); > > @@ -80,6 +85,7 @@ __xfrm6_sort(void **dst, void **src, int n, int (*cmp)(void > *p), int maxclass) > src[i] = NULL; > } > > + kfree(count); > return 0; > } Instead of dynamically allocating and freeing memory here, shouldn't we just get rid of the maxclass parameter and use XFRM_MAX_DEPTH as size for the count[] array, too? Cheers, Mathias
[tip:x86/urgent] x86/alternatives: Fix alt_max_short macro to really be a max()
Commit-ID: 6b32c126d33d5cb379bca280ab8acedc1ca978ff Gitweb: https://git.kernel.org/tip/6b32c126d33d5cb379bca280ab8acedc1ca978ff Author: Mathias Krause <mini...@googlemail.com> AuthorDate: Thu, 5 Oct 2017 20:30:12 +0200 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Mon, 9 Oct 2017 13:35:17 +0200 x86/alternatives: Fix alt_max_short macro to really be a max() The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Reviewed-and-tested-by: Borislav Petkov <b...@suse.de> Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause <mini...@googlemail.com> Signed-off-by: Thomas Gleixner <t...@linutronix.de> Cc: Borislav Petkov <b...@suse.de> Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-mini...@googlemail.com --- arch/x86/include/asm/alternative-asm.h | 4 +++- arch/x86/include/asm/alternative.h | 6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h index e7636ba..6c98821 100644 --- a/arch/x86/include/asm/alternative-asm.h +++ b/arch/x86/include/asm/alternative-asm.h @@ -62,8 +62,10 @@ #define new_len2 145f-144f /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax + * + * The additional "-" is needed because gas uses a "true" value of -1. */ #define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624..ccbe24e 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -103,12 +103,12 @@ static inline int alternatives_text_reserved(void *start, void *end) alt_end_marker ":\n" /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is
[tip:x86/urgent] x86/alternatives: Fix alt_max_short macro to really be a max()
Commit-ID: 6b32c126d33d5cb379bca280ab8acedc1ca978ff Gitweb: https://git.kernel.org/tip/6b32c126d33d5cb379bca280ab8acedc1ca978ff Author: Mathias Krause AuthorDate: Thu, 5 Oct 2017 20:30:12 +0200 Committer: Thomas Gleixner CommitDate: Mon, 9 Oct 2017 13:35:17 +0200 x86/alternatives: Fix alt_max_short macro to really be a max() The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Reviewed-and-tested-by: Borislav Petkov Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: sta...@vger.kernel.org Link: https://lkml.kernel.org/r/1507228213-13095-1-git-send-email-mini...@googlemail.com --- arch/x86/include/asm/alternative-asm.h | 4 +++- arch/x86/include/asm/alternative.h | 6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h index e7636ba..6c98821 100644 --- a/arch/x86/include/asm/alternative-asm.h +++ b/arch/x86/include/asm/alternative-asm.h @@ -62,8 +62,10 @@ #define new_len2 145f-144f /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax + * + * The additional "-" is needed because gas uses a "true" value of -1. */ #define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624..ccbe24e 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -103,12 +103,12 @@ static inline int alternatives_text_reserved(void *start, void *end) alt_end_marker ":\n" /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is
[PATCH v2] x86/alternatives: Fix alt_max_short macro to really be a max()
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause <mini...@googlemail.com> Reviewed-and-tested-by: Borislav Petkov <b...@suse.de> --- v2: - fix the comments even further, as requested by Boris - add reviewed-and-tested-by for Boris arch/x86/include/asm/alternative-asm.h |4 +++- arch/x86/include/asm/alternative.h |6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h index e7636bac7372..6c98821fef5e 100644 --- a/arch/x86/include/asm/alternative-asm.h +++ b/arch/x86/include/asm/alternative-asm.h @@ -62,8 +62,10 @@ #define new_len2 145f-144f /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax + * + * The additional "-" is needed because gas uses a "true" value of -1. */ #define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624137ae..ccbe24e697c4 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -103,12 +103,12 @@ static inline int alternatives_text_reserved(void *start, void *end) alt_end_marker ":\n" /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is -- 1.7.10.4
[PATCH v2] x86/alternatives: Fix alt_max_short macro to really be a max()
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comments regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause Reviewed-and-tested-by: Borislav Petkov --- v2: - fix the comments even further, as requested by Boris - add reviewed-and-tested-by for Boris arch/x86/include/asm/alternative-asm.h |4 +++- arch/x86/include/asm/alternative.h |6 +++--- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/alternative-asm.h b/arch/x86/include/asm/alternative-asm.h index e7636bac7372..6c98821fef5e 100644 --- a/arch/x86/include/asm/alternative-asm.h +++ b/arch/x86/include/asm/alternative-asm.h @@ -62,8 +62,10 @@ #define new_len2 145f-144f /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax + * + * The additional "-" is needed because gas uses a "true" value of -1. */ #define alt_max_short(a, b)((a) ^ (((a) ^ (b)) & -(-((a) < (b) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624137ae..ccbe24e697c4 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -103,12 +103,12 @@ static inline int alternatives_text_reserved(void *start, void *end) alt_end_marker ":\n" /* - * max without conditionals. Idea adapted from: + * gas compatible max based on the idea from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is -- 1.7.10.4
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 14:52, Michael Matz <m...@suse.de> wrote: > On Thu, 5 Oct 2017, Borislav Petkov wrote: >> On Thu, Oct 05, 2017 at 02:35:33PM +0200, Mathias Krause wrote: >> > Note the "<"! ...comment is wrong, though the implementation works! >> >> I know, I realized that when I looked at alternative-asm.h. Wanted to >> double-check it with Micha first. > > Yeah, for bit-twiddling the result of arithmetic would need to be > booleanized first, or alternatively the boolean operators be used in the > first place. So if '<' works then that's better in this context. Ack! > (In a different context, or in the same one there definitely was a problem > with using '<', but I can't remember the details, it's too long ago we > discussed about this; maybe it even was a problem only with some binutils > versions. So I'd suggest using the more obvious way until problems > reoccur, and then document why exactly using relational ops was a problem > ;-) ) Might be because the "true" value of gas' < operator is non-obvious with being -1. But, well, if you don't know, I don't know either ;) Anyways, the alt_max_short() macro in alternative.h is plain wrong while the one in alternative-asm.h works and is used in a few places, even with varying lengths of the alternatives (e.g. arch/x86/entry/vdso/vdso32/system_call.S, arch/x86/lib/{clear_page_64.S,memcpy_64.S,memset_64.S}), therefore proves to be functional. Thanks, Mathias
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 14:52, Michael Matz wrote: > On Thu, 5 Oct 2017, Borislav Petkov wrote: >> On Thu, Oct 05, 2017 at 02:35:33PM +0200, Mathias Krause wrote: >> > Note the "<"! ...comment is wrong, though the implementation works! >> >> I know, I realized that when I looked at alternative-asm.h. Wanted to >> double-check it with Micha first. > > Yeah, for bit-twiddling the result of arithmetic would need to be > booleanized first, or alternatively the boolean operators be used in the > first place. So if '<' works then that's better in this context. Ack! > (In a different context, or in the same one there definitely was a problem > with using '<', but I can't remember the details, it's too long ago we > discussed about this; maybe it even was a problem only with some binutils > versions. So I'd suggest using the more obvious way until problems > reoccur, and then document why exactly using relational ops was a problem > ;-) ) Might be because the "true" value of gas' < operator is non-obvious with being -1. But, well, if you don't know, I don't know either ;) Anyways, the alt_max_short() macro in alternative.h is plain wrong while the one in alternative-asm.h works and is used in a few places, even with varying lengths of the alternatives (e.g. arch/x86/entry/vdso/vdso32/system_call.S, arch/x86/lib/{clear_page_64.S,memcpy_64.S,memset_64.S}), therefore proves to be functional. Thanks, Mathias
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 09:58, Mathias Krause <mini...@googlemail.com> wrote: > On 5 October 2017 at 09:38, Borislav Petkov <b...@alien8.de> wrote: >> On Wed, Oct 04, 2017 at 08:08:12PM +0200, Mathias Krause wrote: >>> diff --git a/arch/x86/include/asm/alternative.h >>> b/arch/x86/include/asm/alternative.h >>> index c096624137ae..7c553f48f163 100644 >>> --- a/arch/x86/include/asm/alternative.h >>> +++ b/arch/x86/include/asm/alternative.h >>> @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void >>> *start, void *end) >>> * max without conditionals. Idea adapted from: >> ^ >> >> You did read this part, right? Oh, btw., quoting its counter part from arch/x86/include/asm/alternative-asm.h: /* * max without conditionals. Idea adapted from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax */ #define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b) Note the "<"! ...comment is wrong, though the implementation works! Cheers, Mathias
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 09:58, Mathias Krause wrote: > On 5 October 2017 at 09:38, Borislav Petkov wrote: >> On Wed, Oct 04, 2017 at 08:08:12PM +0200, Mathias Krause wrote: >>> diff --git a/arch/x86/include/asm/alternative.h >>> b/arch/x86/include/asm/alternative.h >>> index c096624137ae..7c553f48f163 100644 >>> --- a/arch/x86/include/asm/alternative.h >>> +++ b/arch/x86/include/asm/alternative.h >>> @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void >>> *start, void *end) >>> * max without conditionals. Idea adapted from: >> ^ >> >> You did read this part, right? Oh, btw., quoting its counter part from arch/x86/include/asm/alternative-asm.h: /* * max without conditionals. Idea adapted from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax */ #define alt_max_short(a, b) ((a) ^ (((a) ^ (b)) & -(-((a) < (b) Note the "<"! ...comment is wrong, though the implementation works! Cheers, Mathias
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 09:38, Borislav Petkov <b...@alien8.de> wrote: > On Wed, Oct 04, 2017 at 08:08:12PM +0200, Mathias Krause wrote: >> The alt_max_short() macro in asm/alternative.h does not work as >> intended, leading to nasty bugs. E.g. alt_max_short("1", "3") >> evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not >> exactly the maximum of 1 and 3. >> >> In fact, I had to learn it the hard way by crashing my kernel in not >> so funny ways by attempting to make use of the ALTENATIVE_2 macro >> with alternatives where the first one was larger than the second >> one. >> >> According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix >> ALTERNATIVE_2 padding generation properly") the right handed side >> should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the >> macro work as intended. >> >> While at it, fix up the comment regarding the additional "-", too. >> It's not about gas' usage of s32 but brain dead logic of having a >> "true" value of -1 for the < operator ... *sigh* >> >> Btw., the one in asm/alternative-asm.h is correct. And, apparently, >> all current users of ALTERNATIVE_2() pass same sized alternatives, >> avoiding to hit the bug. >> >> [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax >> >> Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation >> properly") >> Signed-off-by: Mathias Krause <mini...@googlemail.com> >> Cc: Borislav Petkov <b...@suse.de> >> --- >> arch/x86/include/asm/alternative.h |4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/alternative.h >> b/arch/x86/include/asm/alternative.h >> index c096624137ae..7c553f48f163 100644 >> --- a/arch/x86/include/asm/alternative.h >> +++ b/arch/x86/include/asm/alternative.h >> @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void >> *start, void *end) >> * max without conditionals. Idea adapted from: > ^ > > You did read this part, right? Yes. But it's just wrong the way it is right now. It's a crap number generator instead of being a "max without conditionals". > AFAIR, gas can't stomach conditionals but I don't remember the details > anymore. Can't be as arch/x86/include/asm/alternative.h itself makes use of them for the implementation of ALTERNATIVE[_2], see, e.g. __OLDINSTR() and OLDINSTR_2(). > Could be that -1 representation of "true". Let me add Micha to > CC. > > Anyway, how can I reproduce your observation? Code snippet and compiler > pls. Here you go: $ cat alt_max.c #ifdef VANILLA # define alt_max_short(a, b) \ "((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" #else # define alt_max_short(a, b) \ "((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" #endif #define ams(a, b) \ "ams_" #a "_" #b " = " alt_max_short(#a, #b) "\n\t" asm(ams(1, 3) ams(3, 1) ams(1, 6)); $ gcc -DVANILLA -c alt_max.c && nm alt_max.o 0003 a ams_1_3 0002 a ams_1_6 0001 a ams_3_1 $ gcc -c alt_max.c && nm alt_max.o 0003 a ams_1_3 0006 a ams_1_6 0003 a ams_3_1 Note not only how it gets the max(3,1) case wrong but generates even more insane crap for max(1,6). $ as --version | head -1 GNU assembler (GNU Binutils for Debian) 2.25 Cheers, Mathias > > Thx. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
Re: [PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
On 5 October 2017 at 09:38, Borislav Petkov wrote: > On Wed, Oct 04, 2017 at 08:08:12PM +0200, Mathias Krause wrote: >> The alt_max_short() macro in asm/alternative.h does not work as >> intended, leading to nasty bugs. E.g. alt_max_short("1", "3") >> evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not >> exactly the maximum of 1 and 3. >> >> In fact, I had to learn it the hard way by crashing my kernel in not >> so funny ways by attempting to make use of the ALTENATIVE_2 macro >> with alternatives where the first one was larger than the second >> one. >> >> According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix >> ALTERNATIVE_2 padding generation properly") the right handed side >> should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the >> macro work as intended. >> >> While at it, fix up the comment regarding the additional "-", too. >> It's not about gas' usage of s32 but brain dead logic of having a >> "true" value of -1 for the < operator ... *sigh* >> >> Btw., the one in asm/alternative-asm.h is correct. And, apparently, >> all current users of ALTERNATIVE_2() pass same sized alternatives, >> avoiding to hit the bug. >> >> [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax >> >> Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation >> properly") >> Signed-off-by: Mathias Krause >> Cc: Borislav Petkov >> --- >> arch/x86/include/asm/alternative.h |4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/alternative.h >> b/arch/x86/include/asm/alternative.h >> index c096624137ae..7c553f48f163 100644 >> --- a/arch/x86/include/asm/alternative.h >> +++ b/arch/x86/include/asm/alternative.h >> @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void >> *start, void *end) >> * max without conditionals. Idea adapted from: > ^ > > You did read this part, right? Yes. But it's just wrong the way it is right now. It's a crap number generator instead of being a "max without conditionals". > AFAIR, gas can't stomach conditionals but I don't remember the details > anymore. Can't be as arch/x86/include/asm/alternative.h itself makes use of them for the implementation of ALTERNATIVE[_2], see, e.g. __OLDINSTR() and OLDINSTR_2(). > Could be that -1 representation of "true". Let me add Micha to > CC. > > Anyway, how can I reproduce your observation? Code snippet and compiler > pls. Here you go: $ cat alt_max.c #ifdef VANILLA # define alt_max_short(a, b) \ "((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" #else # define alt_max_short(a, b) \ "((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" #endif #define ams(a, b) \ "ams_" #a "_" #b " = " alt_max_short(#a, #b) "\n\t" asm(ams(1, 3) ams(3, 1) ams(1, 6)); $ gcc -DVANILLA -c alt_max.c && nm alt_max.o 0003 a ams_1_3 0002 a ams_1_6 0001 a ams_3_1 $ gcc -c alt_max.c && nm alt_max.o 0003 a ams_1_3 0006 a ams_1_6 0003 a ams_3_1 Note not only how it gets the max(3,1) case wrong but generates even more insane crap for max(1,6). $ as --version | head -1 GNU assembler (GNU Binutils for Debian) 2.25 Cheers, Mathias > > Thx. > > -- > Regards/Gruss, > Boris. > > Good mailing practices for 400: avoid top-posting and trim the reply.
[PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comment regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Borislav Petkov <b...@suse.de> --- arch/x86/include/asm/alternative.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624137ae..7c553f48f163 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void *start, void *end) * max without conditionals. Idea adapted from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is -- 1.7.10.4
[PATCH] x86/alternatives: Fix alt_max_short macro to really be a max()
The alt_max_short() macro in asm/alternative.h does not work as intended, leading to nasty bugs. E.g. alt_max_short("1", "3") evaluates to 3, but alt_max_short("3", "1") evaluates to 1 -- not exactly the maximum of 1 and 3. In fact, I had to learn it the hard way by crashing my kernel in not so funny ways by attempting to make use of the ALTENATIVE_2 macro with alternatives where the first one was larger than the second one. According to [1] and commit dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") the right handed side should read "-(-(a < b))" not "-(-(a - b))". Fix that, to make the macro work as intended. While at it, fix up the comment regarding the additional "-", too. It's not about gas' usage of s32 but brain dead logic of having a "true" value of -1 for the < operator ... *sigh* Btw., the one in asm/alternative-asm.h is correct. And, apparently, all current users of ALTERNATIVE_2() pass same sized alternatives, avoiding to hit the bug. [1] http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax Fixes: dbe4058a6a44 ("x86/alternatives: Fix ALTERNATIVE_2 padding generation properly") Signed-off-by: Mathias Krause Cc: Borislav Petkov --- arch/x86/include/asm/alternative.h |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index c096624137ae..7c553f48f163 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -106,9 +106,9 @@ static inline int alternatives_text_reserved(void *start, void *end) * max without conditionals. Idea adapted from: * http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax * - * The additional "-" is needed because gas works with s32s. + * The additional "-" is needed because gas uses a "true" value of -1. */ -#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") - (" b ")" +#define alt_max_short(a, b)"((" a ") ^ (((" a ") ^ (" b ")) & -(-((" a ") < (" b ")" /* * Pad the second replacement alternative with additional NOPs if it is -- 1.7.10.4
[tip:x86/urgent] x86/vdso: Ensure vdso32_enabled gets set to valid values only
Commit-ID: c06989da39cdb10604d572c8c7ea8c8c97f3c483 Gitweb: http://git.kernel.org/tip/c06989da39cdb10604d572c8c7ea8c8c97f3c483 Author: Mathias Krause <mini...@googlemail.com> AuthorDate: Mon, 10 Apr 2017 17:14:27 +0200 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Mon, 10 Apr 2017 18:31:41 +0200 x86/vdso: Ensure vdso32_enabled gets set to valid values only vdso_enabled can be set to arbitrary integer values via the kernel command line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'. load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32 merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which causes a segfault when the application tries to use the VDSO. Restrict the valid arguments on the command line and the sysctl to 0 and 1. Fixes: b0b49f2673f0 ("x86, vdso: Remove compat vdso support") Signed-off-by: Mathias Krause <mini...@googlemail.com> Acked-by: Andy Lutomirski <l...@amacapital.net> Cc: Peter Zijlstra <pet...@infradead.org> Cc: sta...@vger.kernel.org Cc: Roland McGrath <rol...@redhat.com> Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-mini...@googlemail.com Link: http://lkml.kernel.org/r/20170410151723.518412...@linutronix.de Signed-off-by: Thomas Gleixner <t...@linutronix.de> --- arch/x86/entry/vdso/vdso32-setup.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c index 7853b53..3f9d1a8 100644 --- a/arch/x86/entry/vdso/vdso32-setup.c +++ b/arch/x86/entry/vdso/vdso32-setup.c @@ -30,8 +30,10 @@ static int __init vdso32_setup(char *s) { vdso32_enabled = simple_strtoul(s, NULL, 0); - if (vdso32_enabled > 1) + if (vdso32_enabled > 1) { pr_warn("vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n"); + vdso32_enabled = 0; + } return 1; } @@ -62,13 +64,18 @@ subsys_initcall(sysenter_setup); /* Register vsyscall32 into the ABI table */ #include +static const int zero; +static const int one = 1; + static struct ctl_table abi_table2[] = { { .procname = "vsyscall32", .data = _enabled, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = (int *), + .extra2 = (int *), }, {} };
[tip:x86/urgent] x86/vdso: Ensure vdso32_enabled gets set to valid values only
Commit-ID: c06989da39cdb10604d572c8c7ea8c8c97f3c483 Gitweb: http://git.kernel.org/tip/c06989da39cdb10604d572c8c7ea8c8c97f3c483 Author: Mathias Krause AuthorDate: Mon, 10 Apr 2017 17:14:27 +0200 Committer: Thomas Gleixner CommitDate: Mon, 10 Apr 2017 18:31:41 +0200 x86/vdso: Ensure vdso32_enabled gets set to valid values only vdso_enabled can be set to arbitrary integer values via the kernel command line 'vdso32=' parameter or via 'sysctl abi.vsyscall32'. load_vdso32() only maps VDSO if vdso_enabled == 1, but ARCH_DLINFO_IA32 merily checks for vdso_enabled != 0. As a consequence the AT_SYSINFO_EHDR auxiliary vector for the VDSO_ENTRY is emitted with a NULL pointer which causes a segfault when the application tries to use the VDSO. Restrict the valid arguments on the command line and the sysctl to 0 and 1. Fixes: b0b49f2673f0 ("x86, vdso: Remove compat vdso support") Signed-off-by: Mathias Krause Acked-by: Andy Lutomirski Cc: Peter Zijlstra Cc: sta...@vger.kernel.org Cc: Roland McGrath Link: http://lkml.kernel.org/r/1491424561-7187-1-git-send-email-mini...@googlemail.com Link: http://lkml.kernel.org/r/20170410151723.518412...@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/entry/vdso/vdso32-setup.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c index 7853b53..3f9d1a8 100644 --- a/arch/x86/entry/vdso/vdso32-setup.c +++ b/arch/x86/entry/vdso/vdso32-setup.c @@ -30,8 +30,10 @@ static int __init vdso32_setup(char *s) { vdso32_enabled = simple_strtoul(s, NULL, 0); - if (vdso32_enabled > 1) + if (vdso32_enabled > 1) { pr_warn("vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n"); + vdso32_enabled = 0; + } return 1; } @@ -62,13 +64,18 @@ subsys_initcall(sysenter_setup); /* Register vsyscall32 into the ABI table */ #include +static const int zero; +static const int one = 1; + static struct ctl_table abi_table2[] = { { .procname = "vsyscall32", .data = _enabled, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = (int *), + .extra2 = (int *), }, {} };
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 7 April 2017 at 15:14, Thomas Gleixner <t...@linutronix.de> wrote: > On Fri, 7 Apr 2017, Mathias Krause wrote: >> On 7 April 2017 at 11:46, Thomas Gleixner <t...@linutronix.de> wrote: >> > Whether protected by preempt_disable or local_irq_disable, to make that >> > work it needs CR0 handling in the exception entry/exit at the lowest >> > level. And that's just a nightmare maintainence wise as it's prone to be >> > broken over time. >> >> It seems to be working fine for more than a decade now in PaX. So it >> can't be such a big maintenance nightmare ;) > > I really do not care whether PaX wants to chase and verify that over and > over. I certainly don't want to take the chance to leak CR0.WP ever and I > very much care about extra stuff to check in the entry/exit path. Fair enough. However, placing a BUG_ON(!(read_cr0() & X86_CR0_WP)) somewhere sensible should make those "leaks" visible fast -- and their exploitation impossible, i.e. fail hard. >> The "proper solution" seems to be much slower compared to just >> toggling CR0.WP (which is costly in itself, already) because of the >> TLB invalidation / synchronisation involved. > > Why the heck should we care about rare writes being performant? As soon as they stop being rare and people start extending the r/o protection to critical data structures accessed often. Then performance matters. >> > It's valid (at least on x86) to have a shadow map with the same page >> > attributes but write enabled. That does not require any fixups of CR0 and >> > just works. >> >> "Just works", sure -- but it's not as tightly focused as the PaX >> solution which is CPU local, while your proposed solution is globally >> visible. > > Making the world and some more writeable hardly qualifies as tightly > focussed. Making the mapping concept CPU local is not rocket science > either. The question is whethers it's worth the trouble. No, the question is if the value of the concept is well understood and if people can see what could be done with such a strong primitive. Apparently not... Cheers, Mathias
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 7 April 2017 at 15:14, Thomas Gleixner wrote: > On Fri, 7 Apr 2017, Mathias Krause wrote: >> On 7 April 2017 at 11:46, Thomas Gleixner wrote: >> > Whether protected by preempt_disable or local_irq_disable, to make that >> > work it needs CR0 handling in the exception entry/exit at the lowest >> > level. And that's just a nightmare maintainence wise as it's prone to be >> > broken over time. >> >> It seems to be working fine for more than a decade now in PaX. So it >> can't be such a big maintenance nightmare ;) > > I really do not care whether PaX wants to chase and verify that over and > over. I certainly don't want to take the chance to leak CR0.WP ever and I > very much care about extra stuff to check in the entry/exit path. Fair enough. However, placing a BUG_ON(!(read_cr0() & X86_CR0_WP)) somewhere sensible should make those "leaks" visible fast -- and their exploitation impossible, i.e. fail hard. >> The "proper solution" seems to be much slower compared to just >> toggling CR0.WP (which is costly in itself, already) because of the >> TLB invalidation / synchronisation involved. > > Why the heck should we care about rare writes being performant? As soon as they stop being rare and people start extending the r/o protection to critical data structures accessed often. Then performance matters. >> > It's valid (at least on x86) to have a shadow map with the same page >> > attributes but write enabled. That does not require any fixups of CR0 and >> > just works. >> >> "Just works", sure -- but it's not as tightly focused as the PaX >> solution which is CPU local, while your proposed solution is globally >> visible. > > Making the world and some more writeable hardly qualifies as tightly > focussed. Making the mapping concept CPU local is not rocket science > either. The question is whethers it's worth the trouble. No, the question is if the value of the concept is well understood and if people can see what could be done with such a strong primitive. Apparently not... Cheers, Mathias
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 7 April 2017 at 11:46, Thomas Gleixner <t...@linutronix.de> wrote: > On Fri, 7 Apr 2017, Mathias Krause wrote: >> On 6 April 2017 at 17:59, Andy Lutomirski <l...@amacapital.net> wrote: >> > On Wed, Apr 5, 2017 at 5:14 PM, Kees Cook <keesc...@chromium.org> wrote: >> >> static __always_inline rare_write_begin(void) >> >> { >> >> preempt_disable(); >> >> local_irq_disable(); >> >> barrier(); >> >> __arch_rare_write_begin(); >> >> barrier(); >> >> } >> > >> > Looks good, except you don't need preempt_disable(). >> > local_irq_disable() also disables preemption. You might need to use >> > local_irq_save(), though, depending on whether any callers already >> > have IRQs off. >> >> Well, doesn't look good to me. NMIs will still be able to interrupt >> this code and will run with CR0.WP = 0. >> >> Shouldn't you instead question yourself why PaX can do it "just" with >> preempt_disable() instead?! > > That's silly. Just because PaX does it, doesn't mean it's correct. To be > honest, playing games with the CR0.WP bit is outright stupid to begin with. Why that? It allows fast and CPU local modifications of r/o memory. OTOH, an approach that needs to fiddle with page table entries requires global synchronization to keep the individual TLB states in sync. Hmm.. Not that fast, I'd say. > Whether protected by preempt_disable or local_irq_disable, to make that > work it needs CR0 handling in the exception entry/exit at the lowest > level. And that's just a nightmare maintainence wise as it's prone to be > broken over time. It seems to be working fine for more than a decade now in PaX. So it can't be such a big maintenance nightmare ;) >Aside of that it's pointless overhead for the normal case. > > The proper solution is: > > write_rare(ptr, val) > { > mp = map_shadow_rw(ptr); > *mp = val; > unmap_shadow_rw(mp); > } > > map_shadow_rw() is essentially the same thing as we do in the highmem case > where the kernel creates a shadow mapping of the user space pages via > kmap_atomic(). The "proper solution" seems to be much slower compared to just toggling CR0.WP (which is costly in itself, already) because of the TLB invalidation / synchronisation involved. > It's valid (at least on x86) to have a shadow map with the same page > attributes but write enabled. That does not require any fixups of CR0 and > just works. "Just works", sure -- but it's not as tightly focused as the PaX solution which is CPU local, while your proposed solution is globally visible. Cheers, Mathias
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 7 April 2017 at 11:46, Thomas Gleixner wrote: > On Fri, 7 Apr 2017, Mathias Krause wrote: >> On 6 April 2017 at 17:59, Andy Lutomirski wrote: >> > On Wed, Apr 5, 2017 at 5:14 PM, Kees Cook wrote: >> >> static __always_inline rare_write_begin(void) >> >> { >> >> preempt_disable(); >> >> local_irq_disable(); >> >> barrier(); >> >> __arch_rare_write_begin(); >> >> barrier(); >> >> } >> > >> > Looks good, except you don't need preempt_disable(). >> > local_irq_disable() also disables preemption. You might need to use >> > local_irq_save(), though, depending on whether any callers already >> > have IRQs off. >> >> Well, doesn't look good to me. NMIs will still be able to interrupt >> this code and will run with CR0.WP = 0. >> >> Shouldn't you instead question yourself why PaX can do it "just" with >> preempt_disable() instead?! > > That's silly. Just because PaX does it, doesn't mean it's correct. To be > honest, playing games with the CR0.WP bit is outright stupid to begin with. Why that? It allows fast and CPU local modifications of r/o memory. OTOH, an approach that needs to fiddle with page table entries requires global synchronization to keep the individual TLB states in sync. Hmm.. Not that fast, I'd say. > Whether protected by preempt_disable or local_irq_disable, to make that > work it needs CR0 handling in the exception entry/exit at the lowest > level. And that's just a nightmare maintainence wise as it's prone to be > broken over time. It seems to be working fine for more than a decade now in PaX. So it can't be such a big maintenance nightmare ;) >Aside of that it's pointless overhead for the normal case. > > The proper solution is: > > write_rare(ptr, val) > { > mp = map_shadow_rw(ptr); > *mp = val; > unmap_shadow_rw(mp); > } > > map_shadow_rw() is essentially the same thing as we do in the highmem case > where the kernel creates a shadow mapping of the user space pages via > kmap_atomic(). The "proper solution" seems to be much slower compared to just toggling CR0.WP (which is costly in itself, already) because of the TLB invalidation / synchronisation involved. > It's valid (at least on x86) to have a shadow map with the same page > attributes but write enabled. That does not require any fixups of CR0 and > just works. "Just works", sure -- but it's not as tightly focused as the PaX solution which is CPU local, while your proposed solution is globally visible. Cheers, Mathias
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 6 April 2017 at 17:59, Andy Lutomirskiwrote: > On Wed, Apr 5, 2017 at 5:14 PM, Kees Cook wrote: >> On Wed, Apr 5, 2017 at 4:57 PM, Andy Lutomirski wrote: >>> On Wed, Mar 29, 2017 at 6:41 PM, Kees Cook wrote: On Wed, Mar 29, 2017 at 3:38 PM, Andy Lutomirski wrote: > On Wed, Mar 29, 2017 at 11:15 AM, Kees Cook wrote: >> Based on PaX's x86 pax_{open,close}_kernel() implementation, this >> allows HAVE_ARCH_RARE_WRITE to work on x86. >> >> + >> +static __always_inline unsigned long __arch_rare_write_begin(void) >> +{ >> + unsigned long cr0; >> + >> + preempt_disable(); > > This looks wrong. DEBUG_LOCKS_WARN_ON(!irqs_disabled()) would work, > as would local_irq_disable(). There's no way that just disabling > preemption is enough. > > (Also, how does this interact with perf nmis?) Do you mean preempt_disable() isn't strong enough here? I'm open to suggestions. The goal would be to make sure nothing between _begin and _end would get executed without interruption... >>> >>> Sorry for the very slow response. >>> >>> preempt_disable() isn't strong enough to prevent interrupts, and an >>> interrupt here would run with WP off, causing unknown havoc. I tend >>> to think that the caller should be responsible for turning off >>> interrupts. >> >> So, something like: >> >> Top-level functions: >> >> static __always_inline rare_write_begin(void) >> { >> preempt_disable(); >> local_irq_disable(); >> barrier(); >> __arch_rare_write_begin(); >> barrier(); >> } > > Looks good, except you don't need preempt_disable(). > local_irq_disable() also disables preemption. You might need to use > local_irq_save(), though, depending on whether any callers already > have IRQs off. Well, doesn't look good to me. NMIs will still be able to interrupt this code and will run with CR0.WP = 0. Shouldn't you instead question yourself why PaX can do it "just" with preempt_disable() instead?! Cheers, Mathias
Re: [kernel-hardening] Re: [RFC v2][PATCH 04/11] x86: Implement __arch_rare_write_begin/unmap()
On 6 April 2017 at 17:59, Andy Lutomirski wrote: > On Wed, Apr 5, 2017 at 5:14 PM, Kees Cook wrote: >> On Wed, Apr 5, 2017 at 4:57 PM, Andy Lutomirski wrote: >>> On Wed, Mar 29, 2017 at 6:41 PM, Kees Cook wrote: On Wed, Mar 29, 2017 at 3:38 PM, Andy Lutomirski wrote: > On Wed, Mar 29, 2017 at 11:15 AM, Kees Cook wrote: >> Based on PaX's x86 pax_{open,close}_kernel() implementation, this >> allows HAVE_ARCH_RARE_WRITE to work on x86. >> >> + >> +static __always_inline unsigned long __arch_rare_write_begin(void) >> +{ >> + unsigned long cr0; >> + >> + preempt_disable(); > > This looks wrong. DEBUG_LOCKS_WARN_ON(!irqs_disabled()) would work, > as would local_irq_disable(). There's no way that just disabling > preemption is enough. > > (Also, how does this interact with perf nmis?) Do you mean preempt_disable() isn't strong enough here? I'm open to suggestions. The goal would be to make sure nothing between _begin and _end would get executed without interruption... >>> >>> Sorry for the very slow response. >>> >>> preempt_disable() isn't strong enough to prevent interrupts, and an >>> interrupt here would run with WP off, causing unknown havoc. I tend >>> to think that the caller should be responsible for turning off >>> interrupts. >> >> So, something like: >> >> Top-level functions: >> >> static __always_inline rare_write_begin(void) >> { >> preempt_disable(); >> local_irq_disable(); >> barrier(); >> __arch_rare_write_begin(); >> barrier(); >> } > > Looks good, except you don't need preempt_disable(). > local_irq_disable() also disables preemption. You might need to use > local_irq_save(), though, depending on whether any callers already > have IRQs off. Well, doesn't look good to me. NMIs will still be able to interrupt this code and will run with CR0.WP = 0. Shouldn't you instead question yourself why PaX can do it "just" with preempt_disable() instead?! Cheers, Mathias
[PATCH] x86/vdso: ensure vdso32_enabled gets set to valid values only
If either via kernel command line 'vdso32=' or via 'sysctl abi.vsyscall32' vdso32_enabled gets set to a value below 0 or above 1, load_vdso32() won't map the vDSO but ARCH_DLINFO_IA32 would still pass an AT_SYSINFO_EHDR auxiliary vector, however with a NULL pointer. That'll make any program trying to make use of it fail with a segmentation fault. At least musl makes use of it if the kernel provides it. Ensure vdso32_enabled gets set to valid values only to fix this corner case. Fixes: b0b49f2673f0 ("x86, vdso: Remove compat vdso support") Cc: Andy Lutomirski <l...@amacapital.net> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Ingo Molnar <mi...@redhat.com> Cc: "H. Peter Anvin" <h...@zytor.com> Cc: Roland McGrath <rol...@redhat.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/x86/entry/vdso/vdso32-setup.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c index 7853b53959cd..ca312c174d6f 100644 --- a/arch/x86/entry/vdso/vdso32-setup.c +++ b/arch/x86/entry/vdso/vdso32-setup.c @@ -30,8 +30,10 @@ static int __init vdso32_setup(char *s) { vdso32_enabled = simple_strtoul(s, NULL, 0); - if (vdso32_enabled > 1) + if (vdso32_enabled > 1) { pr_warn("vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n"); + vdso32_enabled = 0; + } return 1; } @@ -62,13 +64,18 @@ int __init sysenter_setup(void) /* Register vsyscall32 into the ABI table */ #include +static const int zero; +static const int one = 1; + static struct ctl_table abi_table2[] = { { .procname = "vsyscall32", .data = _enabled, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = (int *), + .extra2 = (int *), }, {} }; -- 1.7.10.4
[PATCH] x86/vdso: ensure vdso32_enabled gets set to valid values only
If either via kernel command line 'vdso32=' or via 'sysctl abi.vsyscall32' vdso32_enabled gets set to a value below 0 or above 1, load_vdso32() won't map the vDSO but ARCH_DLINFO_IA32 would still pass an AT_SYSINFO_EHDR auxiliary vector, however with a NULL pointer. That'll make any program trying to make use of it fail with a segmentation fault. At least musl makes use of it if the kernel provides it. Ensure vdso32_enabled gets set to valid values only to fix this corner case. Fixes: b0b49f2673f0 ("x86, vdso: Remove compat vdso support") Cc: Andy Lutomirski Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Roland McGrath Signed-off-by: Mathias Krause --- arch/x86/entry/vdso/vdso32-setup.c | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/vdso/vdso32-setup.c b/arch/x86/entry/vdso/vdso32-setup.c index 7853b53959cd..ca312c174d6f 100644 --- a/arch/x86/entry/vdso/vdso32-setup.c +++ b/arch/x86/entry/vdso/vdso32-setup.c @@ -30,8 +30,10 @@ static int __init vdso32_setup(char *s) { vdso32_enabled = simple_strtoul(s, NULL, 0); - if (vdso32_enabled > 1) + if (vdso32_enabled > 1) { pr_warn("vdso32 values other than 0 and 1 are no longer allowed; vdso disabled\n"); + vdso32_enabled = 0; + } return 1; } @@ -62,13 +64,18 @@ int __init sysenter_setup(void) /* Register vsyscall32 into the ABI table */ #include +static const int zero; +static const int one = 1; + static struct ctl_table abi_table2[] = { { .procname = "vsyscall32", .data = _enabled, .maxlen = sizeof(int), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dointvec_minmax, + .extra1 = (int *), + .extra2 = (int *), }, {} }; -- 1.7.10.4
[PATCH v2] cris: remove unused wp_works_ok macro
It had no use since it's introduction in v2.4.1.2. Get rid of it. Cc: Mikael Starvik <star...@axis.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> Acked-by: Jesper Nilsson <jesper.nils...@axis.com> --- Same patch as v1 but as the tip folks took only the x86 parts, I think, this one should go through the CRIS tree. Cheers, Mathias arch/cris/include/arch-v10/arch/processor.h |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/cris/include/arch-v10/arch/processor.h b/arch/cris/include/arch-v10/arch/processor.h index 93feb2a487d8..58f75bee1d6c 100644 --- a/arch/cris/include/arch-v10/arch/processor.h +++ b/arch/cris/include/arch-v10/arch/processor.h @@ -7,9 +7,6 @@ */ #define current_text_addr() ({void *pc; __asm__ ("move.d $pc,%0" : "=rm" (pc)); pc; }) -/* CRIS has no problems with write protection */ -#define wp_works_ok 1 - /* CRIS thread_struct. this really has nothing to do with the processor itself, since * CRIS does not do any hardware task-switching, but it's here for legacy reasons. * The thread_struct here is used when task-switching using _resume defined in entry.S. -- 1.7.10.4
[PATCH v2] cris: remove unused wp_works_ok macro
It had no use since it's introduction in v2.4.1.2. Get rid of it. Cc: Mikael Starvik Signed-off-by: Mathias Krause Acked-by: Jesper Nilsson --- Same patch as v1 but as the tip folks took only the x86 parts, I think, this one should go through the CRIS tree. Cheers, Mathias arch/cris/include/arch-v10/arch/processor.h |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/cris/include/arch-v10/arch/processor.h b/arch/cris/include/arch-v10/arch/processor.h index 93feb2a487d8..58f75bee1d6c 100644 --- a/arch/cris/include/arch-v10/arch/processor.h +++ b/arch/cris/include/arch-v10/arch/processor.h @@ -7,9 +7,6 @@ */ #define current_text_addr() ({void *pc; __asm__ ("move.d $pc,%0" : "=rm" (pc)); pc; }) -/* CRIS has no problems with write protection */ -#define wp_works_ok 1 - /* CRIS thread_struct. this really has nothing to do with the processor itself, since * CRIS does not do any hardware task-switching, but it's here for legacy reasons. * The thread_struct here is used when task-switching using _resume defined in entry.S. -- 1.7.10.4
[PATCH v2] sparc: remove unused wp_works_ok macro
It's unused for ages, used to be required for ksyms.c back in the v1.1 times. Signed-off-by: Mathias Krause <mini...@googlemail.com> Acked-by: David S. Miller <da...@davemloft.net> --- Hi Dave, same patch as v1 but as the tip folks took only the x86 parts, I think, this one should go through the SPARC tree. Cheers, Mathias arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 2 files changed, 10 deletions(-) diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 365d4cb267b4..dd27159819eb 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -18,12 +18,6 @@ #include #include -/* - * The sparc has no problems with write protection - */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* Whee, this is STACK_TOP + PAGE_SIZE and the lowest kernel address too... * That one page is used to protect kernel from intruders, so that * we can make our access_ok test faster diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 6448cfc8292f..b58ee9018433 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -18,10 +18,6 @@ #include #include -/* The sparc has no problems with write protection */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* * User lives in his very own context, and cannot reference us. Note * that TASK_SIZE is a misnomer, it really gives maximum user virtual -- 1.7.10.4
[PATCH v2] sparc: remove unused wp_works_ok macro
It's unused for ages, used to be required for ksyms.c back in the v1.1 times. Signed-off-by: Mathias Krause Acked-by: David S. Miller --- Hi Dave, same patch as v1 but as the tip folks took only the x86 parts, I think, this one should go through the SPARC tree. Cheers, Mathias arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 2 files changed, 10 deletions(-) diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 365d4cb267b4..dd27159819eb 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -18,12 +18,6 @@ #include #include -/* - * The sparc has no problems with write protection - */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* Whee, this is STACK_TOP + PAGE_SIZE and the lowest kernel address too... * That one page is used to protect kernel from intruders, so that * we can make our access_ok test faster diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 6448cfc8292f..b58ee9018433 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -18,10 +18,6 @@ #include #include -/* The sparc has no problems with write protection */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* * User lives in his very own context, and cannot reference us. Note * that TASK_SIZE is a misnomer, it really gives maximum user virtual -- 1.7.10.4
[tip:x86/cpu] x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86
Commit-ID: 6415813bae75feba10b8ca3ed6634a72c2a4d313 Gitweb: http://git.kernel.org/tip/6415813bae75feba10b8ca3ed6634a72c2a4d313 Author: Mathias Krause <mini...@googlemail.com> AuthorDate: Sun, 12 Feb 2017 22:12:08 +0100 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Sat, 11 Mar 2017 14:30:24 +0100 x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86 Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit d560bc61575e ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Signed-off-by: Mathias Krause <mini...@googlemail.com> Acked-by: Borislav Petkov <b...@alien8.de> Cc: Jesper Nilsson <jesper.nils...@axis.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardi...@citrix.com> Cc: Arnd Hannemann <hannem...@nets.rwth-aachen.de> Cc: Mikael Starvik <star...@axis.com> Cc: Geert Uytterhoeven <ge...@linux-m68k.org> Cc: Andrew Morton <a...@linux-foundation.org> Cc: "David S. Miller" <da...@davemloft.net> Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-mini...@googlemail.com Signed-off-by: Thomas Gleixner <t...@linutronix.de> --- arch/x86/include/asm/processor.h | 4 +--- arch/x86/kernel/cpu/proc.c | 5 ++--- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c| 9 + arch/x86/xen/enlighten.c | 1 - 5 files changed, 12 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 893f80e..4aa93b5 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -89,9 +89,7 @@ struct cpuinfo_x86 { __u8x86_vendor; /* CPU vendor */ __u8x86_model; __u8x86_mask; -#ifdef CONFIG_X86_32 - charwp_works_ok;/* It doesn't on 386's */ -#else +#ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; #endif diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 18ca99f..6df621a 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -31,14 +31,13 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "fpu\t\t: %s\n" "fpu_exception\t: %s\n" "cpuid level\t: %d\n" - "wp\t\t: %s\n", + "wp\t\t: yes\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", - c->cpuid_level, - c->wp_works_ok ? "yes" : "no"); + c->cpuid_level); } #else static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4bf0c89..7cd7bbe 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -173,14 +173,11 @@ static struct resource bss_resource = { #ifdef CONFIG_X86_32 -/* cpu data as detected by the assembly code in head.S */ -struct cpuinfo_x86 new_cpu_data = { - .wp_works_ok = -1, -}; +/* cpu data as detected by the assembly code in head_32.S */ +struct cpuinfo_x86 new_cpu_data; + /* common cpu data for all cpus */ -struct cpuinfo_x86 boot_cpu_data __read_mostly = { - .wp_works_ok = -1, -}; +struct cpuinfo_x86 boot_cpu_data __read_mostly; EXPORT_SYMBOL(boot_cpu_data); unsigned int def_to_bigsmp; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/i
[tip:x86/cpu] x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86
Commit-ID: 6415813bae75feba10b8ca3ed6634a72c2a4d313 Gitweb: http://git.kernel.org/tip/6415813bae75feba10b8ca3ed6634a72c2a4d313 Author: Mathias Krause AuthorDate: Sun, 12 Feb 2017 22:12:08 +0100 Committer: Thomas Gleixner CommitDate: Sat, 11 Mar 2017 14:30:24 +0100 x86/cpu: Drop wp_works_ok member of struct cpuinfo_x86 Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit d560bc61575e ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Signed-off-by: Mathias Krause Acked-by: Borislav Petkov Cc: Jesper Nilsson Cc: Jeremy Fitzhardinge Cc: Arnd Hannemann Cc: Mikael Starvik Cc: Geert Uytterhoeven Cc: Andrew Morton Cc: "David S. Miller" Link: http://lkml.kernel.org/r/1486933932-585-3-git-send-email-mini...@googlemail.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h | 4 +--- arch/x86/kernel/cpu/proc.c | 5 ++--- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c| 9 + arch/x86/xen/enlighten.c | 1 - 5 files changed, 12 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 893f80e..4aa93b5 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -89,9 +89,7 @@ struct cpuinfo_x86 { __u8x86_vendor; /* CPU vendor */ __u8x86_model; __u8x86_mask; -#ifdef CONFIG_X86_32 - charwp_works_ok;/* It doesn't on 386's */ -#else +#ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; #endif diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 18ca99f..6df621a 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -31,14 +31,13 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "fpu\t\t: %s\n" "fpu_exception\t: %s\n" "cpuid level\t: %d\n" - "wp\t\t: %s\n", + "wp\t\t: yes\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", - c->cpuid_level, - c->wp_works_ok ? "yes" : "no"); + c->cpuid_level); } #else static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4bf0c89..7cd7bbe 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -173,14 +173,11 @@ static struct resource bss_resource = { #ifdef CONFIG_X86_32 -/* cpu data as detected by the assembly code in head.S */ -struct cpuinfo_x86 new_cpu_data = { - .wp_works_ok = -1, -}; +/* cpu data as detected by the assembly code in head_32.S */ +struct cpuinfo_x86 new_cpu_data; + /* common cpu data for all cpus */ -struct cpuinfo_x86 boot_cpu_data __read_mostly = { - .wp_works_ok = -1, -}; +struct cpuinfo_x86 boot_cpu_data __read_mostly; EXPORT_SYMBOL(boot_cpu_data); unsigned int def_to_bigsmp; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 2b4b53e..4dddfaf 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -716,15 +716,17 @@ void __init paging_init(void) */ static void __init test_wp_bit(void) { + int wp_works_ok; + printk(KERN_INFO "Checking if this processor honours the WP bit even in supervisor mode...");
[tip:x86/cpu] x86/cpu: Drop unneded members of struct cpuinfo_x86
Commit-ID: 04402116846f36adea9503d7cd5104a7ed27a1a6 Gitweb: http://git.kernel.org/tip/04402116846f36adea9503d7cd5104a7ed27a1a6 Author: Mathias Krause <mini...@googlemail.com> AuthorDate: Sun, 12 Feb 2017 22:12:07 +0100 Committer: Thomas Gleixner <t...@linutronix.de> CommitDate: Sat, 11 Mar 2017 14:30:23 +0100 x86/cpu: Drop unneded members of struct cpuinfo_x86 Those member serve no purpose -- not even fill padding for alignment or such. So just get rid of them. Signed-off-by: Mathias Krause <mini...@googlemail.com> Acked-by: Borislav Petkov <b...@alien8.de> Cc: Jesper Nilsson <jesper.nils...@axis.com> Cc: Mikael Starvik <star...@axis.com> Cc: Geert Uytterhoeven <ge...@linux-m68k.org> Cc: Andrew Morton <a...@linux-foundation.org> Cc: "David S. Miller" <da...@davemloft.net> Link: http://lkml.kernel.org/r/1486933932-585-2-git-send-email-mini...@googlemail.com Signed-off-by: Thomas Gleixner <t...@linutronix.de> --- arch/x86/include/asm/processor.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f385eca..893f80e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -80,7 +80,7 @@ extern u16 __read_mostly tlb_lld_1g[NR_INFO]; /* * CPU type and hardware bug flags. Kept separately for each CPU. - * Members of this structure are referenced in head.S, so think twice + * Members of this structure are referenced in head_32.S, so think twice * before touching them. [mj] */ @@ -91,11 +91,6 @@ struct cpuinfo_x86 { __u8x86_mask; #ifdef CONFIG_X86_32 charwp_works_ok;/* It doesn't on 386's */ - - /* Problems on some 486Dx4's and old 386's: */ - charrfu; - charpad0; - charpad1; #else /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize;
[tip:x86/cpu] x86/cpu: Drop unneded members of struct cpuinfo_x86
Commit-ID: 04402116846f36adea9503d7cd5104a7ed27a1a6 Gitweb: http://git.kernel.org/tip/04402116846f36adea9503d7cd5104a7ed27a1a6 Author: Mathias Krause AuthorDate: Sun, 12 Feb 2017 22:12:07 +0100 Committer: Thomas Gleixner CommitDate: Sat, 11 Mar 2017 14:30:23 +0100 x86/cpu: Drop unneded members of struct cpuinfo_x86 Those member serve no purpose -- not even fill padding for alignment or such. So just get rid of them. Signed-off-by: Mathias Krause Acked-by: Borislav Petkov Cc: Jesper Nilsson Cc: Mikael Starvik Cc: Geert Uytterhoeven Cc: Andrew Morton Cc: "David S. Miller" Link: http://lkml.kernel.org/r/1486933932-585-2-git-send-email-mini...@googlemail.com Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/processor.h | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f385eca..893f80e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -80,7 +80,7 @@ extern u16 __read_mostly tlb_lld_1g[NR_INFO]; /* * CPU type and hardware bug flags. Kept separately for each CPU. - * Members of this structure are referenced in head.S, so think twice + * Members of this structure are referenced in head_32.S, so think twice * before touching them. [mj] */ @@ -91,11 +91,6 @@ struct cpuinfo_x86 { __u8x86_mask; #ifdef CONFIG_X86_32 charwp_works_ok;/* It doesn't on 386's */ - - /* Problems on some 486Dx4's and old 386's: */ - charrfu; - charpad0; - charpad1; #else /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize;
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 22:42, Mathias Krause <mini...@googlemail.com> wrote: > On 14 February 2017 at 19:13, H. Peter Anvin <h...@zytor.com> wrote: >> On 02/12/17 13:12, Mathias Krause wrote: >>> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >>> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >>> correctly. This makes a process reading this file always see "wp : yes" >>> here -- otherwise there would be no process to begin with ;) >>> >>> As this status line in /proc/cpuinfo serves no purpose for quite some >>> time now, get rid of it. >>> >>> Cc: Borislav Petkov <b...@alien8.de> >>> Cc: H. Peter Anvin <h...@linux.intel.com> >>> Signed-off-by: Mathias Krause <mini...@googlemail.com> >>> --- >>> arch/x86/kernel/cpu/proc.c |6 ++ >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c >>> index 6df621ae62a7..c6c5217a7980 100644 >>> --- a/arch/x86/kernel/cpu/proc.c >>> +++ b/arch/x86/kernel/cpu/proc.c >>> @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >>> cpuinfo_x86 *c) >>> "coma_bug\t: %s\n" >>> "fpu\t\t: %s\n" >>> "fpu_exception\t: %s\n" >>> -"cpuid level\t: %d\n" >>> -"wp\t\t: yes\n", >>> +"cpuid level\t: %d\n", >>> static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", >>> static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", >>> static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", >>> @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >>> cpuinfo_x86 *c) >>> seq_printf(m, >>> "fpu\t\t: yes\n" >>> "fpu_exception\t: yes\n" >>> -"cpuid level\t: %d\n" >>> -"wp\t\t: yes\n", >>> +"cpuid level\t: %d\n", >>> c->cpuid_level); >>> } >>> #endif >>> >> >> Potential userspace breakage, which is why the line was left in the >> first place despite its value now being hard-coded. Removing it will >> save a whopping 9 bytes of kernel rodata; this is a very small price to >> pay for guaranteeing continued compatibility. > > Indeed. That's why I've separated the removal into an extra patch -- > to make it easier not to take it. > >> >> Nacked-by: H. Peter Anvin <h...@zytor.com> > > Do you want me to send the series again without this patch and patch > #6 (Geert took it already) or are you okay with sorting them out > yourself? > Ping... Peter, what's your preference here? Mathias
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 22:42, Mathias Krause wrote: > On 14 February 2017 at 19:13, H. Peter Anvin wrote: >> On 02/12/17 13:12, Mathias Krause wrote: >>> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >>> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >>> correctly. This makes a process reading this file always see "wp : yes" >>> here -- otherwise there would be no process to begin with ;) >>> >>> As this status line in /proc/cpuinfo serves no purpose for quite some >>> time now, get rid of it. >>> >>> Cc: Borislav Petkov >>> Cc: H. Peter Anvin >>> Signed-off-by: Mathias Krause >>> --- >>> arch/x86/kernel/cpu/proc.c |6 ++ >>> 1 file changed, 2 insertions(+), 4 deletions(-) >>> >>> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c >>> index 6df621ae62a7..c6c5217a7980 100644 >>> --- a/arch/x86/kernel/cpu/proc.c >>> +++ b/arch/x86/kernel/cpu/proc.c >>> @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >>> cpuinfo_x86 *c) >>> "coma_bug\t: %s\n" >>> "fpu\t\t: %s\n" >>> "fpu_exception\t: %s\n" >>> -"cpuid level\t: %d\n" >>> -"wp\t\t: yes\n", >>> +"cpuid level\t: %d\n", >>> static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", >>> static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", >>> static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", >>> @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >>> cpuinfo_x86 *c) >>> seq_printf(m, >>> "fpu\t\t: yes\n" >>> "fpu_exception\t: yes\n" >>> -"cpuid level\t: %d\n" >>> -"wp\t\t: yes\n", >>> +"cpuid level\t: %d\n", >>> c->cpuid_level); >>> } >>> #endif >>> >> >> Potential userspace breakage, which is why the line was left in the >> first place despite its value now being hard-coded. Removing it will >> save a whopping 9 bytes of kernel rodata; this is a very small price to >> pay for guaranteeing continued compatibility. > > Indeed. That's why I've separated the removal into an extra patch -- > to make it easier not to take it. > >> >> Nacked-by: H. Peter Anvin > > Do you want me to send the series again without this patch and patch > #6 (Geert took it already) or are you okay with sorting them out > yourself? > Ping... Peter, what's your preference here? Mathias
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 19:13, H. Peter Anvin <h...@zytor.com> wrote: > On 02/12/17 13:12, Mathias Krause wrote: >> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >> correctly. This makes a process reading this file always see "wp : yes" >> here -- otherwise there would be no process to begin with ;) >> >> As this status line in /proc/cpuinfo serves no purpose for quite some >> time now, get rid of it. >> >> Cc: Borislav Petkov <b...@alien8.de> >> Cc: H. Peter Anvin <h...@linux.intel.com> >> Signed-off-by: Mathias Krause <mini...@googlemail.com> >> --- >> arch/x86/kernel/cpu/proc.c |6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c >> index 6df621ae62a7..c6c5217a7980 100644 >> --- a/arch/x86/kernel/cpu/proc.c >> +++ b/arch/x86/kernel/cpu/proc.c >> @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >> cpuinfo_x86 *c) >> "coma_bug\t: %s\n" >> "fpu\t\t: %s\n" >> "fpu_exception\t: %s\n" >> -"cpuid level\t: %d\n" >> -"wp\t\t: yes\n", >> +"cpuid level\t: %d\n", >> static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", >> static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", >> static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", >> @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >> cpuinfo_x86 *c) >> seq_printf(m, >> "fpu\t\t: yes\n" >> "fpu_exception\t: yes\n" >> -"cpuid level\t: %d\n" >> -"wp\t\t: yes\n", >> +"cpuid level\t: %d\n", >> c->cpuid_level); >> } >> #endif >> > > Potential userspace breakage, which is why the line was left in the > first place despite its value now being hard-coded. Removing it will > save a whopping 9 bytes of kernel rodata; this is a very small price to > pay for guaranteeing continued compatibility. Indeed. That's why I've separated the removal into an extra patch -- to make it easier not to take it. > > Nacked-by: H. Peter Anvin <h...@zytor.com> Do you want me to send the series again without this patch and patch #6 (Geert took it already) or are you okay with sorting them out yourself? Cheers, Mathias
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 19:13, H. Peter Anvin wrote: > On 02/12/17 13:12, Mathias Krause wrote: >> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >> correctly. This makes a process reading this file always see "wp : yes" >> here -- otherwise there would be no process to begin with ;) >> >> As this status line in /proc/cpuinfo serves no purpose for quite some >> time now, get rid of it. >> >> Cc: Borislav Petkov >> Cc: H. Peter Anvin >> Signed-off-by: Mathias Krause >> --- >> arch/x86/kernel/cpu/proc.c |6 ++ >> 1 file changed, 2 insertions(+), 4 deletions(-) >> >> diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c >> index 6df621ae62a7..c6c5217a7980 100644 >> --- a/arch/x86/kernel/cpu/proc.c >> +++ b/arch/x86/kernel/cpu/proc.c >> @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >> cpuinfo_x86 *c) >> "coma_bug\t: %s\n" >> "fpu\t\t: %s\n" >> "fpu_exception\t: %s\n" >> -"cpuid level\t: %d\n" >> -"wp\t\t: yes\n", >> +"cpuid level\t: %d\n", >> static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", >> static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", >> static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", >> @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct >> cpuinfo_x86 *c) >> seq_printf(m, >> "fpu\t\t: yes\n" >> "fpu_exception\t: yes\n" >> -"cpuid level\t: %d\n" >> -"wp\t\t: yes\n", >> +"cpuid level\t: %d\n", >> c->cpuid_level); >> } >> #endif >> > > Potential userspace breakage, which is why the line was left in the > first place despite its value now being hard-coded. Removing it will > save a whopping 9 bytes of kernel rodata; this is a very small price to > pay for guaranteeing continued compatibility. Indeed. That's why I've separated the removal into an extra patch -- to make it easier not to take it. > > Nacked-by: H. Peter Anvin Do you want me to send the series again without this patch and patch #6 (Geert took it already) or are you okay with sorting them out yourself? Cheers, Mathias
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 17:20, Borislav Petkov <b...@alien8.de> wrote: > On Sun, Feb 12, 2017 at 10:12:09PM +0100, Mathias Krause wrote: >> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >> correctly. This makes a process reading this file always see "wp : yes" >> here -- otherwise there would be no process to begin with ;) >> >> As this status line in /proc/cpuinfo serves no purpose for quite some >> time now, get rid of it. > > Right, sure, except /proc/cpuinfo's format is kind of an ABI and scripts > rely on it, I'm being told. That's the reason I haven't folded this change into patch 2. I had similar doubts but it's not documented in Documentation/ and kinda useless to test anyway -- what would a "wp : no" tell one? > TBH, I'd remove that wp:-line too but this > is just me. tip guys' call. > > FWIW, for all three: > > Acked-by: Borislav Petkov <b...@suse.de> Thanks, Mathias
Re: [PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
On 14 February 2017 at 17:20, Borislav Petkov wrote: > On Sun, Feb 12, 2017 at 10:12:09PM +0100, Mathias Krause wrote: >> As of commit a5c2a893dbd4 ("x86, 386 removal: Remove >> CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working >> correctly. This makes a process reading this file always see "wp : yes" >> here -- otherwise there would be no process to begin with ;) >> >> As this status line in /proc/cpuinfo serves no purpose for quite some >> time now, get rid of it. > > Right, sure, except /proc/cpuinfo's format is kind of an ABI and scripts > rely on it, I'm being told. That's the reason I haven't folded this change into patch 2. I had similar doubts but it's not documented in Documentation/ and kinda useless to test anyway -- what would a "wp : no" tell one? > TBH, I'd remove that wp:-line too but this > is just me. tip guys' call. > > FWIW, for all three: > > Acked-by: Borislav Petkov Thanks, Mathias
Re: [PATCH 1/6] x86: drop unneded members of struct cpuinfo_x86
On 14 February 2017 at 17:17, Borislav Petkov <b...@alien8.de> wrote: > On Sun, Feb 12, 2017 at 10:12:07PM +0100, Mathias Krause wrote: >> Those member serve no purpose -- not even fill padding for alignment or >> such. So just get rid of them. > > Well, almost. You need the wp_works_ok removal patch too, otherwise you > have the 3 bytes hole below. Heh, indeed! But only since commit 79a8b9aa388b ("x86/CPU/AMD: Bring back Compute Unit ID") ;) > But the wp_works_ok goes away too so I guess that's fine. > > $ pahole -C cpuinfo_x86 vmlinux > struct cpuinfo_x86 { > __u8 x86; /* 0 1 */ > __u8 x86_vendor; /* 1 1 */ > __u8 x86_model;/* 2 1 */ > __u8 x86_mask; /* 3 1 */ > char wp_works_ok; /* 4 1 */ > __u8 x86_virt_bits;/* 5 1 */ > __u8 x86_phys_bits;/* 6 1 */ > __u8 x86_coreid_bits; /* 7 1 */ > __u8 cu_id;/* 8 1 */ > > /* XXX 3 bytes hole, try to pack */ The cu_id member is "new". Without it there would be no hole. But, yeah, without wp_works_ok everything's fine again. Cheers, Mathias
Re: [PATCH 1/6] x86: drop unneded members of struct cpuinfo_x86
On 14 February 2017 at 17:17, Borislav Petkov wrote: > On Sun, Feb 12, 2017 at 10:12:07PM +0100, Mathias Krause wrote: >> Those member serve no purpose -- not even fill padding for alignment or >> such. So just get rid of them. > > Well, almost. You need the wp_works_ok removal patch too, otherwise you > have the 3 bytes hole below. Heh, indeed! But only since commit 79a8b9aa388b ("x86/CPU/AMD: Bring back Compute Unit ID") ;) > But the wp_works_ok goes away too so I guess that's fine. > > $ pahole -C cpuinfo_x86 vmlinux > struct cpuinfo_x86 { > __u8 x86; /* 0 1 */ > __u8 x86_vendor; /* 1 1 */ > __u8 x86_model;/* 2 1 */ > __u8 x86_mask; /* 3 1 */ > char wp_works_ok; /* 4 1 */ > __u8 x86_virt_bits;/* 5 1 */ > __u8 x86_phys_bits;/* 6 1 */ > __u8 x86_coreid_bits; /* 7 1 */ > __u8 cu_id;/* 8 1 */ > > /* XXX 3 bytes hole, try to pack */ The cu_id member is "new". Without it there would be no hole. But, yeah, without wp_works_ok everything's fine again. Cheers, Mathias
[PATCH 2/6] x86/cpu: drop wp_works_ok member of struct cpuinfo_x86
Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit d560bc61575e ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Cc: Borislav Petkov <b...@alien8.de> Cc: H. Peter Anvin <h...@linux.intel.com> Cc: Arnd Hannemann <hannem...@nets.rwth-aachen.de> Cc: Jeremy Fitzhardinge <jeremy.fitzhardi...@citrix.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/x86/include/asm/processor.h |4 +--- arch/x86/kernel/cpu/proc.c |5 ++--- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c|9 + arch/x86/xen/enlighten.c |1 - 5 files changed, 12 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index bf7cb1e00ce7..7b15b29e8a66 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -89,9 +89,7 @@ struct cpuinfo_x86 { __u8x86_vendor; /* CPU vendor */ __u8x86_model; __u8x86_mask; -#ifdef CONFIG_X86_32 - charwp_works_ok;/* It doesn't on 386's */ -#else +#ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; #endif diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 18ca99f2798b..6df621ae62a7 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -31,14 +31,13 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "fpu\t\t: %s\n" "fpu_exception\t: %s\n" "cpuid level\t: %d\n" - "wp\t\t: %s\n", + "wp\t\t: yes\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", - c->cpuid_level, - c->wp_works_ok ? "yes" : "no"); + c->cpuid_level); } #else static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4cfba947d774..ffc2791ab256 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -173,14 +173,11 @@ int default_check_phys_apicid_present(int phys_apicid) #ifdef CONFIG_X86_32 -/* cpu data as detected by the assembly code in head.S */ -struct cpuinfo_x86 new_cpu_data = { - .wp_works_ok = -1, -}; +/* cpu data as detected by the assembly code in head_32.S */ +struct cpuinfo_x86 new_cpu_data; + /* common cpu data for all cpus */ -struct cpuinfo_x86 boot_cpu_data __read_mostly = { - .wp_works_ok = -1, -}; +struct cpuinfo_x86 boot_cpu_data __read_mostly; EXPORT_SYMBOL(boot_cpu_data); unsigned int def_to_bigsmp; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 928d657de829..e0fd0c8b9ad1 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -716,15 +716,17 @@ void __init paging_init(void) */ static void __init test_wp_bit(void) { + int wp_works_ok; + printk(KERN_INFO "Checking if this processor honours the WP bit even in supervisor mode..."); /* Any page-aligned address will do, the test is non-destructive */ __set_fixmap(FIX_WP_TEST, __pa(_pg_dir), PAGE_KERNEL_RO); - boot_cpu_data.wp_works_ok = do_test_wp_bit(); + wp_works_ok = do_test_wp_bit(); clear_fixmap(FIX_WP_TEST); - if (!boot_cpu_data.wp_works_ok) { + if (!wp_works_ok) {
[PATCH 4/6] sparc: remove unused wp_works_ok macro
It's unused for ages, used to be required for ksyms.c back in the v1.1 times. Cc: David S. Miller <da...@davemloft.net> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 2 files changed, 10 deletions(-) diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 365d4cb267b4..dd27159819eb 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -18,12 +18,6 @@ #include #include -/* - * The sparc has no problems with write protection - */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* Whee, this is STACK_TOP + PAGE_SIZE and the lowest kernel address too... * That one page is used to protect kernel from intruders, so that * we can make our access_ok test faster diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 6448cfc8292f..b58ee9018433 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -18,10 +18,6 @@ #include #include -/* The sparc has no problems with write protection */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* * User lives in his very own context, and cannot reference us. Note * that TASK_SIZE is a misnomer, it really gives maximum user virtual -- 1.7.10.4
[PATCH 2/6] x86/cpu: drop wp_works_ok member of struct cpuinfo_x86
Remove the wp_works_ok member of struct cpuinfo_x86. It's an optimization back from Linux v0.99 times where we had no fixup support yet and did the CR0.WP test via special code in the page fault handler. The < 0 test was an optimization to not do the special casing for each NULL ptr access violation but just for the first one doing the WP test. Today it serves no real purpose as the test no longer needs special code in the page fault handler and the only call side -- mem_init() -- calls it just once, anyway. However, Xen pre-initializes it to 1, to skip the test. Doing the test again for Xen should be no issue at all, as even the commit introducing skipping the test (commit d560bc61575e ("x86, xen: Suppress WP test on Xen")) mentioned it being ban aid only. And, in fact, testing the patch on Xen showed nothing breaks. The pre-fixup times are long gone and with the removal of the fallback handling code in commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel requires a working CR0.WP anyway. So just get rid of the "optimization" and do the test unconditionally. Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Arnd Hannemann Cc: Jeremy Fitzhardinge Signed-off-by: Mathias Krause --- arch/x86/include/asm/processor.h |4 +--- arch/x86/kernel/cpu/proc.c |5 ++--- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c|9 + arch/x86/xen/enlighten.c |1 - 5 files changed, 12 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index bf7cb1e00ce7..7b15b29e8a66 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -89,9 +89,7 @@ struct cpuinfo_x86 { __u8x86_vendor; /* CPU vendor */ __u8x86_model; __u8x86_mask; -#ifdef CONFIG_X86_32 - charwp_works_ok;/* It doesn't on 386's */ -#else +#ifdef CONFIG_X86_64 /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; #endif diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 18ca99f2798b..6df621ae62a7 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -31,14 +31,13 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "fpu\t\t: %s\n" "fpu_exception\t: %s\n" "cpuid level\t: %d\n" - "wp\t\t: %s\n", + "wp\t\t: yes\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", static_cpu_has(X86_FEATURE_FPU) ? "yes" : "no", - c->cpuid_level, - c->wp_works_ok ? "yes" : "no"); + c->cpuid_level); } #else static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 4cfba947d774..ffc2791ab256 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -173,14 +173,11 @@ int default_check_phys_apicid_present(int phys_apicid) #ifdef CONFIG_X86_32 -/* cpu data as detected by the assembly code in head.S */ -struct cpuinfo_x86 new_cpu_data = { - .wp_works_ok = -1, -}; +/* cpu data as detected by the assembly code in head_32.S */ +struct cpuinfo_x86 new_cpu_data; + /* common cpu data for all cpus */ -struct cpuinfo_x86 boot_cpu_data __read_mostly = { - .wp_works_ok = -1, -}; +struct cpuinfo_x86 boot_cpu_data __read_mostly; EXPORT_SYMBOL(boot_cpu_data); unsigned int def_to_bigsmp; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 928d657de829..e0fd0c8b9ad1 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -716,15 +716,17 @@ void __init paging_init(void) */ static void __init test_wp_bit(void) { + int wp_works_ok; + printk(KERN_INFO "Checking if this processor honours the WP bit even in supervisor mode..."); /* Any page-aligned address will do, the test is non-destructive */ __set_fixmap(FIX_WP_TEST, __pa(_pg_dir), PAGE_KERNEL_RO); - boot_cpu_data.wp_works_ok = do_test_wp_bit(); + wp_works_ok = do_test_wp_bit(); clear_fixmap(FIX_WP_TEST); - if (!boot_cpu_data.wp_works_ok) { + if (!wp_works_ok) { printk(KERN_CONT "No.\n"); panic("Linux doesn't support CPUs with broken WP."); } el
[PATCH 4/6] sparc: remove unused wp_works_ok macro
It's unused for ages, used to be required for ksyms.c back in the v1.1 times. Cc: David S. Miller Signed-off-by: Mathias Krause --- arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 2 files changed, 10 deletions(-) diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h index 365d4cb267b4..dd27159819eb 100644 --- a/arch/sparc/include/asm/processor_32.h +++ b/arch/sparc/include/asm/processor_32.h @@ -18,12 +18,6 @@ #include #include -/* - * The sparc has no problems with write protection - */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* Whee, this is STACK_TOP + PAGE_SIZE and the lowest kernel address too... * That one page is used to protect kernel from intruders, so that * we can make our access_ok test faster diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h index 6448cfc8292f..b58ee9018433 100644 --- a/arch/sparc/include/asm/processor_64.h +++ b/arch/sparc/include/asm/processor_64.h @@ -18,10 +18,6 @@ #include #include -/* The sparc has no problems with write protection */ -#define wp_works_ok 1 -#define wp_works_ok__is_a_macro /* for versions in ksyms.c */ - /* * User lives in his very own context, and cannot reference us. Note * that TASK_SIZE is a misnomer, it really gives maximum user virtual -- 1.7.10.4
[PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
As of commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working correctly. This makes a process reading this file always see "wp : yes" here -- otherwise there would be no process to begin with ;) As this status line in /proc/cpuinfo serves no purpose for quite some time now, get rid of it. Cc: Borislav Petkov <b...@alien8.de> Cc: H. Peter Anvin <h...@linux.intel.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/x86/kernel/cpu/proc.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 6df621ae62a7..c6c5217a7980 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "coma_bug\t: %s\n" "fpu\t\t: %s\n" "fpu_exception\t: %s\n" - "cpuid level\t: %d\n" - "wp\t\t: yes\n", + "cpuid level\t: %d\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) seq_printf(m, "fpu\t\t: yes\n" "fpu_exception\t: yes\n" - "cpuid level\t: %d\n" - "wp\t\t: yes\n", + "cpuid level\t: %d\n", c->cpuid_level); } #endif -- 1.7.10.4
[PATCH 6/6] m68k: paging_init - remove dead code
The macro TEST_VERIFY_AREA can never be defined as there's no wp_works_ok variable. So just remove the dead code. Cc: Geert Uytterhoeven <ge...@linux-m68k.org> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/m68k/mm/sun3mmu.c |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/m68k/mm/sun3mmu.c b/arch/m68k/mm/sun3mmu.c index b5b7d53f7283..177d776de1a0 100644 --- a/arch/m68k/mm/sun3mmu.c +++ b/arch/m68k/mm/sun3mmu.c @@ -44,9 +44,6 @@ void __init paging_init(void) unsigned long zones_size[MAX_NR_ZONES] = { 0, }; unsigned long size; -#ifdef TEST_VERIFY_AREA - wp_works_ok = 0; -#endif empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); address = PAGE_OFFSET; -- 1.7.10.4
[PATCH 3/6] x86/cpu: proc - remove "wp" status line in cpuinfo
As of commit a5c2a893dbd4 ("x86, 386 removal: Remove CONFIG_X86_WP_WORKS_OK") the kernel won't boot if CR0.WP isn't working correctly. This makes a process reading this file always see "wp : yes" here -- otherwise there would be no process to begin with ;) As this status line in /proc/cpuinfo serves no purpose for quite some time now, get rid of it. Cc: Borislav Petkov Cc: H. Peter Anvin Signed-off-by: Mathias Krause --- arch/x86/kernel/cpu/proc.c |6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 6df621ae62a7..c6c5217a7980 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -30,8 +30,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) "coma_bug\t: %s\n" "fpu\t\t: %s\n" "fpu_exception\t: %s\n" - "cpuid level\t: %d\n" - "wp\t\t: yes\n", + "cpuid level\t: %d\n", static_cpu_has_bug(X86_BUG_FDIV) ? "yes" : "no", static_cpu_has_bug(X86_BUG_F00F) ? "yes" : "no", static_cpu_has_bug(X86_BUG_COMA) ? "yes" : "no", @@ -45,8 +44,7 @@ static void show_cpuinfo_misc(struct seq_file *m, struct cpuinfo_x86 *c) seq_printf(m, "fpu\t\t: yes\n" "fpu_exception\t: yes\n" - "cpuid level\t: %d\n" - "wp\t\t: yes\n", + "cpuid level\t: %d\n", c->cpuid_level); } #endif -- 1.7.10.4
[PATCH 6/6] m68k: paging_init - remove dead code
The macro TEST_VERIFY_AREA can never be defined as there's no wp_works_ok variable. So just remove the dead code. Cc: Geert Uytterhoeven Signed-off-by: Mathias Krause --- arch/m68k/mm/sun3mmu.c |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/m68k/mm/sun3mmu.c b/arch/m68k/mm/sun3mmu.c index b5b7d53f7283..177d776de1a0 100644 --- a/arch/m68k/mm/sun3mmu.c +++ b/arch/m68k/mm/sun3mmu.c @@ -44,9 +44,6 @@ void __init paging_init(void) unsigned long zones_size[MAX_NR_ZONES] = { 0, }; unsigned long size; -#ifdef TEST_VERIFY_AREA - wp_works_ok = 0; -#endif empty_zero_page = alloc_bootmem_pages(PAGE_SIZE); address = PAGE_OFFSET; -- 1.7.10.4
[PATCH 5/6] cris: remove unused wp_works_ok macro
It had no use since it's introduction in v2.4.1.2. Get rid of it. Cc: Jesper Nilsson <jesper.nils...@axis.com> Cc: Mikael Starvik <star...@axis.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/cris/include/arch-v10/arch/processor.h |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/cris/include/arch-v10/arch/processor.h b/arch/cris/include/arch-v10/arch/processor.h index 93feb2a487d8..58f75bee1d6c 100644 --- a/arch/cris/include/arch-v10/arch/processor.h +++ b/arch/cris/include/arch-v10/arch/processor.h @@ -7,9 +7,6 @@ */ #define current_text_addr() ({void *pc; __asm__ ("move.d $pc,%0" : "=rm" (pc)); pc; }) -/* CRIS has no problems with write protection */ -#define wp_works_ok 1 - /* CRIS thread_struct. this really has nothing to do with the processor itself, since * CRIS does not do any hardware task-switching, but it's here for legacy reasons. * The thread_struct here is used when task-switching using _resume defined in entry.S. -- 1.7.10.4
[PATCH 5/6] cris: remove unused wp_works_ok macro
It had no use since it's introduction in v2.4.1.2. Get rid of it. Cc: Jesper Nilsson Cc: Mikael Starvik Signed-off-by: Mathias Krause --- arch/cris/include/arch-v10/arch/processor.h |3 --- 1 file changed, 3 deletions(-) diff --git a/arch/cris/include/arch-v10/arch/processor.h b/arch/cris/include/arch-v10/arch/processor.h index 93feb2a487d8..58f75bee1d6c 100644 --- a/arch/cris/include/arch-v10/arch/processor.h +++ b/arch/cris/include/arch-v10/arch/processor.h @@ -7,9 +7,6 @@ */ #define current_text_addr() ({void *pc; __asm__ ("move.d $pc,%0" : "=rm" (pc)); pc; }) -/* CRIS has no problems with write protection */ -#define wp_works_ok 1 - /* CRIS thread_struct. this really has nothing to do with the processor itself, since * CRIS does not do any hardware task-switching, but it's here for legacy reasons. * The thread_struct here is used when task-switching using _resume defined in entry.S. -- 1.7.10.4
[PATCH 0/6] struct cpuinfo_x86 related cleanups
This small series slims down the x86 specific struct cpuinfo_x86 (patches 1 and 2). It's kind of a continuation of Boris' work from 2013. Beside the x86 specific part it also cleans up other arches that had to have a wp_works_ok variable / define back in the old v1.1 times. But those times are long gone, so we can get rid of that ancient hackery. Therefore patches 4, 5 and 6 are independent of the x86 specific changes and could be taken by the individual arch maintainers if they prefer to. Please apply, Mathias Mathias Krause (6): x86: drop unneded members of struct cpuinfo_x86 x86/cpu: drop wp_works_ok member of struct cpuinfo_x86 x86/cpu: proc - remove "wp" status line in cpuinfo sparc: remove unused wp_works_ok macro cris: remove unused wp_works_ok macro m68k: paging_init - remove dead code arch/cris/include/arch-v10/arch/processor.h |3 --- arch/m68k/mm/sun3mmu.c |3 --- arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 arch/x86/include/asm/processor.h| 11 ++- arch/x86/kernel/cpu/proc.c |9 +++-- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c |9 + arch/x86/xen/enlighten.c|1 - 9 files changed, 14 insertions(+), 43 deletions(-) -- 1.7.10.4
[PATCH 1/6] x86: drop unneded members of struct cpuinfo_x86
Those member serve no purpose -- not even fill padding for alignment or such. So just get rid of them. Cc: Borislav Petkov <b...@alien8.de> Cc: H. Peter Anvin <h...@linux.intel.com> Signed-off-by: Mathias Krause <mini...@googlemail.com> --- arch/x86/include/asm/processor.h |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index e6cfe7ba2d65..bf7cb1e00ce7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -80,7 +80,7 @@ enum tlb_infos { /* * CPU type and hardware bug flags. Kept separately for each CPU. - * Members of this structure are referenced in head.S, so think twice + * Members of this structure are referenced in head_32.S, so think twice * before touching them. [mj] */ @@ -91,11 +91,6 @@ struct cpuinfo_x86 { __u8x86_mask; #ifdef CONFIG_X86_32 charwp_works_ok;/* It doesn't on 386's */ - - /* Problems on some 486Dx4's and old 386's: */ - charrfu; - charpad0; - charpad1; #else /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; -- 1.7.10.4
[PATCH 0/6] struct cpuinfo_x86 related cleanups
This small series slims down the x86 specific struct cpuinfo_x86 (patches 1 and 2). It's kind of a continuation of Boris' work from 2013. Beside the x86 specific part it also cleans up other arches that had to have a wp_works_ok variable / define back in the old v1.1 times. But those times are long gone, so we can get rid of that ancient hackery. Therefore patches 4, 5 and 6 are independent of the x86 specific changes and could be taken by the individual arch maintainers if they prefer to. Please apply, Mathias Mathias Krause (6): x86: drop unneded members of struct cpuinfo_x86 x86/cpu: drop wp_works_ok member of struct cpuinfo_x86 x86/cpu: proc - remove "wp" status line in cpuinfo sparc: remove unused wp_works_ok macro cris: remove unused wp_works_ok macro m68k: paging_init - remove dead code arch/cris/include/arch-v10/arch/processor.h |3 --- arch/m68k/mm/sun3mmu.c |3 --- arch/sparc/include/asm/processor_32.h |6 -- arch/sparc/include/asm/processor_64.h |4 arch/x86/include/asm/processor.h| 11 ++- arch/x86/kernel/cpu/proc.c |9 +++-- arch/x86/kernel/setup.c | 11 --- arch/x86/mm/init_32.c |9 + arch/x86/xen/enlighten.c|1 - 9 files changed, 14 insertions(+), 43 deletions(-) -- 1.7.10.4
[PATCH 1/6] x86: drop unneded members of struct cpuinfo_x86
Those member serve no purpose -- not even fill padding for alignment or such. So just get rid of them. Cc: Borislav Petkov Cc: H. Peter Anvin Signed-off-by: Mathias Krause --- arch/x86/include/asm/processor.h |7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index e6cfe7ba2d65..bf7cb1e00ce7 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -80,7 +80,7 @@ enum tlb_infos { /* * CPU type and hardware bug flags. Kept separately for each CPU. - * Members of this structure are referenced in head.S, so think twice + * Members of this structure are referenced in head_32.S, so think twice * before touching them. [mj] */ @@ -91,11 +91,6 @@ struct cpuinfo_x86 { __u8x86_mask; #ifdef CONFIG_X86_32 charwp_works_ok;/* It doesn't on 386's */ - - /* Problems on some 486Dx4's and old 386's: */ - charrfu; - charpad0; - charpad1; #else /* Number of 4K pages in DTLB/ITLB combined(in pages): */ int x86_tlbsize; -- 1.7.10.4
Re: [PATCH v1 0/2] Introduce the initify gcc plugin
Hi Emese, On Tue, Jun 28, 2016 at 01:34:07PM +0200, Emese Revfy wrote: > I would like to introduce the initify gcc plugin. The kernel already has > a mechanism to free up code and data memory that is only used during kernel > or module initialization. > This plugin will teach the compiler to find more such code and data that > can be freed after initialization. It reduces memory usage. > The initify gcc plugin can be useful for embedded systems. > > It is a CII project supported by the Linux Foundation. > > This plugin is the part of grsecurity/PaX. > > The plugin supports all gcc versions from 4.5 to 6.0. > > I made some changes on top of the PaX version (since March 6.). These are > the important ones: > * move all local strings to init.rodata.str and exit.rodata.str >(not just __func__) > * report all initified strings and functions >(GCC_PLUGIN_INITIFY_VERBOSE config option) > * automatically discover init/exit functions and apply the __init or >__exit attributes on them > > You can find more about the changes here: > https://github.com/ephox-gcc-plugins/initify > > This patch set is based on the "Add support for complex gcc plugins that > don't fit in a single file" patch set (git/kees/linux.git#kspp HEAD: > e5d4798b284cd192c8b). > > Some statistics about the plugin: > > On allyes config (amd64, gcc-6): > * 7731 initified strings > * 231 initified functions > > On allmod config (i386, gcc-6): > * 8846 initified strings > * 252 initified functions > > On allyes config (amd64, gcc-6): > > section vanilla vanilla + initifychange > --- > .rodata 39059688 (0x25400e8)38527210 (0x24be0ea)-532478 > .data 45744128 (0x2ba)45404160 (0x2b4d000)-339968 > .init.data 1361144 (0x14c4f8) 1674200 (0x198bd8)+313056 > .text 77615128 (0x4a05018)77576664 (0x49fb9d8) -38464 > .init.text 1108455 (0x10e9e7) 1137618 (0x115bd2) +29163 You should probably provide numbers for .init.rodata.str, .exit.rodata.str and .exit.text as well. Otherwise this delta calculation suggests a rather gigantic image size reduction which is probably not the case ;) Also a comparison of the final kernel image size would be nice to see if the string duplication issue mentioned in [1] is actually an issue. [1] http://marc.info/?l=linux-kernel=140364632417795=2 Thanks, Mathias > > > Emese Revfy (2): > Add the initify gcc plugin > Mark functions with the __nocapture attribute > > --- > arch/Kconfig | 23 + > arch/arm/include/asm/string.h| 10 +- > arch/arm64/include/asm/string.h | 23 +- > arch/powerpc/include/asm/string.h| 19 +- > arch/x86/boot/string.h |4 +- > arch/x86/include/asm/string_32.h | 21 +- > arch/x86/include/asm/string_64.h | 18 +- > arch/x86/kernel/hpet.c |2 +- > include/asm-generic/bug.h|6 +- > include/asm-generic/vmlinux.lds.h|2 + > include/linux/compiler-gcc.h | 10 +- > include/linux/compiler.h |4 + > include/linux/fs.h |5 +- > include/linux/printk.h |2 +- > include/linux/string.h | 73 +-- > scripts/Makefile.gcc-plugins |4 + > scripts/gcc-plugins/initify_plugin.c | 1147 > ++ > 17 files changed, 1283 insertions(+), 90 deletions(-)
Re: [PATCH v1 0/2] Introduce the initify gcc plugin
Hi Emese, On Tue, Jun 28, 2016 at 01:34:07PM +0200, Emese Revfy wrote: > I would like to introduce the initify gcc plugin. The kernel already has > a mechanism to free up code and data memory that is only used during kernel > or module initialization. > This plugin will teach the compiler to find more such code and data that > can be freed after initialization. It reduces memory usage. > The initify gcc plugin can be useful for embedded systems. > > It is a CII project supported by the Linux Foundation. > > This plugin is the part of grsecurity/PaX. > > The plugin supports all gcc versions from 4.5 to 6.0. > > I made some changes on top of the PaX version (since March 6.). These are > the important ones: > * move all local strings to init.rodata.str and exit.rodata.str >(not just __func__) > * report all initified strings and functions >(GCC_PLUGIN_INITIFY_VERBOSE config option) > * automatically discover init/exit functions and apply the __init or >__exit attributes on them > > You can find more about the changes here: > https://github.com/ephox-gcc-plugins/initify > > This patch set is based on the "Add support for complex gcc plugins that > don't fit in a single file" patch set (git/kees/linux.git#kspp HEAD: > e5d4798b284cd192c8b). > > Some statistics about the plugin: > > On allyes config (amd64, gcc-6): > * 7731 initified strings > * 231 initified functions > > On allmod config (i386, gcc-6): > * 8846 initified strings > * 252 initified functions > > On allyes config (amd64, gcc-6): > > section vanilla vanilla + initifychange > --- > .rodata 39059688 (0x25400e8)38527210 (0x24be0ea)-532478 > .data 45744128 (0x2ba)45404160 (0x2b4d000)-339968 > .init.data 1361144 (0x14c4f8) 1674200 (0x198bd8)+313056 > .text 77615128 (0x4a05018)77576664 (0x49fb9d8) -38464 > .init.text 1108455 (0x10e9e7) 1137618 (0x115bd2) +29163 You should probably provide numbers for .init.rodata.str, .exit.rodata.str and .exit.text as well. Otherwise this delta calculation suggests a rather gigantic image size reduction which is probably not the case ;) Also a comparison of the final kernel image size would be nice to see if the string duplication issue mentioned in [1] is actually an issue. [1] http://marc.info/?l=linux-kernel=140364632417795=2 Thanks, Mathias > > > Emese Revfy (2): > Add the initify gcc plugin > Mark functions with the __nocapture attribute > > --- > arch/Kconfig | 23 + > arch/arm/include/asm/string.h| 10 +- > arch/arm64/include/asm/string.h | 23 +- > arch/powerpc/include/asm/string.h| 19 +- > arch/x86/boot/string.h |4 +- > arch/x86/include/asm/string_32.h | 21 +- > arch/x86/include/asm/string_64.h | 18 +- > arch/x86/kernel/hpet.c |2 +- > include/asm-generic/bug.h|6 +- > include/asm-generic/vmlinux.lds.h|2 + > include/linux/compiler-gcc.h | 10 +- > include/linux/compiler.h |4 + > include/linux/fs.h |5 +- > include/linux/printk.h |2 +- > include/linux/string.h | 73 +-- > scripts/Makefile.gcc-plugins |4 + > scripts/gcc-plugins/initify_plugin.c | 1147 > ++ > 17 files changed, 1283 insertions(+), 90 deletions(-)
[tip:x86/mm] x86/extable: Ensure entries are swapped completely when sorting
Commit-ID: 67d7a982bab6702d84415ea889996fae72a7d3b2 Gitweb: http://git.kernel.org/tip/67d7a982bab6702d84415ea889996fae72a7d3b2 Author: Mathias Krause <mini...@googlemail.com> AuthorDate: Tue, 10 May 2016 23:07:02 +0200 Committer: Ingo Molnar <mi...@kernel.org> CommitDate: Wed, 11 May 2016 11:14:06 +0200 x86/extable: Ensure entries are swapped completely when sorting The x86 exception table sorting was changed in this recent commit: 29934b0fb8ff ("x86/extable: use generic search and sort routines") ... to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at: https://lkml.org/lkml/2016/1/27/232 The committed version kind of attempted to incorporate the changes of contemporary commit done in the x86 tree: 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") ... as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only be partly swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. Signed-off-by: Mathias Krause <mini...@googlemail.com> Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Andrew Morton <a...@linux-foundation.org> Cc: Andy Lutomirski <l...@amacapital.net> Cc: Andy Lutomirski <l...@kernel.org> Cc: Borislav Petkov <b...@alien8.de> Cc: Borislav Petkov <b...@suse.de> Cc: Brian Gerst <brge...@gmail.com> Cc: Denys Vlasenko <dvlas...@redhat.com> Cc: H. Peter Anvin <h...@zytor.com> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Tony Luck <tony.l...@intel.com> Link: http://lkml.kernel.org/r/1462914422-2911-1-git-send-email-mini...@googlemail.com Signed-off-by: Ingo Molnar <mi...@kernel.org> --- arch/x86/include/asm/uaccess.h | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 8b3fb76..86c48f3 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -108,6 +108,14 @@ struct exception_table_entry { #define ARCH_HAS_RELATIVE_EXTABLE +#define swap_ex_entry_fixup(a, b, tmp, delta) \ + do {\ + (a)->fixup = (b)->fixup + (delta); \ + (b)->fixup = (tmp).fixup - (delta); \ + (a)->handler = (b)->handler + (delta); \ + (b)->handler = (tmp).handler - (delta); \ + } while (0) + extern int fixup_exception(struct pt_regs *regs, int trapnr); extern bool ex_has_fault_handler(unsigned long ip); extern int early_fixup_exception(unsigned long *ip);
[tip:x86/mm] x86/extable: Ensure entries are swapped completely when sorting
Commit-ID: 67d7a982bab6702d84415ea889996fae72a7d3b2 Gitweb: http://git.kernel.org/tip/67d7a982bab6702d84415ea889996fae72a7d3b2 Author: Mathias Krause AuthorDate: Tue, 10 May 2016 23:07:02 +0200 Committer: Ingo Molnar CommitDate: Wed, 11 May 2016 11:14:06 +0200 x86/extable: Ensure entries are swapped completely when sorting The x86 exception table sorting was changed in this recent commit: 29934b0fb8ff ("x86/extable: use generic search and sort routines") ... to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at: https://lkml.org/lkml/2016/1/27/232 The committed version kind of attempted to incorporate the changes of contemporary commit done in the x86 tree: 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") ... as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only be partly swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. Signed-off-by: Mathias Krause Reviewed-by: Ard Biesheuvel Cc: Andrew Morton Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Tony Luck Link: http://lkml.kernel.org/r/1462914422-2911-1-git-send-email-mini...@googlemail.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uaccess.h | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 8b3fb76..86c48f3 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -108,6 +108,14 @@ struct exception_table_entry { #define ARCH_HAS_RELATIVE_EXTABLE +#define swap_ex_entry_fixup(a, b, tmp, delta) \ + do {\ + (a)->fixup = (b)->fixup + (delta); \ + (b)->fixup = (tmp).fixup - (delta); \ + (a)->handler = (b)->handler + (delta); \ + (b)->handler = (tmp).handler - (delta); \ + } while (0) + extern int fixup_exception(struct pt_regs *regs, int trapnr); extern bool ex_has_fault_handler(unsigned long ip); extern int early_fixup_exception(unsigned long *ip);
Re: [PATCH] x86/extable: ensure entries are swapped completely when sorting
On 11 May 2016 at 07:46, Ard Biesheuvel <ard.biesheu...@linaro.org> wrote: > On 10 May 2016 at 23:07, Mathias Krause <mini...@googlemail.com> wrote: >> The x86 exception table sorting was changed in commit 29934b0fb8ff >> ("x86/extable: use generic search and sort routines") to use the arch >> independent code in lib/extable.c. However, the patch was mangled >> somehow on its way into the kernel from the last version posted at [1]. >> The committed version kind of attempted to incorporate the changes of >> commit 548acf19234d ("x86/mm: Expand the exception table logic to allow >> new handling options") as in _completely_ _ignoring_ the x86 specific >> 'handler' member of struct exception_table_entry. This effectively broke >> the sorting as entries will only partly be swapped now. >> > > OK, it appears that Tony's patches grew a 'handler' field fairly late > in their review cycle, and I didn't pick up on that. Apologies. No need to apologize, as your patch *must* have created conflicts for the committer applying it. It's just that those conflicts were resolved wrongly. He/She should have asked you for a new version, but... did not :/ >> Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the >> exception table doesn't need to be sorted at runtime. However, in case >> that ever changes, we better not break the exception table sorting just >> because of that. >> > > BUILDTIME_EXTABLE_SORT applies to the core image only, but we still > rely on the sorting routines for modules in that case. So I think this > needs to be fixed right away. Good point! > >> Fix this by providing a swap_ex_entry_fixup() macro that takes care of >> the 'handler' member. >> >> [1] https://lkml.org/lkml/2016/1/27/232 >> >> Signed-off-by: Mathias Krause <mini...@googlemail.com> >> Cc: Andrew Morton <a...@linux-foundation.org> >> Cc: Andy Lutomirski <l...@kernel.org> >> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> >> Cc: Borislav Petkov <b...@suse.de> >> Cc: H. Peter Anvin <h...@linux.intel.com> >> Cc: Ingo Molnar <mi...@kernel.org> >> Cc: Linus Torvalds <torva...@linux-foundation.org> >> Cc: Thomas Gleixner <t...@linutronix.de> >> Cc: Tony Luck <tony.l...@intel.com> > > Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines") > Reviewed-by: Ard Biesheuvel <ard.biesheu...@linaro.org> > > Thanks, > Ard. > Thanks, Mathias
Re: [PATCH] x86/extable: ensure entries are swapped completely when sorting
On 11 May 2016 at 07:46, Ard Biesheuvel wrote: > On 10 May 2016 at 23:07, Mathias Krause wrote: >> The x86 exception table sorting was changed in commit 29934b0fb8ff >> ("x86/extable: use generic search and sort routines") to use the arch >> independent code in lib/extable.c. However, the patch was mangled >> somehow on its way into the kernel from the last version posted at [1]. >> The committed version kind of attempted to incorporate the changes of >> commit 548acf19234d ("x86/mm: Expand the exception table logic to allow >> new handling options") as in _completely_ _ignoring_ the x86 specific >> 'handler' member of struct exception_table_entry. This effectively broke >> the sorting as entries will only partly be swapped now. >> > > OK, it appears that Tony's patches grew a 'handler' field fairly late > in their review cycle, and I didn't pick up on that. Apologies. No need to apologize, as your patch *must* have created conflicts for the committer applying it. It's just that those conflicts were resolved wrongly. He/She should have asked you for a new version, but... did not :/ >> Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the >> exception table doesn't need to be sorted at runtime. However, in case >> that ever changes, we better not break the exception table sorting just >> because of that. >> > > BUILDTIME_EXTABLE_SORT applies to the core image only, but we still > rely on the sorting routines for modules in that case. So I think this > needs to be fixed right away. Good point! > >> Fix this by providing a swap_ex_entry_fixup() macro that takes care of >> the 'handler' member. >> >> [1] https://lkml.org/lkml/2016/1/27/232 >> >> Signed-off-by: Mathias Krause >> Cc: Andrew Morton >> Cc: Andy Lutomirski >> Cc: Ard Biesheuvel >> Cc: Borislav Petkov >> Cc: H. Peter Anvin >> Cc: Ingo Molnar >> Cc: Linus Torvalds >> Cc: Thomas Gleixner >> Cc: Tony Luck > > Fixes: 29934b0fb8f ("x86/extable: use generic search and sort routines") > Reviewed-by: Ard Biesheuvel > > Thanks, > Ard. > Thanks, Mathias
[PATCH] x86/extable: ensure entries are swapped completely when sorting
The x86 exception table sorting was changed in commit 29934b0fb8ff ("x86/extable: use generic search and sort routines") to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at [1]. The committed version kind of attempted to incorporate the changes of commit 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only partly be swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. [1] https://lkml.org/lkml/2016/1/27/232 Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Andrew Morton <a...@linux-foundation.org> Cc: Andy Lutomirski <l...@kernel.org> Cc: Ard Biesheuvel <ard.biesheu...@linaro.org> Cc: Borislav Petkov <b...@suse.de> Cc: H. Peter Anvin <h...@linux.intel.com> Cc: Ingo Molnar <mi...@kernel.org> Cc: Linus Torvalds <torva...@linux-foundation.org> Cc: Thomas Gleixner <t...@linutronix.de> Cc: Tony Luck <tony.l...@intel.com> --- arch/x86/include/asm/uaccess.h |8 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a969ae607be8..b69b7bffb0e0 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -108,6 +108,14 @@ struct exception_table_entry { #define ARCH_HAS_RELATIVE_EXTABLE +#define swap_ex_entry_fixup(a, b, tmp, delta) \ + do {\ + (a)->fixup = (b)->fixup + (delta); \ + (b)->fixup = (tmp).fixup - (delta); \ + (a)->handler = (b)->handler + (delta); \ + (b)->handler = (tmp).handler - (delta); \ + } while (0) + extern int fixup_exception(struct pt_regs *regs, int trapnr); extern bool ex_has_fault_handler(unsigned long ip); extern int early_fixup_exception(unsigned long *ip); -- 1.7.10.4
[PATCH] x86/extable: ensure entries are swapped completely when sorting
The x86 exception table sorting was changed in commit 29934b0fb8ff ("x86/extable: use generic search and sort routines") to use the arch independent code in lib/extable.c. However, the patch was mangled somehow on its way into the kernel from the last version posted at [1]. The committed version kind of attempted to incorporate the changes of commit 548acf19234d ("x86/mm: Expand the exception table logic to allow new handling options") as in _completely_ _ignoring_ the x86 specific 'handler' member of struct exception_table_entry. This effectively broke the sorting as entries will only partly be swapped now. Fortunately, the x86 Kconfig selects BUILDTIME_EXTABLE_SORT, so the exception table doesn't need to be sorted at runtime. However, in case that ever changes, we better not break the exception table sorting just because of that. Fix this by providing a swap_ex_entry_fixup() macro that takes care of the 'handler' member. [1] https://lkml.org/lkml/2016/1/27/232 Signed-off-by: Mathias Krause Cc: Andrew Morton Cc: Andy Lutomirski Cc: Ard Biesheuvel Cc: Borislav Petkov Cc: H. Peter Anvin Cc: Ingo Molnar Cc: Linus Torvalds Cc: Thomas Gleixner Cc: Tony Luck --- arch/x86/include/asm/uaccess.h |8 1 file changed, 8 insertions(+) diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index a969ae607be8..b69b7bffb0e0 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -108,6 +108,14 @@ struct exception_table_entry { #define ARCH_HAS_RELATIVE_EXTABLE +#define swap_ex_entry_fixup(a, b, tmp, delta) \ + do {\ + (a)->fixup = (b)->fixup + (delta); \ + (b)->fixup = (tmp).fixup - (delta); \ + (a)->handler = (b)->handler + (delta); \ + (b)->handler = (tmp).handler - (delta); \ + } while (0) + extern int fixup_exception(struct pt_regs *regs, int trapnr); extern bool ex_has_fault_handler(unsigned long ip); extern int early_fixup_exception(unsigned long *ip); -- 1.7.10.4
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 23:30, Andrew Morton <a...@linux-foundation.org> wrote: > On Thu, 28 Apr 2016 21:04:18 +0200 Mathias Krause <mini...@googlemail.com> > wrote: > >> If /proc//environ gets read before the envp[] array is fully set >> up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying >> to read more bytes than are actually written, as env_start will already >> be set but env_end will still be zero, making the range calculation >> underflow, allowing to read beyond the end of what has been written. >> >> Fix this as it is done for /proc//cmdline by testing env_end for >> zero. It is, apparently, intentionally set last in create_*_tables(). > > Also, if this is indeed our design then > > a) the various create_*_tables() should have comments in there which >explain this subtlety to the reader. Or, better, they use a common >helper function for this readiness-signaling operation because.. > > b) we'll need some barriers there to ensure that the environ_read() >caller sees the create_*_tables() writes in the correct order. I totally agree that this kind of "synchronization" is rather fragile. Adding comments won't help much, I fear. Rather a dedicated flag, signaling "process ready for inspection" may be needed. So far, that's what env_end is (ab-)used for. Regards, Mathias
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 23:30, Andrew Morton wrote: > On Thu, 28 Apr 2016 21:04:18 +0200 Mathias Krause > wrote: > >> If /proc//environ gets read before the envp[] array is fully set >> up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying >> to read more bytes than are actually written, as env_start will already >> be set but env_end will still be zero, making the range calculation >> underflow, allowing to read beyond the end of what has been written. >> >> Fix this as it is done for /proc//cmdline by testing env_end for >> zero. It is, apparently, intentionally set last in create_*_tables(). > > Also, if this is indeed our design then > > a) the various create_*_tables() should have comments in there which >explain this subtlety to the reader. Or, better, they use a common >helper function for this readiness-signaling operation because.. > > b) we'll need some barriers there to ensure that the environ_read() >caller sees the create_*_tables() writes in the correct order. I totally agree that this kind of "synchronization" is rather fragile. Adding comments won't help much, I fear. Rather a dedicated flag, signaling "process ready for inspection" may be needed. So far, that's what env_end is (ab-)used for. Regards, Mathias
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 23:26, Andrew Morton <a...@linux-foundation.org> wrote: > On Thu, 28 Apr 2016 21:04:18 +0200 Mathias Krause <mini...@googlemail.com> > wrote: > >> If /proc//environ gets read before the envp[] array is fully set >> up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying >> to read more bytes than are actually written, as env_start will already >> be set but env_end will still be zero, making the range calculation >> underflow, allowing to read beyond the end of what has been written. >> >> Fix this as it is done for /proc//cmdline by testing env_end for >> zero. It is, apparently, intentionally set last in create_*_tables(). >> >> This bug was found by the PaX size_overflow plugin that detected the >> arithmetic underflow of 'this_len = env_end - (env_start + src)' when >> env_end is still zero. > > So what are the implications of this? From my reading, a craftily > constructed application could occasionally read arbitrarily large > amounts of kernel memory? I don't think access_remote_vm() is capable of that. So, the only consequence is, userland trying to access /proc//environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Regards, Mathias
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 23:26, Andrew Morton wrote: > On Thu, 28 Apr 2016 21:04:18 +0200 Mathias Krause > wrote: > >> If /proc//environ gets read before the envp[] array is fully set >> up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying >> to read more bytes than are actually written, as env_start will already >> be set but env_end will still be zero, making the range calculation >> underflow, allowing to read beyond the end of what has been written. >> >> Fix this as it is done for /proc//cmdline by testing env_end for >> zero. It is, apparently, intentionally set last in create_*_tables(). >> >> This bug was found by the PaX size_overflow plugin that detected the >> arithmetic underflow of 'this_len = env_end - (env_start + src)' when >> env_end is still zero. > > So what are the implications of this? From my reading, a craftily > constructed application could occasionally read arbitrarily large > amounts of kernel memory? I don't think access_remote_vm() is capable of that. So, the only consequence is, userland trying to access /proc//environ of a not yet fully set up process may get inconsistent data as we're in the middle of copying in the environment variables. Regards, Mathias
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 21:20, Mateusz Guzikwrote: > In this case get_cmdline in mm/util.c should also be patched for > completness. It tests for arg_end, but later accesses env_end. But it'll do this only when argv[] was modified from what the kernel initially wrote, which, in turn, either requires the process to have started executing and messing with it's own argv[] or another process poking at it via ptrace(). In the former case env_end will be non-zero already and I don't know if the latter case is actually possible, i.e. if one can already attach to a process this early. If one can, then yes, that place needs to be modified, too. Thanks, Mathias
Re: [PATCH] proc: prevent accessing /proc//environ until it's ready
On 28 April 2016 at 21:20, Mateusz Guzik wrote: > In this case get_cmdline in mm/util.c should also be patched for > completness. It tests for arg_end, but later accesses env_end. But it'll do this only when argv[] was modified from what the kernel initially wrote, which, in turn, either requires the process to have started executing and messing with it's own argv[] or another process poking at it via ptrace(). In the former case env_end will be non-zero already and I don't know if the latter case is actually possible, i.e. if one can already attach to a process this early. If one can, then yes, that place needs to be modified, too. Thanks, Mathias
[PATCH] proc: prevent accessing /proc//environ until it's ready
If /proc//environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc//cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. Reported-at: https://forums.grsecurity.net/viewtopic.php?f=3=4363 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Emese Revfy <re.em...@gmail.com> Cc: Pax Team <pagee...@freemail.hu> Cc: Al Viro <v...@zeniv.linux.org.uk> Cc: Mateusz Guzik <mgu...@redhat.com> Cc: Alexey Dobriyan <adobri...@gmail.com> Cc: Cyrill Gorcunov <gorcu...@openvz.org> Cc: Jarod Wilson <ja...@redhat.com> --- fs/proc/base.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4f764c2ac1a5..45f2162e55b2 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -955,7 +955,8 @@ static ssize_t environ_read(struct file *file, char __user *buf, struct mm_struct *mm = file->private_data; unsigned long env_start, env_end; - if (!mm) + /* Ensure the process spawned far enough to have an environment. */ + if (!mm || !mm->env_end) return 0; page = (char *)__get_free_page(GFP_TEMPORARY); -- 1.7.10.4
[PATCH] proc: prevent accessing /proc//environ until it's ready
If /proc//environ gets read before the envp[] array is fully set up in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to read more bytes than are actually written, as env_start will already be set but env_end will still be zero, making the range calculation underflow, allowing to read beyond the end of what has been written. Fix this as it is done for /proc//cmdline by testing env_end for zero. It is, apparently, intentionally set last in create_*_tables(). This bug was found by the PaX size_overflow plugin that detected the arithmetic underflow of 'this_len = env_end - (env_start + src)' when env_end is still zero. Reported-at: https://forums.grsecurity.net/viewtopic.php?f=3=4363 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=116461 Signed-off-by: Mathias Krause Cc: Emese Revfy Cc: Pax Team Cc: Al Viro Cc: Mateusz Guzik Cc: Alexey Dobriyan Cc: Cyrill Gorcunov Cc: Jarod Wilson --- fs/proc/base.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/proc/base.c b/fs/proc/base.c index 4f764c2ac1a5..45f2162e55b2 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -955,7 +955,8 @@ static ssize_t environ_read(struct file *file, char __user *buf, struct mm_struct *mm = file->private_data; unsigned long env_start, env_end; - if (!mm) + /* Ensure the process spawned far enough to have an environment. */ + if (!mm || !mm->env_end) return 0; page = (char *)__get_free_page(GFP_TEMPORARY); -- 1.7.10.4
Re: [kernel-hardening] [RFC][PATCH 0/3] Sanitization of buddy pages
On 25 January 2016 at 17:55, Laura Abbott wrote: > Hi, > > This is an implementation of page poisoning/sanitization for all arches. It > takes advantage of the existing implementation for > !ARCH_SUPPORTS_DEBUG_PAGEALLOC arches. This is a different approach than what > the Grsecurity patches were taking but should provide equivalent > functionality. > > For those who aren't familiar with this, the goal of sanitization is to reduce > the severity of use after free and uninitialized data bugs. Memory is cleared > on free so any sensitive data is no longer available. Discussion of > sanitization was brough up in a thread about CVEs > (lkml.kernel.org/g/<20160119112812.GA10818@mwanda>) > > I eventually expect Kconfig names will want to be changed and or moved if this > is going to be used for security but that can happen later. > > Credit to Mathias Krause for the version in grsecurity Thanks for the credits but I don't deserve them. I've contributed the slab based sanitization only. The page based one shipped in PaX and grsecurity is from the PaX Team. > > Laura Abbott (3): > mm/debug-pagealloc.c: Split out page poisoning from debug page_alloc > mm/page_poison.c: Enable PAGE_POISONING as a separate option > mm/page_poisoning.c: Allow for zero poisoning > > Documentation/kernel-parameters.txt | 5 ++ > include/linux/mm.h | 13 +++ > include/linux/poison.h | 4 + > mm/Kconfig.debug| 35 +++- > mm/Makefile | 5 +- > mm/debug-pagealloc.c| 127 + > mm/page_alloc.c | 10 ++- > mm/page_poison.c| 158 > > 8 files changed, 228 insertions(+), 129 deletions(-) > create mode 100644 mm/page_poison.c > > -- > 2.5.0 > Regards, Mathias
Re: [kernel-hardening] [RFC][PATCH 0/3] Sanitization of buddy pages
On 25 January 2016 at 17:55, Laura Abbott <labb...@fedoraproject.org> wrote: > Hi, > > This is an implementation of page poisoning/sanitization for all arches. It > takes advantage of the existing implementation for > !ARCH_SUPPORTS_DEBUG_PAGEALLOC arches. This is a different approach than what > the Grsecurity patches were taking but should provide equivalent > functionality. > > For those who aren't familiar with this, the goal of sanitization is to reduce > the severity of use after free and uninitialized data bugs. Memory is cleared > on free so any sensitive data is no longer available. Discussion of > sanitization was brough up in a thread about CVEs > (lkml.kernel.org/g/<20160119112812.GA10818@mwanda>) > > I eventually expect Kconfig names will want to be changed and or moved if this > is going to be used for security but that can happen later. > > Credit to Mathias Krause for the version in grsecurity Thanks for the credits but I don't deserve them. I've contributed the slab based sanitization only. The page based one shipped in PaX and grsecurity is from the PaX Team. > > Laura Abbott (3): > mm/debug-pagealloc.c: Split out page poisoning from debug page_alloc > mm/page_poison.c: Enable PAGE_POISONING as a separate option > mm/page_poisoning.c: Allow for zero poisoning > > Documentation/kernel-parameters.txt | 5 ++ > include/linux/mm.h | 13 +++ > include/linux/poison.h | 4 + > mm/Kconfig.debug| 35 +++- > mm/Makefile | 5 +- > mm/debug-pagealloc.c| 127 + > mm/page_alloc.c | 10 ++- > mm/page_poison.c| 158 > > 8 files changed, 228 insertions(+), 129 deletions(-) > create mode 100644 mm/page_poison.c > > -- > 2.5.0 > Regards, Mathias
Re: [PATCH] ACPI / OSL: Add kerneldoc comments to memory mapping functions
On 2 January 2016 at 03:10, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > Add kerneldoc comments to acpi_os_map_iomem() and acpi_os_unmap_iomem() > and explain why the latter needs the __ref annotation in one of them > (as suggested by Mathias Krause). > > Signed-off-by: Rafael J. Wysocki > --- > drivers/acpi/osl.c | 27 +++ > 1 file changed, 27 insertions(+) > > Index: linux-pm/drivers/acpi/osl.c > === > --- linux-pm.orig/drivers/acpi/osl.c > +++ linux-pm/drivers/acpi/osl.c > @@ -366,6 +366,19 @@ static void acpi_unmap(acpi_physical_add > iounmap(vaddr); > } > > +/** > + * acpi_os_map_iomem - Get a virtual address for a given physical address > range. > + * @phys: Start of the physical address range to map. > + * @size: Size of the physical address range to map. > + * > + * Look up the given physical address range in the list of existing ACPI > memory > + * mappings. If found, get a reference to it and return a pointer to it (its > + * virtual address). If not found, map it, add it to that list and return a > + * pointer to it. > + * > + * During early init (when acpi_gbl_permanent_mmap has not been set yet) this > + * routine simply calls __acpi_map_table() to get the job done. > + */ > void __iomem *__init_refok > acpi_os_map_iomem(acpi_physical_address phys, acpi_size size) > { > @@ -441,6 +454,20 @@ static void acpi_os_map_cleanup(struct a > } > } > > +/** > + * acpi_os_unmap_iomem - Drop a memory mapping reference. > + * @virt: Start of the address range to drop a reference to. > + * @size: Size of the address range to drop a reference to. > + * > + * Look up the given virtual address range in the list of existing ACPI > memory > + * mappings, drop a reference to it and unmap it if there are no more active > + * references to it. > + * > + * During early init (when acpi_gbl_permanent_mmap has not been set yet) this > + * routine simply calls __acpi_unmap_table() to get the job done. Since > + * __acpi_unmap_table() is an __init function, the __ref annotation is needed > + * here. > + */ > void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) > { > struct acpi_ioremap *map; > Looks much better than my two-liner comment! Acked-by: Mathias Krause -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] ACPI / OSL: Add kerneldoc comments to memory mapping functions
On 2 January 2016 at 03:10, Rafael J. Wysocki <r...@rjwysocki.net> wrote: > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > > Add kerneldoc comments to acpi_os_map_iomem() and acpi_os_unmap_iomem() > and explain why the latter needs the __ref annotation in one of them > (as suggested by Mathias Krause). > > Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > --- > drivers/acpi/osl.c | 27 +++ > 1 file changed, 27 insertions(+) > > Index: linux-pm/drivers/acpi/osl.c > === > --- linux-pm.orig/drivers/acpi/osl.c > +++ linux-pm/drivers/acpi/osl.c > @@ -366,6 +366,19 @@ static void acpi_unmap(acpi_physical_add > iounmap(vaddr); > } > > +/** > + * acpi_os_map_iomem - Get a virtual address for a given physical address > range. > + * @phys: Start of the physical address range to map. > + * @size: Size of the physical address range to map. > + * > + * Look up the given physical address range in the list of existing ACPI > memory > + * mappings. If found, get a reference to it and return a pointer to it (its > + * virtual address). If not found, map it, add it to that list and return a > + * pointer to it. > + * > + * During early init (when acpi_gbl_permanent_mmap has not been set yet) this > + * routine simply calls __acpi_map_table() to get the job done. > + */ > void __iomem *__init_refok > acpi_os_map_iomem(acpi_physical_address phys, acpi_size size) > { > @@ -441,6 +454,20 @@ static void acpi_os_map_cleanup(struct a > } > } > > +/** > + * acpi_os_unmap_iomem - Drop a memory mapping reference. > + * @virt: Start of the address range to drop a reference to. > + * @size: Size of the address range to drop a reference to. > + * > + * Look up the given virtual address range in the list of existing ACPI > memory > + * mappings, drop a reference to it and unmap it if there are no more active > + * references to it. > + * > + * During early init (when acpi_gbl_permanent_mmap has not been set yet) this > + * routine simply calls __acpi_unmap_table() to get the job done. Since > + * __acpi_unmap_table() is an __init function, the __ref annotation is needed > + * here. > + */ > void __ref acpi_os_unmap_iomem(void __iomem *virt, acpi_size size) > { > struct acpi_ioremap *map; > Looks much better than my two-liner comment! Acked-by: Mathias Krause <mini...@googlemail.com> -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 21:01, Christoph Lameter wrote: > On Tue, 22 Dec 2015, Mathias Krause wrote: > >> How many systems, do you think, are running with enabled DEBUG_SLAB / >> SLUB_DEBUG in production? Not so many, I'd guess. And the ones running >> into issues probably just disable DEBUG_SLAB / SLUB_DEBUG. > > All systems run with SLUB_DEBUG in production. SLUB_DEBUG causes the code > for debugging to be compiled in. Then it can be enabled later with a > command line parameter. Indeed, I meant CONFIG_SLUB_DEBUG_ON, i.e. compiled in and enabled SLAB cache debugging including poisoning. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 18:51, Laura Abbott wrote: >> [snip] >> >> Related to this, have you checked that the sanitization doesn't >> interfere with the various slab handling schemes, namely RCU related >> specialties? Not all caches are marked SLAB_DESTROY_BY_RCU, some use >> call_rcu() instead, implicitly relying on the semantics RCU'ed slabs >> permit, namely allowing a "use-after-free" access to be legitimate >> within the RCU grace period. Scrubbing the object during that period >> would break that assumption. > > > I haven't looked into that. I was working off the assumption that > if the regular SLAB debug poisoning worked so would the sanitization. > The regular debug poisoning only checks for SLAB_DESTROY_BY_RCU so > how does that work then? Maybe it doesn't? ;) How many systems, do you think, are running with enabled DEBUG_SLAB / SLUB_DEBUG in production? Not so many, I'd guess. And the ones running into issues probably just disable DEBUG_SLAB / SLUB_DEBUG. Btw, SLUB not only looks for SLAB_DESTROY_BY_RCU but also excludes "call_rcu slabs" via other mechanisms. As SLUB is the default SLAB allocator for quite some time now, even with enabled SLUB_DEBUG one wouldn't be able to trigger RCU related sanitization issues. >> Speaking of RCU, do you have a plan to support RCU'ed slabs as well? >> > > My only plan was to get the base support in. I didn't have a plan to > support RCU slabs but that's certainly something to be done in the > future. "Base support", in my opinion, includes covering the buddy allocator as well. Otherwise this feature is incomplete. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 04:40, Laura Abbott wrote: > > The SL[AOU]B allocators all behave differently w.r.t. to what happen > an object is freed. CONFIG_SLAB_SANITIZATION introduces a common > mechanism to control what happens on free. When this option is > enabled, objects may be poisoned according to a combination of > slab_sanitization command line option and whether SLAB_NO_SANITIZE > is set on a cache. > > All credit for the original work should be given to Brad Spengler and > the PaX Team. > > Signed-off-by: Laura Abbott > --- > init/Kconfig | 36 > 1 file changed, 36 insertions(+) > > diff --git a/init/Kconfig b/init/Kconfig > index 235c7a2..37857f3 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1755,6 +1755,42 @@ config SLUB_CPU_PARTIAL > which requires the taking of locks that may cause latency spikes. > Typically one would choose no for a realtime system. > > +config SLAB_MEMORY_SANITIZE > + bool "Sanitize all freed memory" > + help > + By saying Y here the kernel will erase slab objects as soon as they > + are freed. This in turn reduces the lifetime of data > + stored in them, making it less likely that sensitive information > such > + as passwords, cryptographic secrets, etc stay in memory for too > long. > + > + This is especially useful for programs whose runtime is short, long > + lived processes and the kernel itself benefit from this as long as > + they ensure timely freeing of memory that may hold sensitive > + information. This part is not true. The code is handling SLAB objects only, so talking about processes in this context is misleading. Freeing memory in userland containing secrets cannot be covered by this feature as is. It needs a counter-part in the userland memory allocator as well as handling page sanitization in the buddy allocator. I guess you've just copy+pasted that Kconfig description from the PaX feature PAX_MEMORY_SANITIZE that also covers the buddy allocator, therefore fits that description while this patch set does not. So please adapt the text or implement the fully featured version. > + > + A nice side effect of the sanitization of slab objects is the > + reduction of possible info leaks caused by padding bytes within the > + leaky structures. Use-after-free bugs for structures containing > + pointers can also be detected as dereferencing the sanitized pointer > + will generate an access violation. > + > + The tradeoff is performance impact. The noticible impact can vary > + and you are advised to test this feature on your expected workload > + before deploying it > + > + The slab sanitization feature excludes a few slab caches per default > + for performance reasons. The level of sanitization can be adjusted > + with the sanitize_slab commandline option: > + sanitize_slab=off: No sanitization will occur > + santiize_slab=slow: Sanitization occurs only on the slow path > + for all but the excluded slabs > + (relevant for SLUB allocator only) > + sanitize_slab=partial: Sanitization occurs on all path for all > + but the excluded slabs > + sanitize_slab=full: All slabs are sanitize This should probably be moved to Documentation/kernel-parameters.txt, as can be found in the PaX patch[1]? > + > + If unsure, say Y here. Really? It has an unknown performance impact, depending on the workload, which might make "unsure users" preferably say No, if they don't care about info leaks. Related to this, have you checked that the sanitization doesn't interfere with the various slab handling schemes, namely RCU related specialties? Not all caches are marked SLAB_DESTROY_BY_RCU, some use call_rcu() instead, implicitly relying on the semantics RCU'ed slabs permit, namely allowing a "use-after-free" access to be legitimate within the RCU grace period. Scrubbing the object during that period would break that assumption. Speaking of RCU, do you have a plan to support RCU'ed slabs as well? > + > config MMAP_ALLOW_UNINITIALIZED > bool "Allow mmapped anonymous memory to be uninitialized" > depends on EXPERT && !MMU > -- > 2.5.0 > Regards, Mathias [1] https://github.com/minipli/linux-grsec/blob/v4.3.3-pax/Documentation/kernel-parameters.txt#L2689-L2696 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 21:01, Christoph Lameter <c...@linux.com> wrote: > On Tue, 22 Dec 2015, Mathias Krause wrote: > >> How many systems, do you think, are running with enabled DEBUG_SLAB / >> SLUB_DEBUG in production? Not so many, I'd guess. And the ones running >> into issues probably just disable DEBUG_SLAB / SLUB_DEBUG. > > All systems run with SLUB_DEBUG in production. SLUB_DEBUG causes the code > for debugging to be compiled in. Then it can be enabled later with a > command line parameter. Indeed, I meant CONFIG_SLUB_DEBUG_ON, i.e. compiled in and enabled SLAB cache debugging including poisoning. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 04:40, Laura Abbottwrote: > > The SL[AOU]B allocators all behave differently w.r.t. to what happen > an object is freed. CONFIG_SLAB_SANITIZATION introduces a common > mechanism to control what happens on free. When this option is > enabled, objects may be poisoned according to a combination of > slab_sanitization command line option and whether SLAB_NO_SANITIZE > is set on a cache. > > All credit for the original work should be given to Brad Spengler and > the PaX Team. > > Signed-off-by: Laura Abbott > --- > init/Kconfig | 36 > 1 file changed, 36 insertions(+) > > diff --git a/init/Kconfig b/init/Kconfig > index 235c7a2..37857f3 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1755,6 +1755,42 @@ config SLUB_CPU_PARTIAL > which requires the taking of locks that may cause latency spikes. > Typically one would choose no for a realtime system. > > +config SLAB_MEMORY_SANITIZE > + bool "Sanitize all freed memory" > + help > + By saying Y here the kernel will erase slab objects as soon as they > + are freed. This in turn reduces the lifetime of data > + stored in them, making it less likely that sensitive information > such > + as passwords, cryptographic secrets, etc stay in memory for too > long. > + > + This is especially useful for programs whose runtime is short, long > + lived processes and the kernel itself benefit from this as long as > + they ensure timely freeing of memory that may hold sensitive > + information. This part is not true. The code is handling SLAB objects only, so talking about processes in this context is misleading. Freeing memory in userland containing secrets cannot be covered by this feature as is. It needs a counter-part in the userland memory allocator as well as handling page sanitization in the buddy allocator. I guess you've just copy+pasted that Kconfig description from the PaX feature PAX_MEMORY_SANITIZE that also covers the buddy allocator, therefore fits that description while this patch set does not. So please adapt the text or implement the fully featured version. > + > + A nice side effect of the sanitization of slab objects is the > + reduction of possible info leaks caused by padding bytes within the > + leaky structures. Use-after-free bugs for structures containing > + pointers can also be detected as dereferencing the sanitized pointer > + will generate an access violation. > + > + The tradeoff is performance impact. The noticible impact can vary > + and you are advised to test this feature on your expected workload > + before deploying it > + > + The slab sanitization feature excludes a few slab caches per default > + for performance reasons. The level of sanitization can be adjusted > + with the sanitize_slab commandline option: > + sanitize_slab=off: No sanitization will occur > + santiize_slab=slow: Sanitization occurs only on the slow path > + for all but the excluded slabs > + (relevant for SLUB allocator only) > + sanitize_slab=partial: Sanitization occurs on all path for all > + but the excluded slabs > + sanitize_slab=full: All slabs are sanitize This should probably be moved to Documentation/kernel-parameters.txt, as can be found in the PaX patch[1]? > + > + If unsure, say Y here. Really? It has an unknown performance impact, depending on the workload, which might make "unsure users" preferably say No, if they don't care about info leaks. Related to this, have you checked that the sanitization doesn't interfere with the various slab handling schemes, namely RCU related specialties? Not all caches are marked SLAB_DESTROY_BY_RCU, some use call_rcu() instead, implicitly relying on the semantics RCU'ed slabs permit, namely allowing a "use-after-free" access to be legitimate within the RCU grace period. Scrubbing the object during that period would break that assumption. Speaking of RCU, do you have a plan to support RCU'ed slabs as well? > + > config MMAP_ALLOW_UNINITIALIZED > bool "Allow mmapped anonymous memory to be uninitialized" > depends on EXPERT && !MMU > -- > 2.5.0 > Regards, Mathias [1] https://github.com/minipli/linux-grsec/blob/v4.3.3-pax/Documentation/kernel-parameters.txt#L2689-L2696 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [RFC][PATCH 6/7] mm: Add Kconfig option for slab sanitization
On 22 December 2015 at 18:51, Laura Abbottwrote: >> [snip] >> >> Related to this, have you checked that the sanitization doesn't >> interfere with the various slab handling schemes, namely RCU related >> specialties? Not all caches are marked SLAB_DESTROY_BY_RCU, some use >> call_rcu() instead, implicitly relying on the semantics RCU'ed slabs >> permit, namely allowing a "use-after-free" access to be legitimate >> within the RCU grace period. Scrubbing the object during that period >> would break that assumption. > > > I haven't looked into that. I was working off the assumption that > if the regular SLAB debug poisoning worked so would the sanitization. > The regular debug poisoning only checks for SLAB_DESTROY_BY_RCU so > how does that work then? Maybe it doesn't? ;) How many systems, do you think, are running with enabled DEBUG_SLAB / SLUB_DEBUG in production? Not so many, I'd guess. And the ones running into issues probably just disable DEBUG_SLAB / SLUB_DEBUG. Btw, SLUB not only looks for SLAB_DESTROY_BY_RCU but also excludes "call_rcu slabs" via other mechanisms. As SLUB is the default SLAB allocator for quite some time now, even with enabled SLUB_DEBUG one wouldn't be able to trigger RCU related sanitization issues. >> Speaking of RCU, do you have a plan to support RCU'ed slabs as well? >> > > My only plan was to get the base support in. I didn't have a plan to > support RCU slabs but that's certainly something to be done in the > future. "Base support", in my opinion, includes covering the buddy allocator as well. Otherwise this feature is incomplete. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory
On 29 November 2015 at 16:39, Ingo Molnar wrote: > > * PaX Team wrote: > >> On 29 Nov 2015 at 9:08, Ingo Molnar wrote: >> >> > >> > * PaX Team wrote: >> > >> > > i don't see the compile time vs. runtime detection as 'competing' >> > > approaches, >> > > both have their own role. [...] >> > >> > That's true - but only as long as 'this can be solved in tooling!' is not >> > used as >> > an excuse to oppose the runtime solution and we end up doing neither. >> >> actually, i already voiced my opinion elsewhere in the constify thread on the >> kernel hardening list that adding/using __read_only is somewhat premature >> without also adding the compile time verification part (as part of the >> constify >> plugin for example). right now its use on the embedded vdso image is simple >> and >> easy to verify but once people begin to add it to variables that the compiler >> knows and cares about (say, the ops structures) then things can become >> fragile >> without compile checking. so yes, i'd also advise to get such tooling in >> *before* more __read_only usage is added. > > I think you are mistaken there: if we add the page fault fixup to make sure we > don't crash if a read-only variable is accessed, then we'll have most of the > benefits of read-only mappings and no fragility - without having to wait for > tooling. I guess the point PaX Team (and me earlier in this thread) wanted to make is that having misuse detection *only* at run-time will make those kind of bugs visible too late -- as late as on the wrong write attempt actually happening. It would be much better to have the compiler complain about invalid write statements already during compilation, much like it does when one wants to assign some value to a const object. Having the page fault handler being able to recover from __ro_after_init faults is a nice feature to support users, actually being able to report bugs. But it shouldn't be the only way to detect those kinds of bugs. In fact, we've tools in our toolchain that try to detect and flag wrong usage of __init / __exit, so why not cover __ro_after_init as well? Admitted, that won't be possible with modpost in its current state, but would require a compiler extension instead. Its current non-existence shouldn't be a show-stopper for __ro_after_init but the very next step to take care of before extending the usage of that annotation. Just my 2ct, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory
On 29 November 2015 at 16:39, Ingo Molnarwrote: > > * PaX Team wrote: > >> On 29 Nov 2015 at 9:08, Ingo Molnar wrote: >> >> > >> > * PaX Team wrote: >> > >> > > i don't see the compile time vs. runtime detection as 'competing' >> > > approaches, >> > > both have their own role. [...] >> > >> > That's true - but only as long as 'this can be solved in tooling!' is not >> > used as >> > an excuse to oppose the runtime solution and we end up doing neither. >> >> actually, i already voiced my opinion elsewhere in the constify thread on the >> kernel hardening list that adding/using __read_only is somewhat premature >> without also adding the compile time verification part (as part of the >> constify >> plugin for example). right now its use on the embedded vdso image is simple >> and >> easy to verify but once people begin to add it to variables that the compiler >> knows and cares about (say, the ops structures) then things can become >> fragile >> without compile checking. so yes, i'd also advise to get such tooling in >> *before* more __read_only usage is added. > > I think you are mistaken there: if we add the page fault fixup to make sure we > don't crash if a read-only variable is accessed, then we'll have most of the > benefits of read-only mappings and no fragility - without having to wait for > tooling. I guess the point PaX Team (and me earlier in this thread) wanted to make is that having misuse detection *only* at run-time will make those kind of bugs visible too late -- as late as on the wrong write attempt actually happening. It would be much better to have the compiler complain about invalid write statements already during compilation, much like it does when one wants to assign some value to a const object. Having the page fault handler being able to recover from __ro_after_init faults is a nice feature to support users, actually being able to report bugs. But it shouldn't be the only way to detect those kinds of bugs. In fact, we've tools in our toolchain that try to detect and flag wrong usage of __init / __exit, so why not cover __ro_after_init as well? Admitted, that won't be possible with modpost in its current state, but would require a compiler extension instead. Its current non-existence shouldn't be a show-stopper for __ro_after_init but the very next step to take care of before extending the usage of that annotation. Just my 2ct, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory
On 24 November 2015 at 22:38, Kees Cook wrote: > Many things are written to only during __init, and never changed > again. These cannot be made "const" since the compiler will do the wrong > thing (we do actually need to write to them). Instead, move these items > into a memory region that will be made read-only during mark_rodata_ro() > which happens after all kernel __init code has finished. > > This introduces __read_only as a way to mark such memory, and uses it on > the x86 vDSO to kill an extant kernel exploitation method. ...just some random notes on the experience with kernels implementing such a feature for quite a lot of locations, not just the vDSO. While having that annotation makes perfect sense, not only from a security perspective but also from a micro-optimization point of view (much like the already existing __read_mostly annotation), it has its drawbacks. Violating the "r/o after init" rule by writing to such annotated variables from non-init code goes unnoticed as far as it concerns the toolchain. Neither the compiler nor the linker will flag that incorrect use. It'll just trap at runtime and that's bad. I myself had some educating experience seeing my machine triple fault when resuming from a S3 sleep. The root cause was a variable that was annotated __read_only but that was (unnecessarily) modified during CPU bring-up phase. Debugging that kind of problems is sort of a PITA, you could imagine. So, prior extending the usage of the __read_only annotation some toolchain support is needed. Maybe a gcc plugin that'll warn/error on code that writes to such a variable but is not __init itself. The initify and checker plugins from the PaX patch might be worth to look at for that purpose, as they're doing similar things already. Adding such a check to sparse might be worth it, too. A modpost check probably won't work as it's unable to tell if it's a legitimate access (r/o) or a violation (/w access). So the gcc plugin is the way to go, IMHO. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kernel-hardening] [PATCH 0/2] introduce post-init read-only memory
On 24 November 2015 at 22:38, Kees Cookwrote: > Many things are written to only during __init, and never changed > again. These cannot be made "const" since the compiler will do the wrong > thing (we do actually need to write to them). Instead, move these items > into a memory region that will be made read-only during mark_rodata_ro() > which happens after all kernel __init code has finished. > > This introduces __read_only as a way to mark such memory, and uses it on > the x86 vDSO to kill an extant kernel exploitation method. ...just some random notes on the experience with kernels implementing such a feature for quite a lot of locations, not just the vDSO. While having that annotation makes perfect sense, not only from a security perspective but also from a micro-optimization point of view (much like the already existing __read_mostly annotation), it has its drawbacks. Violating the "r/o after init" rule by writing to such annotated variables from non-init code goes unnoticed as far as it concerns the toolchain. Neither the compiler nor the linker will flag that incorrect use. It'll just trap at runtime and that's bad. I myself had some educating experience seeing my machine triple fault when resuming from a S3 sleep. The root cause was a variable that was annotated __read_only but that was (unnecessarily) modified during CPU bring-up phase. Debugging that kind of problems is sort of a PITA, you could imagine. So, prior extending the usage of the __read_only annotation some toolchain support is needed. Maybe a gcc plugin that'll warn/error on code that writes to such a variable but is not __init itself. The initify and checker plugins from the PaX patch might be worth to look at for that purpose, as they're doing similar things already. Adding such a check to sparse might be worth it, too. A modpost check probably won't work as it's unable to tell if it's a legitimate access (r/o) or a violation (/w access). So the gcc plugin is the way to go, IMHO. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] printk: prevent userland from spoofing kernel messages
The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x | tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause Cc: Greg Kroah-Hartman Cc: Petr Mladek Cc: Alex Elder Cc: Joe Perches Cc: Kay Sievers --- Might be worth to apply to stable, too. Don't know. Prior to commit 7ff9554bb578 there was no facility so user injected messages had an implicit facility for LOG_KERN. But commit 7ff9554bb578 explicitly mentions this feature, so, dunno :/ kernel/printk/printk.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8f0324ef72ab..9982616ce712 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -269,6 +269,9 @@ static u32 clear_idx; #define PREFIX_MAX 32 #define LOG_LINE_MAX (1024 - PREFIX_MAX) +#define LOG_LEVEL(v) ((v) & 0x07) +#define LOG_FACILITY(v)((v) >> 3 & 0xff) + /* record buffer */ #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) #define LOG_ALIGN 4 @@ -611,7 +614,6 @@ struct devkmsg_user { static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from) { char *buf, *line; - int i; int level = default_message_loglevel; int facility = 1; /* LOG_USER */ size_t len = iov_iter_count(from); @@ -641,12 +643,13 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from) line = buf; if (line[0] == '<') { char *endp = NULL; + unsigned int u; - i = simple_strtoul(line+1, , 10); + u = simple_strtoul(line + 1, , 10); if (endp && endp[0] == '>') { - level = i & 7; - if (i >> 3) - facility = i >> 3; + level = LOG_LEVEL(u); + if (LOG_FACILITY(u) != 0) + facility = LOG_FACILITY(u); endp++; len -= endp - line; line = endp; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] printk: prevent userland from spoofing kernel messages
The following statement of ABI/testing/dev-kmsg is not quite right: It is not possible to inject messages from userspace with the facility number LOG_KERN (0), to make sure that the origin of the messages can always be reliably determined. Userland actually can inject messages with a facility of 0 by abusing the fact that the facility is stored in a u8 data type. By using a facility which is a multiple of 256 the assignment of msg->facility in log_store() implicitly truncates it to 0, i.e. LOG_KERN, allowing users of /dev/kmsg to spoof kernel messages as shown below: The following call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0 >/dev/kmsg ...leads to the following log entry (dmesg -x | tail -n 1): user :emerg : [ 66.137758] Kernel panic - not syncing: beer empty However, this call... # printf '<%d>Kernel panic - not syncing: beer empty\n' 0x800 >/dev/kmsg ...leads to the slightly different log entry (note the kernel facility): kern :emerg : [ 74.177343] Kernel panic - not syncing: beer empty Fix that by limiting the user provided facility to 8 bit right from the beginning and catch the truncation early. Fixes: 7ff9554bb578 ("printk: convert byte-buffer to variable-length...") Signed-off-by: Mathias Krause <mini...@googlemail.com> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org> Cc: Petr Mladek <pmla...@suse.cz> Cc: Alex Elder <el...@linaro.org> Cc: Joe Perches <j...@perches.com> Cc: Kay Sievers <k...@vrfy.org> --- Might be worth to apply to stable, too. Don't know. Prior to commit 7ff9554bb578 there was no facility so user injected messages had an implicit facility for LOG_KERN. But commit 7ff9554bb578 explicitly mentions this feature, so, dunno :/ kernel/printk/printk.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 8f0324ef72ab..9982616ce712 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -269,6 +269,9 @@ static u32 clear_idx; #define PREFIX_MAX 32 #define LOG_LINE_MAX (1024 - PREFIX_MAX) +#define LOG_LEVEL(v) ((v) & 0x07) +#define LOG_FACILITY(v)((v) >> 3 & 0xff) + /* record buffer */ #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) #define LOG_ALIGN 4 @@ -611,7 +614,6 @@ struct devkmsg_user { static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from) { char *buf, *line; - int i; int level = default_message_loglevel; int facility = 1; /* LOG_USER */ size_t len = iov_iter_count(from); @@ -641,12 +643,13 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from) line = buf; if (line[0] == '<') { char *endp = NULL; + unsigned int u; - i = simple_strtoul(line+1, , 10); + u = simple_strtoul(line + 1, , 10); if (endp && endp[0] == '>') { - level = i & 7; - if (i >> 3) - facility = i >> 3; + level = LOG_LEVEL(u); + if (LOG_FACILITY(u) != 0) + facility = LOG_FACILITY(u); endp++; len -= endp - line; line = endp; -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()
unlock: > @@ -1194,6 +1216,8 @@ restart: > > sock_hold(sk); > unix_peer(newsk)= sk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(_sk(sk)->peer_wait, > _sk(newsk)->wait); > newsk->sk_state = TCP_ESTABLISHED; > newsk->sk_type = sk->sk_type; > init_peercred(newsk); > @@ -1220,6 +1244,8 @@ restart: > > smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */ > unix_peer(sk) = newsk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(_sk(newsk)->peer_wait, > _sk(sk)->wait); > > unix_state_unlock(sk); > > @@ -1254,6 +1280,10 @@ static int unix_socketpair(struct socket *socka, > struct socket *sockb) > sock_hold(skb); > unix_peer(ska) = skb; > unix_peer(skb) = ska; > + if (ska->sk_type != SOCK_STREAM) { > + add_wait_queue(_sk(ska)->peer_wait, _sk(skb)->wait); > + add_wait_queue(_sk(skb)->peer_wait, _sk(ska)->wait); > + } > init_peercred(ska); > init_peercred(skb); > > @@ -1565,6 +1595,7 @@ restart: > unix_state_lock(sk); > if (unix_peer(sk) == other) { > unix_peer(sk) = NULL; > + remove_wait_queue(_sk(other)->peer_wait, > >wait); > unix_state_unlock(sk); > > unix_dgram_disconnected(sk, other); > @@ -2441,7 +2472,6 @@ static unsigned int unix_dgram_poll(struct file *file, > struct socket *sock, > other = unix_peer_get(sk); > if (other) { > if (unix_peer(other) != sk) { > - sock_poll_wait(file, _sk(other)->peer_wait, > wait); > if (unix_recvq_full(other)) > writable = 0; > } > -- > 1.8.2.rc2 > My reproducer runs on this patch for more than 3 days now without triggering anything anymore. Tested-by: Mathias Krause Thanks Jason! Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/3] unix: fix use-after-free in unix_dgram_poll()
return 0; > > out_unlock: > @@ -1194,6 +1216,8 @@ restart: > > sock_hold(sk); > unix_peer(newsk)= sk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(_sk(sk)->peer_wait, > _sk(newsk)->wait); > newsk->sk_state = TCP_ESTABLISHED; > newsk->sk_type = sk->sk_type; > init_peercred(newsk); > @@ -1220,6 +1244,8 @@ restart: > > smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */ > unix_peer(sk) = newsk; > + if (sk->sk_type == SOCK_SEQPACKET) > + add_wait_queue(_sk(newsk)->peer_wait, > _sk(sk)->wait); > > unix_state_unlock(sk); > > @@ -1254,6 +1280,10 @@ static int unix_socketpair(struct socket *socka, > struct socket *sockb) > sock_hold(skb); > unix_peer(ska) = skb; > unix_peer(skb) = ska; > + if (ska->sk_type != SOCK_STREAM) { > + add_wait_queue(_sk(ska)->peer_wait, _sk(skb)->wait); > + add_wait_queue(_sk(skb)->peer_wait, _sk(ska)->wait); > + } > init_peercred(ska); > init_peercred(skb); > > @@ -1565,6 +1595,7 @@ restart: > unix_state_lock(sk); > if (unix_peer(sk) == other) { > unix_peer(sk) = NULL; > + remove_wait_queue(_sk(other)->peer_wait, > >wait); > unix_state_unlock(sk); > > unix_dgram_disconnected(sk, other); > @@ -2441,7 +2472,6 @@ static unsigned int unix_dgram_poll(struct file *file, > struct socket *sock, > other = unix_peer_get(sk); > if (other) { > if (unix_peer(other) != sk) { > - sock_poll_wait(file, _sk(other)->peer_wait, > wait); > if (unix_recvq_full(other)) > writable = 0; > } > -- > 1.8.2.rc2 > My reproducer runs on this patch for more than 3 days now without triggering anything anymore. Tested-by: Mathias Krause <mini...@googlemail.com> Thanks Jason! Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 13/23] x86/asm/crypto: Create stack frames in aesni-intel_asm.S
On Do, Okt 01, 2015 at 08:29:50 -0500, Josh Poimboeuf wrote: > On Thu, Oct 01, 2015 at 08:10:26AM +0200, mini...@ld-linux.so wrote: > > On Tue, Sep 22, 2015 at 10:47:04AM -0500, Josh Poimboeuf wrote: > > > ENTRY(aesni_set_key) > > > + FRAME_BEGIN > > > #ifndef __x86_64__ > > > pushl KEYP > > > movl 8(%esp), KEYP # ctx > > > > This will break 32 bit builds using the aesni-intel.ko module. You need > > to adjust the esp-based offsets for the non-x86_64 case, as FRAME_BEGIN > > may do another push. > > > > How about adding a FRAME_OFFSET() macro to to wrap the > > offsets?: > > > > #ifdef CONFIG_FRAME_POINTER > > # define FRAME_OFFSET(x)((x) + (BITS_PER_LONG / 8)) > > #else > > # define FRAME_OFFSET(x)(x) > > #endif > > > > And using them like this: > > > > movl FRAME_OFFSET(8)(%esp), KEYP# ctx > > Ah, right. The 32-bit ABI passes arguments on the stack instead of via > registers. > > For now, I'm inclined to just make FRAME_BEGIN and FRAME_END do nothing > on 32-bit. We're only doing stack validation on x86_64 and I don't know > if anybody cares about frame pointers on 32-bit at this point. Well, we had issues in the past, especially in that very module, but only on 32 bit systems. So it would be nice to get frame pointers right for 32 bit, too. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 13/23] x86/asm/crypto: Create stack frames in aesni-intel_asm.S
On Do, Okt 01, 2015 at 08:29:50 -0500, Josh Poimboeuf wrote: > On Thu, Oct 01, 2015 at 08:10:26AM +0200, mini...@ld-linux.so wrote: > > On Tue, Sep 22, 2015 at 10:47:04AM -0500, Josh Poimboeuf wrote: > > > ENTRY(aesni_set_key) > > > + FRAME_BEGIN > > > #ifndef __x86_64__ > > > pushl KEYP > > > movl 8(%esp), KEYP # ctx > > > > This will break 32 bit builds using the aesni-intel.ko module. You need > > to adjust the esp-based offsets for the non-x86_64 case, as FRAME_BEGIN > > may do another push. > > > > How about adding a FRAME_OFFSET() macro to to wrap the > > offsets?: > > > > #ifdef CONFIG_FRAME_POINTER > > # define FRAME_OFFSET(x)((x) + (BITS_PER_LONG / 8)) > > #else > > # define FRAME_OFFSET(x)(x) > > #endif > > > > And using them like this: > > > > movl FRAME_OFFSET(8)(%esp), KEYP# ctx > > Ah, right. The 32-bit ABI passes arguments on the stack instead of via > registers. > > For now, I'm inclined to just make FRAME_BEGIN and FRAME_END do nothing > on 32-bit. We're only doing stack validation on x86_64 and I don't know > if anybody cares about frame pointers on 32-bit at this point. Well, we had issues in the past, especially in that very module, but only on 32 bit systems. So it would be nice to get frame pointers right for 32 bit, too. Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 15:25, Rainer Weikusat wrote: > Mathias Krause writes: >> On 30 September 2015 at 12:56, Rainer Weikusat >> wrote: >>> In case you want some information on this: This is a kernel warning I >>> could trigger (more than once) on the single day I could so far spend >>> looking into this (3.2.54 kernel): >>> >>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 >>> list_del+0x9/0x30() >>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam >>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should >>> be 88022c38f078, but was dead00100100 >>> [snip] >> >> Is that with Jason's patch or a vanilla v3.2.54? > > That's a kernel warning which occurred repeatedly (among other "link > pointer disorganization" warnings) when I tested the "program with > unknown behaviour" you wrote with the kernel I'm currently supporting a > while ago (as I already wrote in the original mail). So I assume Jason's patch is not included in your kernel. Then those messages are expected; expected even on kernels as old as v2.6.27. Can you re-try with Jason's patch applied? Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 12:56, Rainer Weikusat wrote: > Mathias Krause writes: >> On 29 September 2015 at 21:09, Jason Baron wrote: >>> However, if we call connect on socket 's', to connect to a new socket 'o2', >>> we >>> drop the reference on the original socket 'o'. Thus, we can now close socket >>> 'o' without unregistering from epoll. Then, when we either close the ep >>> or unregister 'o', we end up with this list corruption. Thus, this is not a >>> race per se, but can be triggered sequentially. >> >> Sounds profound, but the reproducers calls connect only once per >> socket. So there is no "connect to a new socket", no? >> But w/e, see below. > > In case you want some information on this: This is a kernel warning I > could trigger (more than once) on the single day I could so far spend > looking into this (3.2.54 kernel): > > Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 > list_del+0x9/0x30() > Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam > Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should > be 88022c38f078, but was dead00100100 > [snip] Is that with Jason's patch or a vanilla v3.2.54? Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 12:56, Rainer Weikusat <rweiku...@mobileactivedefense.com> wrote: > Mathias Krause <mini...@googlemail.com> writes: >> On 29 September 2015 at 21:09, Jason Baron <jba...@akamai.com> wrote: >>> However, if we call connect on socket 's', to connect to a new socket 'o2', >>> we >>> drop the reference on the original socket 'o'. Thus, we can now close socket >>> 'o' without unregistering from epoll. Then, when we either close the ep >>> or unregister 'o', we end up with this list corruption. Thus, this is not a >>> race per se, but can be triggered sequentially. >> >> Sounds profound, but the reproducers calls connect only once per >> socket. So there is no "connect to a new socket", no? >> But w/e, see below. > > In case you want some information on this: This is a kernel warning I > could trigger (more than once) on the single day I could so far spend > looking into this (3.2.54 kernel): > > Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 > list_del+0x9/0x30() > Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam > Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should > be 88022c38f078, but was dead00100100 > [snip] Is that with Jason's patch or a vanilla v3.2.54? Regards, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 30 September 2015 at 15:25, Rainer Weikusat <rweiku...@mobileactivedefense.com> wrote: > Mathias Krause <mini...@googlemail.com> writes: >> On 30 September 2015 at 12:56, Rainer Weikusat >> <rweiku...@mobileactivedefense.com> wrote: >>> In case you want some information on this: This is a kernel warning I >>> could trigger (more than once) on the single day I could so far spend >>> looking into this (3.2.54 kernel): >>> >>> Sep 15 19:37:19 doppelsaurus kernel: WARNING: at lib/list_debug.c:53 >>> list_del+0x9/0x30() >>> Sep 15 19:37:19 doppelsaurus kernel: Hardware name: 500-330nam >>> Sep 15 19:37:19 doppelsaurus kernel: list_del corruption. prev->next should >>> be 88022c38f078, but was dead00100100 >>> [snip] >> >> Is that with Jason's patch or a vanilla v3.2.54? > > That's a kernel warning which occurred repeatedly (among other "link > pointer disorganization" warnings) when I tested the "program with > unknown behaviour" you wrote with the kernel I'm currently supporting a > while ago (as I already wrote in the original mail). So I assume Jason's patch is not included in your kernel. Then those messages are expected; expected even on kernels as old as v2.6.27. Can you re-try with Jason's patch applied? Thanks, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket
On 29 September 2015 at 21:09, Jason Baron wrote: > However, if we call connect on socket 's', to connect to a new socket 'o2', we > drop the reference on the original socket 'o'. Thus, we can now close socket > 'o' without unregistering from epoll. Then, when we either close the ep > or unregister 'o', we end up with this list corruption. Thus, this is not a > race per se, but can be triggered sequentially. Sounds profound, but the reproducers calls connect only once per socket. So there is no "connect to a new socket", no? But w/e, see below. > Linus explains the general case in the context the signalfd stuff here: > https://lkml.org/lkml/2013/10/14/634 I also found that posting while looking for similar bug reports. Also found that one: https://lkml.org/lkml/2014/5/15/532 > So this may be the case that we've been chasing here for a while... That bug triggers since commit 3c73419c09 "af_unix: fix 'poll for write'/ connected DGRAM sockets". That's v2.6.26-rc7, as noted in the reproducer. > > In any case, we could fix with that same POLLFREE mechansim, the simplest > would be something like: > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..d499f81 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -392,6 +392,9 @@ static void unix_sock_destructor(struct sock *sk) > pr_debug("UNIX %p is destroyed, %ld are still alive.\n", sk, > atomic_long_read(_nr_socks)); > #endif > + /* make sure we remove from epoll */ > + wake_up_poll(>peer_wait, POLLFREE); > + synchronize_sched(); > } > > static void unix_release_sock(struct sock *sk, int embrion) > > I'm not suggesting we apply that, but that fixed the supplied test case. > We could enhance the above, to avoid the free'ing latency there by doing > the SLAB_DESTROY_BY_RCU for unix sockets. But I'm not convinced > that this wouldn't be still broken for select()/poll() as well. I think > we can be in a select() call for socket 's', and if we remove socket > 'o' from it in the meantime (by doing a connect() on s to somewhere else > and a close on 'o'), I think we can still crash there. So POLLFREE would > have to be extended. I tried to hit this with select() but could not, > but I think if I tried harder I could. > > Instead of going further down that route, perhaps something like below > might be better. The basic idea would be to do away with the 'other' > poll call in unix_dgram_poll(), and instead revert back to a registering > on a single wait queue. We add a new wait queue to unix sockets such > that we can register it with a remote other on connect(). Then we can > use the wakeup from the remote to wake up the registered unix socket. > Probably better explained with the patch below. Note I didn't add to > the remote for SOCK_STREAM, since the poll() routine there doesn't do > the double wait queue registering: > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h > index 4a167b3..9698aff 100644 > --- a/include/net/af_unix.h > +++ b/include/net/af_unix.h > @@ -62,6 +62,7 @@ struct unix_sock { > #define UNIX_GC_CANDIDATE 0 > #define UNIX_GC_MAYBE_CYCLE1 > struct socket_wqpeer_wq; > + wait_queue_twait; > }; > #define unix_sk(__sk) ((struct unix_sock *)__sk) > > diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c > index 03ee4d3..9e0692a 100644 > --- a/net/unix/af_unix.c > +++ b/net/unix/af_unix.c > @@ -420,6 +420,8 @@ static void unix_release_sock(struct sock *sk, int > embrion) > skpair = unix_peer(sk); > > if (skpair != NULL) { > + if (sk->sk_type != SOCK_STREAM) > + remove_wait_queue(_sk(skpair)->peer_wait, > >wait); > if (sk->sk_type == SOCK_STREAM || sk->sk_type == > SOCK_SEQPACKET) { > unix_state_lock(skpair); > /* No more writes */ > @@ -636,6 +638,16 @@ static struct proto unix_proto = { > */ > static struct lock_class_key af_unix_sk_receive_queue_lock_key; > > +static int peer_wake(wait_queue_t *wait, unsigned mode, int sync, void *key) > +{ > + struct unix_sock *u; > + > + u = container_of(wait, struct unix_sock, wait); > + wake_up_interruptible_sync_poll(sk_sleep(>sk), key); > + > + return 0; > +} > + > static struct sock *unix_create1(struct net *net, struct socket *sock, int > kern) > { > struct sock *sk = NULL; > @@ -664,6 +676,7 @@ static struct sock *unix_create1(struct net *net, struct > socket *sock, int kern) > INIT_LIST_HEAD(>link); > mutex_init(>readlock); /* single task reading lock */ > init_waitqueue_head(>peer_wait); > + init_waitqueue_func_entry(>wait, peer_wake); > unix_insert_socket(unix_sockets_unbound(sk), sk); > out: > if (sk == NULL) > @@ -1030,7 +1043,10 @@ restart: > */ > if (unix_peer(sk)) { > struct sock *old_peer = unix_peer(sk); > + > +