from:"hpa"

Re: The killing of ideal_nops[]

2021-03-09 Thread hpa

On March 9, 2021 1:24:44 PM PST, Peter Zijlstra  wrote:
>On Tue, Mar 09, 2021 at 12:05:19PM -0500, Steven Rostedt wrote:
>> On Tue, 9 Mar 2021 17:58:17 +0100
>> Peter Zijlstra  wrote:
>> 
>> > Hi,
>> > 
>> > AFAICT everything made in the past 10 years ends up using p6_nops.
>Is it
>> > time to kill off ideal_nops[] and simplify life?
>> > 
>> 
>> Well, the one bug that was reported recently was due to a box that
>uses a
>> different "ideal_nops" than p6_nops. Perhaps we should ask him if
>there's
>> any noticeable difference between using p6_nops for every function
>than the
>> ideal_nops that as found for that box.
>
>If the machine is more than a decade old, I'm not really caring about
>optimal performance. If it is 32bit, I really couldn't be arsed as long
>as it boots.

p6_nops don't boot on all 32-bit chips.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v1 1/3] x86/cpufeatures: Add low performance CRC32C instruction CPU feature

2021-01-11 Thread hpa

On January 6, 2021 10:37:50 PM PST, Borislav Petkov  wrote:
>On Thu, Jan 07, 2021 at 02:19:06PM +0800, Tony W Wang-oc wrote:
>> SSE4.2 on Zhaoxin CPUs are compatible with Intel. The presence of
>> CRC32C instruction is enumerated by CPUID.01H:ECX.SSE4_2[bit 20] = 1.
>> Some Zhaoxin CPUs declare support SSE4.2 instruction sets but their
>> CRC32C instruction are working with low performance.
>> 
>> Add a synthetic CPU flag to indicates that the CRC32C instruction is
>> not working as intended. This low performance CRC32C instruction flag
>> is depend on X86_FEATURE_XMM4_2.
>> 
>> Signed-off-by: Tony W Wang-oc 
>> ---
>>  arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
>>  2 files changed, 2 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/cpufeatures.h
>b/arch/x86/include/asm/cpufeatures.h
>> index 84b8878..9e8151b 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -292,6 +292,7 @@
>>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in
>kernel entry SWAPGS path */
>>  #define X86_FEATURE_SPLIT_LOCK_DETECT   (11*32+ 6) /* #AC for split
>lock */
>>  #define X86_FEATURE_PER_THREAD_MBA  (11*32+ 7) /* "" Per-thread
>Memory Bandwidth Allocation */
>> +#define X86_FEATURE_CRC32C  (11*32+ 8) /* "" Low performance CRC32C
>instruction */
>
>Didn't hpa say to create a BUG flag for it - X86_BUG...? Low
>performance
>insn sounds like a bug and not a feature to me.
>
>And call it X86_BUG_CRC32C_SLOW or ..._UNUSABLE to denote what it
>means.
>
>Thx.

Yes, it should be a BUG due to the inclusion properties of BUG v FEATURE.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] arch/x86: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)

2020-12-25 Thread hpa

On December 25, 2020 11:29:30 PM PST, John Millikin  wrote:
>When compiling with Clang, the `$(CLANG_FLAGS)' variable contains
>additional flags needed to cross-compile C and Assembly sources:
>
>* `-no-integrated-as' tells clang to assemble with GNU Assembler
>  instead of its built-in LLVM assembler. This flag is set by default
>  unless `LLVM_IAS=1' is set, because the LLVM assembler can't yet
>  parse certain GNU extensions.
>
>* `--target' sets the target architecture when cross-compiling. This
>  flag must be set for both compilation and assembly (`KBUILD_AFLAGS')
>  to support architecture-specific assembler directives.
>
>Signed-off-by: John Millikin 
>---
> arch/x86/Makefile | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/arch/x86/Makefile b/arch/x86/Makefile
>index 7116da3980be..725c65532482 100644
>--- a/arch/x86/Makefile
>+++ b/arch/x86/Makefile
>@@ -33,6 +33,11 @@ REALMODE_CFLAGS += -ffreestanding
> REALMODE_CFLAGS += -fno-stack-protector
> REALMODE_CFLAGS += $(call __cc-option, $(CC), $(REALMODE_CFLAGS),
>-Wno-address-of-packed-member)
> REALMODE_CFLAGS += $(call __cc-option, $(CC), $(REALMODE_CFLAGS),
>$(cc_stack_align4))
>+
>+ifdef CONFIG_CC_IS_CLANG
>+REALMODE_CFLAGS += $(CLANG_FLAGS)
>+endif
>+
> export REALMODE_CFLAGS
> 
> # BITS is used as extension for files which are available in a 32 bit

Why is CLANG_FLAGS non-null when unused? It would be better to centralize that.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa

On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>>On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>> wrote:
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
> 
> On 15/12/2020 04:41, Eric Biggers wrote:
> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> >> On 12/12/2020 01:43, Eric Biggers wrote:
> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>  The driver crc32c-intel match CPUs supporting
X86_FEATURE_XMM4_2.
>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>  crc32c-intel and crc32c-generic are both registered, system
>>>will
>  use crc32c-intel because its .cra_priority is greater than
>  crc32c-generic. This case expect to use crc32c-generic driver
for
>  some Zhaoxin CPUs to get performance gain, So remove these
Zhaoxin
>  CPUs support from crc32c-intel.
> 
>  Signed-off-by: Tony W Wang-oc 
> >>>
> >>> Does this mean that the performance of the crc32c instruction
>>on
those CPUs is
> >>> actually slower than a regular C implementation?  That's very
weird.
> >>>
> >>
> >> From the lmbench3 Create and Delete file test on those chips, I
think yes.
> >>
> > 
> > Did you try measuring the performance of the hashing itself, and
not some
> > higher-level filesystem operations?
> > 
> 
> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
same
> input value the generic C implementation takes fewer time than the
> crc32c instruction implementation.
> 

And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>>  Why do these CPUs even
declare that
they support the crc32c instruction, when it is so slow?

>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>>except the crc32c instruction.
>>>
Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
etc.) that
these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>

- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw

Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We 
*do not* bury this information in drivers; it becomes a recipe for the same 
problems over and over.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa

On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
> wrote:
>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>> 
>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>  The driver crc32c-intel match CPUs supporting
>>X86_FEATURE_XMM4_2.
>>>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>When
>>>  crc32c-intel and crc32c-generic are both registered, system
>will
>>>  use crc32c-intel because its .cra_priority is greater than
>>>  crc32c-generic. This case expect to use crc32c-generic driver
>>for
>>>  some Zhaoxin CPUs to get performance gain, So remove these
>>Zhaoxin
>>>  CPUs support from crc32c-intel.
>>> 
>>>  Signed-off-by: Tony W Wang-oc 
>>> >>>
>>> >>> Does this mean that the performance of the crc32c instruction on
>>those CPUs is
>>> >>> actually slower than a regular C implementation?  That's very
>>weird.
>>> >>>
>>> >>
>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>think yes.
>>> >>
>>> > 
>>> > Did you try measuring the performance of the hashing itself, and
>>not some
>>> > higher-level filesystem operations?
>>> > 
>>> 
>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>same
>>> input value the generic C implementation takes fewer time than the
>>> crc32c instruction implementation.
>>> 
>>
>>And that is really "working as intended"?
>
>These CPU's crc32c instruction is not working as intended.
>
>  Why do these CPUs even
>>declare that
>>they support the crc32c instruction, when it is so slow?
>>
>
>The presence of crc32c and some other instructions supports are
>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>except the crc32c instruction.
>
>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>etc.) that
>>these CPUs similarly declare support for but they are uselessly slow?
>
>No.
>
>Sincerely
>Tonyw
>
>>
>>- Eric

Then the right thing to do is to disable the CPUID bit in the vendor-specific 
startup code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCHSET] saner elf compat

2020-12-06 Thread hpa

On December 5, 2020 7:23:05 PM PST, Al Viro  wrote:
>On Thu, Dec 03, 2020 at 11:03:36PM +, Al Viro wrote:
>> > >  The answer (for mainline) is that mips compat does *NOT* want
>> > > COMPAT_BINFMT_ELF.  Not a problem with that series, though, so
>I'd
>> > > retested it (seems to work, both for x86_64 and mips64, execs and
>> > > coredumps for all ABIs alike), with centralization of Kconfig
>logics
>> > > thrown in.
>> > 
>> > Well, the diffstat looks nice:
>> > 
>> > >  26 files changed, 127 insertions(+), 317 deletions(-)
>> > 
>> > and the patches didn't trigger anything for me, but how much did
>this
>> > get tested? Do you actually have both kinds of 32-bit elf mips
>> > binaries around and a machine to test on?
>> 
>> Yes (aptitude install gcc-multilib on debian mips64el/stretch sets
>the toolchain
>> and libraries just fine, and then it's just a matter of -mabi=n32
>passed
>> to gcc).  "Machine" is qemu-system-mips64el -machine malta -m 1024
>-cpu 5KEc
>> and the things appear to work; I hadn't tried that on the actual
>hardware.
>> I do have a Loongson-2 box, but it would take a while to dig it out
>and
>> get it up-to-date.
>> 
>> > Linux-mips was cc'd, but I'm adding Thomas B to the cc here
>explicitly
>> > just so that he has a heads-up on this thing and can go and look at
>> > the mailing list in case it goes to a separate mailbox for him..
>> 
>> I would certainly appreciate review and testing - this branch sat
>> around in the "should post it someday" state since June (it was
>> one of the followups grown from regset work back then), and I'm
>> _not_ going to ask pulling it without an explicit OK from mips
>> folks.
>
>BTW, there's something curious going on in ELF binary recognition for
>x32.  Unlike other 64bit architectures, here we have a 32bit binary
>that successfully passes the native elf_check_arch().  Usually we
>either have different EM_... values for 64bit and 32bit (e.g. for ppc
>and sparc) or we have an explicit check for ->e_ident[EI_CLASS]
>having the right value (ELFCLASS32 or ELFCLASS64 for 32bit and 64bit
>binaries resp.)
>
>For x32 that's not true - we use EM_X86_64 for ->e_machine and that's
>the only thing the native elf_check_arch() is looking at.  IOW,
>it looks like amd64 elf_load_binary() progresses past elf_check_arch()
>for x32 binaries.  And gets to load_elf_phdrs(), which would appear
>to have a check of its own that should reject the sucker:
>/*
> * If the size of this structure has changed, then punt, since
> * we will be doing the wrong thing.
> */
>if (elf_ex->e_phentsize != sizeof(struct elf_phdr))
>goto out;
>After all, ->e_phentsize is going to be 32 (sizeof(struct elf32_phdr)
>rather than expected 56 (sizeof(struct elf64_phdr)) and off we bugger,
>even though it happens at slightly later point than usual.  Except that
>we are looking at struct elf64_hdr ->e_phentsize - in struct elf32_hdr.
>I.e. at offset 54, two bytes past the end of in-file struct elf32_hdr.
>
>Usually we won't find 0x38 0x00 in that location, so everything works,
>but IMO that's too convoluted.
>
>Peter, is there any reason not to check ->ei_ident[EI_CLASS] in
>amd64 elf_check_arch()?  It's a 1-byte load from hot cacheline
>(offset 4 and we'd just read the 4 bytes at offsets 0..3) +
>compare + branch not taken, so performance impact is pretty much
>nil.  I'm not saying it's a security problem or anything of that
>sort, just that it makes the analysis more subtle than it ought
>to be...
>
>Is it about some malformed homegrown 64bit binaries with BS value
>at offset 4?  Confused...

I can't think of any.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86, build: remove -m16 workaround for unsupported versions of GCC

2020-12-01 Thread hpa

On November 30, 2020 5:13:06 PM PST, Nick Desaulniers  
wrote:
>A revert of the following two commits.
>commit de3accdaec88 ("x86, build: Build 16-bit code with -m16 where
>possible")
>commit a9cfccee6604 ("x86, build: Change code16gcc.h from a C header to
>an assembly header")
>
>Since commit 0bddd227f3dc ("Documentation: update for gcc 4.9
>requirement") the minimum supported version of GCC is gcc-4.9.  It's
>now
>safe to remove this code.
>
>Signed-off-by: Nick Desaulniers 
>---
> arch/x86/Makefile |  9 +
> arch/x86/boot/code16gcc.h | 12 
> 2 files changed, 1 insertion(+), 20 deletions(-)
> delete mode 100644 arch/x86/boot/code16gcc.h
>
>diff --git a/arch/x86/Makefile b/arch/x86/Makefile
>index 1bf21746f4ce..7116da3980be 100644
>--- a/arch/x86/Makefile
>+++ b/arch/x86/Makefile
>@@ -24,14 +24,7 @@ endif
> 
># How to compile the 16-bit code.  Note we always compile for
>-march=i386;
> # that way we can complain to the user if the CPU is insufficient.
>-#
>-# The -m16 option is supported by GCC >= 4.9 and clang >= 3.5. For
>-# older versions of GCC, include an *assembly* header to make sure
>that
>-# gcc doesn't play any games behind our back.
>-CODE16GCC_CFLAGS := -m32 -Wa,$(srctree)/arch/x86/boot/code16gcc.h
>-M16_CFLAGS := $(call cc-option, -m16, $(CODE16GCC_CFLAGS))
>-
>-REALMODE_CFLAGS   := $(M16_CFLAGS) -g -Os -DDISABLE_BRANCH_PROFILING \
>+REALMODE_CFLAGS   := -m16 -g -Os -DDISABLE_BRANCH_PROFILING \
>  -Wall -Wstrict-prototypes -march=i386 -mregparm=3 \
>  -fno-strict-aliasing -fomit-frame-pointer -fno-pic \
>  -mno-mmx -mno-sse
>diff --git a/arch/x86/boot/code16gcc.h b/arch/x86/boot/code16gcc.h
>deleted file mode 100644
>index e19fd7536307..
>--- a/arch/x86/boot/code16gcc.h
>+++ /dev/null
>@@ -1,12 +0,0 @@
>-/* SPDX-License-Identifier: GPL-2.0 */
>-#
>-# code16gcc.h
>-#
>-# This file is added to the assembler via -Wa when compiling 16-bit C
>code.
>-# This is done this way instead via asm() to make sure gcc does not
>reorder
>-# things around us.
>-#
>-# gcc 4.9+ has a real -m16 option so we can drop this hack long term.
>-#
>-
>-  .code16gcc

With enthusiasm:

Acked-by: H. Peter Anvin 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa

On October 23, 2020 2:52:16 PM PDT, David Laight  
wrote:
>From: Linus Torvalds
>> Sent: 23 October 2020 22:11
>> 
>> On Fri, Oct 23, 2020 at 2:00 PM  wrote:
>> >
>> > There is no same reason to mess around with hacks when we are
>talking about dx:ax, though.
>> 
>> Sure there is.
>> 
>> "A" doesn't actually mean %edx:%eax like you seem to think.
>> 
>> It actually means %eax OR %edx, and then if given a 64-bit value, it
>> will use the combination (with %edx being the high bits).
>> 
>> So using "A" unconditionally doesn't work - it gives random behavior
>> for 32-bit (or smaller) types.
>> 
>> Or you'd have to cast the value to always be 64-bit, and have the
>> extra code generation.
>> 
>> IOW, an unconditional "A" is wrong.
>> 
>> And the alternative is to just duplicate things, and go back to the
>> explicit size testing, but honestly, I really think that's much worse
>> than relying on a documented feature of "register asm()" that gcc
>> _documents_ is for this kind of inline asm use.
>
>Could do_put_user() do an initial check for 64 bit
>then expand a different #define that contains the actual
>code passing either "a" or "A" for the constriant.
>
>Apart from another level of indirection nothing is duplicated.
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

Maybe #define ASM_AX64 or some such.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa

On October 23, 2020 2:11:19 PM PDT, Linus Torvalds 
 wrote:
>On Fri, Oct 23, 2020 at 2:00 PM  wrote:
>>
>> There is no same reason to mess around with hacks when we are talking
>about dx:ax, though.
>
>Sure there is.
>
>"A" doesn't actually mean %edx:%eax like you seem to think.
>
>It actually means %eax OR %edx, and then if given a 64-bit value, it
>will use the combination (with %edx being the high bits).
>
>So using "A" unconditionally doesn't work - it gives random behavior
>for 32-bit (or smaller) types.
>
>Or you'd have to cast the value to always be 64-bit, and have the
>extra code generation.
>
>IOW, an unconditional "A" is wrong.
>
>And the alternative is to just duplicate things, and go back to the
>explicit size testing, but honestly, I really think that's much worse
>than relying on a documented feature of "register asm()" that gcc
>_documents_ is for this kind of inline asm use.
>
>So the "don't do pointless conditional duplication" is certainly a
>very sane reason for the code.
>
>Linus

Unconditional "A" is definitely wrong, no argument there.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa

On October 23, 2020 1:55:22 PM PDT, Linus Torvalds 
 wrote:
>Thanks, applied.
>
>On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
> wrote:
>>
>> I'm wondering if one would also need to make __ptr_pu and __ret_pu
>> explicitly "%"_ASM_CX".
>
>No, the "c"/"0" thing is much better, and makes it properly atomic wrt
>the actual asm.
>
>As mentioned to Andy, the "register asm()" thing is not uncommon and
>often useful, but when you can specify the register directly in asm,
>that's certainly simpler and more straightforward and preferred.
>
>  Linus

There is no same reason to mess around with hacks when we are talking about 
dx:ax, though. We have to do pretty ugly hacks when other register pairs are 
involved, but "A" is there for a reason. _ASM_AX64 maybe...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/uaccess: fix code generation in put_user()

2020-10-23 Thread hpa

On October 23, 2020 1:42:39 PM PDT, Andy Lutomirski  wrote:
>On Fri, Oct 23, 2020 at 1:32 PM Rasmus Villemoes
> wrote:
>>
>> Quoting
>> https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
>>
>>   You can define a local register variable and associate it with a
>>   specified register...
>>
>>   The only supported use for this feature is to specify registers for
>>   input and output operands when calling Extended asm (see Extended
>>   Asm). This may be necessary if the constraints for a particular
>>   machine don't provide sufficient control to select the desired
>>   register.
>>
>> On 32-bit x86, this is used to ensure that gcc will put an 8-byte
>> value into the %edx:%eax pair, while all other cases will just use
>the
>> single register %eax (%rax on x86-64). While the _ASM_AX actually
>just
>> expands to "%eax", note this comment next to get_user() which does
>> something very similar:
>>
>>  * The use of _ASM_DX as the register specifier is a bit of a
>>  * simplification, as gcc only cares about it as the starting point
>>  * and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
>>  * (%ecx being the next register in gcc's x86 register sequence), and
>>  * %rdx on 64 bits.
>>
>> However, getting this to work requires that there is no code between
>> the assignment to the local register variable and its use as an input
>> to the asm() which can possibly clobber any of the registers involved
>> - including evaluation of the expressions making up other inputs.
>
>This looks like the patch is an improvement, but this is still IMO
>likely to be very fragile.  Can we just do the size-dependent "a" vs
>"A" selection method instead?  Sure, it's a little more code, but it
>will be Obviously Correct.  As it stands, I can easily see our code
>failing on some gcc or clang version and the compiler authors telling
>us that we're relying on unsupportable behavior.

Yeah, the reason get_user hacks it is because there is no equivalent to "A" for 
other register pairs, but not using it for dx:ax is just silly.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2] x86/insn, tools/x86: Fix some potential undefined behavior.

2020-10-15 Thread hpa

On October 15, 2020 9:12:16 AM PDT, Ian Rogers  wrote:
>From: Numfor Mbiziwo-Tiapo 
>
>Don't perform unaligned loads in __get_next and __peek_nbyte_next as
>these are forms of undefined behavior.
>
>These problems were identified using the undefined behavior sanitizer
>(ubsan) with the tools version of the code and perf test. Part of this
>patch was previously posted here:
>https://lore.kernel.org/lkml/20190724184512.162887-4-n...@google.com/
>
>v2. removes the validate_next check and merges the 2 changes into one
>as
>requested by Masami Hiramatsu 
>
>Signed-off-by: Ian Rogers 
>Signed-off-by: Numfor Mbiziwo-Tiapo 
>---
> arch/x86/lib/insn.c   | 4 ++--
> tools/arch/x86/lib/insn.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
>index 404279563891..be88ab250146 100644
>--- a/arch/x86/lib/insn.c
>+++ b/arch/x86/lib/insn.c
>@@ -20,10 +20,10 @@
>   ((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
> 
> #define __get_next(t, insn)   \
>-  ({ t r = *(t*)insn->next_byte; insn->next_byte += sizeof(t); r; })
>+  ({ t r; memcpy(, insn->next_byte, sizeof(t)); insn->next_byte +=
>sizeof(t); r; })
> 
> #define __peek_nbyte_next(t, insn, n) \
>-  ({ t r = *(t*)((insn)->next_byte + n); r; })
>+  ({ t r; memcpy(, (insn)->next_byte + n, sizeof(t)); r; })
> 
> #define get_next(t, insn) \
>   ({ if (unlikely(!validate_next(t, insn, 0))) goto err_out;
>__get_next(t, insn); })
>diff --git a/tools/arch/x86/lib/insn.c b/tools/arch/x86/lib/insn.c
>index 0151dfc6da61..92358c71a59e 100644
>--- a/tools/arch/x86/lib/insn.c
>+++ b/tools/arch/x86/lib/insn.c
>@@ -20,10 +20,10 @@
>   ((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
> 
> #define __get_next(t, insn)   \
>-  ({ t r = *(t*)insn->next_byte; insn->next_byte += sizeof(t); r; })
>+  ({ t r; memcpy(, insn->next_byte, sizeof(t)); insn->next_byte +=
>sizeof(t); r; })
> 
> #define __peek_nbyte_next(t, insn, n) \
>-  ({ t r = *(t*)((insn)->next_byte + n); r; })
>+  ({ t r; memcpy(, (insn)->next_byte + n, sizeof(t)); r; })
> 
> #define get_next(t, insn) \
>   ({ if (unlikely(!validate_next(t, insn, 0))) goto err_out;
>__get_next(t, insn); })

Wait, what?

You are taking about x86-specific code, and on x86 unaligned memory accesses 
are supported, well-defined, and ubiquitous. 

This is B.S. at best, and unless the compiler turns the memcpy() right back 
into what you started with, deleterious for performance.

If you have a *very* good reason for this kind of churn, wrap it in the 
unaligned access macros, but using memcpy() is insane. All you are doing is 
making the code worse.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/4] x86: add __X32_COND_SYSCALL() macro

2020-09-19 Thread hpa

On September 19, 2020 9:23:22 AM PDT, Andy Lutomirski  wrote:
>On Fri, Sep 18, 2020 at 10:35 PM Christoph Hellwig 
>wrote:
>>
>> On Fri, Sep 18, 2020 at 03:24:36PM +0200, Arnd Bergmann wrote:
>> > sys_move_pages() is an optional syscall, and once we remove
>> > the compat version of it in favor of the native one with an
>> > in_compat_syscall() check, the x32 syscall table refers to
>> > a __x32_sys_move_pages symbol that may not exist when the
>> > syscall is disabled.
>> >
>> > Change the COND_SYSCALL() definition on x86 to also include
>> > the redirection for x32.
>> >
>> > Signed-off-by: Arnd Bergmann 
>>
>> Adding the x86 maintainers and Brian Gerst.  Brian proposed another
>> problem to the mess that most of the compat syscall handlers used by
>> x32 here:
>>
>>https://lkml.org/lkml/2020/6/16/664
>>
>> hpa didn't particularly like it, but with your and my pending series
>> we'll soon use more native than compat syscalls for x32, so something
>> will need to change..
>
>I'm fine with either solution.

My main objection was naming. x64 is a widely used synonym for x86-64, and so 
that is confusing.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/special_insn: reverse __force_order logic

2020-09-02 Thread hpa

On September 1, 2020 9:18:57 AM PDT, Nadav Amit  wrote:
>From: Nadav Amit 
>
>The __force_order logic seems to be inverted. __force_order is
>supposedly used to manipulate the compiler to use its memory
>dependencies analysis to enforce orders between CR writes and reads.
>Therefore, the memory should behave as a "CR": when the CR is read, the
>memory should be "read" by the inline assembly, and __force_order
>should
>be an output. When the CR is written, the memory should be "written".
>
>This change should allow to remove the "volatile" qualifier from CR
>reads at a later patch.
>
>While at it, remove the extra new-line from the inline assembly, as it
>only confuses GCC when it estimates the cost of the inline assembly.
>
>Cc: Thomas Gleixner 
>Cc: Ingo Molnar 
>Cc: Borislav Petkov 
>Cc: x...@kernel.org
>Cc: "H. Peter Anvin" 
>Cc: Peter Zijlstra 
>Cc: Andy Lutomirski 
>Cc: Kees Cook 
>Cc: sta...@vger.kernel.org
>Signed-off-by: Nadav Amit 
>
>---
>
>Unless I misunderstand the logic, __force_order should also be used by
>rdpkru() and wrpkru() which do not have dependency on __force_order. I
>also did not understand why native_write_cr0() has R/W dependency on
>__force_order, and why native_write_cr4() no longer has any dependency
>on __force_order.
>---
> arch/x86/include/asm/special_insns.h | 14 +++---
> 1 file changed, 7 insertions(+), 7 deletions(-)
>
>diff --git a/arch/x86/include/asm/special_insns.h
>b/arch/x86/include/asm/special_insns.h
>index 5999b0b3dd4a..dff5e5b01a3c 100644
>--- a/arch/x86/include/asm/special_insns.h
>+++ b/arch/x86/include/asm/special_insns.h
>@@ -24,32 +24,32 @@ void native_write_cr0(unsigned long val);
> static inline unsigned long native_read_cr0(void)
> {
>   unsigned long val;
>-  asm volatile("mov %%cr0,%0\n\t" : "=r" (val), "=m" (__force_order));
>+  asm volatile("mov %%cr0,%0" : "=r" (val) : "m" (__force_order));
>   return val;
> }
> 
> static __always_inline unsigned long native_read_cr2(void)
> {
>   unsigned long val;
>-  asm volatile("mov %%cr2,%0\n\t" : "=r" (val), "=m" (__force_order));
>+  asm volatile("mov %%cr2,%0" : "=r" (val) : "m" (__force_order));
>   return val;
> }
> 
> static __always_inline void native_write_cr2(unsigned long val)
> {
>-  asm volatile("mov %0,%%cr2": : "r" (val), "m" (__force_order));
>+  asm volatile("mov %1,%%cr2" : "=m" (__force_order) : "r" (val));
> }
> 
> static inline unsigned long __native_read_cr3(void)
> {
>   unsigned long val;
>-  asm volatile("mov %%cr3,%0\n\t" : "=r" (val), "=m" (__force_order));
>+  asm volatile("mov %%cr3,%0" : "=r" (val) : "m" (__force_order));
>   return val;
> }
> 
> static inline void native_write_cr3(unsigned long val)
> {
>-  asm volatile("mov %0,%%cr3": : "r" (val), "m" (__force_order));
>+  asm volatile("mov %1,%%cr3" : "=m" (__force_order) : "r" (val));
> }
> 
> static inline unsigned long native_read_cr4(void)
>@@ -64,10 +64,10 @@ static inline unsigned long native_read_cr4(void)
>   asm volatile("1: mov %%cr4, %0\n"
>"2:\n"
>_ASM_EXTABLE(1b, 2b)
>-   : "=r" (val), "=m" (__force_order) : "0" (0));
>+   : "=r" (val) : "m" (__force_order), "0" (0));
> #else
>   /* CR4 always exists on x86_64. */
>-  asm volatile("mov %%cr4,%0\n\t" : "=r" (val), "=m" (__force_order));
>+  asm volatile("mov %%cr4,%0" : "=r" (val) : "m" (__force_order));
> #endif
>   return val;
> }

This seems reasonable to me, and I fully agree with you that the logic seems to 
be just plain wrong (and unnoticed since all volatile operations are strictly 
ordered anyway), but you better not remove the volatile from cr2 read... unlike 
the other CRs that one is written by hardware and so genuinely is volatile.

However, I do believe "=m" is at least theoretically wrong. You are only 
writing *part* of the total state represented by this token (you can think of 
it as equivalent to a bitop), so it should be "+m".
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [REGRESSION] x86/cpu fsgsbase breaks TLS in 32 bit rr tracees on a 64 bit system

2020-08-25 Thread hpa

On August 24, 2020 5:30:56 PM PDT, Andy Lutomirski  wrote:
>On Mon, Aug 24, 2020 at 4:52 PM H. Peter Anvin  wrote:
>>
>> On 2020-08-24 14:10, Andy Lutomirski wrote:
>> >
>> > PTRACE_READ_SEGMENT_DESCRIPTOR to read a segment descriptor.
>> >
>> > PTRACE_SET_FS / PTRACE_SET_GS: Sets FS or GS and updates the base
>accordingly.
>> >
>> > PTRACE_READ_SEGMENT_BASE: pass in a segment selector, get a base
>out.
>> > You would use this to populate the base fields.
>> >
>> > or perhaps a ptrace SETREGS variant that tries to preserve the old
>> > base semantics and magically sets the bases to match the selectors
>if
>> > the selectors are nonzero.
>> >
>> > Do any of these choices sound preferable to any of you?
>> >
>>
>> My suggestion would be to export the GDT and LDT as a (readonly or
>mostly
>> readonly) regset(s) rather than adding entirely new operations. We
>could allow
>> the LDT and the per-thread GDT entries to be written, subject to the
>same
>> limitations as the corresponding system calls.
>>
>
>That seems useful, although we'd want to do some extensive
>sanitization of the GDT.  But maybe it's obnoxious to ask Kyle and
>Robert to parse the GDT, LDT, and selector just to emulate the
>demented pre-5.9 ptrace() behavior.
>
>--Andy

We only want to allow the same access that user space gets, that's exactly the 
sanitization we need.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/entry/64: Disallow RDPID in paranoid entry if KVM is enabled

2020-08-21 Thread hpa

On August 21, 2020 2:28:32 AM PDT, Thomas Gleixner  wrote:
>On Fri, Aug 21 2020 at 10:09, Paolo Bonzini wrote:
>> On 21/08/20 09:47, Borislav Petkov wrote:
>>> On Thu, Aug 20, 2020 at 07:50:50PM -0700, Sean Christopherson wrote:
 +   * Disallow RDPID if KVM is enabled as it may consume a guest's
>TSC_AUX
 +   * if an NMI arrives in KVM's run loop.  KVM loads guest's
>TSC_AUX on
 +   * VM-Enter and may not restore the host's value until the CPU
>returns
 +   * to userspace, i.e. KVM depends on the kernel not using
>TSC_AUX.
 */
>>> And frankly, this is really unfair. The kernel should be able to use
>any
>>> MSR. IOW, KVM needs to be fixed here. I'm sure it context-switches
>other
>>> MSRs so one more MSR is not a big deal.
>>
>> The only MSR that KVM needs to context-switch manually are XSS and
>> SPEC_CTRL.  They tend to be the same on host and guest in which case
>> they can be optimized away.
>>
>> All the other MSRs (EFER and PAT are those that come to mind) are
>> handled by the microcode and thus they don't have the slowness of
>> RDMSR/WRMSR
>>
>> One more MSR *is* a big deal: KVM's vmentry+vmexit cost is around
>1000
>> cycles, adding 100 clock cycles for 2 WRMSRs is a 10% increase.
>
>We all know that MSRs are slow, but as a general rule I have to make it
>entirely clear that the kernel has precedence over KVM.
>
>If the kernel wants to use an MSR for it's own purposes then KVM has to
>deal with that and not the other way round. Preventing the kernel from
>using a facility freely is not an option ever.
>
>The insanities of KVM performance optimizations have bitten us more
>than
>once.
>
>For this particular case at hand I don't care much and we should just
>rip the whole RDPID thing out unconditionally. We still have zero
>numbers about the performance difference vs. LSL.
>
>Thanks,
>
>tglx

It is hardly going to be a performance difference for paranoid entry, which is 
hopefully rare enough that it falls into the noise.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

2020-08-04 Thread hpa

On August 4, 2020 10:08:08 PM PDT, Borislav Petkov  wrote:
>On Tue, Aug 04, 2020 at 09:58:25PM -0700, h...@zytor.com wrote:
>> Because why use an alternative to jump over one instruction?
>>
>> I personally would prefer to have the IRET put out of line
>
>Can't yet - SERIALIZE CPUs are a minority at the moment.
>
>> and have the call/jmp replaced by SERIALIZE inline.
>
>Well, we could do:
>
>   alternative_io("... IRET bunch", __ASM_SERIALIZE,
>X86_FEATURE_SERIALIZE, ...);
>
>and avoid all kinds of jumping. Alternatives get padded so there
>would be a couple of NOPs following when SERIALIZE gets patched in
>but it shouldn't be a problem. I guess one needs to look at what gcc
>generates...

I didn't say behind a trap. IRET is a control transfer instruction, and slow, 
so putting it out of line really isn't unreasonable. Can even do a call to a 
common handler.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2] x86/cpu: Use SERIALIZE in sync_core() when available

2020-08-04 Thread hpa

On August 4, 2020 9:48:40 PM PDT, Borislav Petkov  wrote:
>On Tue, Aug 04, 2020 at 07:10:59PM -0700, Ricardo Neri wrote:
>> The SERIALIZE instruction gives software a way to force the processor
>to
>> complete all modifications to flags, registers and memory from
>previous
>> instructions and drain all buffered writes to memory before the next
>> instruction is fetched and executed. Thus, it serves the purpose of
>> sync_core(). Use it when available.
>> 
>> Commit 7117f16bf460 ("objtool: Fix ORC vs alternatives") enforced
>stack
>> invariance in alternatives. The iret-to-self does not comply with
>such
>> invariance. Thus, it cannot be used inside alternative code. Instead,
>use
>> an alternative that jumps to SERIALIZE when available.
>> 
>> Cc: Andy Lutomirski 
>> Cc: Cathy Zhang 
>> Cc: Dave Hansen 
>> Cc: Fenghua Yu 
>> Cc: "H. Peter Anvin" 
>> Cc: Kyung Min Park 
>> Cc: Peter Zijlstra 
>> Cc: "Ravi V. Shankar" 
>> Cc: Sean Christopherson 
>> Cc: linux-e...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Suggested-by: Andy Lutomirski 
>> Signed-off-by: Ricardo Neri 
>> ---
>> This is a v2 from my initial submission [1]. The first three patches
>of
>> the series have been merged in Linus' tree. Hence, I am submitting
>only
>> this patch for review.
>> 
>> [1]. https://lkml.org/lkml/2020/7/27/8
>> 
>> Changes since v1:
>>  * Support SERIALIZE using alternative runtime patching.
>>(Peter Zijlstra, H. Peter Anvin)
>>  * Added a note to specify which version of binutils supports
>SERIALIZE.
>>(Peter Zijlstra)
>>  * Verified that (::: "memory") is used. (H. Peter Anvin)
>> ---
>>  arch/x86/include/asm/special_insns.h |  2 ++
>>  arch/x86/include/asm/sync_core.h | 10 +-
>>  2 files changed, 11 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/x86/include/asm/special_insns.h
>b/arch/x86/include/asm/special_insns.h
>> index 59a3e13204c3..25cd67801dda 100644
>> --- a/arch/x86/include/asm/special_insns.h
>> +++ b/arch/x86/include/asm/special_insns.h
>> @@ -10,6 +10,8 @@
>>  #include 
>>  #include 
>>  
>> +/* Instruction opcode for SERIALIZE; supported in binutils >= 2.35.
>*/
>> +#define __ASM_SERIALIZE ".byte 0xf, 0x1, 0xe8"
>>  /*
>>   * Volatile isn't enough to prevent the compiler from reordering the
>>   * read/write functions for the control registers and messing
>everything up.
>> diff --git a/arch/x86/include/asm/sync_core.h
>b/arch/x86/include/asm/sync_core.h
>> index fdb5b356e59b..201ea3d9a6bd 100644
>> --- a/arch/x86/include/asm/sync_core.h
>> +++ b/arch/x86/include/asm/sync_core.h
>> @@ -5,15 +5,19 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  #ifdef CONFIG_X86_32
>>  static inline void iret_to_self(void)
>>  {
>>  asm volatile (
>> +ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
>>  "pushfl\n\t"
>>  "pushl %%cs\n\t"
>>  "pushl $1f\n\t"
>>  "iret\n\t"
>> +"2:\n\t"
>> +__ASM_SERIALIZE "\n"
>>  "1:"
>>  : ASM_CALL_CONSTRAINT : : "memory");
>>  }
>> @@ -23,6 +27,7 @@ static inline void iret_to_self(void)
>>  unsigned int tmp;
>>  
>>  asm volatile (
>> +ALTERNATIVE("", "jmp 2f", X86_FEATURE_SERIALIZE)
>
>Why is this and above stuck inside the asm statement?
>
>Why can't you simply do:
>
>   if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
>   asm volatile(__ASM_SERIALIZE ::: "memory");
>   return;
>   }
>
>on function entry instead of making it more unreadable for no
>particular
>reason?

Because why use an alternative to jump over one instruction?

I personally would prefer to have the IRET put out of line and have the 
call/jmp replaced by SERIALIZE inline.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 4/4] x86/cpu: Use SERIALIZE in sync_core() when available

2020-07-27 Thread hpa

On July 27, 2020 1:36:19 AM PDT, pet...@infradead.org wrote:
>On Sun, Jul 26, 2020 at 10:55:15PM -0700, h...@zytor.com wrote:
>> For a really overenginered solution, but which might perform
>> unnecessary poorly on existing hardware:
>> 
>> asm volatile("1: .byte 0xf, 0x1, 0xe8; 2:"
>> _ASM_EXTABLE(1b,2b));
>
>Ha! cute, you take an #UD ?
>
>We could optimize the #UD exception handler for this I suppose, but
>that
>makes it an even worse hack. The simple alternative() seems like a much
>simpler approach.

If this is in any way performance critical, then no :) Taking the #UD has the 
cute property that we end up IRET on the way back, so we don't even need a 
fix-up path.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 4/4] x86/cpu: Use SERIALIZE in sync_core() when available

2020-07-27 Thread hpa

On July 27, 2020 1:20:03 AM PDT, pet...@infradead.org wrote:
>On Sun, Jul 26, 2020 at 09:31:32PM -0700, Ricardo Neri wrote:
>> +static inline void serialize(void)
>> +{
>> +asm volatile(".byte 0xf, 0x1, 0xe8");
>> +}
>
>Can we pretty please have a comment with the binutils version that has
>the mnomic? Such that when we increase the required binutils version we
>can grep for this.

It also needs : : : "memory" to be a barrier.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 18/23] init: open code setting up stdin/stdout/stderr

2020-07-27 Thread hpa

On July 26, 2020 11:24:25 PM PDT, Christoph Hellwig  wrote:
>On Sun, Jul 26, 2020 at 11:20:41PM -0700, h...@zytor.com wrote:
>> On July 26, 2020 8:05:34 PM PDT, Al Viro 
>wrote:
>> >On Tue, Jul 14, 2020 at 09:04:22PM +0200, Christoph Hellwig wrote:
>> >> Don't rely on the implicit set_fs(KERNEL_DS) for ksys_open to
>work,
>> >but
>> >> instead open a struct file for /dev/console and then install it as
>FD
>> >> 0/1/2 manually.
>> >
>> >I really hate that one.  Every time we exposed the internal details
>to
>> >the fucking early init code, we paid for that afterwards.  And this
>> >goes over the top wrt the level of details being exposed.
>> >
>> >_IF_ you want to keep that thing, move it to fs/file.c, with dire
>> >comment
>> >re that being very special shite for init and likely cause of
>> >subsequent
>> >trouble whenever anything gets changed, a gnat farts somewhere, etc.
>> >
>> >Do not leave that kind of crap sitting around init/*.c; KERNEL_DS
>> >may be a source of occasional PITA, but here you are trading it for
>a
>> >lot
>> >worse one in the future.
>> 
>> Okay... here is a perhaps idiotic idea... even if we don't want to
>run stuff in actual user space, could we map initramfs into user space
>memory before running init (execing init will tear down those mappings
>anyway) so that we don't need KERNEL_DS at least?
>
>Err, why?  The changes have been pretty simple, and I'd rather not come
>up with new crazy ways just to make things complicated.

Why? To avoid this neverending avalanche of special interfaces and layering 
violations. Neatly deals with non-contiguous contents and initramfs in device 
memory, etc. etc. etc.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 18/23] init: open code setting up stdin/stdout/stderr

2020-07-27 Thread hpa

On July 26, 2020 8:05:34 PM PDT, Al Viro  wrote:
>On Tue, Jul 14, 2020 at 09:04:22PM +0200, Christoph Hellwig wrote:
>> Don't rely on the implicit set_fs(KERNEL_DS) for ksys_open to work,
>but
>> instead open a struct file for /dev/console and then install it as FD
>> 0/1/2 manually.
>
>I really hate that one.  Every time we exposed the internal details to
>the fucking early init code, we paid for that afterwards.  And this
>goes over the top wrt the level of details being exposed.
>
>_IF_ you want to keep that thing, move it to fs/file.c, with dire
>comment
>re that being very special shite for init and likely cause of
>subsequent
>trouble whenever anything gets changed, a gnat farts somewhere, etc.
>
>   Do not leave that kind of crap sitting around init/*.c; KERNEL_DS
>may be a source of occasional PITA, but here you are trading it for a
>lot
>worse one in the future.

Okay... here is a perhaps idiotic idea... even if we don't want to run stuff in 
actual user space, could we map initramfs into user space memory before running 
init (execing init will tear down those mappings anyway) so that we don't need 
KERNEL_DS at least?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 4/4] x86/cpu: Use SERIALIZE in sync_core() when available

2020-07-27 Thread hpa

On July 26, 2020 10:55:15 PM PDT, h...@zytor.com wrote:
>On July 26, 2020 9:31:32 PM PDT, Ricardo Neri
> wrote:
>>The SERIALIZE instruction gives software a way to force the processor
>>to
>>complete all modifications to flags, registers and memory from
>previous
>>instructions and drain all buffered writes to memory before the next
>>instruction is fetched and executed. Thus, it serves the purpose of
>>sync_core(). Use it when available.
>>
>>Use boot_cpu_has() and not static_cpu_has(); the most critical paths
>>(returning to user mode and from interrupt and NMI) will not reach
>>sync_core().
>>
>>Cc: Andy Lutomirski 
>>Cc: Cathy Zhang 
>>Cc: Dave Hansen 
>>Cc: Fenghua Yu 
>>Cc: "H. Peter Anvin" 
>>Cc: Kyung Min Park 
>>Cc: Peter Zijlstra 
>>Cc: "Ravi V. Shankar" 
>>Cc: Sean Christopherson 
>>Cc: linux-e...@vger.kernel.org
>>Cc: linux-kernel@vger.kernel.org
>>Reviwed-by: Tony Luck 
>>Suggested-by: Andy Lutomirski 
>>Signed-off-by: Ricardo Neri 
>>---
>>---
>> arch/x86/include/asm/special_insns.h |  5 +
>> arch/x86/include/asm/sync_core.h | 10 +-
>> 2 files changed, 14 insertions(+), 1 deletion(-)
>>
>>diff --git a/arch/x86/include/asm/special_insns.h
>>b/arch/x86/include/asm/special_insns.h
>>index 59a3e13204c3..0a2a60bba282 100644
>>--- a/arch/x86/include/asm/special_insns.h
>>+++ b/arch/x86/include/asm/special_insns.h
>>@@ -234,6 +234,11 @@ static inline void clwb(volatile void *__p)
>> 
>> #define nop() asm volatile ("nop")
>> 
>>+static inline void serialize(void)
>>+{
>>+ asm volatile(".byte 0xf, 0x1, 0xe8");
>>+}
>>+
>> #endif /* __KERNEL__ */
>> 
>> #endif /* _ASM_X86_SPECIAL_INSNS_H */
>>diff --git a/arch/x86/include/asm/sync_core.h
>>b/arch/x86/include/asm/sync_core.h
>>index fdb5b356e59b..bf132c09d61b 100644
>>--- a/arch/x86/include/asm/sync_core.h
>>+++ b/arch/x86/include/asm/sync_core.h
>>@@ -5,6 +5,7 @@
>> #include 
>> #include 
>> #include 
>>+#include 
>> 
>> #ifdef CONFIG_X86_32
>> static inline void iret_to_self(void)
>>@@ -54,7 +55,8 @@ static inline void iret_to_self(void)
>> static inline void sync_core(void)
>> {
>>  /*
>>-  * There are quite a few ways to do this.  IRET-to-self is nice
>>+  * Hardware can do this for us if SERIALIZE is available. Otherwise,
>>+  * there are quite a few ways to do this.  IRET-to-self is nice
>>   * because it works on every CPU, at any CPL (so it's compatible
>>   * with paravirtualization), and it never exits to a hypervisor.
>>   * The only down sides are that it's a bit slow (it seems to be
>>@@ -75,6 +77,12 @@ static inline void sync_core(void)
>>   * Like all of Linux's memory ordering operations, this is a
>>   * compiler barrier as well.
>>   */
>>+
>>+ if (boot_cpu_has(X86_FEATURE_SERIALIZE)) {
>>+ serialize();
>>+ return;
>>+ }
>>+
>>  iret_to_self();
>> }
>> 
>
>Any reason to not make sync_core() an inline with alternatives?
>
>For a really overenginered solution, but which might perform
>unnecessary poorly on existing hardware:
>
>asm volatile("1: .byte 0xf, 0x1, 0xe8; 2:"
>_ASM_EXTABLE(1b,2b));

(and : : : "memory" of course.)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 4/4] x86/cpu: Use SERIALIZE in sync_core() when available

2020-07-26 Thread hpa

On July 26, 2020 9:31:32 PM PDT, Ricardo Neri 
 wrote:
>The SERIALIZE instruction gives software a way to force the processor
>to
>complete all modifications to flags, registers and memory from previous
>instructions and drain all buffered writes to memory before the next
>instruction is fetched and executed. Thus, it serves the purpose of
>sync_core(). Use it when available.
>
>Use boot_cpu_has() and not static_cpu_has(); the most critical paths
>(returning to user mode and from interrupt and NMI) will not reach
>sync_core().
>
>Cc: Andy Lutomirski 
>Cc: Cathy Zhang 
>Cc: Dave Hansen 
>Cc: Fenghua Yu 
>Cc: "H. Peter Anvin" 
>Cc: Kyung Min Park 
>Cc: Peter Zijlstra 
>Cc: "Ravi V. Shankar" 
>Cc: Sean Christopherson 
>Cc: linux-e...@vger.kernel.org
>Cc: linux-kernel@vger.kernel.org
>Reviwed-by: Tony Luck 
>Suggested-by: Andy Lutomirski 
>Signed-off-by: Ricardo Neri 
>---
>---
> arch/x86/include/asm/special_insns.h |  5 +
> arch/x86/include/asm/sync_core.h | 10 +-
> 2 files changed, 14 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/special_insns.h
>b/arch/x86/include/asm/special_insns.h
>index 59a3e13204c3..0a2a60bba282 100644
>--- a/arch/x86/include/asm/special_insns.h
>+++ b/arch/x86/include/asm/special_insns.h
>@@ -234,6 +234,11 @@ static inline void clwb(volatile void *__p)
> 
> #define nop() asm volatile ("nop")
> 
>+static inline void serialize(void)
>+{
>+  asm volatile(".byte 0xf, 0x1, 0xe8");
>+}
>+
> #endif /* __KERNEL__ */
> 
> #endif /* _ASM_X86_SPECIAL_INSNS_H */
>diff --git a/arch/x86/include/asm/sync_core.h
>b/arch/x86/include/asm/sync_core.h
>index fdb5b356e59b..bf132c09d61b 100644
>--- a/arch/x86/include/asm/sync_core.h
>+++ b/arch/x86/include/asm/sync_core.h
>@@ -5,6 +5,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> #ifdef CONFIG_X86_32
> static inline void iret_to_self(void)
>@@ -54,7 +55,8 @@ static inline void iret_to_self(void)
> static inline void sync_core(void)
> {
>   /*
>-   * There are quite a few ways to do this.  IRET-to-self is nice
>+   * Hardware can do this for us if SERIALIZE is available. Otherwise,
>+   * there are quite a few ways to do this.  IRET-to-self is nice
>* because it works on every CPU, at any CPL (so it's compatible
>* with paravirtualization), and it never exits to a hypervisor.
>* The only down sides are that it's a bit slow (it seems to be
>@@ -75,6 +77,12 @@ static inline void sync_core(void)
>* Like all of Linux's memory ordering operations, this is a
>* compiler barrier as well.
>*/
>+
>+  if (boot_cpu_has(X86_FEATURE_SERIALIZE)) {
>+  serialize();
>+  return;
>+  }
>+
>   iret_to_self();
> }
> 

Any reason to not make sync_core() an inline with alternatives?

For a really overenginered solution, but which might perform unnecessary poorly 
on existing hardware:

asm volatile("1: .byte 0xf, 0x1, 0xe8; 2:"
_ASM_EXTABLE(1b,2b));

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/idt: Make sure idt_table takes a whole page

2020-07-18 Thread hpa

On July 18, 2020 12:25:46 PM PDT, Andy Lutomirski  wrote:
>
>> On Jul 18, 2020, at 10:57 AM, h...@zytor.com wrote:
>> 
>> On July 9, 2020 3:33:55 AM PDT, Joerg Roedel 
>wrote:
>>> From: Joerg Roedel 
>>> 
>>> On x86-32 the idt_table with 256 entries needs only 2048 bytes. It
>is
>>> page-aligned, but the end of the .bss..page_aligned section is not
>>> guaranteed to be page-aligned.
>>> 
>>> As a result, symbols from other .bss sections may end up on the same
>>> 4k page as the idt_table, and will accidentially get mapped
>read-only
>>> during boot, causing unexpected page-faults when the kernel writes
>to
>>> them.
>>> 
>>> Avoid this by making the idt_table 4kb in size even on x86-32. On
>>> x86-64 the idt_table is already 4kb large, so nothing changes there.
>>> 
>>> Fixes: 3e77abda65b1c ("x86/idt: Consolidate idt functionality")
>>> Signed-off-by: Joerg Roedel 
>>> ---
>>> arch/x86/kernel/idt.c | 12 ++--
>>> 1 file changed, 10 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
>>> index 0db21206f2f3..83e24f837127 100644
>>> --- a/arch/x86/kernel/idt.c
>>> +++ b/arch/x86/kernel/idt.c
>>> @@ -157,8 +157,13 @@ static const __initconst struct idt_data
>>> apic_idts[] = {
>>> #endif
>>> };
>>> 
>>> -/* Must be page-aligned because the real IDT is used in the cpu
>entry
>>> area */
>>> -static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
>>> +/*
>>> + * Must be page-aligned because the real IDT is used in the cpu
>entry
>>> area.
>>> + * Allocate more entries than needed so that idt_table takes a
>whole
>>> page, so it
>>> + * is safe to map the idt_table read-only and into the user-space
>>> page-table.
>>> + */
>>> +#define IDT_ENTRIES_ALLOCATED(PAGE_SIZE / sizeof(gate_desc))
>>> +static gate_desc idt_table[IDT_ENTRIES_ALLOCATED]
>__page_aligned_bss;
>>> 
>>> struct desc_ptr idt_descr __ro_after_init = {
>>>.size= IDT_TABLE_SIZE - 1,
>>> @@ -335,6 +340,9 @@ void __init idt_setup_apic_and_irq_gates(void)
>>>idt_map_in_cea();
>>>load_idt(_descr);
>>> 
>>> +BUILD_BUG_ON(IDT_ENTRIES_ALLOCATED < IDT_ENTRIES);
>>> +BUILD_BUG_ON(sizeof(idt_table) != PAGE_SIZE);
>>> +
>>>/* Make the IDT table read only */
>>>set_memory_ro((unsigned long)_table, 1);
>>> 
>> 
>> NAK... this isn't the right way to fix this and just really kicks the
>can down the road. The reason is that you aren't fixing the module that
>actually has a problem.
>> 
>> The Right Way[TM] is to figure out which module(s) lack the proper
>alignment for this section. A script using objdump -h or readelf -SW
>running over the .o files looking for alignment less than 2**12 should
>spot the modules that are missing the proper .balign directives.
>
>I don’t see the problem. If we are going to treat an object as though
>it’s 4096 bytes, making C think it’s 4096 bytes seems entirely
>reasonable to me.
>
>> -- 
>
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.

It isn't the object, it is its alignment. You can have an object page-aligned 
so it doesn't cross page boundaries.

The bigger issue, though, is that this means there are other object files which 
don't have the correct alignment directives, which means that this error can 
crop up again at any point. The really important thing here is that we fix the 
real problem.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/idt: Make sure idt_table takes a whole page

2020-07-18 Thread hpa

On July 9, 2020 3:33:55 AM PDT, Joerg Roedel  wrote:
>From: Joerg Roedel 
>
>On x86-32 the idt_table with 256 entries needs only 2048 bytes. It is
>page-aligned, but the end of the .bss..page_aligned section is not
>guaranteed to be page-aligned.
>
>As a result, symbols from other .bss sections may end up on the same
>4k page as the idt_table, and will accidentially get mapped read-only
>during boot, causing unexpected page-faults when the kernel writes to
>them.
>
>Avoid this by making the idt_table 4kb in size even on x86-32. On
>x86-64 the idt_table is already 4kb large, so nothing changes there.
>
>Fixes: 3e77abda65b1c ("x86/idt: Consolidate idt functionality")
>Signed-off-by: Joerg Roedel 
>---
> arch/x86/kernel/idt.c | 12 ++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kernel/idt.c b/arch/x86/kernel/idt.c
>index 0db21206f2f3..83e24f837127 100644
>--- a/arch/x86/kernel/idt.c
>+++ b/arch/x86/kernel/idt.c
>@@ -157,8 +157,13 @@ static const __initconst struct idt_data
>apic_idts[] = {
> #endif
> };
> 
>-/* Must be page-aligned because the real IDT is used in the cpu entry
>area */
>-static gate_desc idt_table[IDT_ENTRIES] __page_aligned_bss;
>+/*
>+ * Must be page-aligned because the real IDT is used in the cpu entry
>area.
>+ * Allocate more entries than needed so that idt_table takes a whole
>page, so it
>+ * is safe to map the idt_table read-only and into the user-space
>page-table.
>+ */
>+#define IDT_ENTRIES_ALLOCATED (PAGE_SIZE / sizeof(gate_desc))
>+static gate_desc idt_table[IDT_ENTRIES_ALLOCATED] __page_aligned_bss;
> 
> struct desc_ptr idt_descr __ro_after_init = {
>   .size   = IDT_TABLE_SIZE - 1,
>@@ -335,6 +340,9 @@ void __init idt_setup_apic_and_irq_gates(void)
>   idt_map_in_cea();
>   load_idt(_descr);
> 
>+  BUILD_BUG_ON(IDT_ENTRIES_ALLOCATED < IDT_ENTRIES);
>+  BUILD_BUG_ON(sizeof(idt_table) != PAGE_SIZE);
>+
>   /* Make the IDT table read only */
>   set_memory_ro((unsigned long)_table, 1);
> 

NAK... this isn't the right way to fix this and just really kicks the can down 
the road. The reason is that you aren't fixing the module that actually has a 
problem.

The Right Way[TM] is to figure out which module(s) lack the proper alignment 
for this section. A script using objdump -h or readelf -SW running over the .o 
files looking for alignment less than 2**12 should spot the modules that are 
missing the proper .balign directives.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 3/4] x86: Expose SERIALIZE for supported cpuid

2020-07-14 Thread hpa

On July 14, 2020 5:03:31 PM PDT, "Zhang, Cathy"  wrote:
>On 7/15/2020 7:05 AM, h...@zytor.com wrote:
>> On July 14, 2020 3:42:08 PM PDT, "Zhang, Cathy"
> wrote:
>>> On 7/14/2020 11:00 AM, Sean Christopherson wrote:
 On Tue, Jul 07, 2020 at 10:16:22AM +0800, Cathy Zhang wrote:
> SERIALIZE instruction is supported by intel processors,
> like Sapphire Rapids. Expose it in KVM supported cpuid.
 Providing at least a rough overview of the instruction, e.g. its
>>> enumeration,
 usage, fault rules, controls, etc... would be nice.  In isolation,
>>> the
 changelog isn't remotely helpful in understanding the correctness
>of
>>> the
 patch.
>>> Thanks Sean! Add it in the next version.
> Signed-off-by: Cathy Zhang 
> ---
>arch/x86/kvm/cpuid.c | 3 ++-
>1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 8a294f9..e603aeb 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -341,7 +341,8 @@ void kvm_set_cpu_caps(void)
>   kvm_cpu_cap_mask(CPUID_7_EDX,
>   F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>   F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | 
> F(INTEL_STIBP) |
> - F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM)
> + F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
> + F(SERIALIZE)
>   );
>
>   /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software.
>*/
> -- 
> 1.8.3.1
>
>> At least that one is easy: SERIALIZE is architecturally a NOP, but
>with hard serialization, like CPUID or IRET.
>SERIALIZE does not modify registers, arithmetic flags or memory, which 
>is different with CPUID.

That's what I meant with it being an architectural NOP.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 3/4] x86: Expose SERIALIZE for supported cpuid

2020-07-14 Thread hpa

On July 14, 2020 3:42:08 PM PDT, "Zhang, Cathy"  wrote:
>On 7/14/2020 11:00 AM, Sean Christopherson wrote:
>> On Tue, Jul 07, 2020 at 10:16:22AM +0800, Cathy Zhang wrote:
>>> SERIALIZE instruction is supported by intel processors,
>>> like Sapphire Rapids. Expose it in KVM supported cpuid.
>> Providing at least a rough overview of the instruction, e.g. its
>enumeration,
>> usage, fault rules, controls, etc... would be nice.  In isolation,
>the
>> changelog isn't remotely helpful in understanding the correctness of
>the
>> patch.
>Thanks Sean! Add it in the next version.
>>
>>> Signed-off-by: Cathy Zhang 
>>> ---
>>>   arch/x86/kvm/cpuid.c | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
>>> index 8a294f9..e603aeb 100644
>>> --- a/arch/x86/kvm/cpuid.c
>>> +++ b/arch/x86/kvm/cpuid.c
>>> @@ -341,7 +341,8 @@ void kvm_set_cpu_caps(void)
>>> kvm_cpu_cap_mask(CPUID_7_EDX,
>>> F(AVX512_4VNNIW) | F(AVX512_4FMAPS) | F(SPEC_CTRL) |
>>> F(SPEC_CTRL_SSBD) | F(ARCH_CAPABILITIES) | F(INTEL_STIBP) |
>>> -   F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM)
>>> +   F(MD_CLEAR) | F(AVX512_VP2INTERSECT) | F(FSRM) |
>>> +   F(SERIALIZE)
>>> );
>>>   
>>> /* TSC_ADJUST and ARCH_CAPABILITIES are emulated in software. */
>>> -- 
>>> 1.8.3.1
>>>

At least that one is easy: SERIALIZE is architecturally a NOP, but with hard 
serialization, like CPUID or IRET.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] decompress_bunzip2: fix sizeof type in start_bunzip

2020-07-13 Thread hpa

On July 13, 2020 12:27:02 PM PDT, Tom Rix  wrote:
>
>On 7/12/20 3:21 PM, h...@zytor.com wrote:
>> On July 12, 2020 8:12:43 AM PDT, Tom Rix  wrote:
>>> On 7/12/20 6:09 AM, H. Peter Anvin wrote:
>>>> On 2020-07-12 05:59, t...@redhat.com wrote:
>>>>> From: Tom Rix 
>>>>>
>>>>> clang static analysis flags this error
>>>>>
>>>>> lib/decompress_bunzip2.c:671:13: warning: Result of 'malloc' is
>>> converted
>>>>>   to a pointer of type 'unsigned int', which is incompatible with
>>> sizeof
>>>>>   operand type 'int' [unix.MallocSizeof]
>>>>> bd->dbuf = large_malloc(bd->dbufSize * sizeof(int));
>>>>>^~~~
>>>>>
>>>>> Reviewing the bunzip_data structure, the element dbuf is type
>>>>>
>>>>>   /* Intermediate buffer and its size (in bytes) */
>>>>>   unsigned int *dbuf, dbufSize;
>>>>>
>>>>> So change the type in sizeof to 'unsigned int'
>>>>>
>>>> You must be kidding.
>>>>
>>>> If you want to change it, change it to sizeof(bd->dbuf) instead,
>but
>>> this flag
>>>> is at least in my opinion a total joke. For sizeof(int) !=
>>> sizeof(unsigned
>>>> int) is beyond bizarre, no matter how stupid the platform.
>>> Using the actual type is more correct that using a type of the same
>>> size.
>>>
>>> trix
>>>
>>>>-hpa
>>>>
>> "More correct?" All it is is more verbose.
>>
>> Using the sizeof of the actual object at least adds some actual
>safety.
>
>Sorry, I am being pedantic, I mean anything that produces the correct
>assembly is correct. But there are different path to being correct. 
>The path I was suggesting to follow the type of the element/final
>pointer when allocating an memory.
>
>large_malloc(bd->dbufSize * sizeof(*bd->dbuf)) would also work
>
>I will respin.
>
>trix

This isn't Linux style, but in the NASM source I have been migrating to macros:

nasm_new(ptr);
nasm_newn(ptr,n);

... using sizeof() in exactly this manner.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] decompress_bunzip2: fix sizeof type in start_bunzip

2020-07-12 Thread hpa

On July 12, 2020 8:12:43 AM PDT, Tom Rix  wrote:
>
>On 7/12/20 6:09 AM, H. Peter Anvin wrote:
>> On 2020-07-12 05:59, t...@redhat.com wrote:
>>> From: Tom Rix 
>>>
>>> clang static analysis flags this error
>>>
>>> lib/decompress_bunzip2.c:671:13: warning: Result of 'malloc' is
>converted
>>>   to a pointer of type 'unsigned int', which is incompatible with
>sizeof
>>>   operand type 'int' [unix.MallocSizeof]
>>> bd->dbuf = large_malloc(bd->dbufSize * sizeof(int));
>>>^~~~
>>>
>>> Reviewing the bunzip_data structure, the element dbuf is type
>>>
>>> /* Intermediate buffer and its size (in bytes) */
>>> unsigned int *dbuf, dbufSize;
>>>
>>> So change the type in sizeof to 'unsigned int'
>>>
>> You must be kidding.
>>
>> If you want to change it, change it to sizeof(bd->dbuf) instead, but
>this flag
>> is at least in my opinion a total joke. For sizeof(int) !=
>sizeof(unsigned
>> int) is beyond bizarre, no matter how stupid the platform.
>
>Using the actual type is more correct that using a type of the same
>size.
>
>trix
>
>>  -hpa
>>

"More correct?" All it is is more verbose.

Using the sizeof of the actual object at least adds some actual safety.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v5] x86/umip: Add emulation/spoofing for SLDT and STR instructions

2020-07-11 Thread hpa

On July 10, 2020 3:45:25 PM PDT, Brendan Shanks  wrote:
>Add emulation/spoofing of SLDT and STR for both 32- and 64-bit
>processes.
>
>Wine users have found a small number of Windows apps using SLDT that
>were crashing when run on UMIP-enabled systems.
>
>Reported-by: Andreas Rammhold 
>Originally-by: Ricardo Neri 
>Signed-off-by: Brendan Shanks 
>---
>
>v5: Capitalize instruction names in comments.
>
> arch/x86/kernel/umip.c | 40 +++-
> 1 file changed, 27 insertions(+), 13 deletions(-)
>
>diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
>index 8d5cbe1bbb3b..2c304fd0bb1a 100644
>--- a/arch/x86/kernel/umip.c
>+++ b/arch/x86/kernel/umip.c
>@@ -45,11 +45,12 @@
>* value that, lies close to the top of the kernel memory. The limit for
>the GDT
>  * and the IDT are set to zero.
>  *
>- * Given that SLDT and STR are not commonly used in programs that run
>on WineHQ
>- * or DOSEMU2, they are not emulated.
>- *
>- * The instruction smsw is emulated to return the value that the
>register CR0
>+ * The instruction SMSW is emulated to return the value that the
>register CR0
>  * has at boot time as set in the head_32.
>+ * SLDT and STR are emulated to return the values that the kernel
>programmatically
>+ * assigns:
>+ * - SLDT returns (GDT_ENTRY_LDT * 8) if an LDT has been set, 0 if
>not.
>+ * - STR returns (GDT_ENTRY_TSS * 8).
>  *
>  * Emulation is provided for both 32-bit and 64-bit processes.
>  *
>@@ -244,16 +245,34 @@ static int emulate_umip_insn(struct insn *insn,
>int umip_inst,
>   *data_size += UMIP_GDT_IDT_LIMIT_SIZE;
>   memcpy(data, _limit, UMIP_GDT_IDT_LIMIT_SIZE);
> 
>-  } else if (umip_inst == UMIP_INST_SMSW) {
>-  unsigned long dummy_value = CR0_STATE;
>+  } else if (umip_inst == UMIP_INST_SMSW || umip_inst == UMIP_INST_SLDT
>||
>+ umip_inst == UMIP_INST_STR) {
>+  unsigned long dummy_value;
>+
>+  if (umip_inst == UMIP_INST_SMSW) {
>+  dummy_value = CR0_STATE;
>+  } else if (umip_inst == UMIP_INST_STR) {
>+  dummy_value = GDT_ENTRY_TSS * 8;
>+  } else if (umip_inst == UMIP_INST_SLDT) {
>+#ifdef CONFIG_MODIFY_LDT_SYSCALL
>+  down_read(>mm->context.ldt_usr_sem);
>+  if (current->mm->context.ldt)
>+  dummy_value = GDT_ENTRY_LDT * 8;
>+  else
>+  dummy_value = 0;
>+  up_read(>mm->context.ldt_usr_sem);
>+#else
>+  dummy_value = 0;
>+#endif
>+  }
> 
>   /*
>-   * Even though the CR0 register has 4 bytes, the number
>+   * For these 3 instructions, the number
>* of bytes to be copied in the result buffer is determined
>* by whether the operand is a register or a memory location.
>* If operand is a register, return as many bytes as the operand
>* size. If operand is memory, return only the two least
>-   * siginificant bytes of CR0.
>+   * siginificant bytes.
>*/
>   if (X86_MODRM_MOD(insn->modrm.value) == 3)
>   *data_size = insn->opnd_bytes;
>@@ -261,7 +280,6 @@ static int emulate_umip_insn(struct insn *insn, int
>umip_inst,
>   *data_size = 2;
> 
>   memcpy(data, _value, *data_size);
>-  /* STR and SLDT  are not emulated */
>   } else {
>   return -EINVAL;
>   }
>@@ -383,10 +401,6 @@ bool fixup_umip_exception(struct pt_regs *regs)
>   umip_pr_warn(regs, "%s instruction cannot be used by applications.\n",
>   umip_insns[umip_inst]);
> 
>-  /* Do not emulate (spoof) SLDT or STR. */
>-  if (umip_inst == UMIP_INST_STR || umip_inst == UMIP_INST_SLDT)
>-  return false;
>-
>   umip_pr_warn(regs, "For now, expensive software emulation returns the
>result.\n");
> 
>   if (emulate_umip_insn(, umip_inst, dummy_data, _data_size,

It's there any reason for SLDT to not *always* return a fixed value? "An LDT 
has been assigned" is formally a kernel internal property, separate from the 
property of whenever there are user space enteies in the LDT.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: decruft the early init / initrd / initramfs code v2

2020-07-09 Thread hpa

On July 9, 2020 8:17:57 AM PDT, Christoph Hellwig  wrote:
>Hi all,
>
>this series starts to move the early init code away from requiring
>KERNEL_DS to be implicitly set during early startup.  It does so by
>first removing legacy unused cruft, and the switches away the code
>from struct file based APIs to our more usual in-kernel APIs.
>
>There is no really good tree for this, so if there are no objections
>I'd like to set up a new one for linux-next.
>
>
>Git tree:
>
>git://git.infradead.org/users/hch/misc.git init-user-pointers
>
>Gitweb:
>
>http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/init-user-pointers
>
>
>Changes since v1:
> - add a patch to deprecated "classic" initrd support
>
>Diffstat:
> b/arch/arm/kernel/atags_parse.c |2 
> b/arch/sh/kernel/setup.c|2 
> b/arch/sparc/kernel/setup_32.c  |2 
> b/arch/sparc/kernel/setup_64.c  |2 
> b/arch/x86/kernel/setup.c   |2 
> b/drivers/md/Makefile   |3 
>b/drivers/md/md-autodetect.c|  239
>++--
> b/drivers/md/md.c   |   34 +
> b/drivers/md/md.h   |   10 +
> b/fs/file.c |7 -
> b/fs/open.c |   18 +--
> b/fs/read_write.c   |2 
> b/fs/readdir.c  |   11 -
> b/include/linux/initrd.h|6 -
> b/include/linux/raid/detect.h   |8 +
> b/include/linux/syscalls.h  |   16 --
> b/init/Makefile |1 
> b/init/do_mounts.c  |   70 +--
> b/init/do_mounts.h  |   21 ---
> b/init/do_mounts_initrd.c   |   13 --
> b/init/do_mounts_rd.c   |  102 +++--
> b/init/initramfs.c  |  103 +
> b/init/main.c   |   16 +-
> include/linux/raid/md_u.h   |   13 --
> 24 files changed, 251 insertions(+), 452 deletions(-)

I guess I could say something here... ;)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 09/16] initrd: remove the BLKFLSBUF call in handle_initrd

2020-07-03 Thread hpa

On July 3, 2020 5:18:48 PM PDT, antlists  wrote:
>On 03/07/2020 04:40, H. Peter Anvin wrote:
>> On 2020-06-15 05:53, Christoph Hellwig wrote:
>>> BLKFLSBUF used to be overloaded for the ramdisk driver to free the
>whole
>>> ramdisk, which was completely different behavior compared to all
>other
>>> drivers.  But this magic overload got removed in commit ff26956875c2
>>> ("brd: remove support for BLKFLSBUF"), so this call is entirely
>>> pointless now.
>>>
>>> Signed-off-by: Christoph Hellwig 
>> 
>> Does *anyone* use initrd as opposed to initramfs anymore? It would
>seem
>> like a good candidate for deprecation/removal.
>> 
>Reading the gentoo mailing list, it seems there's a fair few people who
>
>don't use initramfs. I get the impression they don't use initrd either,
>
>though.
>
>I don't know too much about booting without an initramfs - I switched 
>ages ago - so what is possible and what they're actually doing, I don't
>
>know.
>
>Cheers,
>Wol

Not using any init userspace at all is an entirely different issue.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/2] x86/x32: Use __x64 prefix for X32 compat syscalls

2020-06-23 Thread hpa

On June 16, 2020 10:17:29 AM PDT, Brian Gerst  wrote:
>On Tue, Jun 16, 2020 at 12:49 PM Andy Lutomirski 
>wrote:
>>
>> On Tue, Jun 16, 2020 at 7:23 AM Brian Gerst 
>wrote:
>> >
>> > The ABI prefix for syscalls specifies the argument register
>mapping, so
>> > there is no specific reason to continue using the __x32 prefix for
>the
>> > compat syscalls.  This change will allow using native syscalls in
>the X32
>> > specific portion of the syscall table.
>>
>> Okay, I realize that the x86 syscall machinery is held together by
>> duct tape and a lot of luck, but:
>>
>> >
>> > Signed-off-by: Brian Gerst 
>> > ---
>> >  arch/x86/entry/syscall_x32.c   |  8 +++-
>> >  arch/x86/include/asm/syscall_wrapper.h | 10 +-
>> >  2 files changed, 8 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/arch/x86/entry/syscall_x32.c
>b/arch/x86/entry/syscall_x32.c
>> > index 3d8d70d3896c..f993e6254043 100644
>> > --- a/arch/x86/entry/syscall_x32.c
>> > +++ b/arch/x86/entry/syscall_x32.c
>> > @@ -9,15 +9,13 @@
>> >  #include 
>> >
>> >  #define __SYSCALL_64(nr, sym)
>> > +#define __SYSCALL_COMMON(nr, sym) __SYSCALL_X32(nr, sym)
>> >
>> > -#define __SYSCALL_X32(nr, sym) extern long __x32_##sym(const
>struct pt_regs *);
>> > -#define __SYSCALL_COMMON(nr, sym) extern long __x64_##sym(const
>struct pt_regs *);
>> > +#define __SYSCALL_X32(nr, sym) extern long __x64_##sym(const
>struct pt_regs *);
>> >  #include 
>> >  #undef __SYSCALL_X32
>> > -#undef __SYSCALL_COMMON
>> >
>> > -#define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
>> > -#define __SYSCALL_COMMON(nr, sym) [nr] = __x64_##sym,
>> > +#define __SYSCALL_X32(nr, sym) [nr] = __x64_##sym,
>> >
>> >  asmlinkage const sys_call_ptr_t
>x32_sys_call_table[__NR_x32_syscall_max+1] = {
>> > /*
>> > diff --git a/arch/x86/include/asm/syscall_wrapper.h
>b/arch/x86/include/asm/syscall_wrapper.h
>> > index a84333adeef2..267fae9904ff 100644
>> > --- a/arch/x86/include/asm/syscall_wrapper.h
>> > +++ b/arch/x86/include/asm/syscall_wrapper.h
>> > @@ -17,7 +17,7 @@ extern long __ia32_sys_ni_syscall(const struct
>pt_regs *regs);
>> >   * __x64_sys_*() - 64-bit native syscall
>> >   * __ia32_sys_*()- 32-bit native syscall or common compat
>syscall
>> >   * __ia32_compat_sys_*() - 32-bit compat syscall
>>
>> On a 64-bit kernel, an "ia32" compat syscall is __ia32_compat_sys_*,
>but...
>>
>> > - * __x32_compat_sys_*()  - 64-bit X32 compat syscall
>> > + * __x64_compat_sys_*()  - 64-bit X32 compat syscall
>>
>> Now an x32 compat syscall is __x64_compat?  This seems nonsensical.
>
>Again, think of it as how the registers are mapped, not which syscall
>table it belongs to.  X32 and X64 are identical in that regard.
>
>> I'm also a bit confused as to how this is even necessary for your
>> other patch.
>
>This came out of discussion on Cristoph's patch to combine compat
>execve*() into the native version:
>https://lore.kernel.org/lkml/20200615141239.ga12...@lst.de/
>
>The bottom line is that marking a syscall as X32-only in the syscall
>table forces an __x32 prefix even if it's not a "compat" syscall.
>This causes a link failure.  This is just another quirk caused by how
>X32 was designed.  The solution is to make the prefix consistent for
>the whole table.  The other alternative is to use __x32 for all the
>common syscalls.
>
>The second patch isn't really necessary, but it makes more sense to
>not have a compat syscall with no corresponding native version.
>
>--
>Brian Gerst

Please don't use "x64" to mean anything other than x86-64, as some, ahem, other 
OSes use those as synonyms.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] initrd: Remove erroneous comment

2020-06-22 Thread hpa

On June 19, 2020 5:03:33 PM PDT, ron minnich  wrote:
>It seems fine to me, but I did not initially object to the use of that
>name anyway. hpa, what do you think?
>
>On Fri, Jun 19, 2020 at 7:31 AM Tom Rini  wrote:
>>
>> Most architectures have been passing the location of an initrd via
>the
>> initrd= option since their inception.  Remove the comment as it's
>both
>> wrong and unrelated to the commit that introduced it.
>>
>> Fixes: 694cfd87b0c8 ("x86/setup: Add an initrdmem= option to specify
>initrd physical address")
>> Cc: Andrew Morton 
>> Cc: Borislav Petkov 
>> Cc: Dominik Brodowski 
>> Cc: H. Peter Anvin (Intel) 
>> Cc: Ronald G. Minnich 
>> Signed-off-by: Tom Rini 
>> ---
>> For a bit more context, I assume there's been some confusion between
>> "initrd" being a keyword in things like extlinux.conf and also that
>for
>> quite a long time now initrd information is passed via device tree
>and
>> not the command line on relevant architectures.  But it's still true
>> that it's been a valid command line option to the kernel since the
>90s.
>> It's just the case that in 2018 the code was consolidated from under
>> arch/ and in to this file.
>> ---
>>  init/do_mounts_initrd.c | 5 -
>>  1 file changed, 5 deletions(-)
>>
>> diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c
>> index d72beda824aa..53314d7da4be 100644
>> --- a/init/do_mounts_initrd.c
>> +++ b/init/do_mounts_initrd.c
>> @@ -45,11 +45,6 @@ static int __init early_initrdmem(char *p)
>>  }
>>  early_param("initrdmem", early_initrdmem);
>>
>> -/*
>> - * This is here as the initrd keyword has been in use since 11/2018
>> - * on ARM, PowerPC, and MIPS.
>> - * It should not be; it is reserved for bootloaders.
>> - */
>>  static int __init early_initrd(char *p)
>>  {
>> return early_initrdmem(p);
>> --
>> 2.17.1
>>

Well, I observe that it was documented as reserved for bootloaders since the 
mid-90s at least.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH RFC] seccomp: Implement syscall isolation based on memory areas

2020-06-01 Thread hpa

On June 1, 2020 6:59:26 AM PDT, Andy Lutomirski  wrote:
>
>
>> On Jun 1, 2020, at 2:23 AM, Billy Laws  wrote:
>> 
>> 
>>> 
>>> On May 30, 2020, at 5:26 PM, Gabriel Krisman Bertazi
> wrote:
>>> 
>>> Andy Lutomirski  writes:
>>> 
>>> On May 29, 2020, at 11:00 PM, Gabriel Krisman Bertazi
> wrote:
>> 
>> Modern Windows applications are executing system call
>instructions
>> directly from the application's code without going through the
>WinAPI.
>> This breaks Wine emulation, because it doesn't have a chance to
>> intercept and emulate these syscalls before they are submitted to
>Linux.
>> 
>> In addition, we cannot simply trap every system call of the
>application
>> to userspace using PTRACE_SYSEMU, because performance would
>suffer,
>> since our main use case is to run Windows games over Linux. 
>Therefore,
>> we need some in-kernel filtering to decide whether the syscall
>was
>> issued by the wine code or by the windows application.
 
 Do you really need in-kernel filtering?  What if you could have
 efficient userspace filtering instead?  That is, set something up
>so
 that all syscalls, except those from a special address, are
>translated
 to CALL thunk where the thunk is configured per task.  Then the
>thunk
 can do whatever emulation is needed.
>>> 
>>> Hi,
>>> 
>>> I suggested something similar to my customer, by using
>>> libsyscall-intercept.  The idea would be overwritting the syscall
>>> instruction with a call to the entry point.  I'm not a specialist on
>the
>>> specifics of Windows games, (cc'ed Paul Gofman, who can provide more
>>> details on that side), but as far as I understand, the reason why
>that
>>> is not feasible is that the anti-cheat protection in games will
>abort
>>> execution if the binary region was modified either on-disk or
>in-memory.
>>> 
>>> Is there some mechanism to do that without modiyfing the
>application?
>> 
>> Hi,
>> 
>> I work on an emulator for the Nintendo Switch that uses a similar
>technique,
>> in our testing it works very well and is much more performant than
>even
>> PTRACE_SYSEMU.
>> 
>> To work around DRM reading the memory contents I think mprotect could
>> be used, after patching the syscall a copy of the original code could
>be
>> kept somewhere in memory and the patched region mapped --X.
>> With this, any time the DRM attempts to read to the patched region
>and
>> perform integrity checks it will cause a segfault and a branch to the
>> signal handler. This handler can then return the contents of the
>original,
>> unpatched region to satisfy them checks.
>> 
>> Are memory contents checked by DRM solutions too often for this to be
>> performant?
>
>A bigger issue is that hardware support for —X is quite spotty. There
>is no x86 CPU that can do it cleanly in a bare metal setup, and client
>CPUs that can do it at all without hypervisor help may be nonexistent.
>I don’t know if the ARM situation is much better.
>
>> --
>> Billy Laws
>>> 
 Getting the details and especially the interaction with any seccomp
 filters that may be installed right could be tricky, but the
>performance
 should be decent, at least on non-PTI systems.
 
 (If we go this route, I suspect that the correct interaction with
 seccomp is that this type of redirection takes precedence over
>seccomp
 and seccomp filters are not invoked for redirected syscalls. After
>all,
 a redirected syscall is, functionally, not a syscall at all.)
 
>>> 
>>> 
>>> --
>>> Gabriel Krisman Bertazi

Running these things in a minimal VM container would allow this kind of 
filtering/trapping to be done in the VMM, too. I don't know how many layers 
deep you invoke native Linux libraries, and so if the option would exist to use 
out-of-range system call numbers for the Linux system numbers?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Re: [PATCH v12 00/18] Enable FSGSBASE instructions

2020-05-24 Thread hpa

On May 24, 2020 2:19:45 PM PDT, Sasha Levin  wrote:
>On Sun, May 24, 2020 at 12:45:18PM -0700, h...@zytor.com wrote:
>>There are legitimate reasons to write a root-hole module, the main one
>being able to test security features like SMAP. I have requested before
>a TAINT flag specifically for this purpose, because TAINT_CRAP is
>nowhere near explicit enough, and is also used for staging drivers.
>Call it TAINT_TOXIC or TAINT_ROOTHOLE; it should always be accompanied
>with a CRIT level alert.
>
>What I don't like about our current system of TAINT_* flags is that
>while we can improve it as much as we want, no one outside of the
>kernel
>tree seems to be using it. While Thomas may have been commenting on
>Graphene's behaviour, look at any other code that did the same thing:
>
>- Graphene:
>https://github.com/oscarlab/graphene-sgx-driver/blob/master/gsgx.c
>- Occlum:
>https://github.com/occlum/enable_rdfsbase/blob/master/enable_rdfsbase.c
>- SGX-LKL:
>https://github.com/lsds/sgx-lkl/blob/master/tools/kmod-set-fsgsbase/mod_set_cr4_fsgsbase.c
>
>None of which set even the CRAP flag.

That's a separate problem, but I would personally want to have it for my own 
test modules in case one ever escapes.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Re: [PATCH v12 00/18] Enable FSGSBASE instructions

2020-05-24 Thread hpa

On May 22, 2020 5:45:39 PM PDT, Thomas Gleixner  wrote:
>Don,
>
>Don Porter  writes:
>> On 5/19/20 12:48 PM, Jarkko Sakkinen wrote:
>>> On Tue, May 19, 2020 at 01:03:25AM +0200, Thomas Gleixner wrote:

 That justifies to write books which recommend to load a kernel
>module
 which creates a full unpriviledged root hole. I bet none of these
>papers
 ever mentioned that.
>>
>> I wanted to clarify that we never intended the Graphene kernel module
>
>> you mention for production use, as well as to comment in support of
>this 
>> patch.
>
>let me clarify, that despite your intentions:
>
>   - there is not a single word in any paper, slide deck, documentation
> etc. which mentions that loading this module and enabling FSGSBASE
>  behind the kernels back is a fully unpriviledged root hole.
>
>- the module lacks a big fat warning emitted to dmesg, that this
>  turns the host kernel into a complete security disaster.
>
>- the module fails to set the TAINT_CRAP flag when initialized.
>
>This shows a pretty obvious discrepancy between intention and action.
>
>> Setting the fs register in userspace is an essential feature for
>running 
>> legacy code in SGX.  We have been following LKML discussions on this 
>> instruction for years, and hoping this feature would be supported by 
>> Linux, so that we can retire this module.
>
>The way to get things done in the kernel is to actively work on the
>problem. Hoping that someone else will fix that for you is naive at
>best. Wilful ignorance might be a less polite but nevertheless accurate
>term.
>
>> To our knowledge, every SGX library OS has a similar module, waiting
>> for this or a similar patch to be merged into Linux.  This indicates
>a
>> growing user base that needs this instruction.
>
>I'm failing to understand that a whole industry which is so confident
>about their ultimate solution to the security problem puts possible
>users and customers into the situation to decide between:
>
> 1) Secure host kernel (with known limitations)
>
> 2) SGX enclaves
>
>I would not mind if this would be a choice between fire and frying pan,
>but this is a choice between a well understood reality and a very
>dangerous illusion.
>
>> Nonetheless, Graphene is moving towards adoption in production
>systems, 
>> and we are actively working to make the code base secure and robust. 
>> This issue has been on our to-do list before a production release. 
>It 
>> would certainly make our lives easier to deprecate our module and
>just 
>> use a robust, in-kernel implementation.
>
>Would make your life easier?
>
>Having proper in kernel FSGSBASE support is the only solution to that
>problem and this has been true since the whole SGX frenzy started.
>Intel
>failed to upstream FSGSBASE since 2015 (sic!). See
>
>https://lore.kernel.org/lkml/alpine.deb.2.21.1903261010380.1...@nanos.tec.linutronix.de/
>
>for a detailed time line. And that mail is more than a year old.
>
>Since then there happened even more trainwrecks including the revert of
>already queued patches a few days before the 5.3 merge window opened.
>
>After that we saw yet more broken variants of that patch set including
>the fail to provide information which is required to re-merge that.
>
>Instead of providing that information the next version re-introduced
>the
>wreckage which was carefully sorted out during earlier review cycles up
>to the revert.
>
>So you (and everybody else who has interrest in SGX) just sat there,
>watched and hoped that this will solve itself magically. And with that
>"hope" argument you really want to make me believe that all of this was
>against your intentions?
>
>It's beyond hillarious that the renewed attempt to get FSGSBASE support
>merged does not come from the company which has the main interest to
>get
>this solved, i.e Intel.
>
>Based on your argumentation that all of this is uninteded, I assume
>that
>the pull request on github which removes this security hole from
>graphene:
>
>https://github.com/oscarlab/graphene/pull/1529
>
>is perfectly fine, right?
>
>Quite the contrary, it's completely usesless and at the same time
>perfectly fitting into this picture:
>
>  The changelog is SGX marketing compliant: Zero technical content. Not
>  a single word about the real implications of that blantant violation
>  of any principle of sane (security) engineering.
>
>Not that I'm surprised about this. That change originates from Intel
>and
>the poor sod who had to place the pull request - coincidentally a few
>days after this insanity became public - was not allowed to spell out
>the real reasons why this removal is necessary.
>
>Please read security relevant changelogs in the kernel git tree and
>then
>explain to me the utter void in this one.
>
>Looking at the advertising which all involved parties including the
>Confidential Computing Consortium are conducting, plus the fact that
>Intel has major investments in SGX supporting companies and projects,
>this is one of

RE: [PATCH] x86: bitops: fix build regression

2020-05-10 Thread hpa

On May 10, 2020 4:59:17 AM PDT, David Laight  wrote:
>From: Peter Anvin
>> Sent: 08 May 2020 18:32
>> On 2020-05-08 10:21, Nick Desaulniers wrote:
>> >>
>> >> One last suggestion.  Add the "b" modifier to the mask operand:
>"orb
>> >> %b1, %0".  That forces the compiler to use the 8-bit register name
>> >> instead of trying to deduce the width from the input.
>> >
>> > Ah right:
>https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#x86Operandmodifiers
>> >
>> > Looks like that works for both compilers.  In that case, we can
>likely
>> > drop the `& 0xff`, too.  Let me play with that, then I'll hopefully
>> > send a v3 today.
>> >
>> 
>> Good idea. I requested a while ago that they document these
>modifiers; they
>> chose not to document them all which in some ways is good; it shows
>what they
>> are willing to commit to indefinitely.
>
>I thought the intention here was to explicitly do a byte access.
>If the constant bit number has had a div/mod by 8 done on it then
>the address can be misaligned - so you mustn't do a non-byte sized
>locked access.
>
>OTOH the original base address must be aligned.
>
>Looking at some instruction timing, BTS/BTR aren't too bad if the
>bit number is a constant. But are 6 or 7 clocks slower if it is in %cl.
>Given these are locked RMW bus cycles they'll always be slow!
>
>How about an asm multi-part alternative that uses a byte offset
>and byte constant if the compiler thinks the mask is constant
>or a 4-byte offset and 32bit mask if it doesn't.
>
>The other alternative is to just use BTS/BTS and (maybe) rely on the
>assembler to add in the word offset to the base address.
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

I don't understand what you are getting at here.

The intent is to do a byte access. The "multi-part asm" you are talking about 
is also already there...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: Use RDRAND and RDSEED mnemonics in archrandom.h

2020-05-08 Thread hpa

On May 8, 2020 3:58:17 AM PDT, Uros Bizjak  wrote:
>Current minimum required version of binutils is 2.23,
>which supports RDRAND and RDSEED instruction mnemonics.
>
>Replace the byte-wise specification of RDRAND and
>RDSEED with these proper mnemonics.
>
>Signed-off-by: Uros Bizjak 
>CC: "H. Peter Anvin" 
>CC: Ingo Molnar 
>CC: Thomas Gleixner 
>---
> arch/x86/include/asm/archrandom.h | 26 --
> 1 file changed, 8 insertions(+), 18 deletions(-)
>
>diff --git a/arch/x86/include/asm/archrandom.h
>b/arch/x86/include/asm/archrandom.h
>index 7a4bb1bd4bdb..ebc248e49549 100644
>--- a/arch/x86/include/asm/archrandom.h
>+++ b/arch/x86/include/asm/archrandom.h
>@@ -15,16 +15,6 @@
> 
> #define RDRAND_RETRY_LOOPS10
> 
>-#define RDRAND_INT".byte 0x0f,0xc7,0xf0"
>-#define RDSEED_INT".byte 0x0f,0xc7,0xf8"
>-#ifdef CONFIG_X86_64
>-# define RDRAND_LONG  ".byte 0x48,0x0f,0xc7,0xf0"
>-# define RDSEED_LONG  ".byte 0x48,0x0f,0xc7,0xf8"
>-#else
>-# define RDRAND_LONG  RDRAND_INT
>-# define RDSEED_LONG  RDSEED_INT
>-#endif
>-
> /* Unconditional execution of RDRAND and RDSEED */
> 
> static inline bool __must_check rdrand_long(unsigned long *v)
>@@ -32,9 +22,9 @@ static inline bool __must_check rdrand_long(unsigned
>long *v)
>   bool ok;
>   unsigned int retry = RDRAND_RETRY_LOOPS;
>   do {
>-  asm volatile(RDRAND_LONG
>+  asm volatile("rdrand %[out]"
>CC_SET(c)
>-   : CC_OUT(c) (ok), "=a" (*v));
>+   : CC_OUT(c) (ok), [out] "=r" (*v));
>   if (ok)
>   return true;
>   } while (--retry);
>@@ -46,9 +36,9 @@ static inline bool __must_check rdrand_int(unsigned
>int *v)
>   bool ok;
>   unsigned int retry = RDRAND_RETRY_LOOPS;
>   do {
>-  asm volatile(RDRAND_INT
>+  asm volatile("rdrand %[out]"
>CC_SET(c)
>-   : CC_OUT(c) (ok), "=a" (*v));
>+   : CC_OUT(c) (ok), [out] "=r" (*v));
>   if (ok)
>   return true;
>   } while (--retry);
>@@ -58,18 +48,18 @@ static inline bool __must_check rdrand_int(unsigned
>int *v)
> static inline bool __must_check rdseed_long(unsigned long *v)
> {
>   bool ok;
>-  asm volatile(RDSEED_LONG
>+  asm volatile("rdseed %[out]"
>CC_SET(c)
>-   : CC_OUT(c) (ok), "=a" (*v));
>+   : CC_OUT(c) (ok), [out] "=r" (*v));
>   return ok;
> }
> 
> static inline bool __must_check rdseed_int(unsigned int *v)
> {
>   bool ok;
>-  asm volatile(RDSEED_INT
>+  asm volatile("rdseed %[out]"
>CC_SET(c)
>-   : CC_OUT(c) (ok), "=a" (*v));
>+   : CC_OUT(c) (ok), [out] "=r" (*v));
>   return ok;
> }
> 

Reviewed-by: H. Peter Anvin (Intel) 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 7, 2020 8:09:35 AM PDT, David Laight  wrote:
>From: Brian Gerst
>> Sent: 07 May 2020 14:32
>...
>> I think the bug this worked around was that the compiler didn't
>detect
>> that CONST_MASK(nr) was also constant and doesn't need to be put into
>> a register.  The question is does that bug still exist on compiler
>> versions we care about?
>
>Hmmm...
>That ought to have been fixed instead of worrying about the fact
>that an invalid register was used.
>
>Alternatively is there any reason not to use the bts/btc instructions?
>Yes, I know they'll do wider accesses, but variable bit numbers do.
>It is also possible that the assembler will support constant bit
>numbers >= 32 by adding to the address offset.
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

They're slower, and for unaligned locked fields can be severely so.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 7, 2020 6:32:24 AM PDT, Brian Gerst  wrote:
>On Thu, May 7, 2020 at 3:02 AM  wrote:
>>
>> On May 6, 2020 11:18:09 PM PDT, Brian Gerst 
>wrote:
>> >On Tue, May 5, 2020 at 1:47 PM Nick Desaulniers
>> > wrote:
>> >>
>> >> From: Sedat Dilek 
>> >>
>> >> It turns out that if your config tickles __builtin_constant_p via
>> >> differences in choices to inline or not, this now produces invalid
>> >> assembly:
>> >>
>> >> $ cat foo.c
>> >> long a(long b, long c) {
>> >>   asm("orb\t%1, %0" : "+q"(c): "r"(b));
>> >>   return c;
>> >> }
>> >> $ gcc foo.c
>> >> foo.c: Assembler messages:
>> >> foo.c:2: Error: `%rax' not allowed with `orb'
>> >>
>> >> The "q" constraint only has meanting on -m32 otherwise is treated
>as
>> >> "r".
>> >>
>> >> This is easily reproducible via
>> >Clang+CONFIG_STAGING=y+CONFIG_VT6656=m,
>> >> or Clang+allyesconfig.
>> >>
>> >> Keep the masking operation to appease sparse (`make C=1`), add
>back
>> >the
>> >> cast in order to properly select the proper 8b register alias.
>> >>
>> >>  [Nick: reworded]
>> >>
>> >> Cc: sta...@vger.kernel.org
>> >> Cc: Jesse Brandeburg 
>> >> Link: https://github.com/ClangBuiltLinux/linux/issues/961
>> >> Link:
>> >https://lore.kernel.org/lkml/20200504193524.ga221...@google.com/
>> >> Fixes: 1651e700664b4 ("x86: Fix bitops.h warning with a moved
>cast")
>> >> Reported-by: Sedat Dilek 
>> >> Reported-by: kernelci.org bot 
>> >> Suggested-by: Andy Shevchenko 
>> >> Suggested-by: Ilie Halip 
>> >> Tested-by: Sedat Dilek 
>> >> Signed-off-by: Sedat Dilek 
>> >> Signed-off-by: Nick Desaulniers 
>> >> ---
>> >>  arch/x86/include/asm/bitops.h | 4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/arch/x86/include/asm/bitops.h
>> >b/arch/x86/include/asm/bitops.h
>> >> index b392571c1f1d..139122e5b25b 100644
>> >> --- a/arch/x86/include/asm/bitops.h
>> >> +++ b/arch/x86/include/asm/bitops.h
>> >> @@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long
>*addr)
>> >> if (__builtin_constant_p(nr)) {
>> >> asm volatile(LOCK_PREFIX "orb %1,%0"
>> >> : CONST_MASK_ADDR(nr, addr)
>> >> -   : "iq" (CONST_MASK(nr) & 0xff)
>> >> +   : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>> >
>> >I think a better fix would be to make CONST_MASK() return a u8 value
>> >rather than have to cast on every use.
>> >
>> >Also I question the need for the "q" constraint.  It was added in
>> >commit 437a0a54 as a workaround for GCC 3.4.4.  Now that the minimum
>> >GCC version is 4.6, is this still necessary?
>> >
>> >--
>> >Brian Gerst
>>
>> Yes, "q" is needed on i386.
>
>I think the bug this worked around was that the compiler didn't detect
>that CONST_MASK(nr) was also constant and doesn't need to be put into
>a register.  The question is does that bug still exist on compiler
>versions we care about?
>
>--
>Brian Gerst

The compiler is free to do that, including for legit reasons (common 
subexpression elimination, especially.) So yes.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 7, 2020 4:34:22 AM PDT, Peter Zijlstra  wrote:
>On Tue, May 05, 2020 at 11:07:24AM -0700, h...@zytor.com wrote:
>> On May 5, 2020 10:44:22 AM PDT, Nick Desaulniers
> wrote:
>
>> >@@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long
>*addr)
>> >if (__builtin_constant_p(nr)) {
>> >asm volatile(LOCK_PREFIX "orb %1,%0"
>> >: CONST_MASK_ADDR(nr, addr)
>> >-   : "iq" (CONST_MASK(nr) & 0xff)
>> >+   : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>> >: "memory");
>> >} else {
>> >asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
>> >@@ -74,7 +74,7 @@ arch_clear_bit(long nr, volatile unsigned long
>*addr)
>> >if (__builtin_constant_p(nr)) {
>> >asm volatile(LOCK_PREFIX "andb %1,%0"
>> >: CONST_MASK_ADDR(nr, addr)
>> >-   : "iq" (CONST_MASK(nr) ^ 0xff));
>> >+   : "iq" ((u8)(CONST_MASK(nr) ^ 0xff)));
>> >} else {
>> >asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
>> >: : RLONG_ADDR(addr), "Ir" (nr) : "memory");
>> 
>> Drop & 0xff and change ^ 0xff to ~.
>
>But then we're back to sparse being unhappy, no? The thing with ~ is
>that it will set high bits which will be truncated, which makes sparse
>sad.

In that case, sparse is just broken.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 7, 2020 1:35:01 AM PDT, David Laight  wrote:
>From: h...@zytor.com
>> Sent: 07 May 2020 08:59
>> On May 7, 2020 12:44:44 AM PDT, David Laight
> wrote:
>> >From: Brian Gerst
>> >> Sent: 07 May 2020 07:18
>> >...
>> >> > --- a/arch/x86/include/asm/bitops.h
>> >> > +++ b/arch/x86/include/asm/bitops.h
>> >> > @@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long
>> >*addr)
>> >> > if (__builtin_constant_p(nr)) {
>> >> > asm volatile(LOCK_PREFIX "orb %1,%0"
>> >> > : CONST_MASK_ADDR(nr, addr)
>> >> > -   : "iq" (CONST_MASK(nr) & 0xff)
>> >> > +   : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>> >>
>> >> I think a better fix would be to make CONST_MASK() return a u8
>value
>> >> rather than have to cast on every use.
>> >
>> >Or assign to a local variable - then it doesn't matter how
>> >the value is actually calculated. So:
>> >u8 mask = CONST_MASK(nr);
>> >asm volatile(LOCK_PREFIX "orb %1,%0"
>> >: CONST_MASK_ADDR(nr, addr)
>> >: "iq" mask
>> >
>> >David
>> >
>> >-
>> >Registered Address Lakeside, Bramley Road, Mount Farm, Milton
>Keynes,
>> >MK1 1PT, UK
>> >Registration No: 1397386 (Wales)
>> 
>> "const u8" please...
>
>Why, just a waste of disk space.
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

Descriptive.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 7, 2020 12:44:44 AM PDT, David Laight  wrote:
>From: Brian Gerst
>> Sent: 07 May 2020 07:18
>...
>> > --- a/arch/x86/include/asm/bitops.h
>> > +++ b/arch/x86/include/asm/bitops.h
>> > @@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long
>*addr)
>> > if (__builtin_constant_p(nr)) {
>> > asm volatile(LOCK_PREFIX "orb %1,%0"
>> > : CONST_MASK_ADDR(nr, addr)
>> > -   : "iq" (CONST_MASK(nr) & 0xff)
>> > +   : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>> 
>> I think a better fix would be to make CONST_MASK() return a u8 value
>> rather than have to cast on every use.
>
>Or assign to a local variable - then it doesn't matter how
>the value is actually calculated. So:
>   u8 mask = CONST_MASK(nr);
>   asm volatile(LOCK_PREFIX "orb %1,%0"
>   : CONST_MASK_ADDR(nr, addr)
>   : "iq" mask
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

"const u8" please...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: bitops: fix build regression

2020-05-07 Thread hpa

On May 6, 2020 11:18:09 PM PDT, Brian Gerst  wrote:
>On Tue, May 5, 2020 at 1:47 PM Nick Desaulniers
> wrote:
>>
>> From: Sedat Dilek 
>>
>> It turns out that if your config tickles __builtin_constant_p via
>> differences in choices to inline or not, this now produces invalid
>> assembly:
>>
>> $ cat foo.c
>> long a(long b, long c) {
>>   asm("orb\t%1, %0" : "+q"(c): "r"(b));
>>   return c;
>> }
>> $ gcc foo.c
>> foo.c: Assembler messages:
>> foo.c:2: Error: `%rax' not allowed with `orb'
>>
>> The "q" constraint only has meanting on -m32 otherwise is treated as
>> "r".
>>
>> This is easily reproducible via
>Clang+CONFIG_STAGING=y+CONFIG_VT6656=m,
>> or Clang+allyesconfig.
>>
>> Keep the masking operation to appease sparse (`make C=1`), add back
>the
>> cast in order to properly select the proper 8b register alias.
>>
>>  [Nick: reworded]
>>
>> Cc: sta...@vger.kernel.org
>> Cc: Jesse Brandeburg 
>> Link: https://github.com/ClangBuiltLinux/linux/issues/961
>> Link:
>https://lore.kernel.org/lkml/20200504193524.ga221...@google.com/
>> Fixes: 1651e700664b4 ("x86: Fix bitops.h warning with a moved cast")
>> Reported-by: Sedat Dilek 
>> Reported-by: kernelci.org bot 
>> Suggested-by: Andy Shevchenko 
>> Suggested-by: Ilie Halip 
>> Tested-by: Sedat Dilek 
>> Signed-off-by: Sedat Dilek 
>> Signed-off-by: Nick Desaulniers 
>> ---
>>  arch/x86/include/asm/bitops.h | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/bitops.h
>b/arch/x86/include/asm/bitops.h
>> index b392571c1f1d..139122e5b25b 100644
>> --- a/arch/x86/include/asm/bitops.h
>> +++ b/arch/x86/include/asm/bitops.h
>> @@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
>> if (__builtin_constant_p(nr)) {
>> asm volatile(LOCK_PREFIX "orb %1,%0"
>> : CONST_MASK_ADDR(nr, addr)
>> -   : "iq" (CONST_MASK(nr) & 0xff)
>> +   : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>
>I think a better fix would be to make CONST_MASK() return a u8 value
>rather than have to cast on every use.
>
>Also I question the need for the "q" constraint.  It was added in
>commit 437a0a54 as a workaround for GCC 3.4.4.  Now that the minimum
>GCC version is 4.6, is this still necessary?
>
>--
>Brian Gerst

Yes, "q" is needed on i386.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2] bpf, i386: remove unneeded conversion to bool

2020-05-06 Thread hpa

On May 6, 2020 7:03:52 AM PDT, Jason Yan  wrote:
>The '==' expression itself is bool, no need to convert it to bool
>again.
>This fixes the following coccicheck warning:
>
>arch/x86/net/bpf_jit_comp32.c:1478:50-55: WARNING: conversion to bool
>not needed here
>arch/x86/net/bpf_jit_comp32.c:1479:50-55: WARNING: conversion to bool
>not needed here
>
>Signed-off-by: Jason Yan 
>---
> v2: change the name 'x32' to 'i386'.
>
> arch/x86/net/bpf_jit_comp32.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/net/bpf_jit_comp32.c
>b/arch/x86/net/bpf_jit_comp32.c
>index 66cd150b7e54..96fde03aa987 100644
>--- a/arch/x86/net/bpf_jit_comp32.c
>+++ b/arch/x86/net/bpf_jit_comp32.c
>@@ -1475,8 +1475,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int
>*addrs, u8 *image,
>   for (i = 0; i < insn_cnt; i++, insn++) {
>   const s32 imm32 = insn->imm;
>   const bool is64 = BPF_CLASS(insn->code) == BPF_ALU64;
>-  const bool dstk = insn->dst_reg == BPF_REG_AX ? false : true;
>-  const bool sstk = insn->src_reg == BPF_REG_AX ? false : true;
>+  const bool dstk = insn->dst_reg != BPF_REG_AX;
>+  const bool sstk = insn->src_reg != BPF_REG_AX;
>   const u8 code = insn->code;
>   const u8 *dst = bpf2ia32[insn->dst_reg];
>   const u8 *src = bpf2ia32[insn->src_reg];

"foo ? true : false" is also far better written !!foo when it isn't totally 
redundant.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: bitops: fix build regression

2020-05-05 Thread hpa

On May 5, 2020 10:44:22 AM PDT, Nick Desaulniers  
wrote:
>From: Sedat Dilek 
>
>It turns out that if your config tickles __builtin_constant_p via
>differences in choices to inline or not, this now produces invalid
>assembly:
>
>$ cat foo.c
>long a(long b, long c) {
>  asm("orb\t%1, %0" : "+q"(c): "r"(b));
>  return c;
>}
>$ gcc foo.c
>foo.c: Assembler messages:
>foo.c:2: Error: `%rax' not allowed with `orb'
>
>The "q" constraint only has meanting on -m32 otherwise is treated as
>"r".
>
>This is easily reproducible via Clang+CONFIG_STAGING=y+CONFIG_VT6656=m,
>or Clang+allyesconfig.
>
>Keep the masking operation to appease sparse (`make C=1`), add back the
>cast in order to properly select the proper 8b register alias.
>
> [Nick: reworded]
>
>Cc: sta...@vger.kernel.org
>Cc: Jesse Brandeburg 
>Link: https://github.com/ClangBuiltLinux/linux/issues/961
>Link: https://lore.kernel.org/lkml/20200504193524.ga221...@google.com/
>Fixes: 1651e700664b4 ("x86: Fix bitops.h warning with a moved cast")
>Reported-by: Sedat Dilek 
>Reported-by: kernelci.org bot 
>Suggested-by: Andy Shevchenko 
>Suggested-by: Ilie Halip 
>Tested-by: Sedat Dilek 
>Signed-off-by: Sedat Dilek 
>Signed-off-by: Nick Desaulniers 
>---
> arch/x86/include/asm/bitops.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/include/asm/bitops.h
>b/arch/x86/include/asm/bitops.h
>index b392571c1f1d..139122e5b25b 100644
>--- a/arch/x86/include/asm/bitops.h
>+++ b/arch/x86/include/asm/bitops.h
>@@ -54,7 +54,7 @@ arch_set_bit(long nr, volatile unsigned long *addr)
>   if (__builtin_constant_p(nr)) {
>   asm volatile(LOCK_PREFIX "orb %1,%0"
>   : CONST_MASK_ADDR(nr, addr)
>-  : "iq" (CONST_MASK(nr) & 0xff)
>+  : "iq" ((u8)(CONST_MASK(nr) & 0xff))
>   : "memory");
>   } else {
>   asm volatile(LOCK_PREFIX __ASM_SIZE(bts) " %1,%0"
>@@ -74,7 +74,7 @@ arch_clear_bit(long nr, volatile unsigned long *addr)
>   if (__builtin_constant_p(nr)) {
>   asm volatile(LOCK_PREFIX "andb %1,%0"
>   : CONST_MASK_ADDR(nr, addr)
>-  : "iq" (CONST_MASK(nr) ^ 0xff));
>+  : "iq" ((u8)(CONST_MASK(nr) ^ 0xff)));
>   } else {
>   asm volatile(LOCK_PREFIX __ASM_SIZE(btr) " %1,%0"
>   : : RLONG_ADDR(addr), "Ir" (nr) : "memory");

Drop & 0xff and change ^ 0xff to ~.

The redundancy is confusing.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: [RFD] x86/split_lock: Request to Intel

2019-10-18 Thread hpa

On October 18, 2019 3:45:14 AM PDT, David Laight  
wrote:
>From: Luck, Tony
>> Sent: 18 October 2019 00:28
>...
>> 2) Fix set_bit() et. al. to not issue atomic operations that cross
>boundaries.
>> 
>> Fenghua had been pursuing option #1 in previous iterations. He found
>a few
>> more places with the help of the "grep" patterns suggested by David
>Laight.
>> So that path is up to ~8 patches now that do one of:
>>  + Change from u32 to u64
>>  + Force alignment with a union with a u64
>>  + Change to non-atomic (places that didn't need atomic)
>> 
>> Downside of strategy #1 is that people will add new misaligned cases
>in the
>> future. So this process has no defined end point.
>> 
>> Strategy #2 begun when I looked at the split-lock issue I saw that
>with a
>> constant bit argument set_bit() just does a "ORB" on the affected
>byte (i.e.
>> no split lock). Similar for clear_bit() and change_bit(). Changing
>code to also
>> do that for the variable bit case is easy.
>> 
>> test_and_clr_bit() needs more care, but luckily, we had Peter Anvin
>nearby
>> to give us a neat solution.
>
>Changing the x86-64 bitops to use 32bit memory cycles is trivial
>(provided you are willing to accept a limit of 2G bits).
>
>OTOH this only works because x86 is LE.
>On any BE systems passing an 'int []' to any of the bit-functions is so
>terribly
>wrong it is unbelievable.
>
>So changing the x86-64 bitops is largely papering over a crack.
>
>In essence any code that casts the argument to any of the bitops
>functions
>is almost certainly badly broken on BE systems.
>
>The x86 cpu features code is always LE.
>It probably ought to have a typedef for a union of long [] and int [].
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

One thing I suggested is that we should actually expose the violations at 
committee time either by wrapping them in macros using __alignof__ and/or make 
the kernel compile with -Wcast-align.

On x86 the btsl/btcl/btrl instructions can be used without limiting to 2Gbit of 
the address is computed, the way one does for plain and, or, etc. However, if 
the real toes for the arguments are exposed then or is possible to do better.

Finally, as far as bigendian is concerned: the problem Linux has on bigendian 
machines is that it tries to use littleendian bitmaps on bigendian machines: on 
bigendian machines, bit 0 is naturally the MSB. If your reaction is "but that 
is absurd", then you have just grokked why bigendian is fundamentally broken.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 1/2] x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area.

2019-09-23 Thread hpa

On September 23, 2019 11:15:20 AM PDT, Steve Wahl  wrote:
>Our hardware (UV aka Superdome Flex) has address ranges marked
>reserved by the BIOS. Access to these ranges is caught as an error,
>causing the BIOS to halt the system.
>
>Initial page tables mapped a large range of physical addresses that
>were not checked against the list of BIOS reserved addresses, and
>sometimes included reserved addresses in part of the mapped range.
>Including the reserved range in the map allowed processor speculative
>accesses to the reserved range, triggering a BIOS halt.
>
>Used early in booting, the page table level2_kernel_pgt addresses 1
>GiB divided into 2 MiB pages, and it was set up to linearly map a full
>1 GiB of physical addresses that included the physical address range
>of the kernel image, as chosen by KASLR.  But this also included a
>large range of unused addresses on either side of the kernel image.
>And unlike the kernel image's physical address range, this extra
>mapped space was not checked against the BIOS tables of usable RAM
>addresses.  So there were times when the addresses chosen by KASLR
>would result in processor accessible mappings of BIOS reserved
>physical addresses.
>
>The kernel code did not directly access any of this extra mapped
>space, but having it mapped allowed the processor to issue speculative
>accesses into reserved memory, causing system halts.
>
>This was encountered somewhat rarely on a normal system boot, and much
>more often when starting the crash kernel if "crashkernel=512M,high"
>was specified on the command line (this heavily restricts the physical
>address of the crash kernel, in our case usually within 1 GiB of
>reserved space).
>
>The solution is to invalidate the pages of this table outside the
>kernel image's space before the page table is activated.  This patch
>has been validated to fix this problem on our hardware.
>
>Signed-off-by: Steve Wahl 
>Cc: sta...@vger.kernel.org
>---
>Changes since v1:
>  * Added comment.
>  * Reworked changelog text.
>
> arch/x86/kernel/head64.c | 18 --
> 1 file changed, 16 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>index 29ffa495bd1c..ee9d0e3e0c46 100644
>--- a/arch/x86/kernel/head64.c
>+++ b/arch/x86/kernel/head64.c
>@@ -222,13 +222,27 @@ unsigned long __head __startup_64(unsigned long
>physaddr,
>* we might write invalid pmds, when the kernel is relocated
>* cleanup_highmap() fixes this up along with the mappings
>* beyond _end.
>+   *
>+   * Only the region occupied by the kernel image has so far
>+   * been checked against the table of usable memory regions
>+   * provided by the firmware, so invalidate pages outside that
>+   * region.  A page table entry that maps to a reserved area of
>+   * memory would allow processor speculation into that area,
>+   * and on some hardware (particularly the UV platform) even
>+   * speculative access to some reserved areas is caught as an
>+   * error, causing the BIOS to halt the system.
>*/
> 
>   pmd = fixup_pointer(level2_kernel_pgt, physaddr);
>-  for (i = 0; i < PTRS_PER_PMD; i++) {
>+  for (i = 0; i < pmd_index((unsigned long)_text); i++)
>+  pmd[i] &= ~_PAGE_PRESENT;
>+
>+  for (; i <= pmd_index((unsigned long)_end); i++)
>   if (pmd[i] & _PAGE_PRESENT)
>   pmd[i] += load_delta;
>-  }
>+
>+  for (; i < PTRS_PER_PMD; i++)
>+  pmd[i] &= ~_PAGE_PRESENT;
> 
>   /*
>* Fixup phys_base - remove the memory encryption mask to obtain

What does your MTRR setup look like, and what memory map do you present, in 
exact detail? The BIOS is normally expected to mark the relevant ranges as UC 
in the MTRRs (that is the remaining, legitimate usage of MTRRs.)

I'm somewhat sceptical that chopping off potentially several megabytes is a 
good thing. We also have the memory type interfaces which can be used to map 
these as UC in the page tables.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Tracing text poke / kernel self-modifying code (Was: Re: [RFC v2 0/6] x86: dynamic indirect branch promotion)

2019-09-12 Thread hpa

On September 12, 2019 8:00:39 AM GMT+01:00, Adrian Hunter 
 wrote:
>On 29/08/19 2:46 PM, Peter Zijlstra wrote:
>> On Thu, Aug 29, 2019 at 12:40:56PM +0300, Adrian Hunter wrote:
>>> Can you expand on "and ensure the poke_handler preserves the
>existing
>>> control flow"?  Whatever the INT3-handler does will be traced
>normally so
>>> long as it does not itself execute self-modified code.
>> 
>> My thinking was that the code shouldn't change semantics before
>emitting
>> the RECORD_TEXT_POKE; but you're right in that that doesn't seem to
>> matter much.
>> 
>> Either we run the old code or INT3 does 'something'.  Then we get
>> RECORD_TEXT_POKE and finish the poke.  Which tells that the moment
>INT3
>> stops the new text is in place.
>> 
>> I suppose that works too, and requires less changes.
>
>
>What about this?
>
>
>diff --git a/arch/x86/include/asm/text-patching.h
>b/arch/x86/include/asm/text-patching.h
>index 70c09967a999..00aa9bef2b9d 100644
>--- a/arch/x86/include/asm/text-patching.h
>+++ b/arch/x86/include/asm/text-patching.h
>@@ -30,6 +30,7 @@ struct text_poke_loc {
>   void *addr;
>   size_t len;
>   const char opcode[POKE_MAX_OPCODE_SIZE];
>+  char old_opcode[POKE_MAX_OPCODE_SIZE];
> };
> 
>extern void text_poke_early(void *addr, const void *opcode, size_t
>len);
>diff --git a/arch/x86/kernel/alternative.c
>b/arch/x86/kernel/alternative.c
>index ccd32013c47a..c781bafd 100644
>--- a/arch/x86/kernel/alternative.c
>+++ b/arch/x86/kernel/alternative.c
>@@ -3,6 +3,7 @@
> 
> #include 
> #include 
>+#include 
> #include 
> #include 
> #include 
>@@ -1045,8 +1046,10 @@ void text_poke_bp_batch(struct text_poke_loc
>*tp, unsigned int nr_entries)
>   /*
>* First step: add a int3 trap to the address that will be patched.
>*/
>-  for (i = 0; i < nr_entries; i++)
>+  for (i = 0; i < nr_entries; i++) {
>+  memcpy(tp[i].old_opcode, tp[i].addr, tp[i].len);
>   text_poke(tp[i].addr, , sizeof(int3));
>+  }
> 
>   on_each_cpu(do_sync_core, NULL, 1);
> 
>@@ -1071,6 +1074,11 @@ void text_poke_bp_batch(struct text_poke_loc
>*tp, unsigned int nr_entries)
>   on_each_cpu(do_sync_core, NULL, 1);
>   }
> 
>+  for (i = 0; i < nr_entries; i++) {
>+  perf_event_text_poke((unsigned long)tp[i].addr,
>+   tp[i].old_opcode, tp[i].opcode, tp[i].len);
>+  }
>+
>   /*
>* Third step: replace the first byte (int3) by the first byte of
>* replacing opcode.
>diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>index 61448c19a132..f4c6095d2110 100644
>--- a/include/linux/perf_event.h
>+++ b/include/linux/perf_event.h
>@@ -1183,6 +1183,8 @@ extern void perf_event_exec(void);
> extern void perf_event_comm(struct task_struct *tsk, bool exec);
> extern void perf_event_namespaces(struct task_struct *tsk);
> extern void perf_event_fork(struct task_struct *tsk);
>+extern void perf_event_text_poke(unsigned long addr, const void
>*old_bytes,
>+   const void *new_bytes, size_t len);
> 
> /* Callchains */
> DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
>@@ -1406,6 +1408,10 @@ static inline void perf_event_exec(void)
>{ }
>static inline void perf_event_comm(struct task_struct *tsk, bool
>exec)  { }
> static inline void perf_event_namespaces(struct task_struct *tsk) { }
> static inline void perf_event_fork(struct task_struct *tsk)   { }
>+static inline void perf_event_text_poke(unsigned long addr,
>+  const void *old_bytes,
>+  const void *new_bytes,
>+  size_t len) { }
> static inline void perf_event_init(void)  { }
>static inline int  perf_swevent_get_recursion_context(void){ return
>-1; }
> static inline void perf_swevent_put_recursion_context(int rctx)   
> { }
>diff --git a/include/uapi/linux/perf_event.h
>b/include/uapi/linux/perf_event.h
>index bb7b271397a6..6396d4c0d2f9 100644
>--- a/include/uapi/linux/perf_event.h
>+++ b/include/uapi/linux/perf_event.h
>@@ -375,7 +375,8 @@ struct perf_event_attr {
>   ksymbol:  1, /* include ksymbol events 
> */
>   bpf_event  :  1, /* include bpf events */
>   aux_output :  1, /* generate AUX records 
> instead of events */
>-  __reserved_1   : 32;
>+  text_poke  :  1, /* include text poke 
>events */
>+  __reserved_1   : 31;
> 
>   union {
>   __u32   wakeup_events;/* wakeup every n events */
>@@ -1000,6 +1001,22 @@ enum perf_event_type {
>*/
>   PERF_RECORD_BPF_EVENT   = 18,
> 
>+  /*
>+   * Records changes

Re: [PATCH] x86/umip: Add emulation for 64-bit processes

2019-09-10 Thread hpa

On September 10, 2019 7:28:28 AM GMT+01:00, Ingo Molnar  
wrote:
>
>* h...@zytor.com  wrote:
>
>> I would strongly suggest that we change the term "emulation" to 
>> "spoofing" for these instructions. We need to explain that we do
>*not* 
>> execute these instructions the was the CPU would have, and unlike the
>
>> native instructions do not leak kernel information.
>
>Ok, I've edited the patch to add the 'spoofing' wording where 
>appropriate, and I also made minor fixes such as consistently 
>capitalizing instruction names.
>
>Can I also add your Reviewed-by tag?
>
>So the patch should show up in tip:x86/asm today-ish, and barring any 
>complications is v5.4 material.
>
>Thanks,
>
>   Ingo

Yes, please do.

Reviewed-by: H. Peter Anvin (Intel) 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86/umip: Add emulation for 64-bit processes

2019-09-09 Thread hpa

On September 6, 2019 12:22:21 AM GMT+01:00, Brendan Shanks 
 wrote:
>Add emulation of the sgdt, sidt, and smsw instructions for 64-bit
>processes.
>
>Wine users have encountered a number of 64-bit Windows games that use
>these instructions (particularly sgdt), and were crashing when run on
>UMIP-enabled systems.
>
>Originally-by: Ricardo Neri 
>Signed-off-by: Brendan Shanks 
>---
> arch/x86/kernel/umip.c | 55 +-
> 1 file changed, 33 insertions(+), 22 deletions(-)
>
>diff --git a/arch/x86/kernel/umip.c b/arch/x86/kernel/umip.c
>index 5b345add550f..1812e95d2f55 100644
>--- a/arch/x86/kernel/umip.c
>+++ b/arch/x86/kernel/umip.c
>@@ -51,9 +51,7 @@
>* The instruction smsw is emulated to return the value that the
>register CR0
>  * has at boot time as set in the head_32.
>  *
>- * Also, emulation is provided only for 32-bit processes; 64-bit
>processes
>- * that attempt to use the instructions that UMIP protects will
>receive the
>- * SIGSEGV signal issued as a consequence of the general protection
>fault.
>+ * Emulation is provided for both 32-bit and 64-bit processes.
>  *
>* Care is taken to appropriately emulate the results when segmentation
>is
>* used. That is, rather than relying on USER_DS and USER_CS, the
>function
>@@ -63,17 +61,18 @@
>  * application uses a local descriptor table.
>  */
> 
>-#define UMIP_DUMMY_GDT_BASE 0xfffe
>-#define UMIP_DUMMY_IDT_BASE 0x
>+#define UMIP_DUMMY_GDT_BASE 0xfffeULL
>+#define UMIP_DUMMY_IDT_BASE 0xULL
> 
> /*
>* The SGDT and SIDT instructions store the contents of the global
>descriptor
>* table and interrupt table registers, respectively. The destination is
>a
>* memory operand of X+2 bytes. X bytes are used to store the base
>address of
>- * the table and 2 bytes are used to store the limit. In 32-bit
>processes, the
>- * only processes for which emulation is provided, X has a value of 4.
>+ * the table and 2 bytes are used to store the limit. In 32-bit
>processes X
>+ * has a value of 4, in 64-bit processes X has a value of 8.
>  */
>-#define UMIP_GDT_IDT_BASE_SIZE 4
>+#define UMIP_GDT_IDT_BASE_SIZE_64BIT 8
>+#define UMIP_GDT_IDT_BASE_SIZE_32BIT 4
> #define UMIP_GDT_IDT_LIMIT_SIZE 2
> 
> #define   UMIP_INST_SGDT  0   /* 0F 01 /0 */
>@@ -189,6 +188,7 @@ static int identify_insn(struct insn *insn)
>  * @umip_inst:A constant indicating the instruction to emulate
>  * @data: Buffer into which the dummy result is stored
>  * @data_size:Size of the emulated result
>+ * @x86_64: true if process is 64-bit, false otherwise
>  *
>* Emulate an instruction protected by UMIP and provide a dummy result.
>The
>* result of the emulation is saved in @data. The size of the results
>depends
>@@ -202,11 +202,8 @@ static int identify_insn(struct insn *insn)
>  * 0 on success, -EINVAL on error while emulating.
>  */
> static int emulate_umip_insn(struct insn *insn, int umip_inst,
>-   unsigned char *data, int *data_size)
>+   unsigned char *data, int *data_size, bool x86_64)
> {
>-  unsigned long dummy_base_addr, dummy_value;
>-  unsigned short dummy_limit = 0;
>-
>   if (!data || !data_size || !insn)
>   return -EINVAL;
>   /*
>@@ -219,6 +216,9 @@ static int emulate_umip_insn(struct insn *insn, int
>umip_inst,
>* is always returned irrespective of the operand size.
>*/
>   if (umip_inst == UMIP_INST_SGDT || umip_inst == UMIP_INST_SIDT) {
>+  u64 dummy_base_addr;
>+  u16 dummy_limit = 0;
>+
>   /* SGDT and SIDT do not use registers operands. */
>   if (X86_MODRM_MOD(insn->modrm.value) == 3)
>   return -EINVAL;
>@@ -228,13 +228,24 @@ static int emulate_umip_insn(struct insn *insn,
>int umip_inst,
>   else
>   dummy_base_addr = UMIP_DUMMY_IDT_BASE;
> 
>-  *data_size = UMIP_GDT_IDT_LIMIT_SIZE + UMIP_GDT_IDT_BASE_SIZE;
>+  /*
>+   * 64-bit processes use the entire dummy base address.
>+   * 32-bit processes use the lower 32 bits of the base address.
>+   * dummy_base_addr is always 64 bits, but we memcpy the correct
>+   * number of bytes from it to the destination.
>+   */
>+  if (x86_64)
>+  *data_size = UMIP_GDT_IDT_BASE_SIZE_64BIT;
>+  else
>+  *data_size = UMIP_GDT_IDT_BASE_SIZE_32BIT;
>+
>+  memcpy(data + 2, _base_addr, *data_size);
> 
>-  memcpy(data + 2, _base_addr, UMIP_GDT_IDT_BASE_SIZE);
>+  *data_size += UMIP_GDT_IDT_LIMIT_SIZE;
>   memcpy(data, _limit, UMIP_GDT_IDT_LIMIT_SIZE);
> 
>   } else if (umip_inst == UMIP_INST_SMSW) {
>-  dummy_value = CR0_STATE;
>+  unsigned long dummy_value = CR0_STATE;
> 
>   /*
>* Even though the CR0

Re: [PATCH] x86/umip: Add emulation for 64-bit processes

2019-09-09 Thread hpa

On September 8, 2019 8:22:48 AM GMT+01:00, Borislav Petkov  
wrote:
>On Sat, Sep 07, 2019 at 02:26:10PM -0700, Ricardo Neri wrote:
>> > Wine users have encountered a number of 64-bit Windows games that
>use
>> > these instructions (particularly sgdt), and were crashing when run
>on
>> > UMIP-enabled systems.
>> 
>> Emulation support for 64-bit processes was not initially included
>> because no use cases had been identified.
>
>AFAIR, we said at the time that 64-bit doesn't need it because this is
>legacy software only and 64-bit will get fixed properly not to use
>those
>insns. I can probably guess how that went ...

I don't think Windows games was something we considered. However, needing to 
simulate these instructions is not a huge surprise. The important thing is that 
by simulating them, we can plug the leak of some very high value kernel 
information – mainly the GDT, IDT and TSS addresses.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: New kernel interface for sys_tz and timewarp?

2019-08-15 Thread hpa

On August 15, 2019 8:05:30 AM PDT, "Theodore Y. Ts'o"  wrote:
>On Thu, Aug 15, 2019 at 03:22:45PM +0200, Arnd Bergmann wrote:
>> If 64-bit Windows relies on a working EFI RTC implementation, we
>could
>> decide to leave the driver enabled on 64-bit and only disable it for
>> 32-bit EFI. That way, future distros would no longer have to worry
>about
>> the localtime hack, at least the ones that have dropped support for
>> 32-bit x86 kernels.
>
>... and who have also dropped support for legacy (non-UEFI) 64-bit
>boot.  Keep in mind that even for distributions which may install with
>UEFI by default, if people have been upgrading from (for example)
>Debian Jessie to Stretch to Buster, they may still be using non-UEFI
>boot.  This might be especially true on small Linode servers (such as,
>for example, the one which is currently running my mail smarthost)
>
>  - Ted

There is also the ACPI TAD device, which can be used to expose this through 
ACPI. We might also be able to request that the ACPI information for the RTC be 
augmented with a CMOS timezone location.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

RE: New kernel interface for sys_tz and timewarp?

2019-08-14 Thread hpa

On August 14, 2019 9:26:36 AM PDT, David Laight  wrote:
>From: Theodore Y. Ts'o
>> Sent: 14 August 2019 01:06
>> On Tue, Aug 13, 2019 at 10:30:34AM -0700, Linus Torvalds wrote:
>> >
>> > I suspect the only actual _valid_ use in the kernel for a time zone
>> > setting is likely for RTC clock setting, but even that isn't really
>> > "global", as much as "per RTC".
>> 
>> As I recall (and I may or may not have been original for the original
>> sys_tz; it was many years ago, and my memories of 1992 are a bit
>> fuzzy) the only reason why we added it was because x86 systems that
>> were dual-booting with Windows had a RTC which ticked localtime, and
>> originally, the system time was fetched from the RTC in early boot,
>> and then when the timezone was set, the time would be warped so it
>> would be correct.
>
>x86 systems are very likely to have the RTC set by the bios config.
>In which case it will almost certainly get set to local time.
>It is certainly the default for windows installs - I don't even know
>if you have any other option.
>
>The 'real fun' (tm) happens when a dual boot system changes from
>winter to summer time.
>ISTR that it is quite easy to get both (or more) OS to change the
>RTC by an hour (I went home an hour early one year).
>Although the x86 RTC chip has a bit defined for 'summertime', nothing
>sets it (at least when I looked).
>
>   David
>
>-
>Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
>MK1 1PT, UK
>Registration No: 1397386 (Wales)

I believe Windows 10 changed the default RTC to UTC, although perhaps only if 
running under UEFI.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC PATCH] compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use

2019-08-01 Thread hpa

On August 1, 2019 5:24:29 AM PDT, Peter Zijlstra  wrote:
>On Wed, Jul 31, 2019 at 11:10:36PM -0700, h...@zytor.com wrote:
>> On July 31, 2019 4:55:47 PM PDT, Miguel Ojeda
> wrote:
>> >On Wed, Jul 31, 2019 at 11:01 PM  wrote:
>> >>
>> >> The standard is moving toward adding this as an attribute with the
>> >[[fallthrough]] syntax; it is in C++17, not sure when it will be in
>C
>> >be if it isn't already.
>> >
>> >Not yet, but it seems to be coming:
>> >
>> >  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf
>> >
>> >However, even if C2x gets it, it will be quite a while until the GCC
>> >minimum version gets bumped up to that, so...
>> >
>> >Cheers,
>> >Miguel
>> 
>> The point was that we should plan ahead in whatever we end up doing.
>
>By reserving 'fallthrough' as a keyword we do exactly that. We can then
>define it to whatever the compiler/tool at hand requires.
>
>Once GCC gains support for that [[attribute]] nonsense, we can detector
>that and use that over the __attribute__(())
>
>[ Also the Cxx attribute syntax is an abomination -- just a lesser one
>than reading actual comments :-) ]

I'm not disagreeing... I think using a macro makes sense.

Not sure if I agree about the syntax... I think it's rather friendly compared 
to gcc's ;)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC PATCH] compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use

2019-08-01 Thread hpa

On July 31, 2019 4:55:47 PM PDT, Miguel Ojeda  
wrote:
>On Wed, Jul 31, 2019 at 11:01 PM  wrote:
>>
>> The standard is moving toward adding this as an attribute with the
>[[fallthrough]] syntax; it is in C++17, not sure when it will be in C
>be if it isn't already.
>
>Not yet, but it seems to be coming:
>
>  http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2268.pdf
>
>However, even if C2x gets it, it will be quite a while until the GCC
>minimum version gets bumped up to that, so...
>
>Cheers,
>Miguel

The point was that we should plan ahead in whatever we end up doing.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone()

2019-08-01 Thread hpa

On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski  wrote:
>On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov  wrote:
>>
>> As it has been discussed on timens RFC, adding a new conditional
>branch
>> `if (inside_time_ns)` on VDSO for all processes is undesirable.
>> It will add a penalty for everybody as branch predictor may
>mispredict
>> the jump. Also there are instruction cache lines wasted on cmp/jmp.
>
>
>>
>> +#ifdef CONFIG_TIME_NS
>> +int vdso_join_timens(struct task_struct *task)
>> +{
>> +   struct mm_struct *mm = task->mm;
>> +   struct vm_area_struct *vma;
>> +
>> +   if (down_write_killable(>mmap_sem))
>> +   return -EINTR;
>> +
>> +   for (vma = mm->mmap; vma; vma = vma->vm_next) {
>> +   unsigned long size = vma->vm_end - vma->vm_start;
>> +
>> +   if (vma_is_special_mapping(vma, _mapping) ||
>> +   vma_is_special_mapping(vma, _mapping))
>> +   zap_page_range(vma, vma->vm_start, size);
>> +   }
>
>This is, unfortunately, fundamentally buggy.  If any thread is in the
>vDSO or has the vDSO on the stack (due to a signal, for example), this
>will crash it.  I can think of three solutions:
>
>1. Say that you can't setns() if you have other mms and ignore the
>signal issue.  Anything with green threads will disapprove.  It's also
>rather gross.
>
>2. Make it so that you can flip the static branch safely.  As in my
>other email, you'll need to deal with CoW somehow,
>
>3. Make it so that you can't change timens, or at least that you can't
>turn timens on or off, without execve() or fork().
>
>BTW, that static branch probably needs to be aligned to a cache line
>or something similar to avoid all the nastiness with trying to poke
>text that might be concurrently executing.  This will be a mess.

Since we are talking about different physical addresses I believe we should be 
okay as long as they don't cross page boundaries, and even if they do it can be 
managed with proper page invalidation sequencing – it's not like the problems 
of having to deal with XMC on live pages like in the kernel.

Still, you really need each instruction sequence to be present, with the only 
difference being specific patch sites.

Any fundamental reason this can't be strictly data driven? Seems odd to me if 
it couldn't, but I might be missing something obvious.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC PATCH] compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use

2019-07-31 Thread hpa

On July 31, 2019 11:48:32 AM PDT, Peter Zijlstra  wrote:
>On Wed, Jul 31, 2019 at 11:24:36AM -0700, h...@zytor.com wrote:
>> >> > +/*
>> >> > + * Add the pseudo keyword 'fallthrough' so case statement
>blocks
>> >> > + * must end with any of these keywords:
>> >> > + *   break;
>> >> > + *   fallthrough;
>> >> > + *   goto ;
>> >> > + *   return [expression];
>> >> > + *
>> >> > + *  gcc:
>>https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#Statement-Attributes
>> >> > + */
>> >> > +#if __has_attribute(__fallthrough__)
>> >> > +# define fallthrough  
>__attribute__((__fallthrough__))
>> >> > +#else
>> >> > +# define fallthroughdo {} while (0)  /*
>fallthrough */
>> >> > +#endif
>> >> > +
>
>> If the comments are stripped, how would the compiler see them to be
>> able to issue a warning? I would guess that it is retained or
>replaced
>> with some other magic token.
>
>Everything that has the warning (GCC-7+/CLANG-9) has that attribute.

The standard is moving toward adding this as an attribute with the 
[[fallthrough]] syntax; it is in C++17, not sure when it will be in C be if it 
isn't already.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC PATCH] compiler_attributes.h: Add 'fallthrough' pseudo keyword for switch/case use

2019-07-31 Thread hpa

On July 31, 2019 10:51:37 AM PDT, Joe Perches  wrote:
>On Wed, 2019-07-31 at 19:14 +0200, Pavel Machek wrote:
>> On Tue 2019-07-30 22:35:18, Joe Perches wrote:
>> > Reserve the pseudo keyword 'fallthrough' for the ability to convert
>the
>> > various case block /* fallthrough */ style comments to appear to be
>an
>> > actual reserved word with the same gcc case block missing
>fallthrough
>> > warning capability.
>> 
>> Acked-by: Pavel Machek 
>> 
>> > +/*
>> > + * Add the pseudo keyword 'fallthrough' so case statement blocks
>> > + * must end with any of these keywords:
>> > + *   break;
>> > + *   fallthrough;
>> > + *   goto ;
>> > + *   return [expression];
>> > + *
>> > + *  gcc:
>https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html#Statement-Attributes
>> > + */
>> > +#if __has_attribute(__fallthrough__)
>> > +# define fallthrough   
>__attribute__((__fallthrough__))
>> > +#else
>> > +# define fallthroughdo {} while (0)  /*
>fallthrough */
>> > +#endif
>> > +
>> 
>> Will various checkers (and gcc) recognize, that comment in a macro,
>> and disable the warning accordingly?
>
>Current non-gcc tools:  I doubt it.
>
>And that's unlikely as the comments are supposed to be stripped
>before the macro expansion phase.
>
>gcc 7+, which by now probably most developers actually use, will
>though
>and likely that's sufficient.
>
>> Explanation that the comment is "magic" might not be a bad idea.
>
>The comment was more for the reader.
>
>cheers, Joe

If the comments are stripped, how would the compiler see them to be able to 
issue a warning? I would guess that it is retained or replaced with some other 
magic token.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] x86: drop REG_OUT macro from hweight functions

2019-07-30 Thread hpa

On July 30, 2019 1:08:43 AM PDT, Peter Zijlstra  wrote:
>On Mon, Jul 29, 2019 at 11:44:17PM +0300, Alexey Dobriyan wrote:
>> On Mon, Jul 29, 2019 at 12:04:47PM +0200, Peter Zijlstra wrote:
>> > +#define _ASM_ARG1B__ASM_FORM_RAW(dil)
>> > +#define _ASM_ARG2B__ASM_FORM_RAW(sil)
>> > +#define _ASM_ARG3B__ASM_FORM_RAW(dl)
>> > +#define _ASM_ARG4B__ASM_FORM_RAW(cl)
>> > +#define _ASM_ARG5B__ASM_FORM_RAW(r8b)
>> > +#define _ASM_ARG6B__ASM_FORM_RAW(r9b)
>> 
>> I preprocessed percpu code once to see what precisely it does because
>> it was easier than wading through forest of macroes.
>
>Per cpu is easy, try reading the tracepoint code ;-)

Sometimes it's the .s file one ends up wanting to check out...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/1] x86/boot: clear some fields explicitly

2019-07-25 Thread hpa

On July 25, 2019 2:48:30 PM PDT, Thomas Gleixner  wrote:
>On Thu, 25 Jul 2019, John Hubbard wrote:
>> On 7/25/19 12:22 AM, Thomas Gleixner wrote:
>> > It removes the clearing of the range between kbd_status and hdr
>without any
>> > replacement. It neither clears edid_info.
>> 
>> 
>> Yes. Somehow I left that chunk out. Not my finest hour. 
>
>S*** happens
>
>> > +  char *p = (char *) boot_params;
>> > +  int i;
>> > +
>> > +  for (i = 0; i < ARRAY_SIZE(toclear); i++)
>> > +  memset(p + toclear[i].start, 0, toclear[i].len);
>> >}
>> >  }
>> 
>> Looks nice.
>
>I have no idea whether it works and I have no cycles either, so I would
>appreciate it if you could polish it up so we can handle that new
>fangled
>GCC "feature" nicely.
>
>Alternatively file a bug report to the GCC folks :)
>
>But seriously I think it's not completely insane what they are doing
>and
>the table based approach is definitely more readable and maintainable
>than
>the existing stuff.
>
>Thanks,
>
>   tglx

Doing this table based does seem like a good idea.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 1/1] x86/boot: clear some fields explicitly

2019-07-24 Thread hpa

On July 24, 2019 4:15:28 PM PDT, john.hubb...@gmail.com wrote:
>From: John Hubbard 
>
>Recent gcc compilers (gcc 9.1) generate warnings about an
>out of bounds memset, if you trying memset across several fields
>of a struct. This generated a couple of warnings on x86_64 builds.
>
>Because struct boot_params is __packed__, normal variable
>variable assignment will work just as well as a memset here.
>Change three u32 fields to be cleared to zero that way, and
>just memset the _pad4 field.
>
>This clears up the build warnings for me.
>
>Signed-off-by: John Hubbard 
>---
> arch/x86/include/asm/bootparam_utils.h | 11 +--
> 1 file changed, 5 insertions(+), 6 deletions(-)
>
>diff --git a/arch/x86/include/asm/bootparam_utils.h
>b/arch/x86/include/asm/bootparam_utils.h
>index 101eb944f13c..4df87d4a043b 100644
>--- a/arch/x86/include/asm/bootparam_utils.h
>+++ b/arch/x86/include/asm/bootparam_utils.h
>@@ -37,12 +37,11 @@ static void sanitize_boot_params(struct boot_params
>*boot_params)
>   if (boot_params->sentinel) {
>   /* fields in boot_params are left uninitialized, clear them */
>   boot_params->acpi_rsdp_addr = 0;
>-  memset(_params->ext_ramdisk_image, 0,
>- (char *)_params->efi_info -
>-  (char *)_params->ext_ramdisk_image);
>-  memset(_params->kbd_status, 0,
>- (char *)_params->hdr -
>- (char *)_params->kbd_status);
>+  boot_params->ext_ramdisk_image = 0;
>+  boot_params->ext_ramdisk_size = 0;
>+  boot_params->ext_cmd_line_ptr = 0;
>+
>+  memset(_params->_pad4, 0, sizeof(boot_params->_pad4));
>   memset(_params->_pad7[0], 0,
>  (char *)_params->edd_mbr_sig_buffer[0] -
>   (char *)_params->_pad7[0]);

The problem with this is that it will break silently when changes are made to 
this structure.

So, that is a NAK from me.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 08/15] x86/alternatives: Teach text_poke_bp() to emulate instructions

2019-06-07 Thread hpa

On June 7, 2019 11:10:19 AM PDT, Andy Lutomirski  wrote:
>
>
>> On Jun 7, 2019, at 10:34 AM, Peter Zijlstra 
>wrote:
>> 
>> On Sat, Jun 08, 2019 at 12:47:08AM +0900, Masami Hiramatsu wrote:
>> 
 This fits almost all text_poke_bp() users, except
 arch_unoptimize_kprobe() which restores random text, and for that
>site
 we have to build an explicit emulate instruction.
>>> 
>>> Hm, actually it doesn't restores randome text, since the first byte
>>> must always be int3. As the function name means, it just unoptimizes
>>> (jump based optprobe -> int3 based kprobe).
>>> Anyway, that is not an issue. With this patch, optprobe must still
>work.
>> 
>> I thought it basically restored 5 bytes of original text (with no
>> guarantee it is a single instruction, or even a complete
>instruction),
>> with the first byte replaced with INT3.
>> 
>
>I am surely missing some kprobe context, but is it really safe to use
>this mechanism to replace more than one instruction?

I don't see how it could be, except *perhaps* inside an NMI have, because you 
could have a preempted or interrupted now having an in-memory IP pointing 
inside the middle of the region you are intending to patch.


-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-22 Thread hpa

On May 17, 2019 7:16:04 PM PDT, Rob Landley  wrote:
>On 5/17/19 4:41 PM, H. Peter Anvin wrote:
>> On 5/17/19 1:18 PM, h...@zytor.com wrote:
>>>
>>> Ok... I just realized this does not work for a modular initramfs,
>composed at load time from multiple files, which is a very real
>problem. Should be easy enough to deal with: instead of one large file,
>use one companion file per source file, perhaps something like
>filename..xattrs (suggesting double dots to make it less likely to
>conflict with a "real" file.) No leading dot, as it makes it more
>likely that archivers will sort them before the file proper.
>>>
>>> A side benefit is that the format can be simpler as there is no need
>to encode the filename.
>>>
>>> A technically cleaner solution still, but which would need archiver
>modifications, would be to encode the xattrs as an optionally nameless
>file (just an empty string) with a new file mode value, immediately
>following the original file. The advantage there is that the archiver
>itself could support xattrs and other extended metadata (which has been
>requested elsewhere); the disadvantage obviously is that that it
>requires new support in the archiver. However, at least it ought to be
>simpler since it is still a higher protocol level than the cpio archive
>itself.
>>>
>>> There's already one special case in cpio, which is the
>"!!!TRAILER!!!" filename; although I don't think it is part of the
>formal spec, to the extent there is one, I would expect that in
>practice it is always encoded with a mode of 0, which incidentally
>could be used to unbreak the case where such a filename actually
>exists. So one way to support such extended metadata would be to set
>mode to 0 and use the filename to encode the type of metadata. I wonder
>how existing GNU or BSD cpio (the BSD one is better maintained these
>days) would deal with reading such a file; it would at least not be a
>regression if it just read it still, possibly with warnings. It could
>also be possible to use bits 17:16 in the mode, which are traditionally
>always zero (mode_t being 16 bits), but I believe are present in most
>or all of the cpio formats for historical reasons. It might be accepted
>better by existing implementations to use one of these high bits
>combined with S_IFREG, I dont know.
>>
>> 
>> Correction: it's just !!!TRAILER!!!.
>
>We documented it as "TRAILER!!!" without leading !!!, and that its
>purpose is to
>flush hardlinks:
>
>https://www.kernel.org/doc/Documentation/early-userspace/buffer-format.txt
>
>That's what toybox cpio has been producing. Kernel consumes it just
>fine. Just
>checked busybox cpio and that's what they're producing as well...
>
>Rob

Yes, TRAILER!!! is correct. Somehow I managed to get it wrong twice.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-22 Thread hpa

On May 20, 2019 2:39:46 AM PDT, Roberto Sassu  wrote:
>On 5/18/2019 12:17 AM, Arvind Sankar wrote:
>> On Fri, May 17, 2019 at 02:47:31PM -0700, H. Peter Anvin wrote:
>>> On 5/17/19 2:02 PM, Arvind Sankar wrote:
>>>> On Fri, May 17, 2019 at 01:18:11PM -0700, h...@zytor.com wrote:
>>>>>
>>>>> Ok... I just realized this does not work for a modular initramfs,
>composed at load time from multiple files, which is a very real
>problem. Should be easy enough to deal with: instead of one large file,
>use one companion file per source file, perhaps something like
>filename..xattrs (suggesting double dots to make it less likely to
>conflict with a "real" file.) No leading dot, as it makes it more
>likely that archivers will sort them before the file proper.
>>>> This version of the patch was changed from the previous one exactly
>to deal with this case --
>>>> it allows for the bootloader to load multiple initramfs archives,
>each
>>>> with its own .xattr-list file, and to have that work properly.
>>>> Could you elaborate on the issue that you see?
>>>>
>>>
>>> Well, for one thing, how do you define "cpio archive", each with its
>own
>>> .xattr-list file? Second, that would seem to depend on the ordering,
>no,
>>> in which case you depend critically on .xattr-list file following
>the
>>> files, which most archivers won't do.
>>>
>>> Either way it seems cleaner to have this per file; especially if/as
>it
>>> can be done without actually mucking up the format.
>>>
>>> I need to run, but I'll post a more detailed explanation of what I
>did
>>> in a little bit.
>>>
>>> -hpa
>>>
>> Not sure what you mean by how do I define it? Each cpio archive will
>> contain its own .xattr-list file with signatures for the files within
>> it, that was the idea.
>> 
>> You need to review the code more closely I think -- it does not
>depend
>> on the .xattr-list file following the files to which it applies.
>> 
>> The code first extracts .xattr-list as though it was a regular file.
>If
>> a later dupe shows up (presumably from a second archive, although the
>> patch will actually allow a second one in the same archive), it will
>> then process the existing .xattr-list file and apply the attributes
>> listed within it. It then will proceed to read the second one and
>> overwrite the first one with it (this is the normal behaviour in the
>> kernel cpio parser). At the end once all the archives have been
>> extracted, if there is an .xattr-list file in the rootfs it will be
>> parsed (it would've been the last one encountered, which hasn't been
>> parsed yet, just extracted).
>> 
>> Regarding the idea to use the high 16 bits of the mode field in
>> the header that's another possibility. It would just require
>additional
>> support in the program that actually creates the archive though,
>which
>> the current patch doesn't.
>
>Yes, for adding signatures for a subset of files, no changes to the ram
>disk generator are necessary. Everything is done by a custom module. To
>support a generic use case, it would be necessary to modify the
>generator to execute getfattr and the awk script after files have been
>placed in the temporary directory.
>
>If I understood the new proposal correctly, it would be task for cpio
>to
>read file metadata after the content and create a new record for each
>file with mode 0x18000, type of metadata encoded in the file name and
>metadata as file content. I don't know how easy it would be to modify
>cpio. Probably the amount of changes would be reasonable.
>
>The kernel will behave in a similar way. It will call do_readxattrs()
>in
>do_copy() for each file. Since the only difference between the current
>and the new proposal would be two additional calls to do_readxattrs()
>in
>do_name() and unpack_to_rootfs(), maybe we could support both.
>
>Roberto

The nice thing with explicit metadata is that it doesn't have to contain the 
filename per se, and each file is self-contained. There is a reason why each 
cpio header starts with the magic number: each cpio record is formally 
independent and can be processed in isolation.  The TRAILER!!! thing is a huge 
wart in the format, although in practice TRAILER!!! always has a mode of 0 and 
so can be distinguished from an actual file.

The use of mode 0x18000 for metadata allows for optional backwards 
compatibility for extraction; for encoding this can be handled with very simple 
postprocessing.

So my suggestion would be to have mode 0x18000 indicate extended fi

Re: [PATCH v7 03/12] x86: Add macro to get symbol address for PIE support

2019-05-20 Thread hpa

On May 20, 2019 4:19:28 PM PDT, Thomas Garnier  wrote:
>From: Thomas Garnier 
>
>Add a new _ASM_MOVABS macro to fetch a symbol address. It will be used
>to replace "_ASM_MOV $, %dst" code construct that are not
>compatible with PIE.
>
>Signed-off-by: Thomas Garnier 
>---
> arch/x86/include/asm/asm.h | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/arch/x86/include/asm/asm.h b/arch/x86/include/asm/asm.h
>index 3ff577c0b102..3a686057e882 100644
>--- a/arch/x86/include/asm/asm.h
>+++ b/arch/x86/include/asm/asm.h
>@@ -30,6 +30,7 @@
> #define _ASM_ALIGN__ASM_SEL(.balign 4, .balign 8)
> 
> #define _ASM_MOV  __ASM_SIZE(mov)
>+#define _ASM_MOVABS   __ASM_SEL(movl, movabsq)
> #define _ASM_INC  __ASM_SIZE(inc)
> #define _ASM_DEC  __ASM_SIZE(dec)
> #define _ASM_ADD  __ASM_SIZE(add)

This is just about *always* wrong on x86-86. We should be using leaq 
sym(%rip),%reg. If it isn't reachable by leaq, then it is a non-PIE symbol like 
percpu. You do have to keep those distinct!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v3 2/2] initramfs: introduce do_readxattrs()

2019-05-17 Thread hpa

On May 17, 2019 9:55:19 AM PDT, Roberto Sassu  wrote:
>This patch adds support for an alternative method to add xattrs to
>files in
>the rootfs filesystem. Instead of extracting them directly from the ram
>disk image, they are extracted from a regular file called .xattr-list,
>that
>can be added by any ram disk generator available today. The file format
>is:
>
>\0
>\0
>
>.xattr-list can be generated by executing:
>
>$ getfattr --absolute-names -d -h -R -e hex -m - \
>   | xattr.awk -b > ${initdir}/.xattr-list
>
>where the content of the xattr.awk script is:
>
>#! /usr/bin/awk -f
>{
>  if (!length($0)) {
>printf("%.10x%s\0", len, file);
>for (x in xattr) {
>  printf("%.8x%s\0", xattr_len[x], x);
>  for (i = 0; i < length(xattr[x]) / 2; i++) {
>printf("%c", strtonum("0x"substr(xattr[x], i * 2 + 1, 2)));
>  }
>}
>i = 0;
>delete xattr;
>delete xattr_len;
>next;
>  };
>  if (i == 0) {
>file=$3;
>len=length(file) + 8 + 1;
>  }
>  if (i > 0) {
>split($0, a, "=");
>xattr[a[1]]=substr(a[2], 3);
>xattr_len[a[1]]=length(a[1]) + 1 + 8 + length(xattr[a[1]]) / 2;
>len+=xattr_len[a[1]];
>  };
>  i++;
>}
>
>Signed-off-by: Roberto Sassu 
>---
> init/initramfs.c | 99 
> 1 file changed, 99 insertions(+)
>
>diff --git a/init/initramfs.c b/init/initramfs.c
>index 0c6dd1d5d3f6..6ec018c6279a 100644
>--- a/init/initramfs.c
>+++ b/init/initramfs.c
>@@ -13,6 +13,8 @@
> #include 
> #include 
> 
>+#define XATTR_LIST_FILENAME ".xattr-list"
>+
> static ssize_t __init xwrite(int fd, const char *p, size_t count)
> {
>   ssize_t out = 0;
>@@ -382,6 +384,97 @@ static int __init __maybe_unused do_setxattrs(char
>*pathname)
>   return 0;
> }
> 
>+struct path_hdr {
>+  char p_size[10]; /* total size including p_size field */
>+  char p_data[];   /* \0 */
>+};
>+
>+static int __init do_readxattrs(void)
>+{
>+  struct path_hdr hdr;
>+  char *path = NULL;
>+  char str[sizeof(hdr.p_size) + 1];
>+  unsigned long file_entry_size;
>+  size_t size, path_size, total_size;
>+  struct kstat st;
>+  struct file *file;
>+  loff_t pos;
>+  int ret;
>+
>+  ret = vfs_lstat(XATTR_LIST_FILENAME, );
>+  if (ret < 0)
>+  return ret;
>+
>+  total_size = st.size;
>+
>+  file = filp_open(XATTR_LIST_FILENAME, O_RDONLY, 0);
>+  if (IS_ERR(file))
>+  return PTR_ERR(file);
>+
>+  pos = file->f_pos;
>+
>+  while (total_size) {
>+  size = kernel_read(file, (char *), sizeof(hdr), );
>+  if (size != sizeof(hdr)) {
>+  ret = -EIO;
>+  goto out;
>+  }
>+
>+  total_size -= size;
>+
>+  str[sizeof(hdr.p_size)] = 0;
>+  memcpy(str, hdr.p_size, sizeof(hdr.p_size));
>+  ret = kstrtoul(str, 16, _entry_size);
>+  if (ret < 0)
>+  goto out;
>+
>+  file_entry_size -= sizeof(sizeof(hdr.p_size));
>+  if (file_entry_size > total_size) {
>+  ret = -EINVAL;
>+  goto out;
>+  }
>+
>+  path = vmalloc(file_entry_size);
>+  if (!path) {
>+  ret = -ENOMEM;
>+  goto out;
>+  }
>+
>+  size = kernel_read(file, path, file_entry_size, );
>+  if (size != file_entry_size) {
>+  ret = -EIO;
>+  goto out_free;
>+  }
>+
>+  total_size -= size;
>+
>+  path_size = strnlen(path, file_entry_size);
>+  if (path_size == file_entry_size) {
>+  ret = -EINVAL;
>+  goto out_free;
>+  }
>+
>+  xattr_buf = path + path_size + 1;
>+  xattr_len = file_entry_size - path_size - 1;
>+
>+  ret = do_setxattrs(path);
>+  vfree(path);
>+  path = NULL;
>+
>+  if (ret < 0)
>+  break;
>+  }
>+out_free:
>+  vfree(path);
>+out:
>+  fput(file);
>+
>+  if (ret < 0)
>+  error("Unable to parse xattrs");
>+
>+  return ret;
>+}
>+
> static __initdata int wfd;
> 
> static int __init do_name(void)
>@@ -391,6 +484,11 @@ static int __init do_name(void)
>   if (strcmp(collected, "TRAILER!!!") == 0) {
>   free_hash();
>   return 0;
>+  } else if (strcmp(collected, XATTR_LIST_FILENAME) == 0) {
>+  struct kstat st;
>+
>+  if (!vfs_lstat(collected, ))
>+  do_readxattrs();
>   }
>   clean_path(collected, mode);
>   if (S_ISREG(mode)) {
>@@ -562,6 +660,7 @@ static char * __init unpack_to_rootfs(char *buf,
>unsigned long len)
>   buf += my_inptr;
>   len -= my_inptr;
>   }
>+  do_readxattrs();
>   dir_utime();
>

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-12 Thread hpa

On May 12, 2019 8:31:05 AM PDT, Dominik Brodowski  
wrote:
>On Sun, May 12, 2019 at 03:18:16AM -0700, h...@zytor.com wrote:
>> > Couldn't this parsing of the .xattr-list file and the setting of
>the xattrs
>> > be done equivalently by the initramfs' /init? Why is kernel
>involvement
>> > actually required here?
>> 
>> There are a lot of things that could/should be done that way...
>
>Indeed... so why not try to avoid adding more such "things", and
>keeping
>them in userspace (or in a fork_usermode_blob)?
>
>
>On Sun, May 12, 2019 at 08:52:47AM -0400, Mimi Zohar wrote:
>> It's too late.  The /init itself should be signed and verified.
>
>Could you elaborate a bit more about the threat model, and why
>deferring
>this to the initramfs is too late?
>
>Thanks,
>   Dominik

I tried over 10 years ago to make exactly that happen... it was called the 
klibc project. Linus turned it down because he felt that it didn't provide 
enough immediate benefit to justify the complexity, which of course creates the 
thousand-cuts problem: there will never be *one single* event that *by itself* 
justifies the transition.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-12 Thread hpa

On May 12, 2019 5:02:30 PM PDT, Mimi Zohar  wrote:
>On Sun, 2019-05-12 at 17:31 +0200, Dominik Brodowski wrote:
>> On Sun, May 12, 2019 at 08:52:47AM -0400, Mimi Zohar wrote:
>
>
>> > It's too late.  The /init itself should be signed and verified.
>> 
>> Could you elaborate a bit more about the threat model, and why
>deferring
>> this to the initramfs is too late?
>
>The IMA policy defines a number of different methods of identifying
>which files to measure, appraise, audit.[1]  Without xattrs, the
>granularity of the policy rules is severely limited.  Without xattrs,
>a filesystem is either in policy, or not.
>
>With an IMA policy rule requiring rootfs (tmpfs) files to be verified,
>then /init needs to be properly labeled, otherwise /init will fail to
>execute.
>
>Mimi
>
>[1] Documentation/ABI/testing/ima_policy

And the question is what is the sense in that, especially if /init is provided 
as play of the kernel itself.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 0/3] initramfs: add support for xattrs in the initial ram disk

2019-05-12 Thread hpa

On May 12, 2019 2:17:48 AM PDT, Dominik Brodowski  
wrote:
>On Thu, May 09, 2019 at 01:24:17PM +0200, Roberto Sassu wrote:
>> This proposal consists in marshaling pathnames and xattrs in a file
>called
>> .xattr-list. They are unmarshaled by the CPIO parser after all files
>have
>> been extracted.
>
>Couldn't this parsing of the .xattr-list file and the setting of the
>xattrs
>be done equivalently by the initramfs' /init? Why is kernel involvement
>actually required here?
>
>Thanks,
>   Dominik

There are a lot of things that could/should be done that way...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [GIT PULL] objtool changes for v5.2: Add build-time uaccess permissions and DF validation

2019-05-06 Thread hpa

On May 6, 2019 12:20:12 AM PDT, Ingo Molnar  wrote:
>Linus,
>
>Please pull the latest core-objtool-for-linus git tree from:
>
>git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>core-objtool-for-linus
>
># HEAD: 29da93fea3ea39ab9b12270cc6be1b70ef201c9e mm/uaccess: Use
>'unsigned long' to placate UBSAN warnings on older GCC versions
>
>This is a series from Peter Zijlstra that adds x86 build-time uaccess 
>validation of SMAP to objtool, which will detect and warn about the 
>following uaccess API usage bugs and weirdnesses:
>
>   call to %s() with UACCESS enabled
>   return with UACCESS enabled
>   return with UACCESS disabled from a UACCESS-safe function
>   recursive UACCESS enable
>   redundant UACCESS disable
>   UACCESS-safe disables UACCESS
>
>As it turns out not leaking uaccess permissions outside the intended 
>uaccess functionality is hard when the interfaces are complex and when 
>such bugs are mostly dormant.
>
>As a bonus we now also check the DF flag. We had at least one 
>high-profile bug in that area in the early days of Linux, and the 
>checking is fairly simple. The checks performed and warnings emitted
>are:
>
>   call to %s() with DF set
>   return with DF set
>   return with modified stack frame
>   recursive STD
>   redundant CLD
>
>It's all x86-only for now, but later on this can also be used for PAN
>on 
>ARM and objtool is fairly cross-platform in principle.
>
>While all warnings emitted by this new checking facility that got 
>reported to us were fixed, there might be GCC version dependent
>warnings 
>that were not reported yet - which we'll address, should they trigger.
>
>The warnings are non-fatal build warnings.
>
> Thanks,
>
>   Ingo
>
>-->
>Josh Poimboeuf (1):
>  tracing: Improve "if" macro code generation
>
>Peter Zijlstra (26):
>  sched/x86: Save [ER]FLAGS on context switch
>  x86/ia32: Fix ia32_restore_sigcontext() AC leak
>  i915, uaccess: Fix redundant CLAC
>  x86/uaccess: Move copy_user_handle_tail() into asm
>  x86/uaccess: Fix up the fixup
>  x86/nospec, objtool: Introduce ANNOTATE_IGNORE_ALTERNATIVE
>  x86/uaccess, xen: Suppress SMAP warnings
>  x86/uaccess: Always inline user_access_begin()
>  x86/uaccess, signal: Fix AC=1 bloat
>  x86/uaccess: Introduce user_access_{save,restore}()
>  x86/smap: Ditch __stringify()
>  x86/uaccess, kasan: Fix KASAN vs SMAP
>  x86/uaccess, ubsan: Fix UBSAN vs. SMAP
>  x86/uaccess, ftrace: Fix ftrace_likely_update() vs. SMAP
>  x86/uaccess, kcov: Disable stack protector
>  objtool: Set insn->func for alternatives
>  objtool: Handle function aliases
>  objtool: Rewrite add_ignores()
>  objtool: Add --backtrace support
>  objtool: Rewrite alt->skip_orig
>  objtool: Fix sibling call detection
>  objtool: Add UACCESS validation
>  objtool: Add Direction Flag validation
>  sched/x86_64: Don't save flags on context switch
>x86/uaccess: Dont leak the AC flag into __put_user() argument
>evaluation
>mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC
>versions
>
>
> arch/x86/entry/entry_32.S  |   2 +
> arch/x86/ia32/ia32_signal.c|  29 ++-
> arch/x86/include/asm/alternative-asm.h |  11 +
> arch/x86/include/asm/alternative.h |  10 +
> arch/x86/include/asm/asm.h |  24 --
> arch/x86/include/asm/nospec-branch.h   |  28 +-
> arch/x86/include/asm/smap.h|  37 ++-
> arch/x86/include/asm/switch_to.h   |   1 +
> arch/x86/include/asm/uaccess.h |  12 +-
> arch/x86/include/asm/uaccess_64.h  |   3 -
> arch/x86/include/asm/xen/hypercall.h   |  24 +-
> arch/x86/kernel/process_32.c   |   7 +
> arch/x86/kernel/process_64.c   |   1 +
> arch/x86/kernel/signal.c   |  29 ++-
> arch/x86/lib/copy_user_64.S|  48 
> arch/x86/lib/memcpy_64.S   |   3 +-
> arch/x86/lib/usercopy_64.c |  20 --
> drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 +-
> include/linux/compiler.h   |   2 +-
> include/linux/uaccess.h|   2 +
> kernel/Makefile|   1 +
> kernel/trace/trace_branch.c|   4 +
> lib/Makefile   |   1 +
> lib/strncpy_from_user.c|   5 +-
> lib/strnlen_user.c |   4 +-
> lib/ubsan.c|   4 +
> mm/kasan/Makefile  |   3 +
> mm/kasan/common.c  |  10 +
> mm/kasan/report.c  |   3 +-
> scripts/Makefile.build |   3 +
> tools/objtool/arch.h   |   8 +-
> tools/objtool/arch/x86/decode.c|  21 +-
> tools/objtool/builtin-check.c  |   4 +-
> tools/objtool/builtin.h|   2 +-

Re: [PATCH] x86_64: uninline TASK_SIZE

2019-04-21 Thread hpa

On April 21, 2019 11:28:42 AM PDT, Ingo Molnar  wrote:
>
>* Alexey Dobriyan  wrote:
>
>> TASK_SIZE macro is quite deceptive: it looks like a constant but in
>fact
>> compiles to 50+ bytes.
>> 
>> Space savings on x86_64 defconfig:
>> 
>> add/remove: 1/0 grow/shrink: 3/24 up/down: 77/-2247 (-2170)
>> Function old new   delta
>> _task_size -  52 +52
>> mpol_shared_policy_init  344 363 +19
>> shmem_get_unmapped_area   92  97  +5
>> __rseq_handle_notify_resume.cold  34  35  +1
>> copy_from_user_nmi   123 113 -10
>> mmap_address_hint_valid   92  56 -36
>> arch_get_unmapped_area_topdown   471 435 -36
>> tlb_gather_mmu   164 126 -38
>> hugetlb_get_unmapped_area774 736 -38
>> __create_xol_area497 458 -39
>> arch_tlb_gather_mmu  160 120 -40
>> setup_new_exec   380 336 -44
>> __x64_sys_mlockall   378 333 -45
>> __ia32_sys_mlockall  378 333 -45
>> tlb_flush_mmu235 189 -46
>> unmap_page_range20982048 -50
>> copy_mount_options   518 465 -53
>> __get_user_pages17371675 -62
>> get_unmapped_area270 204 -66
>> perf_prepare_sample 11761098 -78
>> perf_callchain_user  549 469 -80
>> mremap_to.isra   545 457 -88
>> arch_tlb_finish_mmu  394 305 -89
>> __do_munmap 1039 927-112
>> elf_map  527 409-118
>> prctl_set_mm15091335-174
>> __rseq_handle_notify_resume 1116 906-210
>> load_elf_binary11761   1-650
>> Total: Before=14121337, After=14119167, chg -0.02%
>> 
>> Signed-off-by: Alexey Dobriyan 
>> ---
>> 
>>  arch/x86/include/asm/processor.h |4 ++--
>>  arch/x86/kernel/Makefile |1 +
>>  arch/x86/kernel/task_size_64.c   |9 +
>>  3 files changed, 12 insertions(+), 2 deletions(-)
>> 
>> --- a/arch/x86/include/asm/processor.h
>> +++ b/arch/x86/include/asm/processor.h
>> @@ -887,8 +887,8 @@ static inline void spin_lock_prefetch(const void
>*x)
>>  
>>  #define TASK_SIZE_LOW   (test_thread_flag(TIF_ADDR32) ? \
>>  IA32_PAGE_OFFSET : DEFAULT_MAP_WINDOW)
>> -#define TASK_SIZE   (test_thread_flag(TIF_ADDR32) ? \
>> -IA32_PAGE_OFFSET : TASK_SIZE_MAX)
>> +unsigned long _task_size(void);
>> +#define TASK_SIZE   _task_size()
>>  #define TASK_SIZE_OF(child) ((test_tsk_thread_flag(child,
>TIF_ADDR32)) ? \
>>  IA32_PAGE_OFFSET : TASK_SIZE_MAX)
>>  
>> --- a/arch/x86/kernel/Makefile
>> +++ b/arch/x86/kernel/Makefile
>> @@ -46,6 +46,7 @@ CFLAGS_irq.o := -I$(src)/../include/asm/trace
>>  
>>  obj-y   := process_$(BITS).o signal.o
>>  obj-$(CONFIG_COMPAT)+= signal_compat.o
>> +obj-$(CONFIG_X86_64)+= task_size_64.o
>>  obj-y   += traps.o idt.o irq.o irq_$(BITS).o 
>> dumpstack_$(BITS).o
>>  obj-y   += time.o ioport.o dumpstack.o nmi.o
>>  obj-$(CONFIG_MODIFY_LDT_SYSCALL)+= ldt.o
>> new file mode 100644
>> --- /dev/null
>> +++ b/arch/x86/kernel/task_size_64.c
>> @@ -0,0 +1,9 @@
>> +#include 
>> +#include 
>> +#include 
>> +
>> +unsigned long _task_size(void)
>> +{
>> +return test_thread_flag(TIF_ADDR32) ? IA32_PAGE_OFFSET :
>TASK_SIZE_MAX;
>> +}
>> +EXPORT_SYMBOL(_task_size);
>
>Good idea - but instead of adding yet another compilation unit, why not
>
>stick _task_size() into arch/x86/kernel/process_64.c, which is the 
>canonical place for process management related arch functions?
>
>Thanks,
>
>   Ingo

Better yet... since TIF_ADDR32 isn't something that changes randomly, perhaps 
this should be a separate variable?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Potentially missing "memory" clobbers in bitops.h for x86

2019-03-29 Thread hpa

On March 29, 2019 3:05:54 PM PDT, "Paul E. McKenney"  
wrote:
>On Fri, Mar 29, 2019 at 02:51:26PM -0700, H. Peter Anvin wrote:
>> On 3/29/19 2:09 PM, Paul E. McKenney wrote:
>> >>
>> >> Note: the atomic versions of these functions obviously need to
>have
>> >> "volatile" and the clobber anyway, as they are by definition
>barriers
>> >> and moving memory operations around them would be a very serious
>error.
>> > 
>> > The atomic functions that return void don't need to order anything
>except
>> > the input and output arguments.  The oddness with clear_bit() is
>that the
>> > memory changed isn't necessarily the quantity referenced by the
>argument,
>> > if the number of bits specified is large.
>> > 
>> > So (for example) atomic_inc() does not need a "memory" clobber,
>right?
>> 
>> I don't believe that is true: the code calling it has a reasonable
>> expectation that previous memory operations have finished and later
>> memory operations have not started from the point of view of another
>> processor. You are more of an expert on memory ordering than I am,
>but
>> I'm 89% sure that there is plenty of code in the kernel which makes
>that
>> assumption.
>
>From Documentation/core-api/atomic_ops.rst:
>
>
>   void atomic_add(int i, atomic_t *v);
>   void atomic_sub(int i, atomic_t *v);
>   void atomic_inc(atomic_t *v);
>   void atomic_dec(atomic_t *v);
>
>These four routines add and subtract integral values to/from the given
>atomic_t value.  The first two routines pass explicit integers by
>which to make the adjustment, whereas the latter two use an implicit
>adjustment value of "1".
>
>One very important aspect of these two routines is that they DO NOT
>require any explicit memory barriers.  They need only perform the
>atomic_t counter update in an SMP safe manner.
>
>
>So, no, these functions do not imply any ordering other than to the
>variable modified.  This one predates my joining the Linux kernel
>community.  ;-)  So any cases where someone is relying on atomic_inc()
>to provide ordering are bugs.
>
>Now for value-returning atomics, for example, atomic_inc_return(),
>full ordering is indeed required.
>
>   Thanx, Paul

Ok.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC][PATCH] tracing/x86: Save CR2 before tracing irqsoff on error_entry

2019-03-21 Thread hpa

On March 21, 2019 11:18:53 AM PDT, Linus Torvalds 
 wrote:
>On Thu, Mar 21, 2019 at 11:05 AM Andy Lutomirski 
>wrote:
>>
>> In the long run, I think the right solution is to rewrite even more
>of
>> this mess in C.  We really ought to be able to put the IRQ flag
>> tracing and the context tracking into C code.
>
>Tangentially about this long run thing - can we start discussing just
>dropping PV mode entirely in the long run? Or even not-so-long run?
>
>What are the pressing advantages of paravirt these days? Because I can
>point to a lot of pressing _disadvantages_, and not all of them are
>about more complex code as shown by this patch...
>
> Linus

God only knows how nice that would be (for things that aren't naturally 
driverized.) Last I heard, which was quite a while ago, the big problem was 
that the Amazon cloud runs Xen in PV mode...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 01/25] x86: Make SMAP 64-bit only

2019-03-21 Thread hpa

On March 21, 2019 10:25:05 AM PDT, Denys Vlasenko  wrote:
>On 3/18/19 7:10 PM, Linus Torvalds wrote:
>> On Mon, Mar 18, 2019 at 10:51 AM Peter Zijlstra
> wrote:
>>>
>>> How about I do a patch that schedules EFLAGS for both 32bit and
>64bit,
>>> mark this for backporting to infinity.
>>>
>>> And then at the end, after the objtool-ac bits land, I do a patch
>>> removing the EFLAGS scheduling for x86_64.
>> 
>> Sounds sane to me.
>> 
>> And we can make it AC-conditional if it's actually shown to be
>visible
>> from a performance standpoint.
>> 
>> But iirc pushf/popf isn't really that expensive - in fact I think
>it's
>> pretty cheap when system flags don't change.
>
>I did not see evidence of this. In my testing,
>POPF is always ~20 cycles, even if popped flags are identical to
>current
>state of flags.

I think you will find that if you change system flags it is much slower.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 01/25] x86: Make SMAP 64-bit only

2019-03-21 Thread hpa

On March 18, 2019 11:10:22 AM PDT, Linus Torvalds 
 wrote:
>On Mon, Mar 18, 2019 at 10:51 AM Peter Zijlstra 
>wrote:
>>
>> How about I do a patch that schedules EFLAGS for both 32bit and
>64bit,
>> mark this for backporting to infinity.
>>
>> And then at the end, after the objtool-ac bits land, I do a patch
>> removing the EFLAGS scheduling for x86_64.
>
>Sounds sane to me.
>
>And we can make it AC-conditional if it's actually shown to be visible
>from a performance standpoint.
>
>But iirc pushf/popf isn't really that expensive - in fact I think it's
>pretty cheap when system flags don't change. Which would be the common
>case unless you'd also make the popf do the irq restore part and
>simply make finish_lock_switch() re-enable irq's by doing an
>irqrestore?
>
>I think popf is like 20 cycles or something (and pushf is just single
>cycles). Probably not worth worrying about in the task switch context.
>
> Linus

Yes, the problem isn't scheduling per se but the risk of hiding problems.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-18 Thread hpa

On March 18, 2019 4:52:19 PM PDT, Matthias Kaehlcke  wrote:
>On Mon, Mar 18, 2019 at 04:44:03PM -0700, h...@zytor.com wrote:
>> On March 18, 2019 3:16:39 PM PDT, Matthias Kaehlcke
> wrote:
>> >On Mon, Mar 18, 2019 at 02:50:44PM -0700, h...@zytor.com wrote:
>> >> On March 18, 2019 2:31:13 PM PDT, Matthias Kaehlcke
>> > wrote:
>> >> >On Fri, Mar 15, 2019 at 01:54:50PM -0700, Matthias Kaehlcke
>wrote:
>> >> >> The compiler may emit calls to __lshrti3 from the compiler
>runtime
>> >> >> library, which results in undefined references:
>> >> >> 
>> >> >> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>> >> >>   include/linux/math64.h:186: undefined reference to
>`__lshrti3'
>> >> >> 
>> >> >> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> >> >> 
>> >> >> Include the function for x86 builds with clang, which is the
>> >> >> environment where the above error was observed.
>> >> >> 
>> >> >> Signed-off-by: Matthias Kaehlcke 
>> >> >
>> >> >With "Revert "kbuild: use -Oz instead of -Os when using clang"
>> >> >(https://lore.kernel.org/patchwork/patch/1051932/) the above
>> >> >error is fixed, a few comments inline for if the patch is
>> >> >resurrected in the future because __lshrti3 is emitted in a
>> >> >different context.
>> >> >
>> >> >> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> >> >> index 32e1e0f4b2d0..a71036471838 100644
>> >> >> --- a/include/linux/libgcc.h
>> >> >> +++ b/include/linux/libgcc.h
>> >> >> @@ -22,15 +22,26 @@
>> >> >>  #include 
>> >> >> 
>> >> >>  typedef int word_type __attribute__ ((mode (__word__)));
>> >> >> +typedef int TItype __attribute__ ((mode (TI)));
>> >> >
>> >> >Consider using __int128 instead. Definition and use need a
>> >> >'defined(__SIZEOF_INT128__)' guard  (similar for mode (TI)),
>since
>> >> >these 128 bit types aren't supported on all platforms.
>> >> >
>> >> >>  #ifdef __BIG_ENDIAN
>> >> >>  struct DWstruct {
>> >> >>int high, low;
>> >> >>  };
>> >> >> +
>> >> >> +struct DWstruct128 {
>> >> >> +  long long high, low;
>> >> >> +};
>> >> >
>> >> >This struct isn't needed, struct DWstruct can be used.
>> >> >
>> >> >> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> >> >> new file mode 100644
>> >> >> index ..2d2123bb3030
>> >> >> --- /dev/null
>> >> >> +++ b/lib/lshrti3.c
>> >> >> @@ -0,0 +1,31 @@
>> >> >> +// SPDX-License-Identifier: GPL-2.0
>> >> >> +
>> >> >> +#include 
>> >> >> +#include 
>> >> >> +
>> >> >> +long long __lshrti3(long long u, word_type b)
>> >> >
>> >> >use TItype for input/output, which is what gcc does, though the
>> >above
>> >> >matches the interface in the documentation.
>> >> >
>> >> >> +{
>> >> >> +  DWunion128 uu, w;
>> >> >> +  word_type bm;
>> >> >> +
>> >> >> +  if (b == 0)
>> >> >> +  return u;
>> >> >> +
>> >> >> +  uu.ll = u;
>> >> >> +  bm = 64 - b;
>> >> >> +
>> >> >> +  if (bm <= 0) {
>> >> >> +  w.s.high = 0;
>> >> >> +  w.s.low = (unsigned long long) uu.s.high >> -bm;
>> >> >
>> >> >include  and use u64 instead of unsigned long
>long.
>> >> 
>> >> Ok, now I'm really puzzled.
>> >> 
>> >> How could we need a 128-bit shift when the prototype only has 64
>bits
>> >of input?!
>> >
>> >Good question, this is the code from libgcc:
>> >
>> >TItype
>> >__lshrti3 (TItype u, shift_count_type b)
>> >{
>> >  if (b == 0)
>> >return u;
>> >
>> >  const DWunion uu = {.ll = u};
>> >  const shift_count_type bm = (8 * (8)) - b;
>> >  DWunion w;
>> >
>> >  if (bm <= 0)
>> >{
>> >  w.s.high = 0;
>> >  w.s.low = (UDItype) uu.s.high >> -bm;
>> >}
>> >  else
>> >{
>> >  const UDItype carries = (UDItype) uu.s.high << bm;
>> >
>> >  w.s.high = (UDItype) uu.s.high >> b;
>> >  w.s.low = ((UDItype) uu.s.low >> b) | carries;
>> >}
>> >
>> >  return w.ll;
>> >}
>> >
>> >
>> >My compiler knowledge is limited, my guess is that the function is a
>> >generic implementation, and while a long long is 64-bit wide under
>> >Linux it could be 128-bit on other platforms.
>> 
>> Yes, long long is just plain wrong.
>> 
>> How could we end up calling this function on 32 bits?!
>
>We didn't, in this case the function is called in 64-bit code
>(arch/x86/kvm/x86.o: In function `mul_u64_u64_shr'), for the 32-bit
>vDSO it was __lshrdi3.

Again, for 64 bits we can include libgcc in the link (with --no-whole-archive).
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-18 Thread hpa

On March 18, 2019 4:52:19 PM PDT, Matthias Kaehlcke  wrote:
>On Mon, Mar 18, 2019 at 04:44:03PM -0700, h...@zytor.com wrote:
>> On March 18, 2019 3:16:39 PM PDT, Matthias Kaehlcke
> wrote:
>> >On Mon, Mar 18, 2019 at 02:50:44PM -0700, h...@zytor.com wrote:
>> >> On March 18, 2019 2:31:13 PM PDT, Matthias Kaehlcke
>> > wrote:
>> >> >On Fri, Mar 15, 2019 at 01:54:50PM -0700, Matthias Kaehlcke
>wrote:
>> >> >> The compiler may emit calls to __lshrti3 from the compiler
>runtime
>> >> >> library, which results in undefined references:
>> >> >> 
>> >> >> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>> >> >>   include/linux/math64.h:186: undefined reference to
>`__lshrti3'
>> >> >> 
>> >> >> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> >> >> 
>> >> >> Include the function for x86 builds with clang, which is the
>> >> >> environment where the above error was observed.
>> >> >> 
>> >> >> Signed-off-by: Matthias Kaehlcke 
>> >> >
>> >> >With "Revert "kbuild: use -Oz instead of -Os when using clang"
>> >> >(https://lore.kernel.org/patchwork/patch/1051932/) the above
>> >> >error is fixed, a few comments inline for if the patch is
>> >> >resurrected in the future because __lshrti3 is emitted in a
>> >> >different context.
>> >> >
>> >> >> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> >> >> index 32e1e0f4b2d0..a71036471838 100644
>> >> >> --- a/include/linux/libgcc.h
>> >> >> +++ b/include/linux/libgcc.h
>> >> >> @@ -22,15 +22,26 @@
>> >> >>  #include 
>> >> >> 
>> >> >>  typedef int word_type __attribute__ ((mode (__word__)));
>> >> >> +typedef int TItype __attribute__ ((mode (TI)));
>> >> >
>> >> >Consider using __int128 instead. Definition and use need a
>> >> >'defined(__SIZEOF_INT128__)' guard  (similar for mode (TI)),
>since
>> >> >these 128 bit types aren't supported on all platforms.
>> >> >
>> >> >>  #ifdef __BIG_ENDIAN
>> >> >>  struct DWstruct {
>> >> >>int high, low;
>> >> >>  };
>> >> >> +
>> >> >> +struct DWstruct128 {
>> >> >> +  long long high, low;
>> >> >> +};
>> >> >
>> >> >This struct isn't needed, struct DWstruct can be used.
>> >> >
>> >> >> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> >> >> new file mode 100644
>> >> >> index ..2d2123bb3030
>> >> >> --- /dev/null
>> >> >> +++ b/lib/lshrti3.c
>> >> >> @@ -0,0 +1,31 @@
>> >> >> +// SPDX-License-Identifier: GPL-2.0
>> >> >> +
>> >> >> +#include 
>> >> >> +#include 
>> >> >> +
>> >> >> +long long __lshrti3(long long u, word_type b)
>> >> >
>> >> >use TItype for input/output, which is what gcc does, though the
>> >above
>> >> >matches the interface in the documentation.
>> >> >
>> >> >> +{
>> >> >> +  DWunion128 uu, w;
>> >> >> +  word_type bm;
>> >> >> +
>> >> >> +  if (b == 0)
>> >> >> +  return u;
>> >> >> +
>> >> >> +  uu.ll = u;
>> >> >> +  bm = 64 - b;
>> >> >> +
>> >> >> +  if (bm <= 0) {
>> >> >> +  w.s.high = 0;
>> >> >> +  w.s.low = (unsigned long long) uu.s.high >> -bm;
>> >> >
>> >> >include  and use u64 instead of unsigned long
>long.
>> >> 
>> >> Ok, now I'm really puzzled.
>> >> 
>> >> How could we need a 128-bit shift when the prototype only has 64
>bits
>> >of input?!
>> >
>> >Good question, this is the code from libgcc:
>> >
>> >TItype
>> >__lshrti3 (TItype u, shift_count_type b)
>> >{
>> >  if (b == 0)
>> >return u;
>> >
>> >  const DWunion uu = {.ll = u};
>> >  const shift_count_type bm = (8 * (8)) - b;
>> >  DWunion w;
>> >
>> >  if (bm <= 0)
>> >{
>> >  w.s.high = 0;
>> >  w.s.low = (UDItype) uu.s.high >> -bm;
>> >}
>> >  else
>> >{
>> >  const UDItype carries = (UDItype) uu.s.high << bm;
>> >
>> >  w.s.high = (UDItype) uu.s.high >> b;
>> >  w.s.low = ((UDItype) uu.s.low >> b) | carries;
>> >}
>> >
>> >  return w.ll;
>> >}
>> >
>> >
>> >My compiler knowledge is limited, my guess is that the function is a
>> >generic implementation, and while a long long is 64-bit wide under
>> >Linux it could be 128-bit on other platforms.
>> 
>> Yes, long long is just plain wrong.
>> 
>> How could we end up calling this function on 32 bits?!
>
>We didn't, in this case the function is called in 64-bit code
>(arch/x86/kvm/x86.o: In function `mul_u64_u64_shr'), for the 32-bit
>vDSO it was __lshrdi3.

That makes more sense.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-18 Thread hpa

On March 18, 2019 3:16:39 PM PDT, Matthias Kaehlcke  wrote:
>On Mon, Mar 18, 2019 at 02:50:44PM -0700, h...@zytor.com wrote:
>> On March 18, 2019 2:31:13 PM PDT, Matthias Kaehlcke
> wrote:
>> >On Fri, Mar 15, 2019 at 01:54:50PM -0700, Matthias Kaehlcke wrote:
>> >> The compiler may emit calls to __lshrti3 from the compiler runtime
>> >> library, which results in undefined references:
>> >> 
>> >> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>> >>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>> >> 
>> >> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> >> 
>> >> Include the function for x86 builds with clang, which is the
>> >> environment where the above error was observed.
>> >> 
>> >> Signed-off-by: Matthias Kaehlcke 
>> >
>> >With "Revert "kbuild: use -Oz instead of -Os when using clang"
>> >(https://lore.kernel.org/patchwork/patch/1051932/) the above
>> >error is fixed, a few comments inline for if the patch is
>> >resurrected in the future because __lshrti3 is emitted in a
>> >different context.
>> >
>> >> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> >> index 32e1e0f4b2d0..a71036471838 100644
>> >> --- a/include/linux/libgcc.h
>> >> +++ b/include/linux/libgcc.h
>> >> @@ -22,15 +22,26 @@
>> >>  #include 
>> >> 
>> >>  typedef int word_type __attribute__ ((mode (__word__)));
>> >> +typedef int TItype __attribute__ ((mode (TI)));
>> >
>> >Consider using __int128 instead. Definition and use need a
>> >'defined(__SIZEOF_INT128__)' guard  (similar for mode (TI)), since
>> >these 128 bit types aren't supported on all platforms.
>> >
>> >>  #ifdef __BIG_ENDIAN
>> >>  struct DWstruct {
>> >>   int high, low;
>> >>  };
>> >> +
>> >> +struct DWstruct128 {
>> >> + long long high, low;
>> >> +};
>> >
>> >This struct isn't needed, struct DWstruct can be used.
>> >
>> >> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> >> new file mode 100644
>> >> index ..2d2123bb3030
>> >> --- /dev/null
>> >> +++ b/lib/lshrti3.c
>> >> @@ -0,0 +1,31 @@
>> >> +// SPDX-License-Identifier: GPL-2.0
>> >> +
>> >> +#include 
>> >> +#include 
>> >> +
>> >> +long long __lshrti3(long long u, word_type b)
>> >
>> >use TItype for input/output, which is what gcc does, though the
>above
>> >matches the interface in the documentation.
>> >
>> >> +{
>> >> + DWunion128 uu, w;
>> >> + word_type bm;
>> >> +
>> >> + if (b == 0)
>> >> + return u;
>> >> +
>> >> + uu.ll = u;
>> >> + bm = 64 - b;
>> >> +
>> >> + if (bm <= 0) {
>> >> + w.s.high = 0;
>> >> + w.s.low = (unsigned long long) uu.s.high >> -bm;
>> >
>> >include  and use u64 instead of unsigned long long.
>> 
>> Ok, now I'm really puzzled.
>> 
>> How could we need a 128-bit shift when the prototype only has 64 bits
>of input?!
>
>Good question, this is the code from libgcc:
>
>TItype
>__lshrti3 (TItype u, shift_count_type b)
>{
>  if (b == 0)
>return u;
>
>  const DWunion uu = {.ll = u};
>  const shift_count_type bm = (8 * (8)) - b;
>  DWunion w;
>
>  if (bm <= 0)
>{
>  w.s.high = 0;
>  w.s.low = (UDItype) uu.s.high >> -bm;
>}
>  else
>{
>  const UDItype carries = (UDItype) uu.s.high << bm;
>
>  w.s.high = (UDItype) uu.s.high >> b;
>  w.s.low = ((UDItype) uu.s.low >> b) | carries;
>}
>
>  return w.ll;
>}
>
>
>My compiler knowledge is limited, my guess is that the function is a
>generic implementation, and while a long long is 64-bit wide under
>Linux it could be 128-bit on other platforms.

Yes, long long is just plain wrong.

How could we end up calling this function on 32 bits?!
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] Revert "kbuild: use -Oz instead of -Os when using clang"

2019-03-18 Thread hpa

On March 18, 2019 2:56:05 PM PDT, Matthias Kaehlcke  wrote:
>On Mon, Mar 18, 2019 at 02:47:13PM -0700, 'Nick Desaulniers' via Clang
>Built Linux wrote:
>> On Mon, Mar 18, 2019 at 2:10 PM Matthias Kaehlcke 
>wrote:
>> >
>> > The clang option -Oz enables *aggressive* optimization for size,
>> > which doesn't necessarily result in smaller images, but can have
>> > negative impact on performance. Switch back to the less aggressive
>> > -Os.
>> >
>> > This reverts commit 6748cb3c299de1ffbe56733647b01dbcc398c419.
>> 
>> Does scripts/checkpatch.pl complain about this format (full sha, no
>> title)?
>
>Nope, for reverts checkpatch is happy with it, I suppose because by
>default the subject of the reverted patch is part of the revert's
>subject.
>
>> Not sure that necessitates a v2, but if so would be good to
>> mention that Clang generates excessive calls into libgcc/compiler-rt
>> at -Oz.  Thanks for the patch and the discussion/sanity check.
>
>I can send a v2 if 'needed'.
>
>> Reviewed-by: Nick Desaulniers 
>
>Thanks!
>
>Matthias
>
>> >
>> > Suggested-by: Peter Zijlstra 
>> > Signed-off-by: Matthias Kaehlcke 
>> > ---
>> >  Makefile | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/Makefile b/Makefile
>> > index 9ef547fc7ffe..191f3ce3cb5e 100644
>> > --- a/Makefile
>> > +++ b/Makefile
>> > @@ -667,7 +667,7 @@ KBUILD_CFLAGS   += $(call
>cc-disable-warning, format-overflow)
>> >  KBUILD_CFLAGS  += $(call cc-disable-warning, int-in-bool-context)
>> >
>> >  ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
>> > -KBUILD_CFLAGS  += $(call cc-option,-Oz,-Os)
>> > +KBUILD_CFLAGS  += -Os
>> >  else
>> >  KBUILD_CFLAGS   += -O2
>> >  endif
>> 
>> 
>> 

Would be nice, still, to have -Ok.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-18 Thread hpa

On March 18, 2019 2:31:13 PM PDT, Matthias Kaehlcke  wrote:
>On Fri, Mar 15, 2019 at 01:54:50PM -0700, Matthias Kaehlcke wrote:
>> The compiler may emit calls to __lshrti3 from the compiler runtime
>> library, which results in undefined references:
>> 
>> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>> 
>> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> 
>> Include the function for x86 builds with clang, which is the
>> environment where the above error was observed.
>> 
>> Signed-off-by: Matthias Kaehlcke 
>
>With "Revert "kbuild: use -Oz instead of -Os when using clang"
>(https://lore.kernel.org/patchwork/patch/1051932/) the above
>error is fixed, a few comments inline for if the patch is
>resurrected in the future because __lshrti3 is emitted in a
>different context.
>
>> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> index 32e1e0f4b2d0..a71036471838 100644
>> --- a/include/linux/libgcc.h
>> +++ b/include/linux/libgcc.h
>> @@ -22,15 +22,26 @@
>>  #include 
>> 
>>  typedef int word_type __attribute__ ((mode (__word__)));
>> +typedef int TItype __attribute__ ((mode (TI)));
>
>Consider using __int128 instead. Definition and use need a
>'defined(__SIZEOF_INT128__)' guard  (similar for mode (TI)), since
>these 128 bit types aren't supported on all platforms.
>
>>  #ifdef __BIG_ENDIAN
>>  struct DWstruct {
>>  int high, low;
>>  };
>> +
>> +struct DWstruct128 {
>> +long long high, low;
>> +};
>
>This struct isn't needed, struct DWstruct can be used.
>
>> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> new file mode 100644
>> index ..2d2123bb3030
>> --- /dev/null
>> +++ b/lib/lshrti3.c
>> @@ -0,0 +1,31 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include 
>> +#include 
>> +
>> +long long __lshrti3(long long u, word_type b)
>
>use TItype for input/output, which is what gcc does, though the above
>matches the interface in the documentation.
>
>> +{
>> +DWunion128 uu, w;
>> +word_type bm;
>> +
>> +if (b == 0)
>> +return u;
>> +
>> +uu.ll = u;
>> +bm = 64 - b;
>> +
>> +if (bm <= 0) {
>> +w.s.high = 0;
>> +w.s.low = (unsigned long long) uu.s.high >> -bm;
>
>include  and use u64 instead of unsigned long long.

Ok, now I'm really puzzled.

How could we need a 128-bit shift when the prototype only has 64 bits of input?!

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-15 Thread hpa

On March 15, 2019 4:47:01 PM PDT, Matthias Kaehlcke  wrote:
>On Fri, Mar 15, 2019 at 03:12:08PM -0700, h...@zytor.com wrote:
>> On March 15, 2019 3:06:37 PM PDT, Nick Desaulniers
> wrote:
>> >On Fri, Mar 15, 2019 at 1:54 PM Matthias Kaehlcke 
>> >wrote:
>> >>
>> >> The compiler may emit calls to __lshrti3 from the compiler runtime
>> >> library, which results in undefined references:
>> >>
>> >> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>> >>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>> >
>> >Looks like Clang will emit this at -Oz (but not -O2):
>> >https://godbolt.org/z/w1_2YC
>> >
>> >>
>> >> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> >
>> >Has it changed since? If so why not a newer version of libgcc_s?
>> >
>> >>
>> >> Include the function for x86 builds with clang, which is the
>> >> environment where the above error was observed.
>> >>
>> >> Signed-off-by: Matthias Kaehlcke 
>> >> ---
>> >>  arch/x86/Kconfig   |  1 +
>> >>  include/linux/libgcc.h | 16 
>> >>  lib/Kconfig|  3 +++
>> >>  lib/Makefile   |  1 +
>> >>  lib/lshrti3.c  | 31 +++
>> >>  5 files changed, 52 insertions(+)
>> >>  create mode 100644 lib/lshrti3.c
>> >>
>> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> >> index c1f9b3cf437c..a5e0d923845d 100644
>> >> --- a/arch/x86/Kconfig
>> >> +++ b/arch/x86/Kconfig
>> >> @@ -105,6 +105,7 @@ config X86
>> >> select GENERIC_IRQ_PROBE
>> >> select GENERIC_IRQ_RESERVATION_MODE
>> >> select GENERIC_IRQ_SHOW
>> >> +   select GENERIC_LIB_LSHRTI3  if CC_IS_CLANG
>> >> select GENERIC_PENDING_IRQ  if SMP
>> >> select GENERIC_SMP_IDLE_THREAD
>> >> select GENERIC_STRNCPY_FROM_USER
>> >> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> >> index 32e1e0f4b2d0..a71036471838 100644
>> >> --- a/include/linux/libgcc.h
>> >> +++ b/include/linux/libgcc.h
>> >> @@ -22,15 +22,26 @@
>> >>  #include 
>> >>
>> >>  typedef int word_type __attribute__ ((mode (__word__)));
>> >> +typedef int TItype __attribute__ ((mode (TI)));
>> >
>> >Well that's an interesting new compiler attribute.
>>
>>https://stackoverflow.com/questions/4559025/what-does-gcc-attribute-modexx-actually-do
>> >Please use `__mode(TI)` from include/linux/compiler_attributes.h ex.
>> >typedef int TItype __mode(TI);
>> >
>> >>
>> >>  #ifdef __BIG_ENDIAN
>> >>  struct DWstruct {
>> >> int high, low;
>> >>  };
>> >> +
>> >> +struct DWstruct128 {
>> >> +   long long high, low;
>> >> +};
>> >> +
>> >>  #elif defined(__LITTLE_ENDIAN)
>> >>  struct DWstruct {
>> >> int low, high;
>> >>  };
>> >> +
>> >> +struct DWstruct128 {
>> >> +   long long low, high;
>> >> +};
>> >> +
>> >>  #else
>> >>  #error I feel sick.
>> >>  #endif
>> >> @@ -40,4 +51,9 @@ typedef union {
>> >> long long ll;
>> >>  } DWunion;
>> >>
>> >> +typedef union {
>> >> +   struct DWstruct128 s;
>> >> +   TItype ll;
>> >> +} DWunion128;
>> >> +
>> >>  #endif /* __ASM_LIBGCC_H */
>> >> diff --git a/lib/Kconfig b/lib/Kconfig
>> >> index a9e56539bd11..369e10259ea6 100644
>> >> --- a/lib/Kconfig
>> >> +++ b/lib/Kconfig
>> >> @@ -624,6 +624,9 @@ config GENERIC_LIB_ASHRDI3
>> >>  config GENERIC_LIB_LSHRDI3
>> >> bool
>> >>
>> >> +config GENERIC_LIB_LSHRTI3
>> >> +   bool
>> >> +
>> >>  config GENERIC_LIB_MULDI3
>> >> bool
>> >>
>> >> diff --git a/lib/Makefile b/lib/Makefile
>> >> index 4e066120a0d6..42648411f451 100644
>> >> --- a/lib/Makefile
>> >> +++ b/lib/Makefile
>> >> @@ -277,6 +277,7 @@ obj-$(CONFIG_PARMAN) += parman.o
>> >>  obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_LSHRDI3) += lshrdi3.o
>> >> +obj-$(CONFIG_GENERIC_LIB_LSHRTI3) += lshrti3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_MULDI3) += muldi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_CMPDI2) += cmpdi2.o
>> >>  obj-$(CONFIG_GENERIC_LIB_UCMPDI2) += ucmpdi2.o
>> >> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> >> new file mode 100644
>> >> index ..2d2123bb3030
>> >> --- /dev/null
>> >> +++ b/lib/lshrti3.c
>> >> @@ -0,0 +1,31 @@
>> >> +// SPDX-License-Identifier: GPL-2.0
>> >> +
>> >> +#include 
>> >> +#include 
>> >> +
>> >> +long long __lshrti3(long long u, word_type b)
>> >> +{
>> >> +   DWunion128 uu, w;
>> >> +   word_type bm;
>> >> +
>> >> +   if (b == 0)
>> >> +   return u;
>> >> +
>> >> +   uu.ll = u;
>> >> +   bm = 64 - b;
>> >> +
>> >> +   if (bm <= 0) {
>> >> +   w.s.high = 0;
>> >> +   w.s.low = (unsigned long long) uu.s.high >> -bm;
>> >> +   } else {
>> >> +   const unsigned long long carries =
>> >> +   (unsigned long long) uu.s.high << bm;
>> >> +   w.s.high = (unsigned long long) uu.s.high >> b;
>> >> +   w.s.low = ((unsigned long long)

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-15 Thread hpa

On March 15, 2019 4:34:10 PM PDT, Matthias Kaehlcke  wrote:
>Hi Peter,
>
>On Fri, Mar 15, 2019 at 03:08:57PM -0700, h...@zytor.com wrote:
>> On March 15, 2019 3:06:37 PM PDT, Nick Desaulniers
> wrote:
>> >On Fri, Mar 15, 2019 at 1:54 PM Matthias Kaehlcke 
>> >wrote:
>> >>
>> >> The compiler may emit calls to __lshrti3 from the compiler runtime
>> >> library, which results in undefined references:
>> >>
>> >> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>> >>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>> >
>> >Looks like Clang will emit this at -Oz (but not -O2):
>> >https://godbolt.org/z/w1_2YC
>> >
>> >>
>> >> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>> >
>> >Has it changed since? If so why not a newer version of libgcc_s?
>> >
>> >>
>> >> Include the function for x86 builds with clang, which is the
>> >> environment where the above error was observed.
>> >>
>> >> Signed-off-by: Matthias Kaehlcke 
>> >> ---
>> >>  arch/x86/Kconfig   |  1 +
>> >>  include/linux/libgcc.h | 16 
>> >>  lib/Kconfig|  3 +++
>> >>  lib/Makefile   |  1 +
>> >>  lib/lshrti3.c  | 31 +++
>> >>  5 files changed, 52 insertions(+)
>> >>  create mode 100644 lib/lshrti3.c
>> >>
>> >> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> >> index c1f9b3cf437c..a5e0d923845d 100644
>> >> --- a/arch/x86/Kconfig
>> >> +++ b/arch/x86/Kconfig
>> >> @@ -105,6 +105,7 @@ config X86
>> >> select GENERIC_IRQ_PROBE
>> >> select GENERIC_IRQ_RESERVATION_MODE
>> >> select GENERIC_IRQ_SHOW
>> >> +   select GENERIC_LIB_LSHRTI3  if CC_IS_CLANG
>> >> select GENERIC_PENDING_IRQ  if SMP
>> >> select GENERIC_SMP_IDLE_THREAD
>> >> select GENERIC_STRNCPY_FROM_USER
>> >> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> >> index 32e1e0f4b2d0..a71036471838 100644
>> >> --- a/include/linux/libgcc.h
>> >> +++ b/include/linux/libgcc.h
>> >> @@ -22,15 +22,26 @@
>> >>  #include 
>> >>
>> >>  typedef int word_type __attribute__ ((mode (__word__)));
>> >> +typedef int TItype __attribute__ ((mode (TI)));
>> >
>> >Well that's an interesting new compiler attribute.
>>
>>https://stackoverflow.com/questions/4559025/what-does-gcc-attribute-modexx-actually-do
>> >Please use `__mode(TI)` from include/linux/compiler_attributes.h ex.
>> >typedef int TItype __mode(TI);
>> >
>> >>
>> >>  #ifdef __BIG_ENDIAN
>> >>  struct DWstruct {
>> >> int high, low;
>> >>  };
>> >> +
>> >> +struct DWstruct128 {
>> >> +   long long high, low;
>> >> +};
>> >> +
>> >>  #elif defined(__LITTLE_ENDIAN)
>> >>  struct DWstruct {
>> >> int low, high;
>> >>  };
>> >> +
>> >> +struct DWstruct128 {
>> >> +   long long low, high;
>> >> +};
>> >> +
>> >>  #else
>> >>  #error I feel sick.
>> >>  #endif
>> >> @@ -40,4 +51,9 @@ typedef union {
>> >> long long ll;
>> >>  } DWunion;
>> >>
>> >> +typedef union {
>> >> +   struct DWstruct128 s;
>> >> +   TItype ll;
>> >> +} DWunion128;
>> >> +
>> >>  #endif /* __ASM_LIBGCC_H */
>> >> diff --git a/lib/Kconfig b/lib/Kconfig
>> >> index a9e56539bd11..369e10259ea6 100644
>> >> --- a/lib/Kconfig
>> >> +++ b/lib/Kconfig
>> >> @@ -624,6 +624,9 @@ config GENERIC_LIB_ASHRDI3
>> >>  config GENERIC_LIB_LSHRDI3
>> >> bool
>> >>
>> >> +config GENERIC_LIB_LSHRTI3
>> >> +   bool
>> >> +
>> >>  config GENERIC_LIB_MULDI3
>> >> bool
>> >>
>> >> diff --git a/lib/Makefile b/lib/Makefile
>> >> index 4e066120a0d6..42648411f451 100644
>> >> --- a/lib/Makefile
>> >> +++ b/lib/Makefile
>> >> @@ -277,6 +277,7 @@ obj-$(CONFIG_PARMAN) += parman.o
>> >>  obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_LSHRDI3) += lshrdi3.o
>> >> +obj-$(CONFIG_GENERIC_LIB_LSHRTI3) += lshrti3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_MULDI3) += muldi3.o
>> >>  obj-$(CONFIG_GENERIC_LIB_CMPDI2) += cmpdi2.o
>> >>  obj-$(CONFIG_GENERIC_LIB_UCMPDI2) += ucmpdi2.o
>> >> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> >> new file mode 100644
>> >> index ..2d2123bb3030
>> >> --- /dev/null
>> >> +++ b/lib/lshrti3.c
>> >> @@ -0,0 +1,31 @@
>> >> +// SPDX-License-Identifier: GPL-2.0
>> >> +
>> >> +#include 
>> >> +#include 
>> >> +
>> >> +long long __lshrti3(long long u, word_type b)
>> >> +{
>> >> +   DWunion128 uu, w;
>> >> +   word_type bm;
>> >> +
>> >> +   if (b == 0)
>> >> +   return u;
>> >> +
>> >> +   uu.ll = u;
>> >> +   bm = 64 - b;
>> >> +
>> >> +   if (bm <= 0) {
>> >> +   w.s.high = 0;
>> >> +   w.s.low = (unsigned long long) uu.s.high >> -bm;
>> >> +   } else {
>> >> +   const unsigned long long carries =
>> >> +   (unsigned long long) uu.s.high << bm;
>> >> +   w.s.high = (unsigned long long) uu.s.high >> b;
>> >> +   w.s.low = ((unsigned

Re: [PATCH] x86/vdso: include generic __lshrdi3 in 32-bit vDSO

2019-03-15 Thread hpa

On March 15, 2019 3:29:06 PM PDT, Matthias Kaehlcke  wrote:
>Hi Nick,
>
>On Fri, Mar 15, 2019 at 02:31:09PM -0700, 'Nick Desaulniers' via Clang
>Built Linux wrote:
>> On Fri, Mar 15, 2019 at 12:54 PM Matthias Kaehlcke 
>wrote:
>> >
>> > Building the 32-bit vDSO with a recent clang version fails due
>> > to undefined symbols:
>> >
>> > arch/x86/entry/vdso/vdso32.so.dbg: undefined symbols found
>> >
>> > The undefined symbol in this case is __lshrdi3, which is part of
>> > the compiler runtime library, however the vDSO isn't linked against
>> > this library.
>> >
>> > Include the kernel version of __lshrdi3 in the 32-bit vDSO build.
>> 
>> __lshrdi3 is used for "logical shift right double-word by int" (best
>> guess), so anywhere there's a right shift of a u64.  Looks like
>> there's a few of these in arch/x86/entry/vdso/, so it's legal for the
>> compiler to emit this libcall.  Do you know which function
>> specifically in the .so has a relocation referencing __lshrdi3
>> specifically?
>
>It's the right shifts in do_realtime() and do_monotonic().
>
>> Is there a config I can set to reproduce this, in order to help
>> test?
>
>I encountered it with a Chrome OS specific configuration, but a
>defconfig should do. Note that you probably need a development version
>of clang to reproduce this.
>
>> >
>> > Signed-off-by: Matthias Kaehlcke 
>> > ---
>> >  arch/x86/entry/vdso/Makefile | 7 ++-
>> >  lib/lshrdi3.c| 4 +++-
>> >  2 files changed, 9 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/x86/entry/vdso/Makefile
>b/arch/x86/entry/vdso/Makefile
>> > index 5bfe2243a08f..7517cd87e10b 100644
>> > --- a/arch/x86/entry/vdso/Makefile
>> > +++ b/arch/x86/entry/vdso/Makefile
>> > @@ -144,6 +144,7 @@ KBUILD_CFLAGS_32 += $(call cc-option,
>-fno-stack-protector)
>> >  KBUILD_CFLAGS_32 += $(call cc-option, -foptimize-sibling-calls)
>> >  KBUILD_CFLAGS_32 += -fno-omit-frame-pointer
>> >  KBUILD_CFLAGS_32 += -DDISABLE_BRANCH_PROFILING
>> > +KBUILD_CFLAGS_32 += -DBUILD_VDSO
>> >
>> >  ifdef CONFIG_RETPOLINE
>> >  ifneq ($(RETPOLINE_VDSO_CFLAGS),)
>> > @@ -153,12 +154,16 @@ endif
>> >
>> >  $(obj)/vdso32.so.dbg: KBUILD_CFLAGS = $(KBUILD_CFLAGS_32)
>> >
>> > +$(obj)/vdso32/lshrdi3.o: $(srctree)/lib/lshrdi3.c FORCE
>> > +   $(call if_changed_rule,cc_o_c)
>> 
>> + Masahiro to help look at this part (I don't understand this part
>> of kbuild).
>
>I bluntly stole that from arch/x86/purgatory/Makefile , which does
>something similar.
>
>> 
>> > +
>> >  $(obj)/vdso32.so.dbg: FORCE \
>> >   $(obj)/vdso32/vdso32.lds \
>> >   $(obj)/vdso32/vclock_gettime.o \
>> >   $(obj)/vdso32/note.o \
>> >   $(obj)/vdso32/system_call.o \
>> > - $(obj)/vdso32/sigreturn.o
>> > + $(obj)/vdso32/sigreturn.o \
>> > + $(obj)/vdso32/lshrdi3.o
>> > $(call if_changed,vdso)
>> >
>> >  #
>> > diff --git a/lib/lshrdi3.c b/lib/lshrdi3.c
>> > index 99cfa5721f2d..8a4fc6bcf3a4 100644
>> > --- a/lib/lshrdi3.c
>> > +++ b/lib/lshrdi3.c
>> > @@ -16,7 +16,7 @@
>> >   * to the Free Software Foundation, Inc.
>> >   */
>> >
>> > -#include 
>> > +#include 
>> 
>> Is this a simple cleanup, or?
>
>The vDSO build is unhappy when modules.h draws in a whole bunch of
>other kernel headers and export.h is all that's need. It seemed
>reasonable to do the 'cleanup' in this patch since we touch it anyway
>to place EXPORT_SYMBOL within an #ifdef.
>
>> >  #include 
>> >
>> >  long long notrace __lshrdi3(long long u, word_type b)
>> > @@ -42,4 +42,6 @@ long long notrace __lshrdi3(long long u,
>word_type b)
>> >
>> > return w.ll;
>> >  }
>> > +#ifndef BUILD_VDSO
>> >  EXPORT_SYMBOL(__lshrdi3);
>> > +#endif
>> 
>> Compilers (GCC and Clang) will always assume their runtime has these
>> helper functions; whether or not they emit libcalls vs inline
>routines
>> is implementation defined.  So I agree with this patch; I just would
>> like to help confirm/test it.
>
>Thanks for your help!
>
>Matthias

Note: it is also probably no reason to use -Os/-Oz for the vdso.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-15 Thread hpa

On March 15, 2019 3:06:37 PM PDT, Nick Desaulniers  
wrote:
>On Fri, Mar 15, 2019 at 1:54 PM Matthias Kaehlcke 
>wrote:
>>
>> The compiler may emit calls to __lshrti3 from the compiler runtime
>> library, which results in undefined references:
>>
>> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>
>Looks like Clang will emit this at -Oz (but not -O2):
>https://godbolt.org/z/w1_2YC
>
>>
>> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>
>Has it changed since? If so why not a newer version of libgcc_s?
>
>>
>> Include the function for x86 builds with clang, which is the
>> environment where the above error was observed.
>>
>> Signed-off-by: Matthias Kaehlcke 
>> ---
>>  arch/x86/Kconfig   |  1 +
>>  include/linux/libgcc.h | 16 
>>  lib/Kconfig|  3 +++
>>  lib/Makefile   |  1 +
>>  lib/lshrti3.c  | 31 +++
>>  5 files changed, 52 insertions(+)
>>  create mode 100644 lib/lshrti3.c
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index c1f9b3cf437c..a5e0d923845d 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -105,6 +105,7 @@ config X86
>> select GENERIC_IRQ_PROBE
>> select GENERIC_IRQ_RESERVATION_MODE
>> select GENERIC_IRQ_SHOW
>> +   select GENERIC_LIB_LSHRTI3  if CC_IS_CLANG
>> select GENERIC_PENDING_IRQ  if SMP
>> select GENERIC_SMP_IDLE_THREAD
>> select GENERIC_STRNCPY_FROM_USER
>> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> index 32e1e0f4b2d0..a71036471838 100644
>> --- a/include/linux/libgcc.h
>> +++ b/include/linux/libgcc.h
>> @@ -22,15 +22,26 @@
>>  #include 
>>
>>  typedef int word_type __attribute__ ((mode (__word__)));
>> +typedef int TItype __attribute__ ((mode (TI)));
>
>Well that's an interesting new compiler attribute.
>https://stackoverflow.com/questions/4559025/what-does-gcc-attribute-modexx-actually-do
>Please use `__mode(TI)` from include/linux/compiler_attributes.h ex.
>typedef int TItype __mode(TI);
>
>>
>>  #ifdef __BIG_ENDIAN
>>  struct DWstruct {
>> int high, low;
>>  };
>> +
>> +struct DWstruct128 {
>> +   long long high, low;
>> +};
>> +
>>  #elif defined(__LITTLE_ENDIAN)
>>  struct DWstruct {
>> int low, high;
>>  };
>> +
>> +struct DWstruct128 {
>> +   long long low, high;
>> +};
>> +
>>  #else
>>  #error I feel sick.
>>  #endif
>> @@ -40,4 +51,9 @@ typedef union {
>> long long ll;
>>  } DWunion;
>>
>> +typedef union {
>> +   struct DWstruct128 s;
>> +   TItype ll;
>> +} DWunion128;
>> +
>>  #endif /* __ASM_LIBGCC_H */
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index a9e56539bd11..369e10259ea6 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -624,6 +624,9 @@ config GENERIC_LIB_ASHRDI3
>>  config GENERIC_LIB_LSHRDI3
>> bool
>>
>> +config GENERIC_LIB_LSHRTI3
>> +   bool
>> +
>>  config GENERIC_LIB_MULDI3
>> bool
>>
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 4e066120a0d6..42648411f451 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -277,6 +277,7 @@ obj-$(CONFIG_PARMAN) += parman.o
>>  obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
>>  obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
>>  obj-$(CONFIG_GENERIC_LIB_LSHRDI3) += lshrdi3.o
>> +obj-$(CONFIG_GENERIC_LIB_LSHRTI3) += lshrti3.o
>>  obj-$(CONFIG_GENERIC_LIB_MULDI3) += muldi3.o
>>  obj-$(CONFIG_GENERIC_LIB_CMPDI2) += cmpdi2.o
>>  obj-$(CONFIG_GENERIC_LIB_UCMPDI2) += ucmpdi2.o
>> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> new file mode 100644
>> index ..2d2123bb3030
>> --- /dev/null
>> +++ b/lib/lshrti3.c
>> @@ -0,0 +1,31 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include 
>> +#include 
>> +
>> +long long __lshrti3(long long u, word_type b)
>> +{
>> +   DWunion128 uu, w;
>> +   word_type bm;
>> +
>> +   if (b == 0)
>> +   return u;
>> +
>> +   uu.ll = u;
>> +   bm = 64 - b;
>> +
>> +   if (bm <= 0) {
>> +   w.s.high = 0;
>> +   w.s.low = (unsigned long long) uu.s.high >> -bm;
>> +   } else {
>> +   const unsigned long long carries =
>> +   (unsigned long long) uu.s.high << bm;
>> +   w.s.high = (unsigned long long) uu.s.high >> b;
>> +   w.s.low = ((unsigned long long) uu.s.low >> b) |
>carries;
>> +   }
>> +
>> +   return w.ll;
>> +}
>> +#ifndef BUILD_VDSO
>> +EXPORT_SYMBOL(__lshrti3);
>> +#endif
>
>I don't think you want this.  Maybe that was carried over from
>https://lkml.org/lkml/2019/3/15/669
>by accident?  The above linkage error mentions arch/x86/kvm/x86.o
>which I wouldn't expect to be linked into the VDSO image?

Or just "u64"...
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH] lib: Add shared copy of __lshrti3 from libgcc

2019-03-15 Thread hpa

On March 15, 2019 3:06:37 PM PDT, Nick Desaulniers  
wrote:
>On Fri, Mar 15, 2019 at 1:54 PM Matthias Kaehlcke 
>wrote:
>>
>> The compiler may emit calls to __lshrti3 from the compiler runtime
>> library, which results in undefined references:
>>
>> arch/x86/kvm/x86.o: In function `mul_u64_u64_shr':
>>   include/linux/math64.h:186: undefined reference to `__lshrti3'
>
>Looks like Clang will emit this at -Oz (but not -O2):
>https://godbolt.org/z/w1_2YC
>
>>
>> Add a copy of the __lshrti3 libgcc routine (from gcc v4.9.2).
>
>Has it changed since? If so why not a newer version of libgcc_s?
>
>>
>> Include the function for x86 builds with clang, which is the
>> environment where the above error was observed.
>>
>> Signed-off-by: Matthias Kaehlcke 
>> ---
>>  arch/x86/Kconfig   |  1 +
>>  include/linux/libgcc.h | 16 
>>  lib/Kconfig|  3 +++
>>  lib/Makefile   |  1 +
>>  lib/lshrti3.c  | 31 +++
>>  5 files changed, 52 insertions(+)
>>  create mode 100644 lib/lshrti3.c
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index c1f9b3cf437c..a5e0d923845d 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -105,6 +105,7 @@ config X86
>> select GENERIC_IRQ_PROBE
>> select GENERIC_IRQ_RESERVATION_MODE
>> select GENERIC_IRQ_SHOW
>> +   select GENERIC_LIB_LSHRTI3  if CC_IS_CLANG
>> select GENERIC_PENDING_IRQ  if SMP
>> select GENERIC_SMP_IDLE_THREAD
>> select GENERIC_STRNCPY_FROM_USER
>> diff --git a/include/linux/libgcc.h b/include/linux/libgcc.h
>> index 32e1e0f4b2d0..a71036471838 100644
>> --- a/include/linux/libgcc.h
>> +++ b/include/linux/libgcc.h
>> @@ -22,15 +22,26 @@
>>  #include 
>>
>>  typedef int word_type __attribute__ ((mode (__word__)));
>> +typedef int TItype __attribute__ ((mode (TI)));
>
>Well that's an interesting new compiler attribute.
>https://stackoverflow.com/questions/4559025/what-does-gcc-attribute-modexx-actually-do
>Please use `__mode(TI)` from include/linux/compiler_attributes.h ex.
>typedef int TItype __mode(TI);
>
>>
>>  #ifdef __BIG_ENDIAN
>>  struct DWstruct {
>> int high, low;
>>  };
>> +
>> +struct DWstruct128 {
>> +   long long high, low;
>> +};
>> +
>>  #elif defined(__LITTLE_ENDIAN)
>>  struct DWstruct {
>> int low, high;
>>  };
>> +
>> +struct DWstruct128 {
>> +   long long low, high;
>> +};
>> +
>>  #else
>>  #error I feel sick.
>>  #endif
>> @@ -40,4 +51,9 @@ typedef union {
>> long long ll;
>>  } DWunion;
>>
>> +typedef union {
>> +   struct DWstruct128 s;
>> +   TItype ll;
>> +} DWunion128;
>> +
>>  #endif /* __ASM_LIBGCC_H */
>> diff --git a/lib/Kconfig b/lib/Kconfig
>> index a9e56539bd11..369e10259ea6 100644
>> --- a/lib/Kconfig
>> +++ b/lib/Kconfig
>> @@ -624,6 +624,9 @@ config GENERIC_LIB_ASHRDI3
>>  config GENERIC_LIB_LSHRDI3
>> bool
>>
>> +config GENERIC_LIB_LSHRTI3
>> +   bool
>> +
>>  config GENERIC_LIB_MULDI3
>> bool
>>
>> diff --git a/lib/Makefile b/lib/Makefile
>> index 4e066120a0d6..42648411f451 100644
>> --- a/lib/Makefile
>> +++ b/lib/Makefile
>> @@ -277,6 +277,7 @@ obj-$(CONFIG_PARMAN) += parman.o
>>  obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o
>>  obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o
>>  obj-$(CONFIG_GENERIC_LIB_LSHRDI3) += lshrdi3.o
>> +obj-$(CONFIG_GENERIC_LIB_LSHRTI3) += lshrti3.o
>>  obj-$(CONFIG_GENERIC_LIB_MULDI3) += muldi3.o
>>  obj-$(CONFIG_GENERIC_LIB_CMPDI2) += cmpdi2.o
>>  obj-$(CONFIG_GENERIC_LIB_UCMPDI2) += ucmpdi2.o
>> diff --git a/lib/lshrti3.c b/lib/lshrti3.c
>> new file mode 100644
>> index ..2d2123bb3030
>> --- /dev/null
>> +++ b/lib/lshrti3.c
>> @@ -0,0 +1,31 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +
>> +#include 
>> +#include 
>> +
>> +long long __lshrti3(long long u, word_type b)
>> +{
>> +   DWunion128 uu, w;
>> +   word_type bm;
>> +
>> +   if (b == 0)
>> +   return u;
>> +
>> +   uu.ll = u;
>> +   bm = 64 - b;
>> +
>> +   if (bm <= 0) {
>> +   w.s.high = 0;
>> +   w.s.low = (unsigned long long) uu.s.high >> -bm;
>> +   } else {
>> +   const unsigned long long carries =
>> +   (unsigned long long) uu.s.high << bm;
>> +   w.s.high = (unsigned long long) uu.s.high >> b;
>> +   w.s.low = ((unsigned long long) uu.s.low >> b) |
>carries;
>> +   }
>> +
>> +   return w.ll;
>> +}
>> +#ifndef BUILD_VDSO
>> +EXPORT_SYMBOL(__lshrti3);
>> +#endif
>
>I don't think you want this.  Maybe that was carried over from
>https://lkml.org/lkml/2019/3/15/669
>by accident?  The above linkage error mentions arch/x86/kvm/x86.o
>which I wouldn't expect to be linked into the VDSO image?

If we compile with the default ABI we can simply link with libgcc; with 
-mregparm=3 all versions of gcc are broken.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 5.0 30/46] x86/boot/compressed/64: Do not read legacy ROM on EFI system

2019-03-10 Thread hpa

On March 9, 2019 10:18:40 PM PST, Greg Kroah-Hartman 
 wrote:
>On Sat, Mar 09, 2019 at 10:10:19PM -0800, h...@zytor.com wrote:
>> On March 8, 2019 4:50:03 AM PST, Greg Kroah-Hartman
> wrote:
>> >5.0-stable review patch.  If anyone has any objections, please let
>me
>> >know.
>> >
>> >--
>> >
>> >From: Kirill A. Shutemov 
>> >
>> >commit 6f913de3231e1d70a871135b38219da7810df218 upstream.
>> >
>> >EFI systems do not necessarily provide a legacy ROM. If the ROM is
>> >missing
>> >the memory is not mapped at all.
>> >
>> >Trying to dereference values in the legacy ROM area leads to a crash
>on
>> >Macbook Pro.
>> >
>> >Only look for values in the legacy ROM area for non-EFI system.
>> >
>> >Fixes: 3548e131ec6a ("x86/boot/compressed/64: Find a place for
>32-bit
>> >trampoline")
>> >Reported-by: Pitam Mitra 
>> >Signed-off-by: Kirill A. Shutemov 
>> >Signed-off-by: Thomas Gleixner 
>> >Tested-by: Bockjoo Kim 
>> >Cc: b...@alien8.de
>> >Cc: h...@zytor.com
>> >Cc: sta...@vger.kernel.org
>> >Link:
>>
>>https://lkml.kernel.org/r/20190219075224.35058-1-kirill.shute...@linux.intel.com
>> >Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351
>> >Signed-off-by: Greg Kroah-Hartman 
>> >
>> >---
>> > arch/x86/boot/compressed/pgtable_64.c |   19 ---
>> > 1 file changed, 16 insertions(+), 3 deletions(-)
>> >
>> >--- a/arch/x86/boot/compressed/pgtable_64.c
>> >+++ b/arch/x86/boot/compressed/pgtable_64.c
>> >@@ -1,5 +1,7 @@
>> >+#include 
>> > #include 
>> > #include 
>> >+#include 
>> > #include "pgtable.h"
>> > #include "../string.h"
>> > 
>> >@@ -37,9 +39,10 @@ int cmdline_find_option_bool(const char
>> > 
>> > static unsigned long find_trampoline_placement(void)
>> > {
>> >-   unsigned long bios_start, ebda_start;
>> >+   unsigned long bios_start = 0, ebda_start = 0;
>> >unsigned long trampoline_start;
>> >struct boot_e820_entry *entry;
>> >+   char *signature;
>> >int i;
>> > 
>> >/*
>> >@@ -47,8 +50,18 @@ static unsigned long find_trampoline_pla
>> > * This code is based on reserve_bios_regions().
>> > */
>> > 
>> >-   ebda_start = *(unsigned short *)0x40e << 4;
>> >-   bios_start = *(unsigned short *)0x413 << 10;
>> >+   /*
>> >+* EFI systems may not provide legacy ROM. The memory may not be
>> >mapped
>> >+* at all.
>> >+*
>> >+* Only look for values in the legacy ROM for non-EFI system.
>> >+*/
>> >+   signature = (char *)_params->efi_info.efi_loader_signature;
>> >+   if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
>> >+   strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) {
>> >+   ebda_start = *(unsigned short *)0x40e << 4;
>> >+   bios_start = *(unsigned short *)0x413 << 10;
>> >+   }
>> > 
>> >if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
>> >bios_start = BIOS_START_MAX;
>> 
>> Only one objection: the explanation is nonsensical. 
>
>Heh, take it up with the original submitter, I don't change the
>changelog text after it hits Linus's tree :)
>
>greg k-h

:)
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 5.0 30/46] x86/boot/compressed/64: Do not read legacy ROM on EFI system

2019-03-09 Thread hpa

On March 8, 2019 4:50:03 AM PST, Greg Kroah-Hartman 
 wrote:
>5.0-stable review patch.  If anyone has any objections, please let me
>know.
>
>--
>
>From: Kirill A. Shutemov 
>
>commit 6f913de3231e1d70a871135b38219da7810df218 upstream.
>
>EFI systems do not necessarily provide a legacy ROM. If the ROM is
>missing
>the memory is not mapped at all.
>
>Trying to dereference values in the legacy ROM area leads to a crash on
>Macbook Pro.
>
>Only look for values in the legacy ROM area for non-EFI system.
>
>Fixes: 3548e131ec6a ("x86/boot/compressed/64: Find a place for 32-bit
>trampoline")
>Reported-by: Pitam Mitra 
>Signed-off-by: Kirill A. Shutemov 
>Signed-off-by: Thomas Gleixner 
>Tested-by: Bockjoo Kim 
>Cc: b...@alien8.de
>Cc: h...@zytor.com
>Cc: sta...@vger.kernel.org
>Link:
>https://lkml.kernel.org/r/20190219075224.35058-1-kirill.shute...@linux.intel.com
>Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202351
>Signed-off-by: Greg Kroah-Hartman 
>
>---
> arch/x86/boot/compressed/pgtable_64.c |   19 ---
> 1 file changed, 16 insertions(+), 3 deletions(-)
>
>--- a/arch/x86/boot/compressed/pgtable_64.c
>+++ b/arch/x86/boot/compressed/pgtable_64.c
>@@ -1,5 +1,7 @@
>+#include 
> #include 
> #include 
>+#include 
> #include "pgtable.h"
> #include "../string.h"
> 
>@@ -37,9 +39,10 @@ int cmdline_find_option_bool(const char
> 
> static unsigned long find_trampoline_placement(void)
> {
>-  unsigned long bios_start, ebda_start;
>+  unsigned long bios_start = 0, ebda_start = 0;
>   unsigned long trampoline_start;
>   struct boot_e820_entry *entry;
>+  char *signature;
>   int i;
> 
>   /*
>@@ -47,8 +50,18 @@ static unsigned long find_trampoline_pla
>* This code is based on reserve_bios_regions().
>*/
> 
>-  ebda_start = *(unsigned short *)0x40e << 4;
>-  bios_start = *(unsigned short *)0x413 << 10;
>+  /*
>+   * EFI systems may not provide legacy ROM. The memory may not be
>mapped
>+   * at all.
>+   *
>+   * Only look for values in the legacy ROM for non-EFI system.
>+   */
>+  signature = (char *)_params->efi_info.efi_loader_signature;
>+  if (strncmp(signature, EFI32_LOADER_SIGNATURE, 4) &&
>+  strncmp(signature, EFI64_LOADER_SIGNATURE, 4)) {
>+  ebda_start = *(unsigned short *)0x40e << 4;
>+  bios_start = *(unsigned short *)0x413 << 10;
>+  }
> 
>   if (bios_start < BIOS_START_MIN || bios_start > BIOS_START_MAX)
>   bios_start = BIOS_START_MAX;

Only one objection: the explanation is nonsensical. 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC] Provide in-kernel headers for making it easy to extend the kernel

2019-03-07 Thread hpa

On March 7, 2019 3:12:07 PM PST, Joel Fernandes  wrote:
>Enrico,
>
>On Thu, Mar 07, 2019 at 11:11:22PM +0100, Enrico Weigelt, metux IT
>consult wrote:
>> On 07.03.19 21:55, Greg KH wrote:
>> 
>> > Ick, no, no more squashfs please, let's just kill that mess once
>and for
>> > all :)
>> 
>> okay, then: s/squashfs/whatever_fs_image_or_archive_you_like/;
>> 
>> > Again, putting this in a simple compressed tar image allows anyone
>to do
>> > whatever they need to with this.  If they want a full filesystem,
>> > uncompress it and use it there.  If they just want it in-memory
>where
>> > they can uncompress it and then discard it, that works too.
>> 
>> And let me stress the point: doesn't need any kernel changes at all,
>> when it's just a file in the same place where the .ko's live.
>
>Yes, but you're missing the point that some people would also opt to
>build it
>into the kernel during their development/debugging (Config=y). For such
>folks, they don't want to update the FS with anything during debug runs
>either. Your "whole same place where the .ko lives" doesn't address
>Daniel's
>usecase. You may say "initrd", but this is a much cleaner solution to
>that
>IMO. There is no initrd needed and the path to the header files will be
>at a
>standard location that is already pre-decided by the kernel.
>
>As Greg said, you are welcome to keep it disabled for yourself if you
>don't
>want it. This doesn't affect anyone else who doesn't use it.

You do know that initrd can be built into the kernel, right?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 18/20] objtool: Add UACCESS validation

2019-03-07 Thread hpa

On March 7, 2019 10:48:13 AM PST, Peter Zijlstra  wrote:
>On Thu, Mar 07, 2019 at 09:54:14AM -0800, Linus Torvalds wrote:
>> On Thu, Mar 7, 2019 at 9:41 AM Peter Zijlstra 
>wrote:
>> > >
>> > > What's the call site that made you go "just add __memset() to the
>list"?
>> >
>> > __asan_{,un}poinson_stack_memory()
>> >   kasan_{,un}poison_shadow()
>> > __memset()
>> 
>> Ugh. I think I almost just agree with your decision to just let that
>> memset go unchecked.
>> 
>> I'm not saying it's right, but it doesn't seem to be a fight worth
>fighting.
>
>One think I could do; is add a filter to each function and only allow
>__memset from the kasan code, and not from anywhere else.
>
>Another thing I need to look at is why objtool only found memset_orig
>(from __memset) but not memset_erms, which if I read the code right, is
>a possible alternative there.
>
>> Because AC vs KASAN in general ends up smelling like "not a fight
>> worth fighting" to me. You've done a herculean job, but..
>
>I know,.. I've been so close to doing that so many times, but it
>seems like defeat, esp. since I'm so close now :-)

___memset_kasan()?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 00/20] objtool: UACCESS validation v3

2019-03-07 Thread hpa

On March 7, 2019 9:18:29 AM PST, Josh Poimboeuf  wrote:
>On Thu, Mar 07, 2019 at 09:04:36AM -0800, h...@zytor.com wrote:
>> On March 7, 2019 8:47:05 AM PST, Josh Poimboeuf 
>wrote:
>> >On Thu, Mar 07, 2019 at 02:13:12PM +0100, Peter Zijlstra wrote:
>> >> On Thu, Mar 07, 2019 at 01:55:26PM +0100, Peter Zijlstra wrote:
>> >> > On Thu, Mar 07, 2019 at 01:03:17PM +0100, Peter Zijlstra wrote:
>> >> 
>> >> 
>> >> > > 01be 20d3:  31 c0   xor%eax,%eax
>> >> > > 01c0 20d5:  4c 39 ebcmp%r13,%rbx
>> >> > > 01c3 20d8:  77 08   ja 20e2
>> ><__do_sys_waitid+0x1cd>
>> >> 
>> >> randconfig-build/kernel/exit.o: warning: objtool:  
>> >__do_sys_waitid()+0x1c3: (branch)
>> >> 
>> >> > > 01cd 20e2:83 f0 01xor$0x1,%eax
>> >> > > 01d0 20e5:48 89 c2mov%rax,%rdx
>> >> > > 01d3 20e8:83 e2 01and$0x1,%edx
>> >> > > 01d6 20eb:48 83 c2 02 add$0x2,%rdx
>> >> > > 01da 20ef:48 ff 04 d5 00 00 00incq   0x0(,%rdx,8)
>> >> > > 01e1 20f6:00 
>> >> > > 01de  20f3: R_X86_64_32S  _ftrace_branch+0x148
>> >> > > 01e2 20f7:84 c0   test   %al,%al
>> >> > > 01e4 20f9:75 2d   jne2128
>> ><__do_sys_waitid+0x213>
>> >> > 
>> >> > we do not take this branch and fall-through.
>> >> 
>> >> And that is the error, I think. We should've taken it and went to:
>> >> 
>> >>   return -EFAULT;
>> >> 
>> >> because:
>> >> 
>> >>  +1be xor  %eax,%eax  eax=0
>> >>  +1cd   xor  $0x1,%eaxeax=1
>> >>  +1e2   test %al,%al  1&1==1 -> ZF=0
>> >>  +1e4   jnz
>> >> 
>> >> Is an unconditional code sequence, but there's no way objtool can
>do
>> >> that without becoming a full blown interpreter :/
>> >> 
>> >> > > 0213 2128:  49 c7 c7 f2 ff ff ffmov   
>> >$0xfff2,%r15
>> >> > > e0eb }
>> >> > > 021a 212f:  48 8d 65 d8 lea   
>-0x28(%rbp),%rsp
>> >> > > 021e 2133:  4c 89 f8mov%r15,%rax
>> >> > > 0221 2136:  5b  pop%rbx
>> >> > > 0222 2137:  41 5c   pop%r12
>> >> > > 0224 2139:  41 5d   pop%r13
>> >> > > 0226 213b:  41 5e   pop%r14
>> >> > > 0228 213d:  41 5f   pop%r15
>> >> > > 022a 213f:  5d  pop%rbp
>> >> > > 022b 2140:  c3  retq
>> >
>> >This "fixes" it, and also seems to help -Os make much code:
>> >
>> >diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>> >index 445348facea9..8de63db58fdd 100644
>> >--- a/include/linux/compiler.h
>> >+++ b/include/linux/compiler.h
>> >@@ -67,7 +67,7 @@ void ftrace_likely_update(struct
>ftrace_likely_data
>> >*f, int val,
>> >.line = __LINE__,   \
>> >};  \
>> >__r = !!(cond); \
>> >-   __f.miss_hit[__r]++;
>> >\
>> >+   if (__r) __f.miss_hit[1]++; else __f.miss_hit[0]++; 
>> >\
>> >__r;\
>> >}))
>> > #endif /* CONFIG_PROFILE_ALL_BRANCHES */
>> 
>> if (cond)?  Or is ___r used elsewhere?
>
>__r is also the return value.  And it's needed because cond should
>only be evaluated once.

So put a true; and false; inside the if.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 00/20] objtool: UACCESS validation v3

2019-03-07 Thread hpa

On March 7, 2019 3:45:11 AM PST, Peter Zijlstra  wrote:
>Teach objtool to validate the UACCESS (SMAP, PAN) rules with are
>currently
>unenforced and (therefore obviously) violated.
>
>UACCESS sections should be small; we want to limit the amount of code
>that can
>touch userspace. Furthermore, UACCESS state isn't scheduled, this means
>that
>anything that directly calls into the scheduler will result in random
>code
>running with UACCESS enabled and possibly getting back into the UACCESS
>region
>with UACCESS disabled and causing faults.
>
>Forbid any CALL/RET while UACCESS is enabled; but provide a few
>exceptions.
>
>This builds x86_64-allmodconfig clean, and I've only got a few
>randconfig
>failures left (GCC-8) that I'm not quite understanding.
>
>---
> arch/x86/ia32/ia32_signal.c|  29 ++-
> arch/x86/include/asm/asm.h |  24 --
> arch/x86/include/asm/nospec-branch.h   |   4 +-
> arch/x86/include/asm/smap.h|  20 ++
> arch/x86/include/asm/uaccess.h |   5 +-
> arch/x86/include/asm/uaccess_64.h  |   3 -
> arch/x86/include/asm/xen/hypercall.h   |  26 +-
> arch/x86/kernel/signal.c   |   2 +-
> arch/x86/lib/copy_user_64.S|  48 
> arch/x86/lib/memcpy_64.S   |   3 +-
> arch/x86/lib/usercopy_64.c |  20 --
> drivers/gpu/drm/i915/i915_gem_execbuffer.c |   6 +-
> include/linux/uaccess.h|   2 +
> kernel/trace/trace_branch.c|   4 +
> lib/Makefile   |   1 +
> lib/ubsan.c|   4 +
> mm/kasan/Makefile  |   3 +
> mm/kasan/common.c  |  10 +
> mm/kasan/report.c  |   3 +-
> scripts/Makefile.build |   3 +
> tools/objtool/Makefile |   2 +-
> tools/objtool/arch.h   |   8 +-
> tools/objtool/arch/x86/decode.c|  26 +-
> tools/objtool/builtin-check.c  |   4 +-
> tools/objtool/builtin.h|   2 +-
>tools/objtool/check.c  | 382
>++---
> tools/objtool/check.h  |   4 +-
> tools/objtool/elf.c|  15 +-
> tools/objtool/elf.h|   3 +-
> tools/objtool/special.c|  10 +-
> tools/objtool/warn.h   |   8 +
> 31 files changed, 511 insertions(+), 173 deletions(-)
>
>
> 

This is phenomenal. Thank you so much for digging into this. I'm hoping this 
will greatly reduce the risk of future leakage.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 00/20] objtool: UACCESS validation v3

2019-03-07 Thread hpa

On March 7, 2019 8:47:05 AM PST, Josh Poimboeuf  wrote:
>On Thu, Mar 07, 2019 at 02:13:12PM +0100, Peter Zijlstra wrote:
>> On Thu, Mar 07, 2019 at 01:55:26PM +0100, Peter Zijlstra wrote:
>> > On Thu, Mar 07, 2019 at 01:03:17PM +0100, Peter Zijlstra wrote:
>> 
>> 
>> > > 01be 20d3:  31 c0   xor%eax,%eax
>> > > 01c0 20d5:  4c 39 ebcmp%r13,%rbx
>> > > 01c3 20d8:  77 08   ja 20e2
><__do_sys_waitid+0x1cd>
>> 
>> randconfig-build/kernel/exit.o: warning: objtool:  
>__do_sys_waitid()+0x1c3: (branch)
>> 
>> > > 01cd 20e2:   83 f0 01xor$0x1,%eax
>> > > 01d0 20e5:   48 89 c2mov%rax,%rdx
>> > > 01d3 20e8:   83 e2 01and$0x1,%edx
>> > > 01d6 20eb:   48 83 c2 02 add$0x2,%rdx
>> > > 01da 20ef:   48 ff 04 d5 00 00 00incq   0x0(,%rdx,8)
>> > > 01e1 20f6:   00 
>> > > 01de 20f3: R_X86_64_32S  _ftrace_branch+0x148
>> > > 01e2 20f7:   84 c0   test   %al,%al
>> > > 01e4 20f9:   75 2d   jne2128
><__do_sys_waitid+0x213>
>> > 
>> > we do not take this branch and fall-through.
>> 
>> And that is the error, I think. We should've taken it and went to:
>> 
>>   return -EFAULT;
>> 
>> because:
>> 
>>  +1bexor  %eax,%eax  eax=0
>>  +1cd   xor  $0x1,%eax   eax=1
>>  +1e2   test %al,%al 1&1==1 -> ZF=0
>>  +1e4   jnz
>> 
>> Is an unconditional code sequence, but there's no way objtool can do
>> that without becoming a full blown interpreter :/
>> 
>> > > 0213 2128:  49 c7 c7 f2 ff ff ffmov   
>$0xfff2,%r15
>> > > e0eb }
>> > > 021a 212f:  48 8d 65 d8 lea-0x28(%rbp),%rsp
>> > > 021e 2133:  4c 89 f8mov%r15,%rax
>> > > 0221 2136:  5b  pop%rbx
>> > > 0222 2137:  41 5c   pop%r12
>> > > 0224 2139:  41 5d   pop%r13
>> > > 0226 213b:  41 5e   pop%r14
>> > > 0228 213d:  41 5f   pop%r15
>> > > 022a 213f:  5d  pop%rbp
>> > > 022b 2140:  c3  retq
>
>This "fixes" it, and also seems to help -Os make much code:
>
>diff --git a/include/linux/compiler.h b/include/linux/compiler.h
>index 445348facea9..8de63db58fdd 100644
>--- a/include/linux/compiler.h
>+++ b/include/linux/compiler.h
>@@ -67,7 +67,7 @@ void ftrace_likely_update(struct ftrace_likely_data
>*f, int val,
>   .line = __LINE__,   \
>   };  \
>   __r = !!(cond); \
>-  __f.miss_hit[__r]++;
>\
>+  if (__r) __f.miss_hit[1]++; else __f.miss_hit[0]++; 
>\
>   __r;\
>   }))
> #endif /* CONFIG_PROFILE_ALL_BRANCHES */

if (cond)?  Or is ___r used elsewhere?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH 18/20] objtool: Add UACCESS validation

2019-03-07 Thread hpa

On March 7, 2019 8:33:26 AM PST, Linus Torvalds  
wrote:
>On Thu, Mar 7, 2019 at 3:52 AM Peter Zijlstra 
>wrote:
>>
>> XXX: are we sure we want __memset marked AC-safe?
>
>It's certainly one of the safer functions to call with AC set, but it
>sounds wrong anyway. It's not like it's likely to leak kernel data
>(most memset's are with 0, and even the non-zero ones I can't imagine
>are sensitive - more like poison values etc).
>
>What's the call site that made you go "just add __memset() to the
>list"?
>
>Linus

Agreed... it seems fishy at least.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v2 10/20] x86: avoid W^X being broken during modules loading

2019-03-07 Thread hpa

On March 6, 2019 11:29:47 PM PST, Borislav Petkov  wrote:
>On Mon, Feb 11, 2019 at 08:42:51PM +0100, Borislav Petkov wrote:
>> On Mon, Feb 11, 2019 at 11:27:03AM -0800, Nadav Amit wrote:
>> > Is there any comment over static_cpu_has()? ;-)
>> 
>> Almost:
>> 
>> /*
>>  * Static testing of CPU features.  Used the same as boot_cpu_has().
>>  * These will statically patch the target code for additional
>>  * performance.
>>  */
>> static __always_inline __pure bool _static_cpu_has(u16 bit)
>
>Ok, I guess something like that along with converting the obvious slow
>path callers to boot_cpu_has():
>
>---
>diff --git a/arch/x86/include/asm/cpufeature.h
>b/arch/x86/include/asm/cpufeature.h
>index ce95b8cbd229..e25d11ad7a88 100644
>--- a/arch/x86/include/asm/cpufeature.h
>+++ b/arch/x86/include/asm/cpufeature.h
>@@ -155,9 +155,12 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c,
>unsigned int bit);
> #else
> 
> /*
>- * Static testing of CPU features.  Used the same as boot_cpu_has().
>- * These will statically patch the target code for additional
>- * performance.
>+ * Static testing of CPU features. Used the same as boot_cpu_has(). It
>+ * statically patches the target code for additional performance. Use
>+ * static_cpu_has() only in fast paths, where every cycle counts.
>Which
>+ * means that the boot_cpu_has() variant is already fast enough for
>the
>+ * majority of cases and you should stick to using it as it is
>generally
>+ * only two instructions: a RIP-relative MOV and a TEST.
>  */
> static __always_inline __pure bool _static_cpu_has(u16 bit)
> {
>diff --git a/arch/x86/include/asm/fpu/internal.h
>b/arch/x86/include/asm/fpu/internal.h
>index fa2c93cb42a2..c525b053b3b3 100644
>--- a/arch/x86/include/asm/fpu/internal.h
>+++ b/arch/x86/include/asm/fpu/internal.h
>@@ -291,7 +291,7 @@ static inline void
>copy_xregs_to_kernel_booting(struct xregs_state *xstate)
> 
>   WARN_ON(system_state != SYSTEM_BOOTING);
> 
>-  if (static_cpu_has(X86_FEATURE_XSAVES))
>+  if (boot_cpu_has(X86_FEATURE_XSAVES))
>   XSTATE_OP(XSAVES, xstate, lmask, hmask, err);
>   else
>   XSTATE_OP(XSAVE, xstate, lmask, hmask, err);
>@@ -313,7 +313,7 @@ static inline void
>copy_kernel_to_xregs_booting(struct xregs_state *xstate)
> 
>   WARN_ON(system_state != SYSTEM_BOOTING);
> 
>-  if (static_cpu_has(X86_FEATURE_XSAVES))
>+  if (boot_cpu_has(X86_FEATURE_XSAVES))
>   XSTATE_OP(XRSTORS, xstate, lmask, hmask, err);
>   else
>   XSTATE_OP(XRSTOR, xstate, lmask, hmask, err);
>@@ -528,8 +528,7 @@ static inline void fpregs_activate(struct fpu *fpu)
>  *  - switch_fpu_finish() restores the new state as
>  *necessary.
>  */
>-static inline void
>-switch_fpu_prepare(struct fpu *old_fpu, int cpu)
>+static inline void switch_fpu_prepare(struct fpu *old_fpu, int cpu)
> {
>   if (static_cpu_has(X86_FEATURE_FPU) && old_fpu->initialized) {
>   if (!copy_fpregs_to_fpstate(old_fpu))
>diff --git a/arch/x86/kernel/apic/apic_numachip.c
>b/arch/x86/kernel/apic/apic_numachip.c
>index 78778b54f904..a5464b8b6c46 100644
>--- a/arch/x86/kernel/apic/apic_numachip.c
>+++ b/arch/x86/kernel/apic/apic_numachip.c
>@@ -175,7 +175,7 @@ static void fixup_cpu_id(struct cpuinfo_x86 *c, int
>node)
>   this_cpu_write(cpu_llc_id, node);
> 
>   /* Account for nodes per socket in multi-core-module processors */
>-  if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
>+  if (boot_cpu_has(X86_FEATURE_NODEID_MSR)) {
>   rdmsrl(MSR_FAM10H_NODE_ID, val);
>   nodes = ((val >> 3) & 7) + 1;
>   }
>diff --git a/arch/x86/kernel/cpu/aperfmperf.c
>b/arch/x86/kernel/cpu/aperfmperf.c
>index 804c49493938..64d5aec24203 100644
>--- a/arch/x86/kernel/cpu/aperfmperf.c
>+++ b/arch/x86/kernel/cpu/aperfmperf.c
>@@ -83,7 +83,7 @@ unsigned int aperfmperf_get_khz(int cpu)
>   if (!cpu_khz)
>   return 0;
> 
>-  if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+  if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
>   return 0;
> 
>   aperfmperf_snapshot_cpu(cpu, ktime_get(), true);
>@@ -99,7 +99,7 @@ void arch_freq_prepare_all(void)
>   if (!cpu_khz)
>   return;
> 
>-  if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+  if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
>   return;
> 
>   for_each_online_cpu(cpu)
>@@ -115,7 +115,7 @@ unsigned int arch_freq_get_on_cpu(int cpu)
>   if (!cpu_khz)
>   return 0;
> 
>-  if (!static_cpu_has(X86_FEATURE_APERFMPERF))
>+  if (!boot_cpu_has(X86_FEATURE_APERFMPERF))
>   return 0;
> 
>   if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true))
>diff --git a/arch/x86/kernel/cpu/common.c
>b/arch/x86/kernel/cpu/common.c
>index cb28e98a0659..95a5faf3a6a0 100644
>--- a/arch/x86/kernel/cpu/common.c
>+++ b/arch/x86/kernel/cpu/common.c
>@@ -1668,7 +1668,7 @@ static void setup_getcpu(int cpu)
>   unsigned long cpudata = vdso_encode_cpunode(cpu,

Re: [PATCH] x86/cpufeature: Remove __pure attribute to _static_cpu_has()

2019-03-07 Thread hpa

On March 7, 2019 7:10:36 AM PST, Borislav Petkov  wrote:
>On Mon, Feb 11, 2019 at 12:32:41PM -0800, Nadav Amit wrote:
>> BTW: the “__pure” attribute is useless when “__always_inline” is
>used.
>> Unless it is intended to be some sort of comment, of course.
>
>---
>From: Borislav Petkov 
>Date: Thu, 7 Mar 2019 15:54:51 +0100
>
>__pure is used to make gcc do Common Subexpression Elimination (CSE)
>and thus save subsequent invocations of a function which does a complex
>computation (without side effects). As a simple example:
>
>  bool a = _static_cpu_has(x);
>  bool b = _static_cpu_has(x);
>
>gets turned into
>
>  bool a = _static_cpu_has(x);
>  bool b = a;
>
>However, gcc doesn't do CSE with asm()s when those get inlined - like
>it
>is done with _static_cpu_has() - because, for example, the t_yes/t_no
>labels are different for each inlined function body and thus cannot be
>detected as equivalent anymore for the CSE heuristic to hit.
>
>However, this all is beside the point because best it should be avoided
>to have more than one call to _static_cpu_has(X) in the same function
>due to the fact that each such call is an alternatives patch site and
>it
>is simply pointless.
>
>Therefore, drop the __pure attribute as it is not doing anything.
>
>Reported-by: Nadav Amit 
>Signed-off-by: Borislav Petkov 
>Cc: Peter Zijlstra 
>Cc: x...@kernel.org
>---
> arch/x86/include/asm/cpufeature.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/include/asm/cpufeature.h
>b/arch/x86/include/asm/cpufeature.h
>index e25d11ad7a88..6d6d5cc4302b 100644
>--- a/arch/x86/include/asm/cpufeature.h
>+++ b/arch/x86/include/asm/cpufeature.h
>@@ -162,7 +162,7 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c,
>unsigned int bit);
>* majority of cases and you should stick to using it as it is generally
>  * only two instructions: a RIP-relative MOV and a TEST.
>  */
>-static __always_inline __pure bool _static_cpu_has(u16 bit)
>+static __always_inline bool _static_cpu_has(u16 bit)
> {
>   asm_volatile_goto("1: jmp 6f\n"
>"2:\n"

Uhm... (a) it is correct, even if the compiler doesn't use it now, it allows 
the compiler to CSE it in the future; (b) it is documentation; (c) there is an 
actual bug here: the "volatile" implies a side effect, which in reality is not 
present, inhibiting CSE.

So the correct fix is to remove "volatile", not remove "__pure".
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC][PATCH] objtool: STAC/CLAC validation

2019-02-25 Thread hpa

On February 23, 2019 12:39:42 AM PST, Peter Zijlstra  
wrote:
>On Fri, Feb 22, 2019 at 03:39:48PM -0800, h...@zytor.com wrote:
>> Objtool could also detect CLAC-STAC or STAC-CLAC sequences without
>> memory operations and remove them; don't know how often that happens,
>> but I know it *does* happen.
>
>Objtool doesn't know about memops; that'd be a lot of work. Also,
>objtool doesn't actually rewrite the text, at best it could warn about
>such occurences.

It doesn't even have to change the text per se, just nullify the altinstr.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [RFC][PATCH] objtool: STAC/CLAC validation

2019-02-25 Thread hpa

On February 23, 2019 12:39:42 AM PST, Peter Zijlstra  
wrote:
>On Fri, Feb 22, 2019 at 03:39:48PM -0800, h...@zytor.com wrote:
>> Objtool could also detect CLAC-STAC or STAC-CLAC sequences without
>> memory operations and remove them; don't know how often that happens,
>> but I know it *does* happen.
>
>Objtool doesn't know about memops; that'd be a lot of work. Also,
>objtool doesn't actually rewrite the text, at best it could warn about
>such occurences.

It doesn't have to understand the contents of the memop, but it seems that the 
presence of a modrm with mode ≠ 3 should be plenty. It needs to know that much 
in order to know the length of instructions anyway. For extra credit, ignore 
LEA or hinting instructions.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

1 2 3 4 5 >

1 - 100 of 438 matches

Mail list logo