[tip: perf/kprobes] locking/atomics: Regenerate the atomics-check SHA1's

2020-11-07 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the perf/kprobes branch of tip:

Commit-ID: a70a04b3844f59c29573a8581d5c263225060dd6
Gitweb:
https://git.kernel.org/tip/a70a04b3844f59c29573a8581d5c263225060dd6
Author:Ingo Molnar 
AuthorDate:Sat, 07 Nov 2020 12:54:49 +01:00
Committer: Ingo Molnar 
CommitterDate: Sat, 07 Nov 2020 13:20:41 +01:00

locking/atomics: Regenerate the atomics-check SHA1's

The include/asm-generic/atomic-instrumented.h checksum got out
of sync, so regenerate it. (No change to actual code.)

Also make scripts/atomic/gen-atomics.sh executable, to make
it easier to use.

The auto-generated atomic header signatures are now fine:

  thule:~/tip> scripts/atomic/check-atomics.sh
  thule:~/tip>

Signed-off-by: Ingo Molnar 
Cc: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Paul E. McKenney 
Cc: Will Deacon 
Signed-off-by: Ingo Molnar 
---
 include/asm-generic/atomic-instrumented.h | 2 +-
 scripts/atomic/gen-atomics.sh | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 mode change 100644 => 100755 scripts/atomic/gen-atomics.sh

diff --git a/include/asm-generic/atomic-instrumented.h 
b/include/asm-generic/atomic-instrumented.h
index 492cc95..888b6cf 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -1830,4 +1830,4 @@ atomic64_dec_if_positive(atomic64_t *v)
 })
 
 #endif /* _ASM_GENERIC_ATOMIC_INSTRUMENTED_H */
-// 9d5e6a315fb1335d02f0ccd3655a91c3dafcc63e
+// 4bec382e44520f4d8267e42620054db26a659ea3
diff --git a/scripts/atomic/gen-atomics.sh b/scripts/atomic/gen-atomics.sh
old mode 100644
new mode 100755


Re: [GIT PULL] RCU changes for v5.10

2020-10-19 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar  wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> >git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
> > core-rcu-2020-10-12
> 
> I've pulled everything but that last merge and the PREEMPT_COUNT 
> stuff that came with it.
> 
> When Paul asked whether it was ok for RCU to use preempt_count() and 
> I answered in the affirmative, I didn't mean it in the sense of "RCU 
> wants to force it on everybody else too".
>
> I'm pretty convinced that the proper fix is to simply make sure that 
> rcu_free() and friends aren't run under any raw spinlocks. So even 
> if the cost of preempt-count isn't that noticeable, there just isn't 
> a reason for RCU to say "screw everybody else, I want this" when 
> there are other alternatives.

That's certainly true - thanks for catching this & sorting it out from 
the bigger pull request!

Thanks,

Ingo


Re: [PATCH 1/2] x86/insn: Fix some potential undefined behavior.

2020-10-15 Thread Ingo Molnar


* Ian Rogers  wrote:

> From: Numfor Mbiziwo-Tiapo 
> 
> If insn_init is given a NULL kaddr and 0 buflen then validate_next will
> perform arithmetic on NULL, add a guard to avoid this.
> 
> Don't perform unaligned loads in __get_next and __peek_nbyte_next as
> these are forms of undefined behavior.

So, 'insn' is a kernel structure, usually allocated on the kernel stack. 
How could these fields ever be unaligned?

> 
> These problems were identified using the undefined behavior sanitizer
> (ubsan) with the tools version of the code and perf test. Part of this
> patch was previously posted here:
> https://lore.kernel.org/lkml/20190724184512.162887-4-n...@google.com/
> 
> Signed-off-by: Ian Rogers 
> Signed-off-by: Numfor Mbiziwo-Tiapo 
> ---
>  arch/x86/lib/insn.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/lib/insn.c b/arch/x86/lib/insn.c
> index 404279563891..57236940de46 100644
> --- a/arch/x86/lib/insn.c
> +++ b/arch/x86/lib/insn.c
> @@ -17,13 +17,13 @@
>  
>  /* Verify next sizeof(t) bytes can be on the same instruction */
>  #define validate_next(t, insn, n)\
> - ((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
> + ((insn)->end_kaddr != 0 && (insn)->next_byte + sizeof(t) + n <= 
> (insn)->end_kaddr)
>  
>  #define __get_next(t, insn)  \
> - ({ t r = *(t*)insn->next_byte; insn->next_byte += sizeof(t); r; })
> + ({ t r; memcpy(, insn->next_byte, sizeof(t)); insn->next_byte += 
> sizeof(t); r; })
>  
>  #define __peek_nbyte_next(t, insn, n)\
> - ({ t r = *(t*)((insn)->next_byte + n); r; })
> + ({ t r; memcpy(, (insn)->next_byte + n, sizeof(t)); r; })
>  
>  #define get_next(t, insn)\
>   ({ if (unlikely(!validate_next(t, insn, 0))) goto err_out; 
> __get_next(t, insn); })

Is there any code generation side effect of this change to the resulting 
code?

Thanks,

Ingo


Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation

2020-10-14 Thread Ingo Molnar


* Alexey Budankov  wrote:

> 
> Patch set provides threaded trace streaming for base perf record
> operation. Provided streaming mode (--threads) mitigates profiling
> data losses and resolves scalability issues of serial and asynchronous
> (--aio) trace streaming modes on multicore server systems. The patch
> set is based on the prototype [1], [2] and the most closely relates
> to mode 3) "mode that creates thread for every monitored memory map".
> 
> The threaded mode executes one-to-one mapping of trace streaming threads
> to mapped data buffers and streaming into per-CPU trace files located
> at data directory. The data buffers and threads are affined to NUMA
> nodes and monitored CPUs according to system topology. --cpu option
> can be used to specify exact CPUs to be monitored.

Yay! This should really be the default trace capture model everywhere 
possible.

Can we do this for perf top too? It's really struggling with lots of cores.

If on a 64-core system I run just a moderately higher frequency 'perf top' 
of 1 kHz:

  perf top -e cycles -F 1000

perf stays stuck forever in 'Collecting samples...', and I also get a lot 
of:

  [548112.871089] Uhhuh. NMI received for unknown reason 31 on CPU 25.
  [548112.871089] Do you have a strange power saving mode enabled?

Thanks,

Ingo


Re: [PATCH v6 02/25] objtool: Add a pass for generating __mcount_loc

2020-10-14 Thread Ingo Molnar


* Sami Tolvanen  wrote:

> From: Peter Zijlstra 
> 
> Add the --mcount option for generating __mcount_loc sections
> needed for dynamic ftrace. Using this pass requires the kernel to
> be compiled with -mfentry and CC_USING_NOP_MCOUNT to be defined
> in Makefile.
> 
> Link: 
> https://lore.kernel.org/lkml/20200625200235.gq4...@hirez.programming.kicks-ass.net/
> Signed-off-by: Peter Zijlstra 
> [Sami: rebased, dropped config changes, fixed to actually use --mcount,
>and wrote a commit message.]
> Signed-off-by: Sami Tolvanen 
> Reviewed-by: Kees Cook 
> ---
>  tools/objtool/builtin-check.c |  3 +-
>  tools/objtool/builtin.h   |  2 +-
>  tools/objtool/check.c | 82 +++
>  tools/objtool/check.h |  1 +
>  tools/objtool/objtool.c   |  1 +
>  tools/objtool/objtool.h   |  1 +
>  6 files changed, 88 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
> index c6d199bfd0ae..e92e76f69176 100644
> --- a/tools/objtool/builtin-check.c
> +++ b/tools/objtool/builtin-check.c
> @@ -18,7 +18,7 @@
>  #include "builtin.h"
>  #include "objtool.h"
>  
> -bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, 
> validate_dup, vmlinux;
> +bool no_fp, no_unreachable, retpoline, module, backtrace, uaccess, stats, 
> validate_dup, vmlinux, mcount;
>  
>  static const char * const check_usage[] = {
>   "objtool check [] file.o",
> @@ -35,6 +35,7 @@ const struct option check_options[] = {
>   OPT_BOOLEAN('s', "stats", , "print statistics"),
>   OPT_BOOLEAN('d', "duplicate", _dup, "duplicate validation for 
> vmlinux.o"),
>   OPT_BOOLEAN('l', "vmlinux", , "vmlinux.o validation"),
> + OPT_BOOLEAN('M', "mcount", , "generate __mcount_loc"),
>   OPT_END(),
>  };

Meh, adding --mcount as an option to 'objtool check' was a valid hack for a 
prototype patchset, but please turn this into a proper subcommand, just 
like 'objtool orc' is.

'objtool check' should ... keep checking. :-)

Thanks,

Ingo


Re: [PATCH v4 5/5] x86: mremap speedup - Enable HAVE_MOVE_PUD

2020-10-14 Thread Ingo Molnar


* Kalesh Singh  wrote:

> HAVE_MOVE_PUD enables remapping pages at the PUD level if both the
> source and destination addresses are PUD-aligned.
> 
> With HAVE_MOVE_PUD enabled it can be inferred that there is approximately
> a 13x improvement in performance on x86. (See data below).
> 
> --- Test Results -
> 
> The following results were obtained using a 5.4 kernel, by remapping
> a PUD-aligned, 1GB sized region to a PUD-aligned destination.
> The results from 10 iterations of the test are given below:
> 
> Total mremap times for 1GB data on x86. All times are in nanoseconds.
> 
> ControlHAVE_MOVE_PUD
> 
> 180394 15089
> 235728 14056
> 238931 25741
> 187330 13838
> 241742 14187
> 177925 14778
> 182758 14728
> 160872 14418
> 205813 15107
> 245722 13998
> 
> 205721.5   15594<-- Mean time in nanoseconds
> 
> A 1GB mremap completion time drops from ~205 microseconds
> to ~15 microseconds on x86. (~13x speed up).
> 
> Signed-off-by: Kalesh Singh 
> Acked-by: Kirill A. Shutemov 
> Cc: Andrew Morton 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: H. Peter Anvin 

Nice!

Assuming it's all correct code:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH 7/8] x86/cpu/intel: enable X86_FEATURE_NT_GOOD on Intel Broadwellx

2020-10-14 Thread Ingo Molnar


* Ankur Arora  wrote:

> System:   Oracle X6-2
> CPU:  2 nodes * 10 cores/node * 2 threads/core
> Intel Xeon E5-2630 v4 (Broadwellx, 6:79:1)
> Memory:   256 GB evenly split between nodes
> Microcode:0xb2e
> scaling_governor: performance
> L3 size:  25MB
> intel_pstate/no_turbo: 1
> 
> Performance comparison of 'perf bench mem memset -l 1' for x86-64-stosb
> (X86_FEATURE_ERMS) and x86-64-movnt (X86_FEATURE_NT_GOOD):
> 
>   x86-64-stosb (5 runs) x86-64-movnt (5 runs)   speedup
>   ---   --- ---
>  size   BW(   pstdev)  BW   (   pstdev)
> 
>  16MB  17.35 GB/s ( +- 9.27%)11.83 GB/s ( +- 0.19%) -31.81%
> 128MB   5.31 GB/s ( +- 0.13%)11.72 GB/s ( +- 0.44%)+121.84%
>1024MB   5.42 GB/s ( +- 0.13%)11.78 GB/s ( +- 0.03%)+117.34%
>4096MB   5.41 GB/s ( +- 0.41%)11.76 GB/s ( +- 0.07%)+117.37%

> + if (c->x86 == 6 && c->x86_model == INTEL_FAM6_BROADWELL_X)
> + set_cpu_cap(c, X86_FEATURE_NT_GOOD);

So while I agree with how you've done careful measurements to isolate bad 
microarchitectures where non-temporal stores are slow, I do think this 
approach of opt-in doesn't scale and is hard to maintain.

Instead I'd suggest enabling this by default everywhere, and creating a 
X86_FEATURE_NT_BAD quirk table for the bad microarchitectures.

This means that with new microarchitectures we'd get automatic enablement, 
and hopefully chip testing would identify cases where performance isn't as 
good.

I.e. the 'trust but verify' method.

Thanks,

Ingo


Re: [PATCH 6/8] mm, clear_huge_page: use clear_page_uncached() for gigantic pages

2020-10-14 Thread Ingo Molnar


* Ankur Arora  wrote:

> Uncached writes are suitable for circumstances where the region written to
> is not expected to be read again soon, or the region written to is large
> enough that there's no expectation that we will find the writes in the
> cache.
> 
> Accordingly switch to using clear_page_uncached() for gigantic pages.
> 
> Signed-off-by: Ankur Arora 
> ---
>  mm/memory.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index eeae590e526a..4d2c58f83ab1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5092,7 +5092,7 @@ static void clear_gigantic_page(struct page *page,
>   for (i = 0; i < pages_per_huge_page;
>i++, p = mem_map_next(p, page, i)) {
>   cond_resched();
> - clear_user_highpage(p, addr + i * PAGE_SIZE);
> + clear_user_highpage_uncached(p, addr + i * PAGE_SIZE);
>   }
>  }

So this does the clearing in 4K chunks, and your measurements suggest that 
short memory clearing is not as efficient, right?

I'm wondering whether it would make sense to do 2MB chunked clearing on 
64-bit CPUs, instead of 512x 4k clearing? Both 2MB and GB pages are 
continuous in memory, so accessible to these instructions in a single 
narrow loop.

Thanks,

Ingo


[GIT PULL v2] objtool changes for v5.10

2020-10-13 Thread Ingo Molnar


* Ingo Molnar  wrote:

> > This seems to be missing
> > 
> > https://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours/
> > 
> > or did that get sent in a previous pull request?
> 
> No, that fix is still missing, thanks for the reminder. I overlooked it 
> thinking that it's a tooling patch - but this needs to be paired with:
> 
>   2486baae2cf6: ("objtool: Allow nested externs to enable BUILD_BUG()")
> 
> I'll send a v2 pull request in an hour or two.

Linus,

Please pull the latest objtool/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
objtool-core-2020-10-13

   # HEAD: ab0a40ea88204e1291b56da8128e2845fec8ee88 perf build: Allow nested 
externs to enable BUILD_BUG() usage

objtool changes for v5.10:

 - Most of the changes are cleanups and reorganization to make the objtool code
   more arch-agnostic. This is in preparation for non-x86 support.

Fixes:

 - KASAN fixes.
 - Handle unreachable trap after call to noreturn functions better.
 - Ignore unreachable fake jumps.
 - Misc smaller fixes & cleanups.

 Thanks,

Ingo

-->
Ilie Halip (1):
  objtool: Ignore unreachable trap after call to noreturn functions

Jann Horn (1):
  objtool: Permit __kasan_check_{read,write} under UACCESS

Julien Thierry (16):
  objtool: Move object file loading out of check()
  objtool: Move ORC logic out of check()
  objtool: Skip ORC entry creation for non-text sections
  objtool: Define 'struct orc_entry' only when needed
  objtool: Group headers to check in a single list
  objtool: Make sync-check consider the target architecture
  objtool: Move macros describing structures to arch-dependent code
  objtool: Abstract alternative special case handling
  objtool: Make relocation in alternative handling arch dependent
  objtool: Rename frame.h -> objtool.h
  objtool: Only include valid definitions depending on source file type
  objtool: Make unwind hint definitions available to other architectures
  objtool: Decode unwind hint register depending on architecture
  objtool: Remove useless tests before save_reg()
  objtool: Ignore unreachable fake jumps
  objtool: Handle calling non-function symbols in other sections

Raphael Gault (1):
  objtool: Refactor jump table code to support other architectures

Vasily Gorbik (2):
  objtool: Allow nested externs to enable BUILD_BUG()
  perf build: Allow nested externs to enable BUILD_BUG() usage


 MAINTAINERS   |   1 +
 arch/x86/include/asm/nospec-branch.h  |   2 +-
 arch/x86/include/asm/orc_types.h  |  34 
 arch/x86/include/asm/unwind_hints.h   |  56 ++-
 arch/x86/kernel/kprobes/core.c|   2 +-
 arch/x86/kernel/kprobes/opt.c |   2 +-
 arch/x86/kernel/reboot.c  |   2 +-
 arch/x86/kernel/unwind_orc.c  |  11 +-
 arch/x86/kvm/svm/svm.c|   2 +-
 arch/x86/kvm/vmx/nested.c |   2 +-
 arch/x86/kvm/vmx/vmx.c|   2 +-
 arch/x86/xen/enlighten_pv.c   |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c   |   3 +-
 include/linux/frame.h |  35 
 include/linux/objtool.h   | 129 +++
 kernel/bpf/core.c |   2 +-
 kernel/kexec_core.c   |   2 +-
 tools/arch/x86/include/asm/orc_types.h|  34 
 tools/include/linux/objtool.h | 129 +++
 tools/objtool/Makefile|   6 +-
 tools/objtool/arch.h  |   4 +
 tools/objtool/arch/x86/Build  |   1 +
 tools/objtool/arch/x86/decode.c   |  37 +
 tools/objtool/arch/x86/include/arch_special.h |  20 +++
 tools/objtool/arch/x86/special.c  | 145 
 tools/objtool/builtin-check.c |  15 +-
 tools/objtool/builtin-orc.c   |  27 ++-
 tools/objtool/check.c | 230 ++
 tools/objtool/check.h |   9 +-
 tools/objtool/objtool.c   |  30 
 tools/objtool/objtool.h   |   6 +-
 tools/objtool/orc_dump.c  |   9 +-
 tools/objtool/orc_gen.c   |   8 +-
 tools/objtool/special.c   |  48 +-
 tools/objtool/special.h   |  10 ++
 tools/objtool/sync-check.sh   |  32 +++-
 tools/objtool/weak.c  |   6 +-
 tools/perf/Makefile.config|   2 +-
 38 files changed, 686 insertions(+), 411 deletions(-)
 delete mode 100644 include/linux/frame.h
 create mode 100644 include/linux/objtool.h
 cre

Re: [GIT PULL] objtool changes for v5.10

2020-10-13 Thread Ingo Molnar


* Stephen Rothwell  wrote:

> Hi Ingo,
> 
> On Tue, 13 Oct 2020 10:26:25 +0200 Ingo Molnar  wrote:
> >
> > Ilie Halip (1):
> >   objtool: Ignore unreachable trap after call to noreturn functions
> > 
> > Jann Horn (1):
> >   objtool: Permit __kasan_check_{read,write} under UACCESS
> > 
> > Julien Thierry (16):
> >   objtool: Move object file loading out of check()
> >   objtool: Move ORC logic out of check()
> >   objtool: Skip ORC entry creation for non-text sections
> >   objtool: Define 'struct orc_entry' only when needed
> >   objtool: Group headers to check in a single list
> >   objtool: Make sync-check consider the target architecture
> >   objtool: Move macros describing structures to arch-dependent code
> >   objtool: Abstract alternative special case handling
> >   objtool: Make relocation in alternative handling arch dependent
> >   objtool: Rename frame.h -> objtool.h
> >   objtool: Only include valid definitions depending on source file type
> >   objtool: Make unwind hint definitions available to other architectures
> >   objtool: Decode unwind hint register depending on architecture
> >   objtool: Remove useless tests before save_reg()
> >   objtool: Ignore unreachable fake jumps
> >   objtool: Handle calling non-function symbols in other sections
> > 
> > Raphael Gault (1):
> >   objtool: Refactor jump table code to support other architectures
> > 
> > Vasily Gorbik (1):
> >   objtool: Allow nested externs to enable BUILD_BUG()
> 
> This seems to be missing
> 
> https://lore.kernel.org/lkml/patch-1.thread-251403.git-2514037e9477.your-ad-here.call-01602244460-ext-7088@work.hours/
> 
> or did that get sent in a previous pull request?

No, that fix is still missing, thanks for the reminder. I overlooked it 
thinking that it's a tooling patch - but this needs to be paired with:

  2486baae2cf6: ("objtool: Allow nested externs to enable BUILD_BUG()")

I'll send a v2 pull request in an hour or two.

Thanks,

Ingo



[GIT PULL] objtool changes for v5.10

2020-10-13 Thread Ingo Molnar
Linus,

Please pull the latest objtool/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
objtool-core-2020-10-13

   # HEAD: 2486baae2cf6df73554144d0a4e40ae8809b54d4 objtool: Allow nested 
externs to enable BUILD_BUG()

objtool changes for v5.10:

 - Most of the changes are cleanups and reorganization to make the objtool code
   more arch-agnostic. This is in preparation for non-x86 support.

Fixes:

 - KASAN fixes.
 - Handle unreachable trap after call to noreturn functions better.
 - Ignore unreachable fake jumps.
 - Misc smaller fixes & cleanups.

 Thanks,

Ingo

-->
Ilie Halip (1):
  objtool: Ignore unreachable trap after call to noreturn functions

Jann Horn (1):
  objtool: Permit __kasan_check_{read,write} under UACCESS

Julien Thierry (16):
  objtool: Move object file loading out of check()
  objtool: Move ORC logic out of check()
  objtool: Skip ORC entry creation for non-text sections
  objtool: Define 'struct orc_entry' only when needed
  objtool: Group headers to check in a single list
  objtool: Make sync-check consider the target architecture
  objtool: Move macros describing structures to arch-dependent code
  objtool: Abstract alternative special case handling
  objtool: Make relocation in alternative handling arch dependent
  objtool: Rename frame.h -> objtool.h
  objtool: Only include valid definitions depending on source file type
  objtool: Make unwind hint definitions available to other architectures
  objtool: Decode unwind hint register depending on architecture
  objtool: Remove useless tests before save_reg()
  objtool: Ignore unreachable fake jumps
  objtool: Handle calling non-function symbols in other sections

Raphael Gault (1):
  objtool: Refactor jump table code to support other architectures

Vasily Gorbik (1):
  objtool: Allow nested externs to enable BUILD_BUG()


 MAINTAINERS   |   1 +
 arch/x86/include/asm/nospec-branch.h  |   2 +-
 arch/x86/include/asm/orc_types.h  |  34 
 arch/x86/include/asm/unwind_hints.h   |  56 ++-
 arch/x86/kernel/kprobes/core.c|   2 +-
 arch/x86/kernel/kprobes/opt.c |   2 +-
 arch/x86/kernel/reboot.c  |   2 +-
 arch/x86/kernel/unwind_orc.c  |  11 +-
 arch/x86/kvm/svm/svm.c|   2 +-
 arch/x86/kvm/vmx/nested.c |   2 +-
 arch/x86/kvm/vmx/vmx.c|   2 +-
 arch/x86/xen/enlighten_pv.c   |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg.c   |   3 +-
 include/linux/frame.h |  35 
 include/linux/objtool.h   | 129 +++
 kernel/bpf/core.c |   2 +-
 kernel/kexec_core.c   |   2 +-
 tools/arch/x86/include/asm/orc_types.h|  34 
 tools/include/linux/objtool.h | 129 +++
 tools/objtool/Makefile|   6 +-
 tools/objtool/arch.h  |   4 +
 tools/objtool/arch/x86/Build  |   1 +
 tools/objtool/arch/x86/decode.c   |  37 +
 tools/objtool/arch/x86/include/arch_special.h |  20 +++
 tools/objtool/arch/x86/special.c  | 145 
 tools/objtool/builtin-check.c |  15 +-
 tools/objtool/builtin-orc.c   |  27 ++-
 tools/objtool/check.c | 230 ++
 tools/objtool/check.h |   9 +-
 tools/objtool/objtool.c   |  30 
 tools/objtool/objtool.h   |   6 +-
 tools/objtool/orc_dump.c  |   9 +-
 tools/objtool/orc_gen.c   |   8 +-
 tools/objtool/special.c   |  48 +-
 tools/objtool/special.h   |  10 ++
 tools/objtool/sync-check.sh   |  32 +++-
 tools/objtool/weak.c  |   6 +-
 37 files changed, 685 insertions(+), 410 deletions(-)
 delete mode 100644 include/linux/frame.h
 create mode 100644 include/linux/objtool.h
 create mode 100644 tools/include/linux/objtool.h
 create mode 100644 tools/objtool/arch/x86/include/arch_special.h
 create mode 100644 tools/objtool/arch/x86/special.c


Re: [GIT PULL] RCU changes for v5.10

2020-10-13 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Mon, Oct 12, 2020 at 7:14 AM Ingo Molnar  wrote:
> >
> > Please pull the latest core/rcu git tree from:
> >
> > RCU changes for v5.10:
> >
> >  - Debugging for smp_call_function()
> >  - RT raw/non-raw lock ordering fixes
> >  - Strict grace periods for KASAN
> >  - New smp_call_function() torture test
> >  - Torture-test updates
> >  - Documentation updates
> >  - Miscellaneous fixes
> 
> I am *very* unhappy with this pull request.
> 
> It doesn't even mention the big removal of CONFIR_PREEMPT, that I felt 
> was still under discussion.

Not mentioning the unconditional PREEMPT_COUNT enabling aspect was 100% my 
fault in summarizing the changes insufficiently, as I (mistakenly) thought 
them to be uncontroversial. My apologies for that!

Here's a second attempt to properly justify these changes:

Regarding the performance aspect of the change, I was relying on these 
performance measurements:

  "Freshly conducted benchmarks did not reveal any measurable impact from 
   enabling preempt count unconditionally. On kernels with 
   CONFIG_PREEMPT_NONE or CONFIG_PREEMPT_VOLUNTARY the preempt count is only 
   incremented and decremented but the result of the decrement is not 
   tested. Contrary to that enabling CONFIG_PREEMPT which tests the result 
   has a small but measurable impact due to the conditional branch/call."

FWIW, to inject some hard numbers into this discussion, here's also the 
code generation impact of an unconditional PREEMPT_COUNT, on x86-defconfig:

  text   databssfilename
  1967593755910361433672vmlinux.ubuntu.vanilla  # 
856deb866d16: ("Linux 5.9-rc5")
  1968238255909641425480vmlinux.ubuntu.PREEMPT_COUNT=y  # 
7681205ba49d: ("preempt: Make preempt count unconditional")

So this is a pretty small, +0.03% increase (+6k) in generated code in the 
core kernel, and it doesn't add widespread new control dependencies either.

I also measured the core kernel code generation impact on the kernel config 
from a major Linux distribution that uses PREEMPT_VOLUNTARY=y (Ubuntu):

  kepler:~/tip> grep PREEMPT .config
  # CONFIG_PREEMPT_NONE is not set
  CONFIG_PREEMPT_VOLUNTARY=y
  # CONFIG_PREEMPT is not set
  CONFIG_PREEMPT_COUNT=y
  CONFIG_PREEMPT_NOTIFIERS=y

 text   databss  filename
  15754341137907865242880vmlinux.ubuntu.vanilla  # 
856deb866d16: ("Linux 5.9-rc5")
  15754790137910185242880vmlinux.ubuntu.PREEMPT_COUNT=y  # 
7681205ba49d: ("preempt: Make preempt count unconditional")
  15754771137910185242880vmlinux.ubuntu.full_cleanups# 
849b9c5446cc: ("kvfree_rcu(): Fix ifnullfree.cocci warnings")

In this test the changes result in very little generated code increase in 
the core kernel, just +449 bytes, or +0.003%.

In fact the impact was so low on this config that I initially disbelieved 
it and double-checked the result and re-ran the build with all =m's turned 
into =y's, to get a whole-kernel measurement of the generated code impact:

  text   databss  filename
  845944486181961342000384vmlinux.ubuntu.vanilla  # 
856deb866d16: ("Linux 5.9-rc5")
  845941296181977742000384vmlinux.ubuntu.PREEMPT_COUNT=y  # 
7681205ba49d: ("preempt: Make preempt count unconditional")

Note how the full ~84 MB image actually *shrunk*, possibly due to random 
function & section alignment noise.

So to get a truly sensitive measurement of the impact of the PREEMPT_COUNT 
change I built with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, to get tight instruction 
packing and no alignment padding artifacts:

  textdata bssfilename
  694603296093257340411136vmlinux.ubuntu.vanilla  # 
856deb866d16: ("Linux 5.9-rc5")
  694607396093685340411136vmlinux.ubuntu.PREEMPT_COUNT=y  # 
7681205ba49d: ("preempt: Make preempt count unconditional")

This shows a 410 bytes (+0.0005%) increase.

  ( Side note: it's rather impressive that -Os saves 21% of text size - if 
only GCC wasn't so stupid with the final 2-3% size optimizations... )

So there's even less relative impact on the whole 84 MB kernel image - 
modules don't do much direct preempt_count manipulation.

Just for completeness' sake I re-ran the original defconfig build as well, 
this time with -Os:

 text   databss filename
  1609169655659882928696vmlinux.defconfig.Os.vanilla  # 
856deb866d16: ("Linux 5.9-rc5")
  1609552555701562928696vmlinux.defconfig.Os.PREEMPT_COUNT=y  # 
7681205ba49d: ("preempt: Make preempt count unconditional")

3.8k, or +0.025% - similar to the initial +0.03% result.

So even though I'm normally fiercely anti-bloat, if we combine the 

[GIT PULL] x86/hyperv change for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest x86/hyperv git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-hyperv-2020-10-12

   # HEAD: dfc53baae3c6a165a35735b789e3e083786271d6 x86/hyperv: Remove aliases 
with X64 in their name

A single commit harmonizing the x86 and ARM64 Hyper-V constants namespace.

 Thanks,

Ingo

-->
Joseph Salisbury (1):
  x86/hyperv: Remove aliases with X64 in their name


 arch/x86/hyperv/hv_init.c  |  8 
 arch/x86/hyperv/hv_spinlock.c  |  2 +-
 arch/x86/include/asm/hyperv-tlfs.h | 33 -
 arch/x86/kernel/cpu/mshyperv.c |  8 
 arch/x86/kvm/hyperv.c  | 20 ++--
 5 files changed, 19 insertions(+), 52 deletions(-)



[GIT PULL] x86/paravirt changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest x86/paravirt git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-paravirt-2020-10-12

   # HEAD: 7c9f80cb76ec9f14c3b25509168b1a2f7942e418 x86/paravirt: Avoid 
needless paravirt step clearing page table entries

Clean up the paravirt code after the removal of 32-bit Xen PV support.

 Thanks,

Ingo

-->
Juergen Gross (6):
  x86/paravirt: Remove 32-bit support from CONFIG_PARAVIRT_XXL
  x86/paravirt: Clean up paravirt macros
  x86/paravirt: Use CONFIG_PARAVIRT_XXL instead of CONFIG_PARAVIRT
  x86/entry/32: Simplify CONFIG_XEN_PV build dependency
  x86/paravirt: Remove set_pte_at() pv-op
  x86/paravirt: Avoid needless paravirt step clearing page table entries


 arch/x86/entry/entry_64.S   |   4 +-
 arch/x86/entry/vdso/vdso32/vclock_gettime.c |   1 +
 arch/x86/include/asm/fixmap.h   |   2 +-
 arch/x86/include/asm/idtentry.h |   4 +-
 arch/x86/include/asm/paravirt.h | 151 
 arch/x86/include/asm/paravirt_types.h   |  23 -
 arch/x86/include/asm/pgtable-3level_types.h |   5 -
 arch/x86/include/asm/pgtable.h  |   7 +-
 arch/x86/include/asm/required-features.h|   2 +-
 arch/x86/include/asm/segment.h  |   4 -
 arch/x86/kernel/cpu/common.c|   8 --
 arch/x86/kernel/kprobes/core.c  |   1 -
 arch/x86/kernel/kprobes/opt.c   |   1 -
 arch/x86/kernel/paravirt.c  |  19 
 arch/x86/kernel/paravirt_patch.c|  17 
 arch/x86/xen/enlighten_pv.c |   6 --
 arch/x86/xen/mmu_pv.c   |   8 --
 include/trace/events/xen.h  |  20 
 18 files changed, 27 insertions(+), 256 deletions(-)


[GIT PULL] x86/build change for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest x86/build git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-build-2020-10-12

   # HEAD: 642d94cf336fe57675e63a91d11f53d74b9a3f9f x86/build: Declutter the 
build output

Remove a couple of ancient and distracting printouts from the x86 build,
such as the CRC sum or limited size data - most of which can be gained
via tools.

 Thanks,

Ingo

-->
Ingo Molnar (1):
  x86/build: Declutter the build output


 arch/x86/boot/tools/build.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index c8b8c1a8d1fc..a3725ad46c5a 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -416,8 +416,6 @@ int main(int argc, char ** argv)
/* Set the default root device */
put_unaligned_le16(DEFAULT_ROOT_DEV, [508]);
 
-   printf("Setup is %d bytes (padded to %d bytes).\n", c, i);
-
/* Open and stat the kernel file */
fd = open(argv[2], O_RDONLY);
if (fd < 0)
@@ -425,7 +423,6 @@ int main(int argc, char ** argv)
if (fstat(fd, ))
die("Unable to stat `%s': %m", argv[2]);
sz = sb.st_size;
-   printf("System is %d kB\n", (sz+1023)/1024);
kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
if (kernel == MAP_FAILED)
die("Unable to mmap '%s': %m", argv[2]);
@@ -488,7 +485,6 @@ int main(int argc, char ** argv)
}
 
/* Write the CRC */
-   printf("CRC %x\n", crc);
put_unaligned_le32(crc, buf);
if (fwrite(buf, 1, 4, dest) != 4)
die("Writing CRC failed");


[GIT PULL] x86/mm changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest x86/mm git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-10-12

   # HEAD: 7a27ef5e83089090f3a4073a9157c862ef00acfc x86/mm/64: Update comment 
in preallocate_vmalloc_pages()

Do not sync vmalloc/ioremap mappings on x86-64 kernels.

Hopefully now without the bugs!

 Thanks,

Ingo

-->
Joerg Roedel (2):
  x86/mm/64: Do not sync vmalloc/ioremap mappings
  x86/mm/64: Update comment in preallocate_vmalloc_pages()


 arch/x86/include/asm/pgtable_64_types.h |  2 --
 arch/x86/mm/init_64.c   | 20 ++--
 2 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 8f63efb2a2cc..52e5f5f2240d 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -159,6 +159,4 @@ extern unsigned int ptrs_per_p4d;
 
 #define PGD_KERNEL_START   ((PAGE_SIZE / 2) / sizeof(pgd_t))
 
-#define ARCH_PAGE_TABLE_SYNC_MASK  (pgtable_l5_enabled() ? 
PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
-
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a4ac13cc3fdc..b5a3fa4033d3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -217,11 +217,6 @@ static void sync_global_pgds(unsigned long start, unsigned 
long end)
sync_global_pgds_l4(start, end);
 }
 
-void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
-{
-   sync_global_pgds(start, end);
-}
-
 /*
  * NOTE: This function is marked __ref because it calls __init function
  * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
@@ -1257,14 +1252,19 @@ static void __init preallocate_vmalloc_pages(void)
if (!p4d)
goto failed;
 
-   /*
-* With 5-level paging the P4D level is not folded. So the PGDs
-* are now populated and there is no need to walk down to the
-* PUD level.
-*/
if (pgtable_l5_enabled())
continue;
 
+   /*
+* The goal here is to allocate all possibly required
+* hardware page tables pointed to by the top hardware
+* level.
+*
+* On 4-level systems, the P4D layer is folded away and
+* the above code does no preallocation.  Below, go down
+* to the pud _software_ level to ensure the second
+* hardware level is allocated on 4-level systems too.
+*/
lvl = "pud";
pud = pud_alloc(_mm, p4d, addr);
if (!pud)


[GIT PULL] x86/kaslr changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest x86/kaslr git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-kaslr-2020-10-12

   # HEAD: 76167e5c5457aee8fba3edc5b8554183696fc94d x86/kaslr: Replace strlen() 
with strnlen()

This tree cleans up and simplifies the x86 KASLR code, and
also fixes some corner case bugs.

 Thanks,

Ingo

-->
Arvind Sankar (22):
  x86/kaslr: Make command line handling safer
  x86/kaslr: Remove bogus warning and unnecessary goto
  x86/kaslr: Fix process_efi_entries comment
  x86/kaslr: Initialize mem_limit to the real maximum address
  x86/kaslr: Fix off-by-one error in __process_mem_region()
  x86/kaslr: Drop redundant cur_entry from __process_mem_region()
  x86/kaslr: Eliminate 'start_orig' local variable from 
__process_mem_region()
  x86/kaslr: Drop redundant variable in __process_mem_region()
  x86/kaslr: Drop some redundant checks from __process_mem_region()
  x86/kaslr: Fix off-by-one error in process_gb_huge_pages()
  x86/kaslr: Short-circuit gb_huge_pages on x86-32
  x86/kaslr: Simplify process_gb_huge_pages()
  x86/kaslr: Drop test for command-line parameters before parsing
  x86/kaslr: Make the type of number of slots/slot areas consistent
  x86/kaslr: Drop redundant check in store_slot_info()
  x86/kaslr: Drop unnecessary alignment in find_random_virt_addr()
  x86/kaslr: Small cleanup of find_random_phys_addr()
  x86/kaslr: Make minimum/image_size 'unsigned long'
  x86/kaslr: Replace 'unsigned long long' with 'u64'
  x86/kaslr: Make local variables 64-bit
  x86/kaslr: Add a check that the random address is in range
  x86/kaslr: Replace strlen() with strnlen()


 arch/x86/boot/compressed/kaslr.c | 238 +--
 arch/x86/boot/compressed/misc.h  |   4 +-
 2 files changed, 107 insertions(+), 135 deletions(-)


Re: [PATCH v5 19/21] asm-generic/atomic: Add try_cmpxchg() fallbacks

2020-10-12 Thread Ingo Molnar


* Masami Hiramatsu  wrote:

> From: Peter Zijlstra 
> 
> Only x86 provides try_cmpxchg() outside of the atomic_t interfaces,
> provide generic fallbacks to create this interface from the widely
> available cmpxchg() function.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> Acked-by: Will Deacon 

Your SOB was missing here too.

Thanks,

Ingo


Re: [PATCH v5 17/21] llist: Add nonatomic __llist_add() and __llist_dell_all()

2020-10-12 Thread Ingo Molnar


* Masami Hiramatsu  wrote:

> From: Peter Zijlstra 
> 
> Signed-off-by: Peter Zijlstra (Intel) 

Because you are forwarding this patch here, I've added your SOB:

  Signed-off-by: Masami Hiramatsu 

(Let me know if that's not OK.)

Thanks,

Ingo


[GIT PULL] perf/kprobes changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest perf/kprobes git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-kprobes-2020-10-12

   # HEAD: bcb53209be5cb32d485507452edda19b78f31d84 kprobes: Fix to check probe 
enabled before disarm_kprobe_ftrace()

This tree prepares to unify the kretprobe trampoline handler and make
kretprobe lockless. (Those patches are still work in progress.)

 Thanks,

Ingo

-->
Masami Hiramatsu (17):
  kprobes: Add generic kretprobe trampoline handler
  x86/kprobes: Use generic kretprobe trampoline handler
  arm: kprobes: Use generic kretprobe trampoline handler
  arm64: kprobes: Use generic kretprobe trampoline handler
  arc: kprobes: Use generic kretprobe trampoline handler
  csky: kprobes: Use generic kretprobe trampoline handler
  ia64: kprobes: Use generic kretprobe trampoline handler
  mips: kprobes: Use generic kretprobe trampoline handler
  parisc: kprobes: Use generic kretprobe trampoline handler
  powerpc: kprobes: Use generic kretprobe trampoline handler
  s390: kprobes: Use generic kretprobe trampoline handler
  sh: kprobes: Use generic kretprobe trampoline handler
  sparc: kprobes: Use generic kretprobe trampoline handler
  kprobes: Remove NMI context check
  kprobes: Free kretprobe_instance with RCU callback
  kprobes: Make local functions static
  kprobes: Fix to check probe enabled before disarm_kprobe_ftrace()


 arch/arc/kernel/kprobes.c  |  54 +--
 arch/arm/probes/kprobes/core.c |  78 +-
 arch/arm64/kernel/probes/kprobes.c |  78 +-
 arch/csky/kernel/probes/kprobes.c  |  77 +
 arch/ia64/kernel/kprobes.c |  77 +
 arch/mips/kernel/kprobes.c |  54 +--
 arch/parisc/kernel/kprobes.c   |  76 ++---
 arch/powerpc/kernel/kprobes.c  |  53 +--
 arch/s390/kernel/kprobes.c |  79 +-
 arch/sh/kernel/kprobes.c   |  58 +---
 arch/sparc/kernel/kprobes.c|  51 +-
 arch/x86/kernel/kprobes/core.c | 108 +-
 include/linux/kprobes.h|  51 --
 kernel/kprobes.c   | 133 +
 14 files changed, 169 insertions(+), 858 deletions(-)


[GIT PULL] performance events updates for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest perf/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-core-2020-10-12

   # HEAD: f91072ed1b7283b13ca57fcfbece5a3b92726143 perf/core: Fix race in the 
perf_mmap_close() function

These are the performance events changes for v5.10:

x86 Intel updates:

 - Add Jasper Lake support

 - Add support for TopDown metrics on Ice Lake

 - Fix Ice Lake & Tiger Lake uncore support, add Snow Ridge support

 - Add a PCI sub driver to support uncore PMUs where the PCI resources
   have been claimed already - extending the range of supported systems.

x86 AMD updates:

 - Restore 'perf stat -a' behaviour to program the uncore PMU
   to count all CPU threads.

 - Fix setting the proper count when sampling Large Increment
   per Cycle events / 'paired' events.

 - Fix IBS Fetch sampling on F17h and some other IBS fine tuning,
   greatly reducing the number of interrupts when large sample
   periods are specified.

 - Extends Family 17h RAPL support to also work on compatible
   F19h machines.

Core code updates:

 - Fix race in perf_mmap_close()

 - Add PERF_EV_CAP_SIBLING, to denote that sibling events should be
   closed if the leader is removed.

 - Smaller fixes and updates.

 Thanks,

Ingo

-->
Alexander Antonov (1):
  perf/x86/intel/uncore: Fix for iio mapping on Skylake Server

Colin Ian King (1):
  x86/events/amd/iommu: Fix sizeof mismatch

Jarkko Sakkinen (1):
  kprobes: Use module_name() macro

Jiri Olsa (1):
  perf/core: Fix race in the perf_mmap_close() function

Kan Liang (28):
  perf/x86: Use event_base_rdpmc for the RDPMC userspace support
  perf/x86/intel: Name the global status bit in NMI handler
  perf/x86/intel: Introduce the fourth fixed counter
  perf/x86/intel: Move BTS index to 47
  perf/x86/intel: Fix the name of perf METRICS
  perf/x86/intel: Use switch in intel_pmu_disable/enable_event
  perf/core: Add a new PERF_EV_CAP_SIBLING event capability
  perf/x86/intel: Generic support for hardware TopDown metrics
  perf/x86: Add a macro for RDPMC offset of fixed counters
  perf/x86/intel: Support TopDown metrics on Ice Lake
  perf/x86/intel: Support per-thread RDPMC TopDown metrics
  perf/x86/intel/ds: Fix x86_pmu_stop warning for large PEBS
  perf/core: Pull pmu::sched_task() into perf_event_context_sched_in()
  perf/core: Pull pmu::sched_task() into perf_event_context_sched_out()
  perf/x86/intel/uncore: Factor out uncore_pci_get_dev_die_info()
  perf/x86/intel/uncore: Factor out uncore_pci_find_dev_pmu()
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_register()
  perf/x86/intel/uncore: Factor out uncore_pci_pmu_unregister()
  perf/x86/intel/uncore: Generic support for the PCI sub driver
  perf/x86/intel/uncore: Support PCIe3 unit on Snow Ridge
  perf/x86/intel/uncore: Split the Ice Lake and Tiger Lake MSR uncore 
support
  perf/x86/intel/uncore: Update Ice Lake uncore units
  perf/x86/intel/uncore: Reduce the number of CBOX counters
  perf/x86/intel: Add Jasper Lake support
  perf/x86/msr: Add Jasper Lake support
  perf/x86/intel/uncore: Fix the scale of the IMC free-running events
  perf/x86/intel: Fix Ice Lake event constraint table
  perf/x86/intel: Check perf metrics feature for each CPU

Kim Phillips (11):
  perf/amd/uncore: Set all slices and threads to restore perf stat -a 
behaviour
  perf/x86/amd: Fix sampling Large Increment per Cycle events
  perf/x86/amd/ibs: Don't include randomized bits in get_ibs_op_count()
  perf/x86/amd/ibs: Fix raw sample data accumulation
  perf/x86/amd/ibs: Support 27-bit extended Op/cycle counter
  perf/x86/rapl: Add AMD Fam19h RAPL support
  arch/x86/amd/ibs: Fix re-arming IBS Fetch
  perf/amd/uncore: Prepare to scale for more attributes that vary per family
  perf/amd/uncore: Allow F17h user threadmask and slicemask specification
  perf/amd/uncore: Allow F19h user coreid, threadmask, and sliceid 
specification
  perf/amd/uncore: Inform the user how many counters each uncore PMU has

Peter Zijlstra (2):
  perf/x86: Fix n_pair for cancelled txn
  perf/x86: Fix n_metric for cancelled txn


 arch/x86/events/amd/ibs.c|  93 ++---
 arch/x86/events/amd/iommu.c  |   2 +-
 arch/x86/events/amd/uncore.c | 186 ++
 arch/x86/events/core.c   |  91 +++--
 arch/x86/events/intel/core.c | 364 +--
 arch/x86/events/intel/ds.c   |  32 +--
 arch/x86/events/intel/uncore.c   | 275 --
 arch/x86/events/intel/uncore.h   |   2 +
 arch/x86/events/intel/uncore_snb.c   |  45 -
 arch/x86/events/intel/uncore_snbep.c |  72 ++-
 arch/x86/events/msr.c|   1 +
 arch/x86/events/perf_event.h |  54 +-
 arch/x86/events/rapl.c   |   1 +
 

[GIT PULL] Static calls for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest core/static_call git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
core-static_call-2020-10-12

   # HEAD: 69e0ad37c9f32d5aa1beb02aab4ec0cd055be013 static_call: Fix return 
type of static_call_init

This tree introduces static_call(), which is the idea of static_branch()
applied to indirect function calls. Remove a data load (indirection) by
modifying the text.

They give the flexibility of function pointers, but with better
performance. (This is especially important for cases where
retpolines would otherwise be used, as retpolines can be pretty
slow.)

API overview:

  DECLARE_STATIC_CALL(name, func);
  DEFINE_STATIC_CALL(name, func);
  DEFINE_STATIC_CALL_NULL(name, typename);

  static_call(name)(args...);
  static_call_cond(name)(args...);
  static_call_update(name, func);

x86 is supported via text patching, otherwise basic indirect calls are used,
with function pointers.

There's a second variant using inline code patching, inspired by jump-labels,
implemented on x86 as well.

The new APIs are utilized in the x86 perf code, a heavy user of function 
pointers,
where static calls speed up the PMU handler by 4.2% (!).

The generic implementation is not really excercised on other architectures,
outside of the trivial test_static_call_init() self-test.

Signed-off-by: Ingo Molnar 
 Thanks,

Ingo

-->
Josh Poimboeuf (5):
  compiler.h: Make __ADDRESSABLE() symbol truly unique
  static_call: Add basic static call infrastructure
  static_call: Add inline static call infrastructure
  x86/static_call: Add out-of-line static call implementation
  x86/static_call: Add inline static call implementation for x86-64

Nathan Chancellor (1):
  static_call: Fix return type of static_call_init

Peter Zijlstra (12):
  notifier: Fix broken error handling pattern
  module: Fix up module_notifier return values
  module: Properly propagate MODULE_STATE_COMING failure
  jump_label,module: Fix module lifetime for 
__jump_label_mod_text_reserved()
  static_call: Avoid kprobes on inline static_call()s
  static_call: Add simple self-test for static calls
  x86/alternatives: Teach text_poke_bp() to emulate RET
  static_call: Add static_call_cond()
  static_call: Handle tail-calls
  static_call: Add some validation
  static_call: Allow early init
  x86/perf, static_call: Optimize x86_pmu methods

Steven Rostedt (VMware) (2):
  tracepoint: Optimize using static_call()
  tracepoint: Fix out of sync data passing by static caller

pet...@infradead.org (1):
  tracepoint: Fix overly long tracepoint names


 arch/Kconfig|  13 +
 arch/x86/Kconfig|   4 +-
 arch/x86/events/core.c  | 134 ++---
 arch/x86/include/asm/static_call.h  |  40 +++
 arch/x86/include/asm/text-patching.h|  19 ++
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/alternative.c   |   5 +
 arch/x86/kernel/kprobes/opt.c   |   4 +-
 arch/x86/kernel/setup.c |   2 +
 arch/x86/kernel/static_call.c   |  98 +++
 arch/x86/kernel/vmlinux.lds.S   |   1 +
 drivers/oprofile/buffer_sync.c  |   4 +-
 include/asm-generic/vmlinux.lds.h   |  13 +
 include/linux/compiler.h|   2 +-
 include/linux/module.h  |   5 +
 include/linux/notifier.h|  15 +-
 include/linux/static_call.h | 298 
 include/linux/static_call_types.h   |  35 +++
 include/linux/tracepoint-defs.h |   5 +
 include/linux/tracepoint.h  |  86 --
 include/trace/define_trace.h|  14 +-
 kernel/Makefile |   1 +
 kernel/cpu_pm.c |  48 ++--
 kernel/jump_label.c |  10 +-
 kernel/kprobes.c|   2 +
 kernel/module.c |  15 +-
 kernel/notifier.c   | 144 ++
 kernel/power/hibernate.c|  39 ++-
 kernel/power/main.c |   8 +-
 kernel/power/power.h|   3 +-
 kernel/power/suspend.c  |  14 +-
 kernel/power/user.c |  14 +-
 kernel/static_call.c| 482 
 kernel/trace/bpf_trace.c|   8 +-
 kernel/trace/trace.c|   2 +-
 kernel/trace/trace_events.c |   2 +-
 kernel/trace/trace_printk.c |   4 +-
 kernel/tracepoint.c |  39 ++-
 tools/include/linux/static_call_types.h |  35 +++
 tools/objtool/check.c   | 138 +
 tools/objtool/check.h   |   1 +
 tools/objtool/elf.c |   8 +-
 tools/objtool/elf.h |   3 +-
 tools/objtool/objtool.h |   1 +
 tools/objtool/orc_ge

[GIT PULL] core/build changes for v5.10: Add orphan section checking for x86, ARM and ARM64

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest core/build git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
core-build-2020-10-12

   # HEAD: 6e0bf0e0e55000742a53c5f3b58f8669e0091a11 x86/boot/compressed: Warn 
on orphan section placement

Orphan link sections were a long-standing source of obscure bugs,
because the heuristics that various linkers & compilers use to handle them
(include these bits into the output image vs discarding them silently)
are both highly idiosyncratic and also version dependent.

Instead of this historically problematic mess, this tree by Kees Cook (et al)
adds build time asserts and build time warnings if there's any orphan section
in the kernel or if a section is not sized as expected.

And because we relied on so many silent assumptions in this area, fix a metric
ton of dependencies and some outright bugs related to this, before we can
finally enable the checks on the x86, ARM and ARM64 platforms.

 Thanks,

Ingo

-->
Ard Biesheuvel (3):
  x86/boot/compressed: Move .got.plt entries out of the .got section
  x86/boot/compressed: Force hidden visibility for all symbol references
  x86/boot/compressed: Get rid of GOT fixup code

Arvind Sankar (4):
  x86/boot: Add .text.* to setup.ld
  x86/boot: Remove run-time relocations from .head.text code
  x86/boot: Remove run-time relocations from head_{32,64}.S
  x86/boot: Check that there are no run-time relocations

Kees Cook (28):
  vmlinux.lds.h: Create COMMON_DISCARDS
  vmlinux.lds.h: Add .gnu.version* to COMMON_DISCARDS
  vmlinux.lds.h: Avoid KASAN and KCSAN's unwanted sections
  vmlinux.lds.h: Split ELF_DETAILS from STABS_DEBUG
  vmlinux.lds.h: Add .symtab, .strtab, and .shstrtab to ELF_DETAILS
  efi/libstub: Disable -mbranch-protection
  arm64/mm: Remove needless section quotes
  arm64/kernel: Remove needless Call Frame Information annotations
  arm64/build: Remove .eh_frame* sections due to unwind tables
  arm64/build: Use common DISCARDS in linker script
  arm64/build: Add missing DWARF sections
  arm64/build: Assert for unwanted sections
  arm/build: Refactor linker script headers
  arm/build: Explicitly keep .ARM.attributes sections
  arm/build: Add missing sections
  arm/build: Assert for unwanted sections
  arm/boot: Handle all sections explicitly
  x86/asm: Avoid generating unused kprobe sections
  x86/build: Enforce an empty .got.plt section
  x86/build: Add asserts for unwanted sections
  x86/boot/compressed: Reorganize zero-size section asserts
  x86/boot/compressed: Remove, discard, or assert for unwanted sections
  x86/boot/compressed: Add missing debugging sections to output
  arm64/build: Warn on orphan section placement
  arm/build: Warn on orphan section placement
  arm/boot: Warn on orphan section placement
  x86/build: Warn on orphan section placement
  x86/boot/compressed: Warn on orphan section placement

Nick Desaulniers (1):
  vmlinux.lds.h: Add PGO and AutoFDO input sections


 arch/alpha/kernel/vmlinux.lds.S|   1 +
 arch/arc/kernel/vmlinux.lds.S  |   1 +
 arch/arm/Makefile  |   4 +
 arch/arm/boot/compressed/Makefile  |   2 +
 arch/arm/boot/compressed/vmlinux.lds.S |  20 +--
 arch/arm/{kernel => include/asm}/vmlinux.lds.h |  30 -
 arch/arm/kernel/vmlinux-xip.lds.S  |   8 +-
 arch/arm/kernel/vmlinux.lds.S  |   8 +-
 arch/arm64/Makefile|   9 +-
 arch/arm64/kernel/smccc-call.S |   2 -
 arch/arm64/kernel/vmlinux.lds.S|  28 -
 arch/arm64/mm/mmu.c|   2 +-
 arch/csky/kernel/vmlinux.lds.S |   1 +
 arch/hexagon/kernel/vmlinux.lds.S  |   1 +
 arch/ia64/kernel/vmlinux.lds.S |   1 +
 arch/mips/kernel/vmlinux.lds.S |   1 +
 arch/nds32/kernel/vmlinux.lds.S|   1 +
 arch/nios2/kernel/vmlinux.lds.S|   1 +
 arch/openrisc/kernel/vmlinux.lds.S |   1 +
 arch/parisc/boot/compressed/vmlinux.lds.S  |   1 +
 arch/parisc/kernel/vmlinux.lds.S   |   1 +
 arch/powerpc/kernel/vmlinux.lds.S  |   2 +-
 arch/riscv/kernel/vmlinux.lds.S|   1 +
 arch/s390/kernel/vmlinux.lds.S |   1 +
 arch/sh/kernel/vmlinux.lds.S   |   1 +
 arch/sparc/kernel/vmlinux.lds.S|   1 +
 arch/um/kernel/dyn.lds.S   |   2 +-
 arch/um/kernel/uml.lds.S   |   2 +-
 arch/x86/Makefile  |   4 +
 arch/x86/boot/compressed/Makefile  |  41 ++
 arch/x86/boot/compressed/head_32.S |  99 +--
 arch/x86/boot/compressed/head_64.S | 165 ++---
 arch/x86/boot/compressed/mkpiggy.c   

[GIT PULL] EFI changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest efi/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git efi-core-2020-10-12

   # HEAD: 4d0a4388ccdd9482fef6b26f879d0f6099143f80 Merge branch 'efi/urgent' 
into efi/core, to pick up fixes

EFI changes for v5.10:

 - Preliminary RISC-V enablement - the bulk of it will arrive via the RISCV 
tree.

 - Relax decompressed image placement rules for 32-bit ARM

 - Add support for passing MOK certificate table contents via a config table
   rather than a EFI variable.

 - Add support for 18 bit DIMM row IDs in the CPER records.

 - Work around broken Dell firmware that passes the entire Boot variable
   contents as the command line

 - Add definition of the EFI_MEMORY_CPU_CRYPTO memory attribute so we can
   identify it in the memory map listings.

 - Don't abort the boot on arm64 if the EFI RNG protocol is available but
   returns with an error

 - Replace slashes with exclamation marks in efivarfs file names

 - Split efi-pstore from the deprecated efivars sysfs code, so we can
   disable the latter on !x86.

 - Misc fixes, cleanups and updates.

 Thanks,

Ingo

-->
Alex Kluver (2):
  edac,ghes,cper: Add Row Extension to Memory Error Record
  cper,edac,efi: Memory Error Record: bank group/address and chip id

Ard Biesheuvel (13):
  efi/libstub: arm32: Base FDT and initrd placement on image address
  efi/libstub: Export efi_low_alloc_above() to other units
  efi/libstub: arm32: Use low allocation for the uncompressed kernel
  efi: Add definition of EFI_MEMORY_CPU_CRYPTO and ability to report it
  efi/arm64: libstub: Deal gracefully with EFI_RNG_PROTOCOL failure
  efi: mokvar-table: fix some issues in new code
  efi: pstore: disentangle from deprecated efivars module
  efi: pstore: move workqueue handling out of efivars
  efi: efivars: un-export efivars_sysfs_init()
  efi: gsmi: fix false dependency on CONFIG_EFI_VARS
  efi: remove some false dependencies on CONFIG_EFI_VARS
  efi: efivars: limit availability to X86 builds
  efi: mokvar: add missing include of asm/early_ioremap.h

Arvind Sankar (2):
  efi/libstub: Add efi_warn and *_once logging helpers
  efi/x86: Add a quirk to support command line arguments on Dell EFI 
firmware

Atish Patra (2):
  include: pe.h: Add RISC-V related PE definition
  efi: Rename arm-init to efi-init common for all arch

Lenny Szubowicz (3):
  efi: Support for MOK variable config table
  integrity: Move import of MokListRT certs to a separate routine
  integrity: Load certs from the EFI MOK config table

Michael Schaller (1):
  efivarfs: Replace invalid slashes with exclamation marks in dentries.

Tian Tao (3):
  efi/printf: remove unneeded semicolon
  efi/libstub: Fix missing-prototypes in string.c
  efi: Delete deprecated parameter comments


 Documentation/arm/uefi.rst  |   2 +-
 arch/arm/include/asm/efi.h  |  23 +-
 arch/arm64/include/asm/efi.h|   5 +-
 arch/x86/kernel/setup.c |   1 +
 arch/x86/platform/efi/efi.c |   3 +
 drivers/edac/ghes_edac.c|  17 +-
 drivers/firmware/efi/Kconfig|  18 +-
 drivers/firmware/efi/Makefile   |   3 +-
 drivers/firmware/efi/cper.c |  18 +-
 drivers/firmware/efi/{arm-init.c => efi-init.c} |   1 +
 drivers/firmware/efi/efi-pstore.c   |  83 +-
 drivers/firmware/efi/efi.c  |  53 ++--
 drivers/firmware/efi/efivars.c  |  45 +--
 drivers/firmware/efi/libstub/arm32-stub.c   | 178 +++-
 drivers/firmware/efi/libstub/arm64-stub.c   |   9 +-
 drivers/firmware/efi/libstub/efi-stub-helper.c  | 101 ++-
 drivers/firmware/efi/libstub/efi-stub.c |  48 +---
 drivers/firmware/efi/libstub/efistub.h  |  61 +++-
 drivers/firmware/efi/libstub/fdt.c  |   4 +-
 drivers/firmware/efi/libstub/file.c |   5 +-
 drivers/firmware/efi/libstub/relocate.c |   4 +-
 drivers/firmware/efi/libstub/string.c   |   1 +
 drivers/firmware/efi/libstub/vsprintf.c |   2 +-
 drivers/firmware/efi/mokvar-table.c | 359 
 drivers/firmware/efi/vars.c |  22 --
 drivers/firmware/google/Kconfig |   2 +-
 drivers/firmware/google/gsmi.c  |   8 +-
 fs/efivarfs/super.c |   3 +
 include/linux/cper.h|  24 +-
 include/linux/efi.h |  46 ++-
 include/linux/pe.h  |   3 +
 security/integrity/platform_certs/load_uefi.c   |  85 --
 32 files changed, 871 insertions(+), 366 deletions(-)
 rename drivers/firmware/efi/{arm-init.c => efi-init.c} (99%)
 create mode 100644 drivers/firmware/efi/mokvar-table.c


[GIT PULL] RCU changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest core/rcu git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core-rcu-2020-10-12

   # HEAD: c6de896fa0a4546c799c86513d99bd011b4a6177 Merge branch 'rcu/fix-rt' 
of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu

RCU changes for v5.10:

 - Debugging for smp_call_function()
 - RT raw/non-raw lock ordering fixes
 - Strict grace periods for KASAN
 - New smp_call_function() torture test
 - Torture-test updates
 - Documentation updates
 - Miscellaneous fixes

 Thanks,

Ingo

-->
Alexander A. Klimov (1):
  rcutorture: Replace HTTP links with HTTPS ones

Colin Ian King (1):
  refperf: Avoid null pointer dereference when buf fails to allocate

Joel Fernandes (Google) (6):
  rcu/trace: Print negative GP numbers correctly
  rcu/trace: Use gp_seq_req in acceleration's rcu_grace_period tracepoint
  rcu: Clarify comments about FQS loop reporting quiescent states
  rcu: Make FQS more aggressive in complaining about offline CPUs
  rcutorture: Output number of elapsed grace periods
  rcu/segcblist: Prevent useless GP start if no CBs to accelerate

Madhuparna Bhowmik (2):
  rculist: Introduce list/hlist_for_each_entry_srcu() macros
  kvm: mmu: page_track: Fix RCU list API usage

Neeraj Upadhyay (2):
  rcu/tree: Force quiescent state on callback overload
  rcu/tree: Remove CONFIG_PREMPT_RCU check in force_qs_rnp()

Paul E. McKenney (56):
  lib: Add backtrace_idle parameter to force backtrace of idle CPUs
  rcu: Remove KCSAN stubs
  rcu: Remove KCSAN stubs from update.c
  srcu: Remove KCSAN stubs
  rcu: Initialize at declaration time in rcu_exp_handler()
  nocb: Clarify RCU nocb CPU error message
  nocb: Remove show_rcu_nocb_state() false positive printout
  rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_divisor
  rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_resched_ns
  rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_kick_kthreads
  rcu: Add READ_ONCE() to rcu_do_batch() access to rcu_cpu_stall_ftrace_dump
  rcu: Move rcu_cpu_started per-CPU variable to rcu_data
  rcu/nocb: Add a warning for non-GP kthread running GP code
  rcu: Remove unused __rcu_is_watching() function
  scftorture: Add smp_call_function() torture test
  torture: Declare parse-console.sh independence from rcutorture
  torture: Add scftorture to the rcutorture scripting
  scftorture: Implement weighted primitive selection
  tick-sched: Clarify "NOHZ: local_softirq_pending" warning
  scftorture: Summarize per-thread statistics
  scftorture: Add smp_call_function_single() memory-ordering checks
  scftorture: Add smp_call_function_many() memory-ordering checks
  scftorture: Add smp_call_function() memory-ordering checks
  scftorture: Consolidate scftorture_invoke_one() check and kfree()
  scftorture: Consolidate scftorture_invoke_one() scf_check initialization
  scftorture: Flag errors in torture-compatible manner
  scftorture: Prevent compiler from reducing race probabilities
  scftorture: Check unexpected "switch" statement value
  scftorture: Block scftorture_invoker() kthreads for offline CPUs
  scftorture: Adapt memory-ordering test to UP operation
  scftorture: Add cond_resched() to test loop
  rcuperf: Change rcuperf to rcuscale
  rcu: Add Kconfig option for strict RCU grace periods
  rcu: Reduce leaf fanout for strict RCU grace periods
  rcu: Restrict default jiffies_till_first_fqs for strict RCU GPs
  rcu: Force DEFAULT_RCU_BLIMIT to 1000 for strict RCU GPs
  rcu: Always set .need_qs from __rcu_read_lock() for strict GPs
  rcu: Do full report for .need_qs for strict GPs
  rcu: Attempt QS when CPU discovers GP for strict GPs
  rcu: IPI all CPUs at GP start for strict GPs
  rcu: IPI all CPUs at GP end for strict GPs
  rcu: Provide optional RCU-reader exit delay for strict GPs
  rcu: Execute RCU reader shortly after rcu_core for strict GPs
  rcu: Report QS for outermost PREEMPT=n rcu_read_unlock() for strict GPs
  rcu: Remove unused "cpu" parameter from rcu_report_qs_rdp()
  rcutorture: Remove KCSAN stubs
  torture: Update initrd documentation
  rcutorture: Add CONFIG_PROVE_RCU_LIST to TREE05
  torture: Add kvm.sh --help and update help message
  rcutorture: Properly set rcu_fwds for OOM handling
  rcutorture: Properly synchronize with OOM notifier
  rcutorture: Hoist OOM registry up one level
  rcutorture: Allow pointer leaks to test diagnostic code
  torture: Add gdb support
  smp: Add source and destination CPUs to __call_single_data
  kernel/smp: Provide CSD lock timeout diagnostics

Paul Gortmaker (1):
  torture: document --allcpus argument added to the kvm.sh script

Randy Dunlap (2):
  doc: Drop doubled words from RCU Data-Structures.rst
  

[GIT PULL] locking changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest locking/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-core-2020-10-12

   # HEAD: 2116d708b0580c0048fc80b82ec4b53f4ddaa166 Merge branch 'lkmm' of 
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into 
locking/core

These are the locking updates for v5.10:

 - Add deadlock detection for recursive read-locks. The rationale is outlined
   in:

 224ec489d3cd: ("lockdep/Documention: Recursive read lock detection 
reasoning")

   The main deadlock pattern we want to detect is:

   TASK A: TASK B:

   read_lock(X);
   write_lock(X);
   read_lock_2(X);

 - Add "latch sequence counters" (seqcount_latch_t):

  A sequence counter variant where the counter even/odd value is used to
  switch between two copies of protected data. This allows the read path,
  typically NMIs, to safely interrupt the write side critical section.

   We utilize this new variant for sched-clock, and to make x86 TSC handling 
safer.

 - Other seqlock cleanups, fixes and enhancements

 - KCSAN updates

 - LKMM updates

 - Misc updates, cleanups and fixes.

Note that there's a pending bugreport against:

   4d004099a668: ("lockdep: Fix lockdep recursion")

this fix triggers a non-fatal RCU warning on some systems - but that looks 
like a real bug which was masked by a lockdep bug, with fixes being worked 
on and tested. The new boot time warning does not appear to be widespread 
(read: we couldn't reproduce it locally).

 Thanks,

Ingo

-->
Ahmed S. Darwish (13):
  time/sched_clock: Use raw_read_seqcount_latch() during suspend
  mm/swap: Do not abuse the seqcount_t latching API
  seqlock: Introduce seqcount_latch_t
  time/sched_clock: Use seqcount_latch_t
  timekeeping: Use seqcount_latch_t
  x86/tsc: Use seqcount_latch_t
  rbtree_latch: Use seqcount_latch_t
  seqlock: seqcount latch APIs: Only allow seqcount_latch_t
  seqlock: seqcount_LOCKNAME_t: Standardize naming convention
  seqlock: Use unique prefix for seqcount_t property accessors
  seqlock: seqcount_t: Implement all read APIs as statement expressions
  seqlock: seqcount_LOCKNAME_t: Introduce PREEMPT_RT support
  seqlock: PREEMPT_RT: Do not starve seqlock_t writers

Alexander A. Klimov (1):
  Replace HTTP links with HTTPS ones: LKMM

Boqun Feng (20):
  locking: More accurate annotations for read_lock()
  lockdep/Documention: Recursive read lock detection reasoning
  lockdep: Demagic the return value of BFS
  lockdep: Make __bfs() visit every dependency until a match
  lockdep: Reduce the size of lock_list::distance
  lockdep: Introduce lock_list::dep
  lockdep: Extend __bfs() to work with multiple types of dependencies
  lockdep: Make __bfs(.match) return bool
  lockdep: Support deadlock detection for recursive read locks in 
check_noncircular()
  lockdep: Adjust check_redundant() for recursive read change
  lockdep: Fix recursive read lock related safe->unsafe detection
  lockdep: Add recursive read locks into dependency graph
  lockdep/selftest: Add a R-L/L-W test case specific to chain cache behavior
  lockdep: Take read/write status in consideration when generate chainkey
  lockdep/selftest: Unleash irq_read_recursion2 and add more
  lockdep/selftest: Add more recursive read related test cases
  Revert "locking/lockdep/selftests: Fix mixed read-write ABBA tests"
  locking/selftest: Add test cases for queued_read_lock()
  lockdep/selftest: Introduce recursion3
  lockdep: Optimize the memory usage of circular queue

Marco Elver (19):
  kcsan: Add support for atomic builtins
  objtool: Add atomic builtin TSAN instrumentation to uaccess whitelist
  kcsan: Add atomic builtin test case
  kcsan: Support compounded read-write instrumentation
  objtool, kcsan: Add __tsan_read_write to uaccess whitelist
  kcsan: Skew delay to be longer for certain access types
  kcsan: Add missing CONFIG_KCSAN_IGNORE_ATOMICS checks
  kcsan: Test support for compound instrumentation
  instrumented.h: Introduce read-write instrumentation hooks
  asm-generic/bitops: Use instrument_read_write() where appropriate
  locking/atomics: Use read-write instrumentation for atomic RMWs
  kcsan: Simplify debugfs counter to name mapping
  kcsan: Simplify constant string handling
  kcsan: Remove debugfs test command
  kcsan: Show message if enabled early
  kcsan: Use pr_fmt for consistency
  kcsan: Optimize debugfs stats counters
  bitops, kcsan: Partially revert instrumentation for non-atomic bitops
  kcsan: Use tracing-safe version of prandom

Marta Rybczynska (1):
  Documentation/locking/locktypes: Fix local_locks documentation

Paul Bolle (1):
  locking/atomics: Check atomic-arch-fallback.h too

Paul E. 

Re: Missing [GIT PULL] request for

2020-10-12 Thread Ingo Molnar


* Sedat Dilek  wrote:

> Hi,
> 
> yesterday, I saw Ingo tagged "locking-urgent-2020-10-11" in tip Git.
> 
> Did you drop it or was this for Linux v5.9 final and the git-pull
> request was simply forgotten?
> 
> Just curious.

So I ran the pull request script to send the tree to Linus, but on final 
review decided not to send it, as there was a pending bugreport against the 
tree, it was very late in the cycle and the commits were pretty fresh. I 
sent two other trees (x86/urgent and perf/urgent).

This is why there's a signed tag for locking/urgent, but no pull request.
:-)

Thanks,

Ingo


[GIT PULL] scheduler changes for v5.10

2020-10-12 Thread Ingo Molnar
Linus,

Please pull the latest sched/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-core-2020-10-12

   # HEAD: feff2e65efd8d84cf831668e182b2ce73c604bbb sched/deadline: Unthrottle 
PI boosted threads while enqueuing

Scheduler changes for v5.10:

 - Reorganize & clean up the SD* flags definitions and add a bunch
   of sanity checks. These new checks caught quite a few bugs or at
   least inconsistencies, resulting in another set of patches.

 - Rseq updates, add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ

 - Add a new tracepoint to improve CPU capacity tracking

 - Improve overloaded SMP system load-balancing behavior

 - Tweak SMT balancing

 - Energy-aware scheduling updates

 - NUMA balancing improvements

 - Deadline scheduler fixes and improvements

 - CPU isolation fixes

 - Misc cleanups, simplifications and smaller optimizations.

 Thanks,

Ingo

-->
Barry Song (1):
  sched/fair: Use dst group while checking imbalance for NUMA balancer

Daniel Bristot de Oliveira (3):
  MAINTAINERS: Add myself as SCHED_DEADLINE reviewer
  sched/rt: Disable RT_RUNTIME_SHARE by default
  sched/deadline: Unthrottle PI boosted threads while enqueuing

Jiang Biao (1):
  sched/fair: Simplify the work when reweighting entity

Josh Don (1):
  sched/fair: Ignore cache hotness for SMT migration

Lucas Stach (1):
  sched/deadline: Fix stale throttling on de-/boosted tasks

Lukasz Luba (1):
  sched/fair: Fix wrong negative conversion in find_energy_efficient_cpu()

Peter Oskolkov (4):
  rseq/membarrier: Add MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  rseq/selftests,x86_64: Add rseq_offset_deref_addv()
  rseq/selftests: Test MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ
  sched/fair: Tweak pick_next_entity()

Sebastian Andrzej Siewior (2):
  sched: Bring the PF_IO_WORKER and PF_WQ_WORKER bits closer together
  sched: Cache task_struct::flags in sched_submit_work()

Valentin Schneider (19):
  ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN
  ARM, sched/topology: Revert back to default scheduler topology
  sched/topology: Split out SD_* flags declaration to its own file
  sched/topology: Define and assign sched_domain flag metadata
  sched/topology: Verify SD_* flags setup when sched_debug is on
  sched/debug: Output SD flag names rather than their values
  sched/topology: Introduce SD metaflag for flags needing > 1 groups
  sched/topology: Use prebuilt SD flag degeneration mask
  sched/topology: Remove SD_SERIALIZE degeneration special case
  sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  sched/topology: Mark SD_PREFER_SIBLING as SDF_NEEDS_GROUPS
  sched/topology: Mark SD_BALANCE_WAKE as SDF_NEEDS_GROUPS
  sched/topology: Mark SD_SERIALIZE as SDF_NEEDS_GROUPS
  sched/topology: Mark SD_ASYM_PACKING as SDF_NEEDS_GROUPS
  sched/topology: Mark SD_OVERLAP as SDF_NEEDS_GROUPS
  sched/topology: Mark SD_NUMA as SDF_NEEDS_GROUPS
  sched/topology: Move sd_flag_debug out of linux/sched/topology.h
  sched/topology: Move SD_DEGENERATE_GROUPS_MASK out of 
linux/sched/topology.h
  sched/topology: Move sd_flag_debug out of #ifdef CONFIG_SYSCTL

Vincent Donnefort (1):
  sched/debug: Add new tracepoint to track cpu_capacity

Vincent Guittot (5):
  sched/numa: Use runnable_avg to classify node
  sched/fair: Relax constraint on task's load during load balance
  sched/fair: Reduce minimal imbalance threshold
  sched/fair: Minimize concurrent LBs between domain level
  sched/fair: Reduce busy load balance interval

Xianting Tian (1):
  sched/fair: Remove the force parameter of update_tg_load_avg()

Xunlei Pang (1):
  sched/fair: Fix wrong cpu selecting from isolated domain

YueHaibing (1):
  sched: Remove unused inline function uclamp_bucket_base_value()


 MAINTAINERS|   1 +
 arch/arm/kernel/topology.c |  26 ---
 include/linux/sched.h  |   5 +-
 include/linux/sched/mm.h   |   3 +
 include/linux/sched/sd_flags.h | 156 +
 include/linux/sched/topology.h |  37 ++--
 include/linux/syscalls.h   |   2 +-
 include/trace/events/sched.h   |   4 +
 include/uapi/linux/membarrier.h|  26 +++
 kernel/sched/core.c|  13 +-
 kernel/sched/deadline.c|  34 +++-
 kernel/sched/debug.c   |  56 ++-
 kernel/sched/fair.c| 103 
 kernel/sched/features.h|   2 +-
 kernel/sched/membarrier.c  | 136 +++
 kernel/sched/topology.c|  69 
 tools/testing/selftests/rseq/param_test.c  | 223 -
 tools/testing/selftests/rseq/rseq-x86.h

[GIT PULL] x86 fixes

2020-10-11 Thread Ingo Molnar
Linus,

Please pull the latest x86/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-2020-10-11

   # HEAD: 0c7689830e907668288a1a1da84dca66dbdb4728 Documentation/x86: Fix 
incorrect references to zero-page.txt

Two fixes:

 - Fix a (hopefully final) IRQ state tracking bug vs. MCE handling
 - Fix a documentation link

 Thanks,

Ingo

-->
Heinrich Schuchardt (1):
  Documentation/x86: Fix incorrect references to zero-page.txt

Thomas Gleixner (1):
  x86/mce: Use idtentry_nmi_enter/exit()


 Documentation/x86/boot.rst | 6 +++---
 arch/x86/kernel/cpu/mce/core.c | 6 --
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index 7fafc7ac00d7..abb9fc164657 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1342,8 +1342,8 @@ follow::
 
 In addition to read/modify/write the setup header of the struct
 boot_params as that of 16-bit boot protocol, the boot loader should
-also fill the additional fields of the struct boot_params as that
-described in zero-page.txt.
+also fill the additional fields of the struct boot_params as
+described in chapter :doc:`zero-page`.
 
 After setting up the struct boot_params, the boot loader can load the
 32/64-bit kernel in the same way as that of 16-bit boot protocol.
@@ -1379,7 +1379,7 @@ can be calculated as follows::
 In addition to read/modify/write the setup header of the struct
 boot_params as that of 16-bit boot protocol, the boot loader should
 also fill the additional fields of the struct boot_params as described
-in zero-page.txt.
+in chapter :doc:`zero-page`.
 
 After setting up the struct boot_params, the boot loader can load
 64-bit kernel in the same way as that of 16-bit boot protocol, but
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f43a78bde670..fc4f8c04bdb5 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1904,6 +1904,8 @@ void (*machine_check_vector)(struct pt_regs *) = 
unexpected_machine_check;
 
 static __always_inline void exc_machine_check_kernel(struct pt_regs *regs)
 {
+   bool irq_state;
+
WARN_ON_ONCE(user_mode(regs));
 
/*
@@ -1914,7 +1916,7 @@ static __always_inline void 
exc_machine_check_kernel(struct pt_regs *regs)
mce_check_crashing_cpu())
return;
 
-   nmi_enter();
+   irq_state = idtentry_enter_nmi(regs);
/*
 * The call targets are marked noinstr, but objtool can't figure
 * that out because it's an indirect call. Annotate it.
@@ -1925,7 +1927,7 @@ static __always_inline void 
exc_machine_check_kernel(struct pt_regs *regs)
if (regs->flags & X86_EFLAGS_IF)
trace_hardirqs_on_prepare();
instrumentation_end();
-   nmi_exit();
+   idtentry_exit_nmi(regs, irq_state);
 }
 
 static __always_inline void exc_machine_check_user(struct pt_regs *regs)


[GIT PULL] perf fix

2020-10-11 Thread Ingo Molnar
Linus,

Please pull the latest perf/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-urgent-2020-10-11

   # HEAD: 6d6b8b9f4fceab7266ca03d194f60ec72bd4b654 perf: Fix 
task_function_call() error handling

Fix an error handling bug that can cause a lockup if a CPU is offline. (doh ...)

 Thanks,

Ingo

-->
Kajol Jain (1):
  perf: Fix task_function_call() error handling


 kernel/events/core.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7ed5248f0445..e8bf92202542 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -99,7 +99,7 @@ static void remote_function(void *data)
  * retry due to any failures in smp_call_function_single(), such as if the
  * task_cpu() goes offline concurrently.
  *
- * returns @func return value or -ESRCH when the process isn't running
+ * returns @func return value or -ESRCH or -ENXIO when the process isn't 
running
  */
 static int
 task_function_call(struct task_struct *p, remote_function_f func, void *info)
@@ -115,7 +115,8 @@ task_function_call(struct task_struct *p, remote_function_f 
func, void *info)
for (;;) {
ret = smp_call_function_single(task_cpu(p), remote_function,
   , 1);
-   ret = !ret ? data.ret : -EAGAIN;
+   if (!ret)
+   ret = data.ret;
 
if (ret != -EAGAIN)
break;


Re: [GIT PULL tip/core/rcu+preempt] Fix RT raw/non-raw lock ordering

2020-10-09 Thread Ingo Molnar


* Paul E. McKenney  wrote:

> Hello!
> 
> This pull request contains Thomas Gleixner's "Make preempt count
> unconditional" series [1], but with the addition of a kvfree_rcu() bug-fix
> commit making use of this PREEMPT_COUNT addition.  This series reduces
> the size of the kernel by almost 100 lines of code and is intended for
> the upcoming v5.10 merge window.

> Thomas Gleixner (13):
>   lib/debug: Remove pointless ARCH_NO_PREEMPT dependencies
>   preempt: Make preempt count unconditional
>   preempt: Cleanup PREEMPT_COUNT leftovers
>   lockdep: Cleanup PREEMPT_COUNT leftovers
>   mm/pagemap: Cleanup PREEMPT_COUNT leftovers
>   locking/bitspinlock: Cleanup PREEMPT_COUNT leftovers
>   uaccess: Cleanup PREEMPT_COUNT leftovers
>   sched: Cleanup PREEMPT_COUNT leftovers
>   ARM: Cleanup PREEMPT_COUNT leftovers
>   xtensa: Cleanup PREEMPT_COUNT leftovers
>   drm/i915: Cleanup PREEMPT_COUNT leftovers
>   rcutorture: Cleanup PREEMPT_COUNT leftovers
>   preempt: Remove PREEMPT_COUNT from Kconfig
> 
> Uladzislau Rezki (Sony) (1):
>   rcu/tree: Allocate a page when caller is preemptible
> 
> kernel test robot (1):
>   kvfree_rcu(): Fix ifnullfree.cocci warnings

>  21 files changed, 44 insertions(+), 136 deletions(-)

Pulled into tip:core/rcu, thanks a lot guys!

Ingo


Re: [GIT PULL memory-model] LKMM commits for v5.10

2020-10-09 Thread Ingo Molnar


* Paul E. McKenney  wrote:

> Hello, Ingo!
> 
> This pull request contains Linux-Kernel Memory-Model commits for v5.10.
> These have been subjected to LKML review:
> 
>   https://lore.kernel.org/lkml/20200831182012.GA1965@paulmck-ThinkPad-P72
> 
> All of these have also been subjected to the kbuild test robot and
> -next testing.  The following changes since v5.9-rc1 are available in
> the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git lkmm
> 
> for you to fetch changes up to 0ce0c78eff7d22c8a261de6c4305a5abb638c200:
> 
>   tools/memory-model: Expand the cheatsheet.txt notion of relaxed (2020-09-04 
> 11:58:15 -0700)
> 
> 
> Alexander A. Klimov (1):
>   Replace HTTP links with HTTPS ones: LKMM
> 
> Paul E. McKenney (4):
>   tools/memory-model: Update recipes.txt prime_numbers.c path
>   tools/memory-model: Improve litmus-test documentation
>   tools/memory-model: Add a simple entry point document
>   tools/memory-model: Expand the cheatsheet.txt notion of relaxed
> 
>  tools/memory-model/Documentation/cheatsheet.txt   |   33 +-
>  tools/memory-model/Documentation/litmus-tests.txt | 1074 
> +
>  tools/memory-model/Documentation/recipes.txt  |4 +-
>  tools/memory-model/Documentation/references.txt   |2 +-
>  tools/memory-model/Documentation/simple.txt   |  271 ++
>  tools/memory-model/README |  160 +--
>  6 files changed, 1410 insertions(+), 134 deletions(-)
>  create mode 100644 tools/memory-model/Documentation/litmus-tests.txt
>  create mode 100644 tools/memory-model/Documentation/simple.txt

Pulled, thanks a lot Paul!

Ingo


Re: [GIT PULL kcsan] KCSAN commits for v5.10

2020-10-09 Thread Ingo Molnar


* Paul E. McKenney  wrote:

> Hello, Ingo!
> 
> This pull request contains KCSAN updates for v5.10.  These have been
> subjected to LKML review, most recently here:
> 
>   https://lore.kernel.org/lkml/20200831181715.GA1530@paulmck-ThinkPad-P72
> 
> All of these have also been subjected to the kbuild test robot and
> -next testing.  The following changes since v5.9-rc1 are available in
> the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git kcsan
> 
> for you to fetch changes up to cd290ec24633f51029dab0d25505fae7da0e1eda:
> 
>   kcsan: Use tracing-safe version of prandom (2020-08-30 21:50:13 -0700)
> 
> 
> Marco Elver (19):
>   kcsan: Add support for atomic builtins
>   objtool: Add atomic builtin TSAN instrumentation to uaccess whitelist
>   kcsan: Add atomic builtin test case
>   kcsan: Support compounded read-write instrumentation
>   objtool, kcsan: Add __tsan_read_write to uaccess whitelist
>   kcsan: Skew delay to be longer for certain access types
>   kcsan: Add missing CONFIG_KCSAN_IGNORE_ATOMICS checks
>   kcsan: Test support for compound instrumentation
>   instrumented.h: Introduce read-write instrumentation hooks
>   asm-generic/bitops: Use instrument_read_write() where appropriate
>   locking/atomics: Use read-write instrumentation for atomic RMWs
>   kcsan: Simplify debugfs counter to name mapping
>   kcsan: Simplify constant string handling
>   kcsan: Remove debugfs test command
>   kcsan: Show message if enabled early
>   kcsan: Use pr_fmt for consistency
>   kcsan: Optimize debugfs stats counters
>   bitops, kcsan: Partially revert instrumentation for non-atomic bitops
>   kcsan: Use tracing-safe version of prandom
> 
>  include/asm-generic/atomic-instrumented.h  | 330 
> ++---
>  include/asm-generic/bitops/instrumented-atomic.h   |   6 +-
>  include/asm-generic/bitops/instrumented-lock.h |   2 +-
>  .../asm-generic/bitops/instrumented-non-atomic.h   |  30 +-
>  include/linux/instrumented.h   |  30 ++
>  include/linux/kcsan-checks.h   |  45 ++-
>  kernel/kcsan/core.c| 210 +++--
>  kernel/kcsan/debugfs.c | 130 ++--
>  kernel/kcsan/kcsan-test.c  | 128 +++-
>  kernel/kcsan/kcsan.h   |  12 +-
>  kernel/kcsan/report.c  |  10 +-
>  kernel/kcsan/selftest.c|   8 +-
>  lib/Kconfig.kcsan  |   5 +
>  scripts/Makefile.kcsan |   2 +-
>  scripts/atomic/gen-atomic-instrumented.sh  |  21 +-
>  tools/objtool/check.c  |  55 
>  16 files changed, 677 insertions(+), 347 deletions(-)

Pulled into tip:locking/core, thanks a lot Paul!

Ingo


Re: [GIT PULL tip/core/rcu] RCU commits for v5.10

2020-10-09 Thread Ingo Molnar


* Paul E. McKenney  wrote:

> Hello, Ingo!
> 
> This pull request contains the following changes:
> 
> 1.Documentation updates.
> 
>   https://lore.kernel.org/lkml/20200831175419.GA31013@paulmck-ThinkPad-P72
> 
> 2.Miscellaneous fixes.
> 
>   https://lore.kernel.org/lkml/20200831180050.GA32590@paulmck-ThinkPad-P72
> 
> 3.Torture-test updates.
> 
>   https://lore.kernel.org/lkml/20200831180348.GA416@paulmck-ThinkPad-P72
> 
> 4.New smp_call_function() torture test.
> 
>   https://lore.kernel.org/lkml/20200831180731.GA582@paulmck-ThinkPad-P72
> 
> 5.Strict grace periods for KASAN.  The point of this series is to find
>   RCU-usage bugs, so the corresponding new RCU_STRICT_GRACE_PERIOD
>   Kconfig option depends on both DEBUG_KERNEL and RCU_EXPERT, and is
>   further disabled by dfefault.  Finally, the help text includes
>   a goodly list of scary caveats.
> 
>   https://lore.kernel.org/lkml/20200831181101.GA950@paulmck-ThinkPad-P72
> 
> 6.Debugging for smp_call_function().
> 
>   https://lore.kernel.org/lkml/20200831181356.GA1224@paulmck-ThinkPad-P72

>  57 files changed, 1582 insertions(+), 421 deletions(-)

Pulled into tip:core/rcu, thanks a lot Paul!

Ingo


Re: [sched/fair] fcf0553db6: netperf.Throughput_Mbps -30.8% regression

2020-10-05 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Sun, Oct 04, 2020 at 05:21:08PM +0100, Mel Gorman wrote:
> > On Sun, Oct 04, 2020 at 09:27:16PM +0800, kernel test robot wrote:
> > > Greeting,
> > > 
> > > FYI, we noticed a -30.8% regression of netperf.Throughput_Mbps due to 
> > > commit:
> > > 
> > > 
> > > commit: fcf0553db6f4c79387864f6e4ab4a891601f395e ("sched/fair: Remove 
> > > meaningless imbalance calculation")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > 
> > This commit was the start of a series that made large changes to load
> > balancing.  The series was not bisect-safe and has since been reconciled
> > with the NUMA balancing. Any workload with a potential load balancing
> > problem has to be checked against the latest kernel to see if the problem
> > persists there. If it does, then tip/sched/core should be checked or
> > 5.10-rc1 when it comes out as tip has a few more LB changes pending.
> 
> What Mel said ;-)

Basically it would be nice to test either the following commit directly 
(which is the latest relevant sched/core commit):

   233e7aca4c8a: ("sched/fair: Use dst group while checking imbalance for NUMA 
balancer")

Or a -next version that includes these commits.

Thanks,

Ingo


Re: [PATCH] kbuild: Run syncconfig with -s

2020-09-14 Thread Ingo Molnar


* Masahiro Yamada  wrote:

> On Thu, Aug 20, 2020 at 3:35 PM Ingo Molnar  wrote:
> >
> > On every kernel build that runs --syncconfig, there's an output of the 
> > following line:
> >
> >   scripts/kconfig/conf  --syncconfig Kconfig
> >
> > This is the only non-platform build message the kbuild system emits that 
> > isn't
> > prefixed by at least a space, or is a build warning.
> >
> > Run it under -s - if there's any problem it will emit messages anyway.
> >
> > With this change the following simple grep filter will show all build 
> > warnings
> > and errors of a kernel build:
> >
> >make | grep -v '^ '
> 
> 
> 
> I do want to see something when syncconfig is invoked.
> 
> I will apply this instead:
> https://patchwork.kernel.org/patch/11727445/

BTW., there's another, rather spurious bug I recently triggered in kbuild.

Occasionally when I Ctrl-C a kernel build on a system with a lot of CPUs, 
the .o.cmd file gets corrupted:

  mm/.pgtable-generic.o.cmd:5: *** unterminated call to function 'wildcard': 
missing ')'.  Stop.
  make: *** [Makefile:1788: mm] Error 2
  make: *** Waiting for unfinished jobs

The .o.cmd file is half-finished:

$(wildcard include/config/shmem.h) \
$(wildcard include/config/hugetlb/page.h) \
$(wildcard include/config/zone/device.h) \
$(wildcard include/config/dev/pagemap/ops.h) \
$(wildcard include/config/device/private.h) \
$(wildcard include/config/pci/p2pdma.h) \
$(wildcard include/config/sparsemem.h) \
$(wildcard include/config/sparsemem/vmemmap.h) \
$(wildcard include/config/numa/balancing.h) \
$(wildcard i
[premature EOF]

Instead of the regular rules that end in:

$(wildcard include/config/memory/hotplug/sparse.h) \

mm/pgtable-generic.o: $(deps_mm/pgtable-generic.o)

$(deps_mm/pgtable-generic.o):
[regular EOF]

Manually removing the corrupted .o.cmd dot file solves the bug.

There's no reproducer other than Ctrl-C-ing large build jobs a couple of times.

Thanks,

Ingo


Re: [PATCH v2] x86/boot/compressed: Disable relocation relaxation

2020-09-14 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> On Mon, 14 Sep 2020 at 01:34, Arvind Sankar  wrote:
> >
> > On Tue, Aug 25, 2020 at 10:56:52AM -0400, Arvind Sankar wrote:
> > > On Sat, Aug 15, 2020 at 01:56:49PM -0700, Nick Desaulniers wrote:
> > > > Hi Ingo,
> > > > I saw you picked up Arvind's other series into x86/boot.  Would you
> > > > mind please including this, as well?  Our CI is quite red for x86...
> > > >
> > > > EOM
> > > >
> > >
> > > Hi Ingo, while this patch is unnecessary after the series in
> > > tip/x86/boot, it is still needed for 5.9 and older. Would you be able to
> > > send it in for the next -rc? It shouldn't hurt the tip/x86/boot series,
> > > and we can add a revert on top of that later.
> > >
> > > Thanks.
> >
> > Ping.
> >
> > https://lore.kernel.org/lkml/20200812004308.1448603-1-nived...@alum.mit.edu/
> 
> Acked-by: Ard Biesheuvel 

Thanks guys - queued up in tip:x86/urgent.

Ingo


Re: [GIT PULL] First batch of KVM changes for Linux 5.9

2020-09-09 Thread Ingo Molnar


* Christopherson, Sean J  wrote:

> Ingo Molnar wrote:
> > * Paolo Bonzini  wrote:
> > 
> > > Paolo Bonzini (11):
> > >   Merge branch 'kvm-async-pf-int' into HEAD
> > 
> > kvmtool broke in this merge window, hanging during bootup right after CPU 
> > bringup:
> > 
> >  [1.289404]  #63
> >  [0.012468] kvm-clock: cpu 63, msr 6ff69fc1, secondary cpu clock
> >  [0.012468] [Firmware Bug]: CPU63: APIC id mismatch. Firmware: 3f APIC: 
> > 14
> >  [1.302320] kvm-guest: KVM setup async PF for cpu 63
> >  [1.302320] kvm-guest: stealtime: cpu 63, msr 1379d7600
> > 
> > Eventually trigger an RCU stall warning:
> > 
> >  [   22.302392] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> >  [   22.302392] rcu:1-...!: (68 GPs behind) idle=00c/0/0x0 
> > softirq=0/0 fqs=0  (false positive?)
> > 
> > I've bisected this down to the above merge commit. The individual commit:
> > 
> >b1d405751cd5: ("KVM: x86: Switch KVM guest to using interrupts for page 
> > ready APF delivery")
> > 
> > appears to be working fine standalone.
> > 
> > I'm using x86-64 defconfig+kvmconfig on SVM. Can send more info on request.
> > 
> > The kvmtool.git commit I've tested is 90b2d3adadf2.
> 
> Looks a lot like the lack of APIC EOI issue that Vitaly reported[*].
> 
> ---
>  arch/x86/kernel/kvm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
> index d45f34cbe1ef..9663ba31347c 100644
> --- a/arch/x86/kernel/kvm.c
> +++ b/arch/x86/kernel/kvm.c
> @@ -271,6 +271,8 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_kvm_asyncpf_interrupt)
>   struct pt_regs *old_regs = set_irq_regs(regs);
>   u32 token;
>  
> +     ack_APIC_irq();
> +
>   inc_irq_stat(irq_hv_callback_count);
>  
>   if (__this_cpu_read(apf_reason.enabled)) {
> --
> 
> [*] https://lkml.kernel.org/r/20200908135350.355053-1-vkuzn...@redhat.com

Yep, this does the trick, thanks!

Tested-by: Ingo Molnar 

Ingo


Re: [PATCH v1] x86/defconfigs: Unbreak 32-bit defconfig builds

2020-09-09 Thread Ingo Molnar


* Andy Shevchenko  wrote:

> On Tue, Sep 08, 2020 at 02:13:54PM +0200, Ingo Molnar wrote:
> > 
> > * Andy Shevchenko  wrote:
> > 
> > > After the commit 1d0e12fd3a84 ("x86/defconfigs: Refresh defconfig files")
> > > 32-bit builds using defconfig become broken because on x86_64 build host
> > > with no ARCH provided the default behaviour is to assume 64-bit 
> > > independently
> > > on the configuration file name. The crucial part is CONFIG_64BIT option
> > > that used to be explicit. Let restore the latter option in order to 
> > > unbreak
> > > 32-bit builds.
> > 
> > So exactly which build method broke due to this? The typical way to do a 
> > defconfig build is:
> > 
> >   make ARCH=i386 defconfig
> > 
> > which still works fine AFAICS.
> 
> uname => x86_64
> make i386_defconfig
> 
> It was very convenient to not supply ARCH when build on multi-arch host.

Nice, TIL about the extended 'make *config' targets. :-)

Curiously, they aren't even mentioned in the 'configuration targets' 
section of 'make help' and are not discoverable unless you know their 
locations.

Anyway, your fix makes sense now to me too.

Do we need a similar for x86_64 defconfig, when built on 32-bit hosts? (not 
that anyone does that in practice, but just for completeness.)

Also, it would be nice if there was a way to annotate the defconfig for 
'make savedefconfig' preserved these ARCH choices - it currently strips out 
all non-enabled options that match their default configuration value.

Thanks,

Ingo


Re: [GIT PULL] First batch of KVM changes for Linux 5.9

2020-09-08 Thread Ingo Molnar


hi,

* Paolo Bonzini  wrote:

> Paolo Bonzini (11):
>   Merge branch 'kvm-async-pf-int' into HEAD

kvmtool broke in this merge window, hanging during bootup right after CPU 
bringup:

 [1.289404]  #63
 [0.012468] kvm-clock: cpu 63, msr 6ff69fc1, secondary cpu clock
 [0.012468] [Firmware Bug]: CPU63: APIC id mismatch. Firmware: 3f APIC: 14
 [1.302320] kvm-guest: KVM setup async PF for cpu 63
 [1.302320] kvm-guest: stealtime: cpu 63, msr 1379d7600

Eventually trigger an RCU stall warning:

 [   22.302392] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
 [   22.302392] rcu:1-...!: (68 GPs behind) idle=00c/0/0x0 softirq=0/0 
fqs=0  (false positive?)

I've bisected this down to the above merge commit. The individual commit:

   b1d405751cd5: ("KVM: x86: Switch KVM guest to using interrupts for page 
ready APF delivery")

appears to be working fine standalone.

I'm using x86-64 defconfig+kvmconfig on SVM. Can send more info on request.

The kvmtool.git commit I've tested is 90b2d3adadf2.

Thanks,

Ingo


Re: [PATCH v1] x86/defconfigs: Unbreak 32-bit defconfig builds

2020-09-08 Thread Ingo Molnar


* Andy Shevchenko  wrote:

> After the commit 1d0e12fd3a84 ("x86/defconfigs: Refresh defconfig files")
> 32-bit builds using defconfig become broken because on x86_64 build host
> with no ARCH provided the default behaviour is to assume 64-bit independently
> on the configuration file name. The crucial part is CONFIG_64BIT option
> that used to be explicit. Let restore the latter option in order to unbreak
> 32-bit builds.

So exactly which build method broke due to this? The typical way to do a 
defconfig build is:

  make ARCH=i386 defconfig

which still works fine AFAICS.

Thanks,

Ingo


[GIT PULL] x86 fixes

2020-09-06 Thread Ingo Molnar
Linus,

Please pull the latest x86/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-2020-09-06

   # HEAD: 4facb95b7adaf77e2da73aafb9ba60996fe42a12 x86/entry: Unbreak 32bit 
fast syscall

Misc fixes:

 - Fix more generic entry code ABI fallout
 - Fix debug register handling bugs
 - Fix vmalloc mappings on 32-bit kernels
 - Fix kprobes instrumentation output on 32-bit kernels
 - Fix over-eager WARN_ON_ONCE() on !SMAP hardware
 - Fix NUMA debugging
 - Fix Clang related crash on !RETPOLINE kernels

The most complex fixes are only a few days old and some haven't seen 
-next yet, but I didn't think we should delay them.

  out-of-topic modifications in x86-urgent-2020-09-06:
  --
  include/linux/entry-common.h   # 4facb95b7ada: x86/entry: Unbreak 32bit 
fas
  kernel/entry/common.c  # 4facb95b7ada: x86/entry: Unbreak 32bit 
fas

 Thanks,

Ingo

-->
Andy Lutomirski (1):
  x86/debug: Allow a single level of #DB recursion

Arvind Sankar (1):
  x86/cmdline: Disable jump tables for cmdline.c

Huang Ying (1):
  x86, fakenuma: Fix invalid starting node ID

Joerg Roedel (1):
  x86/mm/32: Bring back vmalloc faulting on x86_32

Peter Zijlstra (1):
  x86/entry: Fix AC assertion

Thomas Gleixner (1):
  x86/entry: Unbreak 32bit fast syscall

Vamshi K Sthambamkadi (1):
  tracing/kprobes, x86/ptrace: Fix regs argument order for i386


 arch/x86/entry/common.c | 29 +-
 arch/x86/include/asm/entry-common.h | 12 +-
 arch/x86/include/asm/ptrace.h   |  2 +-
 arch/x86/kernel/traps.c | 65 +++
 arch/x86/lib/Makefile   |  2 +-
 arch/x86/mm/fault.c | 78 +
 arch/x86/mm/numa_emulation.c|  2 +-
 include/linux/entry-common.h| 51 +++-
 kernel/entry/common.c   | 35 ++---
 9 files changed, 213 insertions(+), 63 deletions(-)

diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 48512c7944e7..2f84c7ca74ea 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -60,16 +60,10 @@ __visible noinstr void do_syscall_64(unsigned long nr, 
struct pt_regs *regs)
 #if defined(CONFIG_X86_32) || defined(CONFIG_IA32_EMULATION)
 static __always_inline unsigned int syscall_32_enter(struct pt_regs *regs)
 {
-   unsigned int nr = (unsigned int)regs->orig_ax;
-
if (IS_ENABLED(CONFIG_IA32_EMULATION))
current_thread_info()->status |= TS_COMPAT;
-   /*
-* Subtlety here: if ptrace pokes something larger than 2^32-1 into
-* orig_ax, the unsigned int return value truncates it.  This may
-* or may not be necessary, but it matches the old asm behavior.
-*/
-   return (unsigned int)syscall_enter_from_user_mode(regs, nr);
+
+   return (unsigned int)regs->orig_ax;
 }
 
 /*
@@ -91,15 +85,29 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs 
*regs)
 {
unsigned int nr = syscall_32_enter(regs);
 
+   /*
+* Subtlety here: if ptrace pokes something larger than 2^32-1 into
+* orig_ax, the unsigned int return value truncates it.  This may
+* or may not be necessary, but it matches the old asm behavior.
+*/
+   nr = (unsigned int)syscall_enter_from_user_mode(regs, nr);
+
do_syscall_32_irqs_on(regs, nr);
syscall_exit_to_user_mode(regs);
 }
 
 static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
 {
-   unsigned int nr = syscall_32_enter(regs);
+   unsigned int nr = syscall_32_enter(regs);
int res;
 
+   /*
+* This cannot use syscall_enter_from_user_mode() as it has to
+* fetch EBP before invoking any of the syscall entry work
+* functions.
+*/
+   syscall_enter_from_user_mode_prepare(regs);
+
instrumentation_begin();
/* Fetch EBP from where the vDSO stashed it. */
if (IS_ENABLED(CONFIG_X86_64)) {
@@ -122,6 +130,9 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs 
*regs)
return false;
}
 
+   /* The case truncates any ptrace induced syscall nr > 2^32 -1 */
+   nr = (unsigned int)syscall_enter_from_user_mode_work(regs, nr);
+
/* Now this is just like a normal syscall. */
do_syscall_32_irqs_on(regs, nr);
syscall_exit_to_user_mode(regs);
diff --git a/arch/x86/include/asm/entry-common.h 
b/arch/x86/include/asm/entry-common.h
index a8f9315b9eae..6fe54b2813c1 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -18,8 +18,16 @@ static __always_inline void arch_check_user_regs(struct 
pt_regs *regs)
 * state, not the interrupt state as imagined by Xen.
 */
unsigned long flags = native_save_fl();
-   

Re: [PATCH v7 0/5] Warn on orphan section placement

2020-09-06 Thread Ingo Molnar


* Kees Cook  wrote:

> On Fri, Sep 04, 2020 at 07:58:25AM +0200, Ingo Molnar wrote:
> > 
> > * Nick Desaulniers  wrote:
> > 
> > > On Tue, Sep 1, 2020 at 7:53 PM Kees Cook  wrote:
> > > >
> > > > Hi Ingo,
> > > >
> > > > The ever-shortening series. ;) Here is "v7", which is just the remaining
> > > > Makefile changes to enable orphan section warnings, now updated to
> > > > include ld-option calls.
> > > >
> > > > Thanks for getting this all into -tip!
> > > 
> > > For the series,
> > > Reviewed-by: Nick Desaulniers 
> > > 
> > > As the recent ppc vdso boogaloo exposed, what about the vdsos?
> > > * arch/x86/entry/vdso/Makefile
> > > * arch/arm/vdso/Makefile
> > > * arch/arm64/kernel/vdso/Makefile
> > > * arch/arm64/kernel/vdso32/Makefile
> > 
> > Kees, will these patches DTRT for the vDSO builds? I will be unable to test 
> > these patches on that old system until tomorrow the earliest.
> 
> I would like to see VDSO done next, but it's entirely separate from
> this series. This series only touches the core kernel build (i.e. via the
> interactions with scripts/link-vmlinux.sh) or the boot stubs. So there
> is no impact on VDSO linking.

Great!

I also double checked that things still build fine with ancient LD.

> > I'm keeping these latest changes in WIP.core/build for now.
> 
> They should be safe to land in -next, which is important so we can shake
> out any other sneaky sections that all our existing testing hasn't
> found. :)

OK, cool - I've graduated them over into tip:core/build. :-)

Thanks,

Ingo


Re: [PATCH] MAINTAINERS: Add myself as SCHED_DEADLINE reviewer

2020-09-04 Thread Ingo Molnar


* Daniel Bristot de Oliveira  wrote:

> As discussed with Juri and Peter.
> 
> Signed-off-by: Daniel Bristot de Oliveira 

Welcome Daniel! :-)

I've applied the patch to tip:sched/core.

Thanks,

Ingo


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-04 Thread Ingo Molnar


* Christoph Hellwig  wrote:

> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.

Cool! For the x86 bits:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v7 0/5] Warn on orphan section placement

2020-09-03 Thread Ingo Molnar


* Nick Desaulniers  wrote:

> On Tue, Sep 1, 2020 at 7:53 PM Kees Cook  wrote:
> >
> > Hi Ingo,
> >
> > The ever-shortening series. ;) Here is "v7", which is just the remaining
> > Makefile changes to enable orphan section warnings, now updated to
> > include ld-option calls.
> >
> > Thanks for getting this all into -tip!
> 
> For the series,
> Reviewed-by: Nick Desaulniers 
> 
> As the recent ppc vdso boogaloo exposed, what about the vdsos?
> * arch/x86/entry/vdso/Makefile
> * arch/arm/vdso/Makefile
> * arch/arm64/kernel/vdso/Makefile
> * arch/arm64/kernel/vdso32/Makefile

Kees, will these patches DTRT for the vDSO builds? I will be unable to test 
these patches on that old system until tomorrow the earliest.

I'm keeping these latest changes in WIP.core/build for now.

Thanks,

Ingo


Re: 回复: [PATCH v2] debugobjects: install cpu hotplug callback

2020-09-03 Thread Ingo Molnar


* Zhang, Qiang  wrote:

> tglx please review.
> 
> Thanks
> Qiang
> 
> 发件人: linux-kernel-ow...@vger.kernel.org  
> 代表 qiang.zh...@windriver.com 
> 发送时间: 2020年8月27日 13:06
> 收件人: t...@linutronix.de; long...@redhat.com; el...@google.com
> 抄送: linux-kernel@vger.kernel.org
> 主题: [PATCH v2] debugobjects: install cpu hotplug callback
> 
> From: Zqiang 
> 
> Due to cpu hotplug, it may never be online after it's offline,
> some objects in percpu pool is never free, in order to avoid
> this happening, install cpu hotplug callback, call this callback
> func to free objects in percpu pool when cpu going offline.

We capitalize 'CPU'. Also, please split this in at least two sentences.

> 
> Signed-off-by: Zqiang 
> ---
>  v1->v2:
>  Modify submission information.
> 
>  include/linux/cpuhotplug.h |  1 +
>  lib/debugobjects.c | 23 +++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
> index a2710e654b64..2e77db655cfa 100644
> --- a/include/linux/cpuhotplug.h
> +++ b/include/linux/cpuhotplug.h
> @@ -36,6 +36,7 @@ enum cpuhp_state {
> CPUHP_X86_MCE_DEAD,
> CPUHP_VIRT_NET_DEAD,
> CPUHP_SLUB_DEAD,
> +   CPUHP_DEBUG_OBJ_DEAD,
> CPUHP_MM_WRITEBACK_DEAD,
> CPUHP_MM_VMSTAT_DEAD,
> CPUHP_SOFTIRQ_DEAD,
> diff --git a/lib/debugobjects.c b/lib/debugobjects.c
> index fe4557955d97..50e21ed0519e 100644
> --- a/lib/debugobjects.c
> +++ b/lib/debugobjects.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #define ODEBUG_HASH_BITS   14
>  #define ODEBUG_HASH_SIZE   (1 << ODEBUG_HASH_BITS)
> @@ -433,6 +434,23 @@ static void free_object(struct debug_obj *obj)
> }
>  }
> 
> +#if defined(CONFIG_HOTPLUG_CPU)
> +static int object_cpu_offline(unsigned int cpu)
> +{
> +   struct debug_percpu_free *percpu_pool;
> +   struct hlist_node *tmp;
> +   struct debug_obj *obj;
> +
> +   percpu_pool = per_cpu_ptr(_obj_pool, cpu);
> +   hlist_for_each_entry_safe(obj, tmp, _pool->free_objs, node) {
> +   hlist_del(>node);
> +   kmem_cache_free(obj_cache, obj);
> +   }
> +
> +   return 0;
> +}
> +#endif

What happens to ->obj_free, if the CPU is brought back online? Won't it be 
out of sync at that point?

> +#if defined(CONFIG_HOTPLUG_CPU)

There's a shorter preprocessor sequence for that pattern.

Thanks,

Ingo


Re: linux-next: build failure after merge of the tip tree

2020-09-02 Thread Ingo Molnar


* Stephen Rothwell  wrote:

> Hi all,
> 
> After merging the tip tree, today's linux-next build (powerpc
> ppc64_defconfig) failed like this:
> 
> 
> Caused by commit
> 
>   f670269a42bf ("x86: Fix early boot crash on gcc-10, next try")
> 
> interacting with commit
> 
>   a9a3ed1eff36 ("x86: Fix early boot crash on gcc-10, third try")
> 
> from Linus' tree (v5.7-rc6) - the automatic merge did not go well.
> 
> I have added this patch for today (it removes the older version).
> 
> From: Stephen Rothwell 
> Date: Thu, 3 Sep 2020 12:31:13 +1000
> Subject: [PATCH] merge fix for compiler.h

I've merged the old commit by mistake - it's removed now.

Thanks,

Ingo


Re: [PATCH v6 00/29] Warn on orphan section placement

2020-09-01 Thread Ingo Molnar


* Ingo Molnar  wrote:

> 
> * Ingo Molnar  wrote:
> 
> > 
> > * Kees Cook  wrote:
> > 
> > > On Fri, Aug 21, 2020 at 12:42:41PM -0700, Kees Cook wrote:
> > > > Hi Ingo,
> > > > 
> > > > Based on my testing, this is ready to go. I've reviewed the feedback on
> > > > v5 and made a few small changes, noted below.
> > > 
> > > If no one objects, I'll pop this into my tree for -next. I'd prefer it
> > > go via -tip though! :)
> > > 
> > > Thanks!
> > 
> > I'll pick it up today, it all looks very good now!
> 
> One thing I found in testing is that it doesn't handler older LD 
> versions well enough:
> 
>   ld: unrecognized option '--orphan-handling=warn'
> 
> Could we just detect the availability of this flag, and emit a warning 
> if it doesn't exist but otherwise not abort the build?
> 
> This is with:
> 
>   GNU ld version 2.25-17.fc23

I've resolved this for now by not applying the 5 patches that add the 
actual orphan section warnings:

  arm64/build: Warn on orphan section placement
  arm/build: Warn on orphan section placement
  arm/boot: Warn on orphan section placement
  x86/build: Warn on orphan section placement
  x86/boot/compressed: Warn on orphan section placement

The new asserts plus the actual fixes/enhancements are enough changes 
to test for now in any case. :-)

Thanks,

Ingo


Re: [PATCH v6 00/29] Warn on orphan section placement

2020-09-01 Thread Ingo Molnar


* Ingo Molnar  wrote:

> 
> * Kees Cook  wrote:
> 
> > On Fri, Aug 21, 2020 at 12:42:41PM -0700, Kees Cook wrote:
> > > Hi Ingo,
> > > 
> > > Based on my testing, this is ready to go. I've reviewed the feedback on
> > > v5 and made a few small changes, noted below.
> > 
> > If no one objects, I'll pop this into my tree for -next. I'd prefer it
> > go via -tip though! :)
> > 
> > Thanks!
> 
> I'll pick it up today, it all looks very good now!

One thing I found in testing is that it doesn't handler older LD 
versions well enough:

  ld: unrecognized option '--orphan-handling=warn'

Could we just detect the availability of this flag, and emit a warning 
if it doesn't exist but otherwise not abort the build?

This is with:

  GNU ld version 2.25-17.fc23

Thanks,

Ingo


Re: [PATCH v6 00/29] Warn on orphan section placement

2020-09-01 Thread Ingo Molnar


* Kees Cook  wrote:

> On Fri, Aug 21, 2020 at 12:42:41PM -0700, Kees Cook wrote:
> > Hi Ingo,
> > 
> > Based on my testing, this is ready to go. I've reviewed the feedback on
> > v5 and made a few small changes, noted below.
> 
> If no one objects, I'll pop this into my tree for -next. I'd prefer it
> go via -tip though! :)
> 
> Thanks!

I'll pick it up today, it all looks very good now!

Thanks,

Ingo


Re: [PATCH] kbuild: Run syncconfig with -s

2020-08-24 Thread Ingo Molnar


* Masahiro Yamada  wrote:

> On Thu, Aug 20, 2020 at 3:35 PM Ingo Molnar  wrote:
> >
> > On every kernel build that runs --syncconfig, there's an output of the 
> > following line:
> >
> >   scripts/kconfig/conf  --syncconfig Kconfig
> >
> > This is the only non-platform build message the kbuild system emits that 
> > isn't
> > prefixed by at least a space, or is a build warning.
> >
> > Run it under -s - if there's any problem it will emit messages anyway.
> >
> > With this change the following simple grep filter will show all build 
> > warnings
> > and errors of a kernel build:
> >
> >make | grep -v '^ '
> 
> 
> 
> I do want to see something when syncconfig is invoked.
> 
> I will apply this instead:
> https://patchwork.kernel.org/patch/11727445/

That's perfect, thank you very much!

Ingo


[PATCH] kbuild: Run syncconfig with -s

2020-08-20 Thread Ingo Molnar
On every kernel build that runs --syncconfig, there's an output of the 
following line:

  scripts/kconfig/conf  --syncconfig Kconfig

This is the only non-platform build message the kbuild system emits that isn't
prefixed by at least a space, or is a build warning.

Run it under -s - if there's any problem it will emit messages anyway.

With this change the following simple grep filter will show all build warnings
and errors of a kernel build:

   make | grep -v '^ '

Signed-off-by: Ingo Molnar 
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 9cac6fde3479..2057c92a6205 100644
--- a/Makefile
+++ b/Makefile
@@ -709,7 +709,7 @@ $(KCONFIG_CONFIG):
 # The syncconfig should be executed only once to make all the targets.
 # (Note: use the grouped target '&:' when we bump to GNU Make 4.3)
 %/config/auto.conf %/config/auto.conf.cmd %/generated/autoconf.h: 
$(KCONFIG_CONFIG)
-   $(Q)$(MAKE) -f $(srctree)/Makefile syncconfig
+   $(Q)$(MAKE) -sf $(srctree)/Makefile syncconfig
 else # !may-sync-config
 # External modules and some install targets need include/generated/autoconf.h
 # and include/config/auto.conf but do not care if they are up-to-date.


[tip: x86/build] x86/build: Declutter the build output

2020-08-20 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the x86/build branch of tip:

Commit-ID: 642d94cf336fe57675e63a91d11f53d74b9a3f9f
Gitweb:
https://git.kernel.org/tip/642d94cf336fe57675e63a91d11f53d74b9a3f9f
Author:Ingo Molnar 
AuthorDate:Thu, 20 Aug 2020 08:17:40 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 20 Aug 2020 08:17:40 +02:00

x86/build: Declutter the build output

We have some really ancient debug printouts in the x86 boot image build code:

  Setup is 14108 bytes (padded to 14336 bytes).
  System is 8802 kB
  CRC 27e909d4

None of these ever helped debug any sort of breakage that I know of, and they
clutter the build output.

Remove them - if anyone needs the see the various interim stages of this to
debug an obscure bug, they can add these printfs and more.

We still keep this one:

  Kernel: arch/x86/boot/bzImage is ready  (#19)

As a sentimental leftover, plus the '#19' build count tag is mildly useful.

Signed-off-by: Ingo Molnar 
Cc: linux-kernel@vger.kernel.org
Cc: x...@kernel.org
---
 arch/x86/boot/tools/build.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/boot/tools/build.c b/arch/x86/boot/tools/build.c
index c8b8c1a..a3725ad 100644
--- a/arch/x86/boot/tools/build.c
+++ b/arch/x86/boot/tools/build.c
@@ -416,8 +416,6 @@ int main(int argc, char ** argv)
/* Set the default root device */
put_unaligned_le16(DEFAULT_ROOT_DEV, [508]);
 
-   printf("Setup is %d bytes (padded to %d bytes).\n", c, i);
-
/* Open and stat the kernel file */
fd = open(argv[2], O_RDONLY);
if (fd < 0)
@@ -425,7 +423,6 @@ int main(int argc, char ** argv)
if (fstat(fd, ))
die("Unable to stat `%s': %m", argv[2]);
sz = sb.st_size;
-   printf("System is %d kB\n", (sz+1023)/1024);
kernel = mmap(NULL, sz, PROT_READ, MAP_SHARED, fd, 0);
if (kernel == MAP_FAILED)
die("Unable to mmap '%s': %m", argv[2]);
@@ -488,7 +485,6 @@ int main(int argc, char ** argv)
}
 
/* Write the CRC */
-   printf("CRC %x\n", crc);
put_unaligned_le32(crc, buf);
if (fwrite(buf, 1, 4, dest) != 4)
die("Writing CRC failed");


[tip: x86/cpu] x86/cpu: Fix typos and improve the comments in sync_core()

2020-08-19 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the x86/cpu branch of tip:

Commit-ID: 40eb0cb4939e462acfedea8c8064571e886b9773
Gitweb:
https://git.kernel.org/tip/40eb0cb4939e462acfedea8c8064571e886b9773
Author:Ingo Molnar 
AuthorDate:Tue, 18 Aug 2020 07:31:30 +02:00
Committer: Ingo Molnar 
CommitterDate: Wed, 19 Aug 2020 09:56:36 +02:00

x86/cpu: Fix typos and improve the comments in sync_core()

- Fix typos.

- Move the compiler barrier comment to the top, because it's valid for the
  whole function, not just the legacy branch.

Signed-off-by: Ingo Molnar 
Link: https://lore.kernel.org/r/20200818053130.ga3161...@gmail.com
Reviewed-by: Ricardo Neri 
---
 arch/x86/include/asm/sync_core.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 4631c0f..0fd4a9d 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -47,16 +47,19 @@ static inline void iret_to_self(void)
  *
  *  b) Text was modified on a different CPU, may subsequently be
  * executed on this CPU, and you want to make sure the new version
- * gets executed.  This generally means you're calling this in a IPI.
+ * gets executed.  This generally means you're calling this in an IPI.
  *
  * If you're calling this for a different reason, you're probably doing
  * it wrong.
+ *
+ * Like all of Linux's memory ordering operations, this is a
+ * compiler barrier as well.
  */
 static inline void sync_core(void)
 {
/*
 * The SERIALIZE instruction is the most straightforward way to
-* do this but it not universally available.
+* do this, but it is not universally available.
 */
if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
serialize();
@@ -67,10 +70,10 @@ static inline void sync_core(void)
 * For all other processors, there are quite a few ways to do this.
 * IRET-to-self is nice because it works on every CPU, at any CPL
 * (so it's compatible with paravirtualization), and it never exits
-* to a hypervisor. The only down sides are that it's a bit slow
+* to a hypervisor.  The only downsides are that it's a bit slow
 * (it seems to be a bit more than 2x slower than the fastest
-* options) and that it unmasks NMIs.  The "push %cs" is needed
-* because, in paravirtual environments, __KERNEL_CS may not be a
+* options) and that it unmasks NMIs.  The "push %cs" is needed,
+* because in paravirtual environments __KERNEL_CS may not be a
 * valid CS value when we do IRET directly.
 *
 * In case NMI unmasking or performance ever becomes a problem,
@@ -81,9 +84,6 @@ static inline void sync_core(void)
 * CPUID is the conventional way, but it's nasty: it doesn't
 * exist on some 486-like CPUs, and it usually exits to a
 * hypervisor.
-*
-* Like all of Linux's memory ordering operations, this is a
-* compiler barrier as well.
 */
iret_to_self();
 }


Re: [PATCH] x86/cpu: Fix typos and improve the comments in sync_core()

2020-08-19 Thread Ingo Molnar


* Ricardo Neri  wrote:

> > @@ -47,16 +47,19 @@ static inline void iret_to_self(void)
> >   *
> >   *  b) Text was modified on a different CPU, may subsequently be
> >   * executed on this CPU, and you want to make sure the new version
> > - * gets executed.  This generally means you're calling this in a IPI.
> > + * gets executed.  This generally means you're calling this in an IPI.
> >   *
> >   * If you're calling this for a different reason, you're probably doing
> >   * it wrong.
> > + *
> > + * Like all of Linux's memory ordering operations, this is a
> > + * compiler barrier as well.
> >   */
> >  static inline void sync_core(void)
> >  {
> > /*
> >  * The SERIALIZE instruction is the most straightforward way to
> > -* do this but it not universally available.
> > +* do this, but it is not universally available.
> 
> Indeed, I missed this grammar error.
> 
> >  */
> > if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> > serialize();
> > @@ -67,10 +70,10 @@ static inline void sync_core(void)
> >  * For all other processors, there are quite a few ways to do this.
> >  * IRET-to-self is nice because it works on every CPU, at any CPL
> >  * (so it's compatible with paravirtualization), and it never exits
> > -* to a hypervisor. The only down sides are that it's a bit slow
> > +* to a hypervisor.  The only downsides are that it's a bit slow

And this one - it's "downsides" not "down sides".

> >  * (it seems to be a bit more than 2x slower than the fastest
> > -* options) and that it unmasks NMIs.  The "push %cs" is needed
> > -* because, in paravirtual environments, __KERNEL_CS may not be a
> > +* options) and that it unmasks NMIs.  The "push %cs" is needed,
> > +* because in paravirtual environments __KERNEL_CS may not be a
> 
> I didn't realize that the double spaces after the period were part of the
> style.

They are not, but *consistent* use of typographic details is part of 
the style, and here we were mixing two styles within the same comment 
block.

> FWIW,
> 
> Reviewed-by: Ricardo Neri 

Thanks,

Ingo


[PATCH] x86/cpu: Fix typos and improve the comments in sync_core()

2020-08-17 Thread Ingo Molnar


* tip-bot2 for Ricardo Neri  wrote:

> --- a/arch/x86/include/asm/sync_core.h
> +++ b/arch/x86/include/asm/sync_core.h
> @@ -5,6 +5,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef CONFIG_X86_32
>  static inline void iret_to_self(void)
> @@ -54,14 +55,23 @@ static inline void iret_to_self(void)
>  static inline void sync_core(void)
>  {
>   /*
> +  * The SERIALIZE instruction is the most straightforward way to
> +  * do this but it not universally available.
> +  */
> + if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
> + serialize();
> + return;
> + }
> +
> + /*
> +  * For all other processors, there are quite a few ways to do this.
> +  * IRET-to-self is nice because it works on every CPU, at any CPL
> +  * (so it's compatible with paravirtualization), and it never exits
> +  * to a hypervisor. The only down sides are that it's a bit slow
> +  * (it seems to be a bit more than 2x slower than the fastest
> +  * options) and that it unmasks NMIs.  The "push %cs" is needed
> +  * because, in paravirtual environments, __KERNEL_CS may not be a
> +  * valid CS value when we do IRET directly.

So there's two typos in the new comments, there are at least two 
misapplied commas, it departs from existing style, and there's a typo 
in the existing comments as well.

Also, before this patch the 'compiler barrier' comment was valid for 
the whole function (there was no branching), but after this patch it 
reads of it was only valid for the legacy IRET-to-self branch.

Which together broke my detector and triggered a bit of compulsive 
bike-shed painting. ;-) See the resulting patch below.

Thanks,

Ingo

>
From: Ingo Molnar 
Date: Tue, 18 Aug 2020 07:24:05 +0200
Subject: [PATCH] x86/cpu: Fix typos and improve the comments in sync_core()

- Fix typos.

- Move the compiler barrier comment to the top, because it's valid for the
  whole function, not just the legacy branch.

Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/sync_core.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/sync_core.h b/arch/x86/include/asm/sync_core.h
index 4631c0f969d4..0fd4a9dfb29c 100644
--- a/arch/x86/include/asm/sync_core.h
+++ b/arch/x86/include/asm/sync_core.h
@@ -47,16 +47,19 @@ static inline void iret_to_self(void)
  *
  *  b) Text was modified on a different CPU, may subsequently be
  * executed on this CPU, and you want to make sure the new version
- * gets executed.  This generally means you're calling this in a IPI.
+ * gets executed.  This generally means you're calling this in an IPI.
  *
  * If you're calling this for a different reason, you're probably doing
  * it wrong.
+ *
+ * Like all of Linux's memory ordering operations, this is a
+ * compiler barrier as well.
  */
 static inline void sync_core(void)
 {
/*
 * The SERIALIZE instruction is the most straightforward way to
-* do this but it not universally available.
+* do this, but it is not universally available.
 */
if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
serialize();
@@ -67,10 +70,10 @@ static inline void sync_core(void)
 * For all other processors, there are quite a few ways to do this.
 * IRET-to-self is nice because it works on every CPU, at any CPL
 * (so it's compatible with paravirtualization), and it never exits
-* to a hypervisor. The only down sides are that it's a bit slow
+* to a hypervisor.  The only downsides are that it's a bit slow
 * (it seems to be a bit more than 2x slower than the fastest
-* options) and that it unmasks NMIs.  The "push %cs" is needed
-* because, in paravirtual environments, __KERNEL_CS may not be a
+* options) and that it unmasks NMIs.  The "push %cs" is needed,
+* because in paravirtual environments __KERNEL_CS may not be a
 * valid CS value when we do IRET directly.
 *
 * In case NMI unmasking or performance ever becomes a problem,
@@ -81,9 +84,6 @@ static inline void sync_core(void)
 * CPUID is the conventional way, but it's nasty: it doesn't
 * exist on some 486-like CPUs, and it usually exits to a
 * hypervisor.
-*
-* Like all of Linux's memory ordering operations, this is a
-* compiler barrier as well.
 */
iret_to_self();
 }


Re: [PATCH] Makefile: Yes. Finally remove '-Wdeclaration-after-statement'

2020-08-17 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Mon, Aug 17, 2020 at 3:09 PM Pavel Machek  wrote:
> >
> > Submitter believes "wild variable placement" can help with
> > #ifdefs.. and that may be actually good tradeoff.
> 
> I agree that it can help in some cases.
> 
> But it can also make it really hard to find the variable declarations
> in other cases. I've seen a lot of code that ends up actively
> declaring the variable close to where it's used (because people find
> that to be locally more legible) and then it just means that people
> who arent' familiar with the code have a much harder time finding it.
> 
> I'd instead try to discourage people from using #ifdef's inside code.

I'm a big fan of -Wdeclaration-after-statement and I think C++ style 
mixed variables/statements code has several disadvantages:

- One advantage of -Wdeclaration-after-statement is that it can detect 
  mismerges that can happen with the 'patch' tool when it applies a 
  patch with fuzz.

- Also, enforcing -Wdeclaration-after-statement means we have the nice 
  symmetry that local variable declarations are always at the 
  beginning of curly brace blocks, which includes function 
  definitions. This IMO is a very helpful visual clue that allows the 
  quick reading of kernel code.

- A third advantage is that the grouping of local variables at the 
  beginning of curly brace blocks encourages smaller, better 
  structured functions: a large function would look automatically ugly 
  due to the many local variables crammed at the beginning of it.

So the gentle code structure message is: you can declare new local 
variables in a loop construct or branch, at the cost of losing one 
level of indentation. If it gets too deep, you are encouraged to split 
your logic up better with helper functions. The kind of run-on 
mega-functions that C++ style mixed variables often allow looks 
*automatically* uglier under -Wdeclaration-after-statement and quickly 
breaks simple kernel style rules such as col80 or indentation level 
depth or the too high visual complexity of variable definition lines.

Basically the removal of -Wdeclaration-after-statement removes a 
helpful symmetry & allows the addition of random noise to our code 
base, with very little benefits offered. I'd be sad to see it go.

Thanks,

Ingo


Re: [PATCH v3] kunit: added lockdep support

2020-08-15 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Sat, Aug 15, 2020 at 10:30:29AM +0200, Ingo Molnar wrote:
> > 
> > * Uriel Guajardo  wrote:
> > 
> > > From: Uriel Guajardo 
> > > 
> > > KUnit will fail tests upon observing a lockdep failure. Because lockdep
> > > turns itself off after its first failure, only fail the first test and
> > > warn users to not expect any future failures from lockdep.
> > > 
> > > Similar to lib/locking-selftest [1], we check if the status of
> > > debug_locks has changed after the execution of a test case. However, we
> > > do not reset lockdep afterwards.
> > > 
> > > Like the locking selftests, we also fix possible preemption count
> > > corruption from lock bugs.
> > 
> > > --- a/lib/kunit/Makefile
> > > +++ b/lib/kunit/Makefile
> > 
> > > +void kunit_check_lockdep(struct kunit *test, struct kunit_lockdep 
> > > *lockdep) {
> > > + int saved_preempt_count = lockdep->preempt_count;
> > > + bool saved_debug_locks = lockdep->debug_locks;
> > > +
> > > + if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count))
> > > + preempt_count_set(saved_preempt_count);
> > > +
> > > +#ifdef CONFIG_TRACE_IRQFLAGS
> > > + if (softirq_count())
> > > + current->softirqs_enabled = 0;
> > > + else
> > > + current->softirqs_enabled = 1;
> > > +#endif
> > > +
> > > + if (saved_debug_locks && !debug_locks) {
> > > + kunit_set_failure(test);
> > > + kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> > > + kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> > > + }
> > 
> > 
> > So this basically duplicates what the boot-time locking self-tests do, 
> > in a poor fashion?
> 
> No, it makes sure that any kunit based self-test fails when it messes up
> it's locking.

We have a flag for whether lockdep is running though, so is this 
basically a very complicated way to parse /proc/lockdep_debug? :-)

Thanks,

Ingo


Re: [PATCH 2/2] x86/MCE/AMD Support new memory interleaving schemes during address translation

2020-08-15 Thread Ingo Molnar


* Yazen Ghannam  wrote:

> + /* Read D18F1x208 (System Fabric ID Mask 0). */
> + if (amd_df_indirect_read(nid, 1, 0x208, umc, ))
> + goto out_err;
> +
> + /* Determine if system is a legacy Data Fabric type. */
> + legacy_df = !(tmp & 0xFF);

1)

I see this pattern in a lot of places in the code, first the magic 
constant 0x208 is explained a comment, then it is *repeated* and used 
it in the code...

How about introducing an obviously named enum for it instead, which 
would then be self-documenting, saving the comment and removing magic 
numbers:

if (amd_df_indirect_read(nid, 1, AMD_REG_FAB_ID, umc, _fab_id))
goto out_err;

(The symbolic name should be something better, I just guessed 
something quickly.)

Please clean this up in a separate patch, not part of the already 
large patch that introduces a new feature.

2)

'tmp & 0xFF' is some sort of fabric version ID value, with a value of 
0 denoting legacy (pre-Rome) systems, right?

How about making that explicit:

df_version = reg_fab_id & 0xFF;

I'm pretty sure such a version ID might come handy later on, should 
there be quirks or new capabilities with the newer systems ...


>   ret_addr -= hi_addr_offset;
> @@ -728,23 +740,31 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 
> umc, u64 *sys_addr)
>   }
>  
>   lgcy_mmio_hole_en = tmp & BIT(1);
> - intlv_num_chan= (tmp >> 4) & 0xF;
> - intlv_addr_sel= (tmp >> 8) & 0x7;
> - dram_base_addr= (tmp & GENMASK_ULL(31, 12)) << 16;
>  
> - /* {0, 1, 2, 3} map to address bits {8, 9, 10, 11} respectively */
> - if (intlv_addr_sel > 3) {
> - pr_err("%s: Invalid interleave address select %d.\n",
> - __func__, intlv_addr_sel);
> - goto out_err;
> + if (legacy_df) {
> + intlv_num_chan= (tmp >> 4) & 0xF;
> + intlv_addr_sel= (tmp >> 8) & 0x7;
> + } else {
> + intlv_num_chan= (tmp >> 2) & 0xF;
> + intlv_num_dies= (tmp >> 6) & 0x3;
> + intlv_num_sockets = (tmp >> 8) & 0x1;
> + intlv_addr_sel= (tmp >> 9) & 0x7;
>   }
>  
> + dram_base_addr= (tmp & GENMASK_ULL(31, 12)) << 16;
> +
>   /* Read D18F0x114 (DramLimitAddress). */
>   if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, ))
>   goto out_err;
>  
> - intlv_num_sockets = (tmp >> 8) & 0x1;
> - intlv_num_dies= (tmp >> 10) & 0x3;
> + if (legacy_df) {
> + intlv_num_sockets = (tmp >> 8) & 0x1;
> + intlv_num_dies= (tmp >> 10) & 0x3;
> + dst_fabric_id = tmp & 0xFF;
> + } else {
> + dst_fabric_id = tmp & 0x3FF;
> + }
> +
>   dram_limit_addr   = ((tmp & GENMASK_ULL(31, 12)) << 16) | 
> GENMASK_ULL(27, 0);

Could we please structure this code in a bit more readable fashion?

1)

Such as not using the meaningless 'tmp' variable name to first read 
out DramOffset, then DramLimitAddress?

How about naming them a bit more obviously, and retrieving them in a 
single step:

if (amd_df_indirect_read(nid, 0, 0x1B4, umc, _dram_off))
goto out_err;

/* Remove HiAddrOffset from normalized address, if enabled: */
if (reg_dram_off & BIT(0)) {
u64 hi_addr_offset = (tmp & GENMASK_ULL(31, 20)) << 8;

if (norm_addr >= hi_addr_offset) {
ret_addr -= hi_addr_offset;
base = 1;
}
}

if (amd_df_indirect_read(nid, 0, 0x114 + (8 * base), umc, 
_dram_lim))
goto out_err;

('reg' stands for register value - but 'val' would work too.)

Side note: why is the above code using BIT() and GENMASK_UUL() when 
all the other and new code is using fixed masks? Use one of these 
versions instead of a weird mix ...

2)

Then all the fabric version dependent logic could be consolidated 
instead of being spread out:

if (df_version) {
intlv_num_chan= (reg_dram_off >>  2) & 0xF;
intlv_num_dies= (reg_dram_off >>  6) & 0x3;
intlv_num_sockets = (reg_dram_off >>  8) & 0x1;
intlv_addr_sel= (reg_dram_off >>  9) & 0x7;

dst_fabric_id = (reg_dram_lim >>  0) & 0x3FF;
} else {
intlv_num_chan= (reg_dram_off >>  4) & 0xF;
intlv_num_dies= (reg_dram_lim >> 10) & 0x3;
intlv_num_sockets = (reg_dram_lim >>  8) & 0x1;
intlv_addr_sel= (reg_dram_off >>  8) & 0x7;

dst_fabric_id = (reg_dram_lim >>  0) & 0xFF;
}

Also note a couple of more formatting & ordering edits I did to the 
code, to improve the structure. My copy & paste job is untested 
though.

3)

Notably, note how the new code on current systems is the first branch 
- that's the most interesting code most 

[GIT PULL] perf fixes

2020-08-15 Thread Ingo Molnar
Linus,

Please pull the latest perf/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
perf-urgent-2020-08-15

   # HEAD: bcfd218b66790243ef303c1b35ce59f786ded225 perf/x86/rapl: Add support 
for Intel SPR platform

Misc fixes, an expansion of perf syscall access to CAP_PERFMON privileged tools,
plus a RAPL HW-enablement for Intel SPR platforms.

 Thanks,

Ingo

-->
Alexey Budankov (1):
  perf/core: Take over CAP_SYS_PTRACE creds to CAP_PERFMON capability

Bhupesh Sharma (1):
  hw_breakpoint: Remove unused __register_perf_hw_breakpoint() declaration

Masami Hiramatsu (1):
  kprobes: Remove show_registers() function prototype

Zhang Rui (3):
  perf/x86/rapl: Fix missing psys sysfs attributes
  perf/x86/rapl: Support multiple RAPL unit quirks
  perf/x86/rapl: Add support for Intel SPR platform


 arch/x86/events/rapl.c| 46 +--
 include/linux/hw_breakpoint.h |  3 ---
 include/linux/kprobes.h   |  1 -
 kernel/events/core.c  |  4 ++--
 4 files changed, 38 insertions(+), 16 deletions(-)

diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c
index 68b38820b10e..67b411f7e8c4 100644
--- a/arch/x86/events/rapl.c
+++ b/arch/x86/events/rapl.c
@@ -130,11 +130,17 @@ struct rapl_pmus {
struct rapl_pmu *pmus[];
 };
 
+enum rapl_unit_quirk {
+   RAPL_UNIT_QUIRK_NONE,
+   RAPL_UNIT_QUIRK_INTEL_HSW,
+   RAPL_UNIT_QUIRK_INTEL_SPR,
+};
+
 struct rapl_model {
struct perf_msr *rapl_msrs;
unsigned long   events;
unsigned intmsr_power_unit;
-   boolapply_quirk;
+   enum rapl_unit_quirkunit_quirk;
 };
 
  /* 1/2^hw_unit Joule */
@@ -612,14 +618,28 @@ static int rapl_check_hw_unit(struct rapl_model *rm)
for (i = 0; i < NR_RAPL_DOMAINS; i++)
rapl_hw_unit[i] = (msr_rapl_power_unit_bits >> 8) & 0x1FULL;
 
+   switch (rm->unit_quirk) {
/*
 * DRAM domain on HSW server and KNL has fixed energy unit which can be
 * different than the unit from power unit MSR. See
 * "Intel Xeon Processor E5-1600 and E5-2600 v3 Product Families, V2
 * of 2. Datasheet, September 2014, Reference Number: 330784-001 "
 */
-   if (rm->apply_quirk)
+   case RAPL_UNIT_QUIRK_INTEL_HSW:
+   rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   break;
+   /*
+* SPR shares the same DRAM domain energy unit as HSW, plus it
+* also has a fixed energy unit for Psys domain.
+*/
+   case RAPL_UNIT_QUIRK_INTEL_SPR:
rapl_hw_unit[PERF_RAPL_RAM] = 16;
+   rapl_hw_unit[PERF_RAPL_PSYS] = 0;
+   break;
+   default:
+   break;
+   }
+
 
/*
 * Calculate the timer rate:
@@ -665,7 +685,7 @@ static const struct attribute_group *rapl_attr_update[] = {
_events_pkg_group,
_events_ram_group,
_events_gpu_group,
-   _events_gpu_group,
+   _events_psys_group,
NULL,
 };
 
@@ -698,7 +718,6 @@ static struct rapl_model model_snb = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -707,7 +726,6 @@ static struct rapl_model model_snbep = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -717,7 +735,6 @@ static struct rapl_model model_hsw = {
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1),
-   .apply_quirk= false,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -726,7 +743,7 @@ static struct rapl_model model_hsx = {
.events = BIT(PERF_RAPL_PP0) |
  BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -734,7 +751,7 @@ static struct rapl_model model_hsx = {
 static struct rapl_model model_knl = {
.events = BIT(PERF_RAPL_PKG) |
  BIT(PERF_RAPL_RAM),
-   .apply_quirk= true,
+   .unit_quirk = RAPL_UNIT_QUIRK_INTEL_HSW,
.msr_power_unit = MSR_RAPL_POWER_UNIT,
.rapl_msrs  = intel_rapl_msrs,
 };
@@ -745,14 +762,22 @@ static struct rapl_model model_skl = {
  BIT(PERF_RAPL_RAM) |
  BIT(PERF_RAPL_PP1) |
   

Re: [PATCH v3] kunit: added lockdep support

2020-08-15 Thread Ingo Molnar


* Uriel Guajardo  wrote:

> From: Uriel Guajardo 
> 
> KUnit will fail tests upon observing a lockdep failure. Because lockdep
> turns itself off after its first failure, only fail the first test and
> warn users to not expect any future failures from lockdep.
> 
> Similar to lib/locking-selftest [1], we check if the status of
> debug_locks has changed after the execution of a test case. However, we
> do not reset lockdep afterwards.
> 
> Like the locking selftests, we also fix possible preemption count
> corruption from lock bugs.

> --- a/lib/kunit/Makefile
> +++ b/lib/kunit/Makefile

> +void kunit_check_lockdep(struct kunit *test, struct kunit_lockdep *lockdep) {
> + int saved_preempt_count = lockdep->preempt_count;
> + bool saved_debug_locks = lockdep->debug_locks;
> +
> + if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count))
> + preempt_count_set(saved_preempt_count);
> +
> +#ifdef CONFIG_TRACE_IRQFLAGS
> + if (softirq_count())
> + current->softirqs_enabled = 0;
> + else
> + current->softirqs_enabled = 1;
> +#endif
> +
> + if (saved_debug_locks && !debug_locks) {
> + kunit_set_failure(test);
> + kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> + kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> + }


So this basically duplicates what the boot-time locking self-tests do, 
in a poor fashion?

Instead of duplicating unit tests, the right solution would be to 
generalize the locking self-tests and use them both during bootup and 
in kunit.

Thanks,

Ingo


[GIT PULL] locking fixes

2020-08-15 Thread Ingo Molnar
Linus,

Please pull the latest locking/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
locking-urgent-2020-08-15

   # HEAD: 405fa8ac89e7aaa87282df659e525992f2639e76 futex: Convert to use the 
preferred 'fallthrough' macro

A documentation fix and a 'fallthrough' macro update.

 Thanks,

Ingo

-->
Huang Shijie (1):
  Documentation/locking/locktypes: Fix a typo

Miaohe Lin (1):
  futex: Convert to use the preferred 'fallthrough' macro


 Documentation/locking/locktypes.rst | 2 +-
 kernel/futex.c  | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/locking/locktypes.rst 
b/Documentation/locking/locktypes.rst
index 1b577a8bf982..4cefed8048ca 100644
--- a/Documentation/locking/locktypes.rst
+++ b/Documentation/locking/locktypes.rst
@@ -10,7 +10,7 @@ Introduction
 
 
 The kernel provides a variety of locking primitives which can be divided
-into two categories:
+into three categories:
 
  - Sleeping locks
  - CPU local locks
diff --git a/kernel/futex.c b/kernel/futex.c
index 61e8153e6c76..a5876694a60e 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3744,12 +3744,12 @@ long do_futex(u32 __user *uaddr, int op, u32 val, 
ktime_t *timeout,
switch (cmd) {
case FUTEX_WAIT:
val3 = FUTEX_BITSET_MATCH_ANY;
-   /* fall through */
+   fallthrough;
case FUTEX_WAIT_BITSET:
return futex_wait(uaddr, flags, val, timeout, val3);
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
-   /* fall through */
+   fallthrough;
case FUTEX_WAKE_BITSET:
return futex_wake(uaddr, flags, val, val3);
case FUTEX_REQUEUE:


Re: [PATCH] x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task

2020-08-15 Thread Ingo Molnar


* Eric Dumazet  wrote:

> syzbot found its way in 86_fsgsbase_read_task() [1]
> 
> Fix is to make sure ldt pointer is not NULL.

Thanks for this fix. Linus has picked it up (inclusive the typos to 
the x86_fsgsbase_read_task() function name ;-), it's now upstream 
under:

  8ab49526b53d: ("x86/fsgsbase/64: Fix NULL deref in 86_fsgsbase_read_task")

By the fixes tag it looks like this should probably be backported all 
the way back to ~v4.20 or so?

Thanks,

Ingo


[GIT PULL] x86 fixes

2020-08-15 Thread Ingo Molnar
Linus,

Please pull the latest x86/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-urgent-2020-08-15

   # HEAD: a6d996cbd38b42341ad3fce74506b9fdc280e395 x86/alternatives: Acquire 
pte lock with interrupts enabled

Misc fixes and small updates all around the place:

 - Fix mitigation state sysfs output
 - Fix an FPU xstate/sxave code assumption bug triggered by Architectural LBR 
support
 - Fix Lightning Mountain SoC TSC frequency enumeration bug
 - Fix kexec debug output
 - Fix kexec memory range assumption bug
 - Fix a boundary condition in the crash kernel code

 - Optimize porgatory.ro generation a bit
 - Enable ACRN guests to use X2APIC mode
 - Reduce a __text_poke() IRQs-off critical section for the benefit of 
PREEMPT_RT

 Thanks,

Ingo

-->
Dilip Kota (1):
  x86/tsr: Fix tsc frequency enumeration bug on Lightning Mountain SoC

Kan Liang (1):
  x86/fpu/xstate: Fix an xstate size check warning with architectural LBRs

Lianbo Jiang (3):
  x86/crash: Correct the address boundary of function parameters
  kexec: Improve & fix crash_exclude_mem_range() to handle overlapping 
ranges
  kexec_file: Correctly output debugging information for the PT_LOAD ELF 
header

Pawan Gupta (1):
  x86/bugs/multihit: Fix mitigation reporting when VMX is not in use

Pingfan Liu (1):
  x86/purgatory: Don't generate debug info for purgatory.ro

Sebastian Andrzej Siewior (1):
  x86/alternatives: Acquire pte lock with interrupts enabled

Shuo Liu (2):
  x86/acrn: Allow ACRN guest to use X2APIC mode
  x86/acrn: Remove redundant chars from ACRN signature


 Documentation/admin-guide/hw-vuln/multihit.rst |  4 +++
 arch/x86/kernel/alternative.c  |  6 ++--
 arch/x86/kernel/cpu/acrn.c | 12 +++-
 arch/x86/kernel/cpu/bugs.c |  8 -
 arch/x86/kernel/crash.c|  2 +-
 arch/x86/kernel/fpu/xstate.c   | 33 -
 arch/x86/kernel/tsc_msr.c  |  9 --
 arch/x86/purgatory/Makefile|  5 +++-
 kernel/kexec_file.c| 41 --
 9 files changed, 88 insertions(+), 32 deletions(-)

diff --git a/Documentation/admin-guide/hw-vuln/multihit.rst 
b/Documentation/admin-guide/hw-vuln/multihit.rst
index ba9988d8bce5..140e4cec38c3 100644
--- a/Documentation/admin-guide/hw-vuln/multihit.rst
+++ b/Documentation/admin-guide/hw-vuln/multihit.rst
@@ -80,6 +80,10 @@ The possible values in this file are:
- The processor is not vulnerable.
  * - KVM: Mitigation: Split huge pages
- Software changes mitigate this issue.
+ * - KVM: Mitigation: VMX unsupported
+   - KVM is not vulnerable because Virtual Machine Extensions (VMX) is not 
supported.
+ * - KVM: Mitigation: VMX disabled
+   - KVM is not vulnerable because Virtual Machine Extensions (VMX) is 
disabled.
  * - KVM: Vulnerable
- The processor is vulnerable, but no mitigation enabled
 
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index c826cddae157..34a1b8562c31 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -874,8 +874,6 @@ static void *__text_poke(void *addr, const void *opcode, 
size_t len)
 */
BUG_ON(!pages[0] || (cross_page_boundary && !pages[1]));
 
-   local_irq_save(flags);
-
/*
 * Map the page without the global bit, as TLB flushing is done with
 * flush_tlb_mm_range(), which is intended for non-global PTEs.
@@ -892,6 +890,8 @@ static void *__text_poke(void *addr, const void *opcode, 
size_t len)
 */
VM_BUG_ON(!ptep);
 
+   local_irq_save(flags);
+
pte = mk_pte(pages[0], pgprot);
set_pte_at(poking_mm, poking_addr, ptep, pte);
 
@@ -941,8 +941,8 @@ static void *__text_poke(void *addr, const void *opcode, 
size_t len)
 */
BUG_ON(memcmp(addr, opcode, len));
 
-   pte_unmap_unlock(ptep, ptl);
local_irq_restore(flags);
+   pte_unmap_unlock(ptep, ptl);
return addr;
 }
 
diff --git a/arch/x86/kernel/cpu/acrn.c b/arch/x86/kernel/cpu/acrn.c
index 1da9b1c9a2db..0b2c03943ac6 100644
--- a/arch/x86/kernel/cpu/acrn.c
+++ b/arch/x86/kernel/cpu/acrn.c
@@ -11,14 +11,15 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
-static uint32_t __init acrn_detect(void)
+static u32 __init acrn_detect(void)
 {
-   return hypervisor_cpuid_base("ACRNACRNACRN\0\0", 0);
+   return hypervisor_cpuid_base("ACRNACRNACRN", 0);
 }
 
 static void __init acrn_init_platform(void)
@@ -29,12 +30,7 @@ static void __init acrn_init_platform(void)
 
 static bool acrn_x2apic_available(void)
 {
-   /*
-* x2apic is not supported for now. Future enablement will have to check
-* X86_FEATURE_X2APIC to determine whether x2apic is supported in the
-* guest.

[GIT PULL] scheduler fixes

2020-08-15 Thread Ingo Molnar
Linus,

Please pull the latest sched/urgent git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-urgent-2020-08-15

   # HEAD: cc172ff301d8079e941a6eb31758951a6d764084 sched/debug: Fix the 
alignment of the show-state debug output

Two fixes: fix a new tracepoint's output value, and fix the formatting 
of show-state syslog printouts.

 Thanks,

Ingo

-->
Libing Zhou (1):
  sched/debug: Fix the alignment of the show-state debug output

Phil Auld (1):
  sched: Fix use of count for nr_running tracepoint


 kernel/sched/core.c  | 15 ---
 kernel/sched/sched.h |  2 +-
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4a0e7b449b88..09fd62568ba9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6387,10 +6387,10 @@ void sched_show_task(struct task_struct *p)
if (!try_get_task_stack(p))
return;
 
-   printk(KERN_INFO "%-15.15s %c", p->comm, task_state_to_char(p));
+   pr_info("task:%-15.15s state:%c", p->comm, task_state_to_char(p));
 
if (p->state == TASK_RUNNING)
-   printk(KERN_CONT "  running task");
+   pr_cont("  running task");
 #ifdef CONFIG_DEBUG_STACK_USAGE
free = stack_not_used(p);
 #endif
@@ -6399,8 +6399,8 @@ void sched_show_task(struct task_struct *p)
if (pid_alive(p))
ppid = task_pid_nr(rcu_dereference(p->real_parent));
rcu_read_unlock();
-   printk(KERN_CONT "%5lu %5d %6d 0x%08lx\n", free,
-   task_pid_nr(p), ppid,
+   pr_cont(" stack:%5lu pid:%5d ppid:%6d flags:0x%08lx\n",
+   free, task_pid_nr(p), ppid,
(unsigned long)task_thread_info(p)->flags);
 
print_worker_info(KERN_INFO, p);
@@ -6435,13 +6435,6 @@ void show_state_filter(unsigned long state_filter)
 {
struct task_struct *g, *p;
 
-#if BITS_PER_LONG == 32
-   printk(KERN_INFO
-   "  taskPC stack   pid father\n");
-#else
-   printk(KERN_INFO
-   "  taskPC stack   pid father\n");
-#endif
rcu_read_lock();
for_each_process_thread(g, p) {
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 3fd283892761..28709f6b0975 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1999,7 +1999,7 @@ static inline void sub_nr_running(struct rq *rq, unsigned 
count)
 {
rq->nr_running -= count;
if (trace_sched_update_nr_running_tp_enabled()) {
-   call_trace_sched_update_nr_running(rq, count);
+   call_trace_sched_update_nr_running(rq, -count);
}
 
/* Check if we still need preemption */


Re: linux-next: new build warnings after binutils update

2020-08-15 Thread Ingo Molnar


* Kees Cook  wrote:

> On Fri, Aug 14, 2020 at 12:22:06PM +0200, Ingo Molnar wrote:
> > > [0] 
> > > https://lore.kernel.org/lkml/20200731202738.2577854-6-nived...@alum.mit.edu/
> > 
> > It all looked good to me but was a bit late for v5.9, will pick up 
> > after -rc1.
> 
> Excellent! Thank you. I'll base the orphan series on x86/boot now. Once
> I send a v6 (there are a few more things to tweak), can you carry that
> in -tip as well (it includes arm and arm64 as well, all of which depend
> on several asm-generic patches).

Sure, that looks the most sensible, since there's so much x86 impact. 
Might migrate the commits over into a more generic topic branch - 
started out with x86/boot to get things going.

Thanks,

Ingo


Re: [PATCH 1/2] x86/MCE/AMD, EDAC/mce_amd: Use AMD NodeId for Family17h+ DRAM Decode

2020-08-15 Thread Ingo Molnar


* Yazen Ghannam  wrote:

> From: Yazen Ghannam 
> 
> The edac_mce_amd module calls decode_dram_ecc() on AMD Family17h and
> later systems. This function is used in amd64_edac_mod to do
> system-specific decoding for DRAM ECC errors. The function takes a
> "NodeId" as a parameter.
> 
> In AMD documentation, NodeId is used to identify a physical die in a
> system. This can be used to identify a node in the AMD_NB code and also
> it is used with umc_normaddr_to_sysaddr().
> 
> However, the input used for decode_dram_ecc() is currently the NUMA node
> of a logical CPU. In the default configuration, the NUMA node and
> physical die will be equivalent, so this doesn't have an impact. But the
> NUMA node configuration can be adjusted with optional memory
> interleaving schemes. This will cause the NUMA node enumeration to not
> match the physical die enumeration. The mismatch will cause the address
> translation function to fail or report incorrect results.
> 
> Save the "NodeId" as a percpu value during init in AMD MCE code. Export
> a function to return the value which can be used from modules like
> edac_mce_amd.
> 
> Fixes: fbe63acf62f5 ("EDAC, mce_amd: Use cpu_to_node() to find the node ID")
> Signed-off-by: Yazen Ghannam 
> ---
>  arch/x86/include/asm/mce.h|  2 ++
>  arch/x86/kernel/cpu/mce/amd.c | 11 +++
>  drivers/edac/mce_amd.c|  2 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index cf503824529c..92527cc9ed06 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -343,6 +343,8 @@ extern struct smca_bank smca_banks[MAX_NR_BANKS];
>  extern const char *smca_get_long_name(enum smca_bank_types t);
>  extern bool amd_mce_is_memory_error(struct mce *m);
>  
> +extern u8 amd_cpu_to_node(unsigned int cpu);
> +
>  extern int mce_threshold_create_device(unsigned int cpu);
>  extern int mce_threshold_remove_device(unsigned int cpu);
>  
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 99be063fcb1b..524edf81e287 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -202,6 +202,9 @@ static DEFINE_PER_CPU(unsigned int, bank_map);
>  /* Map of banks that have more than MCA_MISC0 available. */
>  static DEFINE_PER_CPU(u32, smca_misc_banks_map);
>  
> +/* CPUID_Fn801E_ECX[NodeId] used to identify a physical node/die. */
> +static DEFINE_PER_CPU(u8, node_id);
> +
>  static void amd_threshold_interrupt(void);
>  static void amd_deferred_error_interrupt(void);
>  
> @@ -233,6 +236,12 @@ static void smca_set_misc_banks_map(unsigned int bank, 
> unsigned int cpu)
>  
>  }
>  
> +u8 amd_cpu_to_node(unsigned int cpu)
> +{
> + return per_cpu(node_id, cpu);
> +}
> +EXPORT_SYMBOL_GPL(amd_cpu_to_node);
> +
>  static void smca_configure(unsigned int bank, unsigned int cpu)
>  {
>   unsigned int i, hwid_mcatype;
> @@ -240,6 +249,8 @@ static void smca_configure(unsigned int bank, unsigned 
> int cpu)
>   u32 high, low;
>   u32 smca_config = MSR_AMD64_SMCA_MCx_CONFIG(bank);
>  
> + this_cpu_write(node_id, cpuid_ecx(0x801e) & 0xFF);

So we already have this magic number used for a similar purpose, in 
amd_get_topology():

cpuid(0x801e, , , , );

node_id  = ecx & 0xff;

Firstly, could we please at least give 0x801e a proper symbolic 
name, use it in hygon.c too (which AFAIK is derived from AMD anyway), 
and then use it in these new patches?

Secondly, why not stick node_id into struct cpuinfo_x86, where the MCA 
code can then use it without having to introduce a new percpu data 
structure?

There's also the underlying assumption that there's only ever going to 
be 256 nodes, which limitation I'm sure we'll hear about in a couple 
of years as not being quite enough. ;-)

So less hardcoding and more generalizations please.

Thanks,

Ingo


Re: linux-next: new build warnings after binutils update

2020-08-14 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> (+ Arvind, Kees)
> 
> On Thu, 13 Aug 2020 at 22:58, Stephen Rothwell  wrote:
> >
> > Hi all,
> >
> > After upgading some software, builds of Linus' tree now produce these 
> > warnings:
> >
> > x86_64-linux-gnu-ld: arch/x86/boot/compressed/head_64.o: warning: 
> > relocation in read-only section `.head.text'
> > x86_64-linux-gnu-ld: warning: creating DT_TEXTREL in a PIE
> >
> > I upgraded binutils from 2.34-8 to 2.35-1 (Debian versions).
> >
> > $ x86_64-linux-gnu-gcc --version
> > x86_64-linux-gnu-gcc (Debian 9.3.0-13) 9.3.0
> >
> > Any ideas?
> >
> 
> Arvind and I have some patches on the list that fix various relocation
> issues in the decompressor binary.
> 
> As far as I can tell, Arvind's patch to suppress runtime relocations
> [0] addresses this exact issue.
> 
> Unfortunately, in spite of various pings and attempts to get the x86
> maintainers to notice this series, it has been ignored so far. Perhaps
> this is a good time to merge it for -rc1/2?
> 
> [0] 
> https://lore.kernel.org/lkml/20200731202738.2577854-6-nived...@alum.mit.edu/

It all looked good to me but was a bit late for v5.9, will pick up 
after -rc1.

Thanks,

Ingo


Re: [PATCH v2] seqlock: Fix build errors

2020-08-14 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> > Signed-off-by: Xingxing Su 
> > ---
> >  v2:  update the commit message
> > 
> >  include/linux/seqlock.h | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
> > index 54bc204..4763c13 100644
> > --- a/include/linux/seqlock.h
> > +++ b/include/linux/seqlock.h
> > @@ -17,6 +17,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> 
> Wrong place, it's lockdep_assert_preemption_disabled() that requires
> asm/percpu.h, and thus lockdep.h should include linux/smp. before
> asm/percpu.h

It already does that upstream:

 #ifndef __LINUX_LOCKDEP_H
 #define __LINUX_LOCKDEP_H

 #include 
 #include 
 #include 

So it would be interesting to know what kernel version the build error 
occurs on.

Thanks,

Ingo


Re: [GIT PULL] x86/mm changes for v5.9

2020-08-13 Thread Ingo Molnar


* Joerg Roedel  wrote:

> On Thu, Aug 06, 2020 at 11:20:19PM +0200, Ingo Molnar wrote:
> > I've reverted it in x86/urgent as well earlier today, can send you 
> > that tree right now if you prefer that route.
> 
> I sent a fix for preallocate_vmalloc_pages() to correctly pre-allocate
> the vmalloc PGD entries. I verified that it works and that
> swapper_pg_dir contains the correct entries now. This should also fix
> the issue Jason is seeing.

Thanks!

There's one thing left to do. Linus has reverted the patch which 
exposed this bug:

  7b4ea9456dd3: ("Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"")

and has applied your fix:

  995909a4e22b: ("x86/mm/64: Do not dereference non-present PGD entries")

I think now we can re-apply the original commit:

  8bb9bf242d1f: ("x86/mm/64: Do not sync vmalloc/ioremap mappings")

Mind re-sending it, with an updated changelog that explains why it's 
now truly safe?

Would be tentatively scheduled for v5.10 though, we've had enough 
excitement in this area for v5.9 I think. :-/

> Sorry for screwing this up :-(

No problem, and it was my fault too: I sent 8bb9bf242d1f to Linus too 
quickly, just 7 days after applying it - x86/mm patches usually need a 
few weeks of testing.

Thanks,

Ingo


Re: [PATCH] x86/mm/64: Do not dereference non-present PGD entries

2020-08-13 Thread Ingo Molnar


* Mike Rapoport  wrote:

> On Mon, Aug 10, 2020 at 07:27:33AM -0700, Dave Hansen wrote:
> > ... adding Kirill
> > 
> > On 8/7/20 1:40 AM, Joerg Roedel wrote:
> > > + lvl = "p4d";
> > > + p4d = p4d_alloc(_mm, pgd, addr);
> > > + if (!p4d)
> > > + goto failed;
> > >  
> > > + /*
> > > +  * With 5-level paging the P4D level is not folded. So the PGDs
> > > +  * are now populated and there is no need to walk down to the
> > > +  * PUD level.
> > > +  */
> > >   if (pgtable_l5_enabled())
> > >   continue;
> > 
> > It's early and I'm a coffee or two short of awake, but I had to stare at
> > the comment for a but to make sense of it.
> > 
> > It feels wrong, I think, because the 5-level code usually ends up doing
> > *more* allocations and in this case, it is _appearing_ to do fewer.
> > Would something like this make sense?
> 
> Unless I miss something, with 5 levels vmalloc mappings are shared at
> p4d level, so allocating a p4d page would be enough. With 4 levels,
> p4d_alloc() is a nop and pud is the first actually populated level below
> pgd.
> 
> > /*
> >  * The goal here is to allocate all possibly required
> >  * hardware page tables pointed to by the top hardware
> >  * level.
> >  *
> >  * On 4-level systems, the p4d layer is folded away and
> >  * the above code does no preallocation.  Below, go down
> >  * to the pud _software_ level to ensure the second
> >  * hardware level is allocated.
> >  */

Would be nice to integrate all these explanations into the comment itself?

Thanks,

Ingo


Re: [PATCH v5 02/17] ARM: Revert back to default scheduler topology.

2020-08-13 Thread Ingo Molnar


* Valentin Schneider  wrote:

> The ARM-specific GMC level is meant to be built using the thread sibling
> mask, but no devicetree in arch/arm/boot/dts uses the 'thread' cpu-map
> binding. With SD_SHARE_POWERDOMAIN gone, this topology level can be
> removed, at which point ARM no longer benefits from having a custom defined
> topology table.
> 
> Delete the GMC topology level by making ARM use the default scheduler
> topology table. This essentially reverts commit
> 
>   fb2aa85564f4 ("sched, ARM: Create a dedicated scheduler topology table")
> 
> Cc: Russell King 
> Suggested-by: Dietmar Eggemann 
> Reviewed-by: Dietmar Eggemann 
> Signed-off-by: Valentin Schneider 

Minor changelog nit, it's helpful to add this final sentence:

No change in functionality is expected.

( If indeed no change in functionality is expected. ;-)

Thanks,

Ingo


Re: [PATCH] sched: print fields name when do sched_show_task

2020-08-13 Thread Ingo Molnar


* Libing Zhou  wrote:

> Current sysrq(t) output task fields name are not aligned with
> actual task fields value, e.g.:
> 
> kernel: sysrq: SysRq : Show State
> kernel:  taskPC stack   pid father
> kernel: systemd S12456 1  0 0x
> kernel: Call Trace:
> kernel: ? __schedule+0x240/0x740
> 
> To make it more readable, print fields name together with task fields
> value in same line, remove separate fields name print.

Makes sense in principle, but could you please quote the new format as 
well in the changelog, not just the old format? Makes it much easier 
to compare.

Thanks,

Ingo


[tip: locking/urgent] x86/headers: Remove APIC headers from

2020-08-06 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the locking/urgent branch of tip:

Commit-ID: 13c01139b17163c9b2aa543a9c39f8bbc875b625
Gitweb:
https://git.kernel.org/tip/13c01139b17163c9b2aa543a9c39f8bbc875b625
Author:Ingo Molnar 
AuthorDate:Thu, 06 Aug 2020 14:34:32 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 06 Aug 2020 16:13:09 +02:00

x86/headers: Remove APIC headers from 

The APIC headers are relatively complex and bring in additional
header dependencies - while smp.h is a relatively simple header
included from high level headers.

Remove the dependency and add in the missing #include's in .c
files where they gained it indirectly before.

Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/smp.h  | 10 --
 arch/x86/include/asm/tsc.h  |  1 +
 arch/x86/kernel/apic/apic.c |  1 +
 arch/x86/kernel/apic/bigsmp_32.c|  1 +
 arch/x86/kernel/apic/ipi.c  |  1 +
 arch/x86/kernel/apic/local.h|  1 +
 arch/x86/kernel/apic/probe_32.c |  1 +
 arch/x86/kernel/devicetree.c|  1 +
 arch/x86/kernel/irqinit.c   |  2 ++
 arch/x86/kernel/jailhouse.c |  1 +
 arch/x86/kernel/mpparse.c   |  2 ++
 arch/x86/kernel/setup.c |  1 +
 arch/x86/kernel/topology.c  |  1 +
 arch/x86/xen/apic.c |  1 +
 arch/x86/xen/enlighten_hvm.c|  1 +
 arch/x86/xen/smp_pv.c   |  1 +
 drivers/iommu/intel/irq_remapping.c |  1 +
 17 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index e15f364..c0538f8 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -5,16 +5,6 @@
 #include 
 #include 
 
-/*
- * We need the APIC definitions automatically as part of 'smp.h'
- */
-#ifdef CONFIG_X86_LOCAL_APIC
-# include 
-# include 
-# ifdef CONFIG_X86_IO_APIC
-#  include 
-# endif
-#endif
 #include 
 #include 
 
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 8a0c25c..db59771 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -6,6 +6,7 @@
 #define _ASM_X86_TSC_H
 
 #include 
+#include 
 
 #define NS_SCALE   10 /* 2^10, carefully chosen */
 #define US_SCALE   32 /* 2^32, arbitralrily chosen */
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index e0e2f02..0c89003 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -47,6 +47,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/apic/bigsmp_32.c b/arch/x86/kernel/apic/bigsmp_32.c
index 38b5b51..98d015a 100644
--- a/arch/x86/kernel/apic/bigsmp_32.c
+++ b/arch/x86/kernel/apic/bigsmp_32.c
@@ -9,6 +9,7 @@
 #include 
 
 #include 
+#include 
 
 #include "local.h"
 
diff --git a/arch/x86/kernel/apic/ipi.c b/arch/x86/kernel/apic/ipi.c
index 6ca0f91..387154e 100644
--- a/arch/x86/kernel/apic/ipi.c
+++ b/arch/x86/kernel/apic/ipi.c
@@ -2,6 +2,7 @@
 
 #include 
 #include 
+#include 
 
 #include "local.h"
 
diff --git a/arch/x86/kernel/apic/local.h b/arch/x86/kernel/apic/local.h
index 04797f0..a997d84 100644
--- a/arch/x86/kernel/apic/local.h
+++ b/arch/x86/kernel/apic/local.h
@@ -10,6 +10,7 @@
 
 #include 
 
+#include 
 #include 
 
 /* APIC flat 64 */
diff --git a/arch/x86/kernel/apic/probe_32.c b/arch/x86/kernel/apic/probe_32.c
index 67b33d6..7bda71d 100644
--- a/arch/x86/kernel/apic/probe_32.c
+++ b/arch/x86/kernel/apic/probe_32.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 
diff --git a/arch/x86/kernel/devicetree.c b/arch/x86/kernel/devicetree.c
index 8d85e00..a0e8fc7 100644
--- a/arch/x86/kernel/devicetree.c
+++ b/arch/x86/kernel/devicetree.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index dd73135..beb1bad 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -22,6 +22,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/jailhouse.c b/arch/x86/kernel/jailhouse.c
index 6eb8b50..2caf5b9 100644
--- a/arch/x86/kernel/jailhouse.c
+++ b/arch/x86/kernel/jailhouse.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/mpparse.c b/arch/x86/kernel/mpparse.c
index afac7cc..db509e1 100644
--- a/arch/x86/kernel/mpparse.c
+++ b/arch/x86/kernel/mpparse.c
@@ -19,6 +19,8 @@
 #include 
 #include 
 
+#include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index a3767e7..f767198 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -25,6 +25,7 @@
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index b8810eb..0a2ec80 100644
--- a/arch/x

Re: [GIT PULL] x86/mm changes for v5.9

2020-08-06 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Thu, Aug 6, 2020 at 11:57 AM Joerg Roedel  wrote:
> >
> > On Thu, Aug 06, 2020 at 03:10:34PM +0200, Ingo Molnar wrote:
> > >
> > > * Linus Torvalds  wrote:
> > > > So apparently the "the page-table pages are all pre-allocated now" is
> > > > simply not true. Joerg?
> >
> > It pre-allocates the whole vmalloc/ioremap PUD/P4D pages, but I actually
> > only tested it with 4-level paging, as I don't have access to 5-level
> > paging hardware.
> 
> I don't think Jason has either.
> 
> The
> 
> PGD 0 P4D 0
> 
> line tells us that "pgd_present()" is true, even though PGD is 0
> (otherwise it wouldn't print the P4D part). That means that he doesn't
> have l5 enabled.
> 
> But you may obviously have different settings for CONFIG_X86_5LEVEL,
> and maybe that ends up changing something?
> 
> But since apparently it's not immediately obvious what the problem is,
> I'll revert it for now.

I've reverted it in x86/urgent as well earlier today, can send you 
that tree right now if you prefer that route.

Thanks,

Ingo


Re: improve compat handling for the i386 u64 alignment quirk v2

2020-08-06 Thread Ingo Molnar


* Christoph Hellwig  wrote:

> Hi all,
> 
> the i386 ABI is a little special in that it uses less than natural
> alignment for 64-bit integer types (u64 and s64), and a significant
> amount of our compat handlers deals with just that.  Unfortunately
> there is no good way to check for this specific quirk at runtime,
> similar how in_compat_syscall() checks for a compat syscall.  This
> series adds such a check, and then uses the quota code as an example
> of how this improves the compat handling.  I have a few other places
> in mind where this will also be useful going forward.
> 
> Changes since v1:
>  - use asm-generic/compat.h instead of linux/compat.h for
>compat_u64 and compat_s64
>  - fix a typo
> 
> Diffstat:
>  b/arch/arm64/include/asm/compat.h|2 
>  b/arch/mips/include/asm/compat.h |2 
>  b/arch/parisc/include/asm/compat.h   |2 
>  b/arch/powerpc/include/asm/compat.h  |2 
>  b/arch/s390/include/asm/compat.h |2 
>  b/arch/sparc/include/asm/compat.h|3 
>  b/arch/x86/entry/syscalls/syscall_32.tbl |2 
>  b/arch/x86/include/asm/compat.h  |3 
>  b/fs/quota/Kconfig   |5 -
>  b/fs/quota/Makefile  |1 
>  b/fs/quota/compat.h  |   34 
>  b/fs/quota/quota.c   |   73 +++---
>  b/include/asm-generic/compat.h   |8 ++
>  b/include/linux/compat.h |9 ++
>  b/include/linux/quotaops.h   |3 
>  b/kernel/sys_ni.c|1 
>  fs/quota/compat.c|  120 
> ---
>  17 files changed, 113 insertions(+), 159 deletions(-)

If nobody objects to this being done at runtime, and if it's 100% ABI 
compatible, then the x86 impact looks good to me:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards

2020-08-06 Thread Ingo Molnar


* Valentin Schneider  wrote:

> We currently set this flag *only* on domains whose topology level exactly
> match the level where we detect asymmetry (as returned by
> asym_cpu_capacity_level()). This is rather problematic.
> 
> Say there are two clusters in the system, one with a lone big CPU and the
> other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):
> 
> DIE []
> MC  [ ][ ]
>  0   1   2   3  4
>  L   L   B   B  B
> 
> asym_cpu_capacity_level() will figure out that the MC level is the one
> where all CPUs can see a CPU of max capacity, and we will thus set
> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
> 
> That lone big CPU will degenerate its MC domain, since it would be alone in
> there, and will end up with just a DIE domain. Since the flag was only set
> at MC, this CPU ends up not seeing any SD with the flag set, which is
> broken.
> 
> Rather than clearing dflags at every topology level, clear it before
> entering the topology level loop. This will properly propagate upwards
> flags that are set starting from a certain level.
> 
> Reviewed-by: Quentin Perret 
> Signed-off-by: Valentin Schneider 
> ---
>  kernel/sched/topology.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 865fff3ef20a..42b89668e1e4 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1985,11 +1985,10 @@ build_sched_domains(const struct cpumask *cpu_map, 
> struct sched_domain_attr *att
>   /* Set up domains for CPUs specified by the cpu_map: */
>   for_each_cpu(i, cpu_map) {
>   struct sched_domain_topology_level *tl;
> + int dflags = 0;
>  
>   sd = NULL;
>   for_each_sd_topology(tl) {
> - int dflags = 0;
> -
>   if (tl == tl_asym) {
>   dflags |= SD_ASYM_CPUCAPACITY;
>   has_asym = true;

I'd suggest ordering all patches with potential side effects at the 
end, to make them easier to bisect.

I.e. I'd reorder this series to do:

 - Obviously correct renamings & cleanups

 - Convert the code over to the new instrumented sd-flags method. This 
   will presumably spew a few warnings for problems the new debugging 
   checks catch in existing topologies.

 - Do all the behavioral changes and fixes like this patch, even if we 
   think that they have no serious side effects.

In that sense it might make sense to order the two ARM patches to the 
later stage as well - but I suppose it's OK to do those two first as 
well.

Nice series otherwise, these new checks look really useful and already 
caught bugs.

Thanks,

Ingo


Re: [GIT PULL] x86/mm changes for v5.9

2020-08-06 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Wed, Aug 5, 2020 at 4:03 AM Jason A. Donenfeld  wrote:
> >
> > The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
> > mappings") causes the OOPS below, in Linus' tree and in linux-next,
> > unearthed by my CI on .
> > Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
> > away.
> 
> The oops happens early in the function, and the "Code:" line actually
> gets almost the whole function prologue in it (missing first two bytes
> are probably "push %rbp"):
> 
>0: 41 56push   %r14
>2: 41 55push   %r13
>4: 41 54push   %r12
>6: 55push   %rbp
>7: 48 89 f5  mov%rsi,%rbp
>a: 53push   %rbx
>b: 48 89 fb  mov%rdi,%rbx
>e: 48 83 ec 08  sub$0x8,%rsp
>   12: 48 8b 06  mov(%rsi),%rax
>   15: 4c 8b 67 40  mov0x40(%rdi),%r12
>   19: 49 89 c6  mov%rax,%r14
>   1c: 45 30 f6  xor%r14b,%r14b
>   1f: a8 04test   $0x4,%al
>   21: b8 00 00 00 00mov$0x0,%eax
>   26: 4c 0f 44 f0  cmove  %rax,%r14
>   2a:* 49 8b 46 08  mov0x8(%r14),%rax <-- trapping instruction
> 
> 
> > BUG: unable to handle page fault for address: e8d00608
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x) - not-present page
> > PGD 0 P4D 0
> 
> Yeah, missing page table because it wasn't copied.
> 
> Presumably because that kthread is using the active_mm of some random
> user space process that didn't get sync'ed.
> 
> And the sync_global_pgds() may have ended up being sufficient
> synchronization with whoever allocated thigns, even if it wasn't about
> the TLB contents themselves.
> 
> So apparently the "the page-table pages are all pre-allocated now" is
> simply not true. Joerg?
> 
> Unless somebody can figure this out fairly quickly, I think it should
> just be reverted.

Agreed. Joerg?

Thanks,

Ingo


Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

2020-08-06 Thread Ingo Molnar


* kernel test robot  wrote:

> Greeting,
> 
> FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> 
> 
> commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce 
> copy_mc_generic()")
> url: 
> https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> 
> 
> in testcase: fio-basic
> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 
> 256G memory
> with following parameters:

So this performance regression, if it isn't a spurious result, looks 
concerning. Is this expected?

Thanks,

Ingo


[tip: x86/urgent] Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"

2020-08-06 Thread tip-bot2 for Ingo Molnar
The following commit has been merged into the x86/urgent branch of tip:

Commit-ID: f17506e2f14bfa8a6a2de9b8b6a3ccc6b6f7c9b6
Gitweb:
https://git.kernel.org/tip/f17506e2f14bfa8a6a2de9b8b6a3ccc6b6f7c9b6
Author:Ingo Molnar 
AuthorDate:Thu, 06 Aug 2020 15:11:03 +02:00
Committer: Ingo Molnar 
CommitterDate: Thu, 06 Aug 2020 15:11:03 +02:00

Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"

This reverts commit 8bb9bf242d1fee925636353807c511d54fde8986.

Jason reported that this causes a new oops in process_one_work(),
and bisected it to this commit.

Linus suspects that it was caused by missing pagetable synchronization:

> > BUG: unable to handle page fault for address: e8d00608
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x) - not-present page
> > PGD 0 P4D 0
>
> Yeah, missing page table because it wasn't copied.
>
> Presumably because that kthread is using the active_mm of some random
> user space process that didn't get sync'ed.
>
> And the sync_global_pgds() may have ended up being sufficient
> synchronization with whoever allocated thigns, even if it wasn't about
> the TLB contents themselves.
>
> So apparently the "the page-table pages are all pre-allocated now" is
> simply not true.

Revert the commit for now.

Reported-by: "Jason A. Donenfeld" 
Analyzed-by: Linus Torvalds 
Cc: Joerg Roedel 
Signed-off-by: Ingo Molnar 
---
 arch/x86/include/asm/pgtable_64_types.h | 2 ++
 arch/x86/mm/init_64.c   | 5 +
 2 files changed, 7 insertions(+)

diff --git a/arch/x86/include/asm/pgtable_64_types.h 
b/arch/x86/include/asm/pgtable_64_types.h
index 52e5f5f..8f63efb 100644
--- a/arch/x86/include/asm/pgtable_64_types.h
+++ b/arch/x86/include/asm/pgtable_64_types.h
@@ -159,4 +159,6 @@ extern unsigned int ptrs_per_p4d;
 
 #define PGD_KERNEL_START   ((PAGE_SIZE / 2) / sizeof(pgd_t))
 
+#define ARCH_PAGE_TABLE_SYNC_MASK  (pgtable_l5_enabled() ? 
PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED)
+
 #endif /* _ASM_X86_PGTABLE_64_DEFS_H */
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e65b96f..3f4e29a 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -217,6 +217,11 @@ static void sync_global_pgds(unsigned long start, unsigned 
long end)
sync_global_pgds_l4(start, end);
 }
 
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end)
+{
+   sync_global_pgds(start, end);
+}
+
 /*
  * NOTE: This function is marked __ref because it calls __init function
  * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.


Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata

2020-08-06 Thread Ingo Molnar


* Valentin Schneider  wrote:

> +#ifndef SD_FLAG
> +#define SD_FLAG(x, y, z)
> +#endif

AFAICS there's not a single use of sd_flags.h that doesn't come with 
its own SD_FLAG definition, so I suppose this should be:

#ifndef SD_FLAG
# error "Should not happen."
#endif

?

Also, some nits:

> +/*
> + * Expected flag uses
> + *
> + * SHARED_CHILD: These flags are meant to be set from the base domain 
> upwards.
> + * If a domain has this flag set, all of its children should have it set. 
> This
> + * is usually because the flag describes some shared resource (all CPUs in 
> that
> + * domain share the same foobar), or because they are tied to a scheduling
> + * behaviour that we want to disable at some point in the hierarchy for
> + * scalability reasons.

s/foobar/resource

?

> +/*
> + * cross-node balancing
> + *
> + * SHARED_PARENT: Set for all NUMA levels above NODE.
> + */
> +SD_FLAG(SD_NUMA,12, SDF_SHARED_PARENT)

s/cross-node/Cross-node

BTW., is there any particular reason why these need to be defines with 
a manual enumeration of flag values - couldn't we generate 
auto-enumerated C enums instead or so?

> +#ifdef CONFIG_SCHED_DEBUG
> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = 
> #_name},

s/{./{ .
s/e}/e }

Thanks,

Ingo


Re: [PATCH v11 2/5] x86: kdump: move reserve_crashkernel_low() into crash_core.c

2020-08-06 Thread Ingo Molnar


* Chen Zhou  wrote:

> In preparation for supporting reserve_crashkernel_low in arm64 as
> x86_64 does, move reserve_crashkernel_low() into kernel/crash_core.c.
> 
> BTW, move x86_64 CRASH_ALIGN to 2M suggested by Dave. CONFIG_PHYSICAL_ALIGN
> can be selected from 2M to 16M, move to the same as arm64.
> 
> Signed-off-by: Chen Zhou 
> ---
>  arch/x86/include/asm/kexec.h | 24 ++
>  arch/x86/kernel/setup.c  | 86 +++-
>  include/linux/crash_core.h   |  3 ++
>  include/linux/kexec.h|  2 -
>  kernel/crash_core.c  | 74 +++
>  kernel/kexec_core.c  | 17 ---
>  6 files changed, 107 insertions(+), 99 deletions(-)

Since the changes are centered around arm64, I suppose the arm64 tree 
will carry this patchset?

Assuming that this is a 100% invariant moving of code that doesn't 
regress on x86:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH] sched/core: add unlikely in group_has_capacity()

2020-08-06 Thread Ingo Molnar


* Qi Zheng  wrote:

> 1. The group_has_capacity() function is only called in
>group_classify().
> 2. Before calling the group_has_capacity() function,
>group_is_overloaded() will first judge the following
>formula, if it holds, the group_classify() will directly
>return the group_overloaded.
> 
>   (sgs->group_capacity * imbalance_pct) <
> (sgs->group_runnable * 100)
> 
> Therefore, when the group_has_capacity() is called, the
> probability that the above formalu holds is very small. Hint
> compilers about that.
> 
> Signed-off-by: Qi Zheng 
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 2ba8f230feb9..9074fd5e23b2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8234,8 +8234,8 @@ group_has_capacity(unsigned int imbalance_pct, struct 
> sg_lb_stats *sgs)
>   if (sgs->sum_nr_running < sgs->group_weight)
>   return true;
>  
> - if ((sgs->group_capacity * imbalance_pct) <
> - (sgs->group_runnable * 100))
> + if (unlikely((sgs->group_capacity * imbalance_pct) <
> + (sgs->group_runnable * 100)))
>   return false;

Isn't the probability that this second check will match around 0%?

I.e. wouldn't the right fix be to remove the duplicate check from 
group_has_capacity(), because it's already been checked in 
group_classify()? Maybe while leaving a comment in place?

Thanks,

Ingo


Re: [PATCH v4 00/10] Function Granular KASLR

2020-08-06 Thread Ingo Molnar


* Kristen Carlson Accardi  wrote:

> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> -
> 
> This patch set is an implementation of finer grained kernel address space
> randomization. It rearranges your kernel code at load time 
> on a per-function level granularity, with only around a second added to
> boot time.

This is a very nice feature IMO, and it should be far more effective 
at randomizing the kernel, due to the sheer number of randomization 
bits that kernel function granular randomization presents.

If this is a good approximation of fg-kaslr randomization depth:

  thule:~/tip> grep ' [tT] ' /proc/kallsyms  | wc -l
  88488

... then that's 80K bits of randomization instead of the mere handful 
of kaslr bits we have today. Very nice!

> In order to hide our new layout, symbols reported through 
> /proc/kallsyms will be displayed in a random order.

Neat. :-)

> Performance Impact
> --

> * Run time
> The performance impact at run-time of function reordering varies by workload.
> Using kcbench, a kernel compilation benchmark, the performance of a kernel
> build with finer grained KASLR was about 1% slower than a kernel with standard
> KASLR. Analysis with perf showed a slightly higher percentage of 
> L1-icache-load-misses. Other workloads were examined as well, with varied
> results. Some workloads performed significantly worse under FGKASLR, while
> others stayed the same or were mysteriously better. In general, it will
> depend on the code flow whether or not finer grained KASLR will impact
> your workload, and how the underlying code was designed. Because the layout
> changes per boot, each time a system is rebooted the performance of a workload
> may change.

I'd guess that the biggest performance impact comes from tearing apart 
'groups' of functions that particular workloads are using.

In that sense it might be worthwile to add a '__kaslr_group' function 
tag to key functions, which would keep certain performance critical 
functions next to each other.

This shouldn't really be a problem, as even with generous amount of 
grouping the number of randomization bits is incredibly large.

> Future work could identify hot areas that may not be randomized and either
> leave them in the .text section or group them together into a single section
> that may be randomized. If grouping things together helps, one other thing to
> consider is that if we could identify text blobs that should be grouped 
> together
> to benefit a particular code flow, it could be interesting to explore
> whether this security feature could be also be used as a performance
> feature if you are interested in optimizing your kernel layout for a
> particular workload at boot time. Optimizing function layout for a particular
> workload has been researched and proven effective - for more information
> read the Facebook paper "Optimizing Function Placement for Large-Scale
> Data-Center Applications" (see references section below).

I'm pretty sure the 'grouping' solution would address any real 
slowdowns.

I'd also suggest allowing the passing in of a boot-time pseudo-random 
generator seed number, which would allow the creation of a 
pseudo-randomized but repeatable layout across reboots.

> Image Size
> --
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file.
> With a standard distro config, the resulting vmlinux was increased by
> about 3%. The compressed image is also increased due to the header files,
> as well as the extra relocations that must be added. You can expect fgkaslr
> to increase the size of the compressed image by about 15%.

What is the increase of the resulting raw kernel image? Additional 
relocations might increase its size (unless I'm missing something) - 
it would be nice to measure this effect. I'd expect this to be really 
low.

vmlinux or compressed kernel size doesn't really matter on x86-64, 
it's a boot time only expense well within typical system resource 
limits.

> Disabling
> -
> Disabling normal KASLR using the nokaslr command line option also disables
> fgkaslr. It is also possible to disable fgkaslr separately by booting with
> fgkaslr=off on the commandline.

I'd suggest to also add a 'nofgkaslr' boot option if it doesn't yet 
exist, to keep usage symmetric with kaslr.

Likewise, there should probably be a 'kaslr=off' option as well.

The less random our user interfaces are, the better ...

>  arch/x86/boot/compressed/Makefile |   9 +-
>  arch/x86/boot/compressed/fgkaslr.c| 811 ++
>  arch/x86/boot/compressed/kaslr.c  |   4 -
>  arch/x86/boot/compressed/misc.c   | 157 +++-
>  arch/x86/boot/compressed/misc.h   |  30 +
>  arch/x86/boot/compressed/utils.c  |  11 +
>  

Re: [x86/copy_mc] a0ac629ebe: fio.read_iops -43.3% regression

2020-08-06 Thread Ingo Molnar


* Dan Williams  wrote:

> On Thu, Aug 6, 2020 at 6:35 AM Ingo Molnar  wrote:
> >
> >
> > * kernel test robot  wrote:
> >
> > > Greeting,
> > >
> > > FYI, we noticed a -43.3% regression of fio.read_iops due to commit:
> > >
> > >
> > > commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce 
> > > copy_mc_generic()")
> > > url: 
> > > https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046
> > >
> > >
> > > in testcase: fio-basic
> > > on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 
> > > 256G memory
> > > with following parameters:
> >
> > So this performance regression, if it isn't a spurious result, looks
> > concerning. Is this expected?
> 
> This is not expected and I think delays these patches until I'm back
> from leave in a few weeks. I know that we might lose some inlining
> effect due to replacing native memcpy, but I did not expect it would
> have an impact like this. In my testing I was seeing a performance
> improvement from replacing the careful / open-coded copy with rep;
> mov;, which increases the surprise of this result.

It would be nice to double check this on the kernel-test-robot side as 
well, to make sure it's not a false positive.

Thanks,

Ingo


Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards

2020-08-06 Thread Ingo Molnar


* Valentin Schneider  wrote:

> This does sound sensible; I can shuffle this around for v5.

Thanks!

> FWIW the reason I had this very patch before the instrumentation is that
> IMO it really wants to be propagated and could thus directly be tagged with
> SDF_SHARED_PARENT when the instrumentation hits. It's a minor thing, but
> having it after the instrumentation means that I'll first have to tag it
> without any hierarchical metaflag, and then tag it with SDF_SHARED_PARENT
> in the propagation fix.
> 
> If that sounds fine by you, I'll do just that.

Sounds good to me!

Ingo


Re: [PATCH 1/3] scripts/sorttable: Change section type of orc_lookup to SHT_PROGBITS

2020-08-06 Thread Ingo Molnar


* changhuaixin  wrote:

> Hi, Ingo
> 
> Another way to write SHT_PROGBITS is using elf_create_section to write 
> orc_lookup table headers, when orc_unwind_ip table and orc_unwind table are 
> written. Is this a better solution?
> 
> diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
> index 3f98dcfbc177..860d4dcec8e6 100644
> --- a/tools/objtool/orc_gen.c
> +++ b/tools/objtool/orc_gen.c
> @@ -183,6 +183,10 @@ int create_orc_sections(struct objtool_file *file)
> u_sec = elf_create_section(file->elf, ".orc_unwind",
>sizeof(struct orc_entry), idx);
> 
> +   /* make flags of section orc_lookup right */
> +   if (!elf_create_section(file->elf, ".orc_lookup", sizeof(int), 0))
> +   return -1;
> +
> /* populate sections */
> idx = 0;
> for_each_sec(file, sec) {

Looks much nicer IMO.

Mind turning this into a proper patch that does it plus reverts the 
hack?

Thanks,

Ingo


Re: [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask

2020-08-06 Thread Ingo Molnar


* Valentin Schneider  wrote:

> I don't think it is going to change much in practice, but we were missing
> those:
> 
> o SD_BALANCE_WAKE: Used just like the other SD_BALANCE_* flags, so also
>   needs > 1 group.
> o SD_ASYM_PACKING: Hinges on load balancing (periodic / wakeup), thus needs
>   > 1 group to happen
> o SD_OVERLAP: Describes domains with overlapping groups; can't have
>   overlaps with a single group.
> 
> SD_PREFER_SIBLING is as always the odd one out: we currently consider it
> in sd_parent_degenerate() but not in sd_degenerate(). It too hinges on load
> balancing, and thus won't have any effect when set on a domain with a
> single group. Add it to the sd_degenerate() groups mask.

Would be nice to add these one by one, just in case there's a 
performance or boot regression with any of them.

Thanks,

Ingo


Re: [PATCH v4 00/14] liblockdep fixes for 5.9-rc1

2020-08-06 Thread Ingo Molnar


* Sasha Levin  wrote:

> Hi Linus,
> 
> Please consider applying these patches for liblockdep, or alternatively
> pull from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/sashal/linux.git 
> tags/liblockdep-fixes-040820
> 
> The patches fix up compilation and functionality of liblockdep on 5.8,
> they were tested using liblockdep's internal testsuite.
> 
> I was unable to get the x86 folks to pull these fixes for the past few
> months:

So the primary reason I didn't pull is that liblockdep was permanently 
build-broken from February 2019 to around February 2020, despite me 
pinging you multiple times about it.

>  - https://lkml.org/lkml/2020/2/17/1089

This pull request still said that if fixes "most of" liblockdep, not 
"all of", which is the benchmark really after such a long series of 
breakage.

>  - https://lkml.org/lkml/2020/4/18/817

This still said "most of".

>  - https://lkml.org/lkml/2020/6/22/1262

Same 'most of' verbiage.

> Which is why this pull request ends up going straight to you.

So at this point I think we need to ask whether it's worth it: are 
there any actual users of liblockdep, besides the testcases in 
liblockdep itself? I see there's a 'liblockdep-dev' package for 
Debian, but not propagated to Ubuntu or other popular variants AFAICS.

Also, could you please specify whether all bugs are fixed or just 
'most'?

> Sasha Levin (14):
>   tools headers: Add kprobes.h header
>   tools headers: Add rcupdate.h header
>   tools/kernel.h: extend with dummy RCU functions
>   tools bitmap: add bitmap_andnot definition
>   tools/lib/lockdep: add definition required for IRQ flag tracing
>   tools bitmap: add bitmap_clear definition
>   tools/lib/lockdep: Hook up vsprintf, find_bit, hweight libraries
>   tools/lib/lockdep: Enable building with CONFIG_TRACE_IRQFLAGS
>   tools/lib/lockdep: New stacktrace API
>   tools/lib/lockdep: call lockdep_init_task on init
>   tools/lib/lockdep: switch to using lockdep_init_map_waits
>   tools/kernel.h: hide noinstr
>   tools/lib/lockdep: explicitly declare lockdep_init_task()
>   tools/kernel.h: hide task_struct.hardirq_chain_key

Style nits, please use consistent titles for patches:

 - First word should be capitalized consistently, instead of mismash 
   of lower case mixed with upper case.

 - First word should preferably be a verb, i.e. "Add new stacktrace 
   API stubs", not "New stacktrace API"

Also, please always check linux-next whether there's some new upstream 
changes that liblockdep needs to adapt to. Right now there's a new 
build breakage even with all your fixes applied:

  thule:~/tip/tools/lib/lockdep> make
CC   common.o
  In file included from ../../include/linux/lockdep.h:24,
   from common.c:5:
   ../../include/linux/../../../include/linux/lockdep.h:13:10: fatal error: 
linux/lockdep_types.h: No such file or directory
 13 | #include 
|  ^~~

At which point we need to step back and analyze the development model: 
this comparatively high rate of breakage derives from the unorthodox 
direct coupling of a kernel subsystem to a user-space library.

The solution for that would be to use the method how perf syncs to 
kernel space headers, by maintaining a 100% copy in tools/include/ and 
having automatic mechanism that warns about out of sync headers but 
doesn't break functionality.

See tools/perf/check-headers.sh for details.

I believe this same half-automated sync-on-upstream-changes model 
could be used for liblockdep as well, i.e. lets copy kernel/lockdep.c 
and lockdep*.h over to tools/lib/lockdep/, and reuse the perf header 
syncing method to keep it synchronized from that point on.

That would result in a far more maintainable liblockdep end result 
IMO?

Thanks,

Ingo


Re: [GIT PULL] sched/fifo changes for v5.9

2020-08-04 Thread Ingo Molnar


* Ingo Molnar  wrote:

> When merging to the latest upstream tree there's a conflict in 
> drivers/spi/spi.c,
> which can be resolved via:
> 
>   sched_set_fifo(ctlr->kworker_task);

Correction, the suggested resolution would be:

sched_set_fifo(ctlr->kworker->task);

Thanks,

Ingo


[GIT PULL] sched/fifo changes for v5.9

2020-08-04 Thread Ingo Molnar
Linus,

Please pull the latest sched/fifo git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
sched-fifo-2020-08-04

   # HEAD: 4fd5750af02ab7bba7c58a073060cc1da8a69173 sched,tracing: Convert to 
sched_set_fifo()

This tree adds the sched_set_fifo*() encapsulation APIs to remove
static priority level knowledge from non-scheduler code.

The three APIs for non-scheduler code to set SCHED_FIFO are:

 - sched_set_fifo()
 - sched_set_fifo_low()
 - sched_set_normal()

These are two FIFO priority levels: default (high), and a 'low' priority level,
plus sched_set_normal() to set the policy back to non-SCHED_FIFO.

Since the changes affect a lot of non-scheduler code, we kept this in a separate
tree.

When merging to the latest upstream tree there's a conflict in 
drivers/spi/spi.c,
which can be resolved via:

sched_set_fifo(ctlr->kworker_task);

Signed-off-by: Ingo Molnar 
 Thanks,

Ingo

-->
Peter Zijlstra (24):
  sched: Provide sched_set_fifo()
  sched,bL_switcher: Convert to sched_set_fifo*()
  sched,crypto: Convert to sched_set_fifo*()
  sched,acpi_pad: Convert to sched_set_fifo*()
  sched,drbd: Convert to sched_set_fifo*()
  sched,psci: Convert to sched_set_fifo*()
  sched,msm: Convert to sched_set_fifo*()
  sched,drm/scheduler: Convert to sched_set_fifo*()
  sched,ivtv: Convert to sched_set_fifo*()
  sched,mmc: Convert to sched_set_fifo*()
  sched,spi: Convert to sched_set_fifo*()
  sched,powercap: Convert to sched_set_fifo*()
  sched,ion: Convert to sched_set_normal()
  sched,powerclamp: Convert to sched_set_fifo()
  sched,serial: Convert to sched_set_fifo()
  sched,watchdog: Convert to sched_set_fifo()
  sched,irq: Convert to sched_set_fifo()
  sched,locktorture: Convert to sched_set_fifo()
  sched,rcuperf: Convert to sched_set_fifo_low()
  sched,rcutorture: Convert to sched_set_fifo_low()
  sched,psi: Convert to sched_set_fifo_low()
  sched: Remove sched_setscheduler*() EXPORTs
  sched: Remove sched_set_*() return value
  sched,tracing: Convert to sched_set_fifo()


 arch/arm/common/bL_switcher.c|  3 +-
 crypto/crypto_engine.c   |  3 +-
 drivers/acpi/acpi_pad.c  |  3 +-
 drivers/block/drbd/drbd_receiver.c   |  5 +---
 drivers/firmware/psci/psci_checker.c | 10 +--
 drivers/gpu/drm/msm/msm_drv.c| 13 +
 drivers/gpu/drm/scheduler/sched_main.c   |  3 +-
 drivers/media/pci/ivtv/ivtv-driver.c |  4 +--
 drivers/mmc/core/sdio_irq.c  |  3 +-
 drivers/platform/chrome/cros_ec_spi.c| 11 ++-
 drivers/powercap/idle_inject.c   |  4 +--
 drivers/spi/spi.c|  4 +--
 drivers/staging/android/ion/ion_heap.c   |  4 +--
 drivers/thermal/intel/intel_powerclamp.c |  5 +---
 drivers/tty/serial/sc16is7xx.c   |  3 +-
 drivers/watchdog/watchdog_dev.c  |  3 +-
 include/linux/sched.h|  3 ++
 kernel/irq/manage.c  |  6 +---
 kernel/locking/locktorture.c | 10 ++-
 kernel/rcu/rcuperf.c |  8 ++---
 kernel/rcu/rcutorture.c  |  7 +
 kernel/sched/core.c  | 50 ++--
 kernel/sched/psi.c   |  5 +---
 kernel/trace/ring_buffer_benchmark.c | 48 +++---
 24 files changed, 98 insertions(+), 120 deletions(-)


[GIT PULL] RAS changes for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest ras/core git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git ras-core-2020-08-03

   # HEAD: bb2de0adca217a114ce023489426e24152e4bfcf x86/mce, EDAC/mce_amd: 
Print PPIN in machine check records

Boris is on vacation and he asked us to send you the pending RAS bits:

 - Print the PPIN field on CPUs that fill them out
 - Fix an MCE injection bug
 - Simplify a kzalloc in dev_mcelog_init_device()

 Thanks,

Ingo

-->
Gustavo A. R. Silva (1):
  x86/mce/dev-mcelog: Use struct_size() helper in kzalloc()

Smita Koralahalli (1):
  x86/mce, EDAC/mce_amd: Print PPIN in machine check records

Zhenzhong Duan (1):
  x86/mce/inject: Fix a wrong assignment of i_mce.status


 arch/x86/kernel/cpu/mce/core.c   | 2 ++
 arch/x86/kernel/cpu/mce/dev-mcelog.c | 2 +-
 arch/x86/kernel/cpu/mce/inject.c | 2 +-
 drivers/edac/mce_amd.c   | 3 +++
 4 files changed, 7 insertions(+), 2 deletions(-)


[GIT PULL] x86/timers change for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/timers git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-timers-2020-08-03

   # HEAD: 898ec52d2ba05915aaedcdb21bff2e944c883cb8 x86/xen/time: Set the 
X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()

A single commit which sets the X86_FEATURE_TSC_KNOWN_FREQ flag for Xen guests,
to avoid recalibration.

 Thanks,

Ingo

-->
Hayato Ohhashi (1):
  x86/xen/time: Set the X86_FEATURE_TSC_KNOWN_FREQ flag in xen_tsc_khz()


 arch/x86/xen/time.c | 1 +
 1 file changed, 1 insertion(+)



[GIT PULL] x86/platform changes for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/platform git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-platform-2020-08-03

   # HEAD: 3bcf25a40b018e632d70bb866d75746748953fbc x86/efi: Remove unused 
EFI_UV1_MEMMAP code

The biggest change is the removal of SGI UV1 support, which allowed the
removal of the legacy EFI old_mmap code as well.

This removes quite a bunch of old code & quirks.

Signed-off-by: Ingo Molnar 
 Thanks,

Ingo

-->
steve.w...@hpe.com (13):
  x86/platform/uv: Remove support for UV1 platform from uv_time
  x86/platform/uv: Remove support for UV1 platform from uv_tlb
  x86/platform/uv: Remove support for UV1 platform from x2apic_uv_x
  x86/platform/uv: Remove support for UV1 platform from uv_mmrs
  x86/platform/uv: Remove support for UV1 platform from uv_bau
  x86/platform/uv: Remove support for uv1 platform from uv_hub
  x86/platform/uv: Remove support for UV1 platform from uv
  x86/platform/uv: Remove vestigial mention of UV1 platform from bios header
  x86/platform/uv: Remove efi=old_map command line option
  x86/efi: Delete SGI UV1 detection.
  x86/efi: Remove references to no-longer-used efi_have_uv1_memmap()
  x86/platform/uv: Remove uv bios and efi code related to EFI_UV1_MEMMAP
  x86/efi: Remove unused EFI_UV1_MEMMAP code


 arch/x86/include/asm/efi.h |  20 +-
 arch/x86/include/asm/uv/bios.h |   2 +-
 arch/x86/include/asm/uv/uv.h   |   2 +-
 arch/x86/include/asm/uv/uv_bau.h   | 118 +-
 arch/x86/include/asm/uv/uv_hub.h   |  34 +-
 arch/x86/include/asm/uv/uv_mmrs.h  | 712 -
 arch/x86/kernel/apic/x2apic_uv_x.c | 122 ++-
 arch/x86/kernel/kexec-bzimage64.c  |   9 -
 arch/x86/platform/efi/efi.c|  16 +-
 arch/x86/platform/efi/efi_64.c |  38 +-
 arch/x86/platform/efi/quirks.c |  31 --
 arch/x86/platform/uv/bios_uv.c | 173 +
 arch/x86/platform/uv/tlb_uv.c  | 243 ++---
 arch/x86/platform/uv/uv_time.c |  16 +-
 14 files changed, 86 insertions(+), 1450 deletions(-)


[GIT PULL] x86/mm changes for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/mm git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-08-03

   # HEAD: 2b32ab031e82a109e2c5b0d30ce563db0fe286b4 x86/mm/64: Make 
sync_global_pgds() static

The biggest change is to not sync the vmalloc and ioremap ranges for x86-64 
anymore.

 Thanks,

Ingo

-->
Joerg Roedel (3):
  x86/mm: Pre-allocate P4D/PUD pages for vmalloc area
  x86/mm/64: Do not sync vmalloc/ioremap mappings
  x86/mm/64: Make sync_global_pgds() static


 arch/x86/include/asm/pgtable_64.h   |  2 --
 arch/x86/include/asm/pgtable_64_types.h |  2 --
 arch/x86/mm/init_64.c   | 59 +
 3 files changed, 53 insertions(+), 10 deletions(-)


[GIT PULL] x86/misc changes for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/misc git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-misc-2020-08-03

   # HEAD: a7e1f67ed29f0c339e2aa7483d13b085127566ab x86/msr: Filter MSR writes

Filter MSR writes from user-space by default, and print a syslog entry if
they happen outside the allowed set of MSRs, which is a single one for now,
MSR_IA32_ENERGY_PERF_BIAS.

The plan is to eventually disable MSR writes by default (they can still be
enabled via allow_writes=on).

Signed-off-by: Ingo Molnar 
 Thanks,

Ingo

-->
Borislav Petkov (1):
  x86/msr: Filter MSR writes


 arch/x86/kernel/msr.c | 69 +++
 1 file changed, 69 insertions(+)



[GIT PULL] x86/microcode change for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/microcode git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 
x86-microcode-2020-08-03

   # HEAD: c8a59a4d8e3c9e609fa915e39c3628c6dd08aeea x86/microcode: Do not 
select FW_LOADER

A single commit that removes the microcode loader's FW_LOADER coupling.

 Thanks,

Ingo

-->
Herbert Xu (1):
  x86/microcode: Do not select FW_LOADER


 arch/x86/Kconfig | 3 ---
 arch/x86/kernel/cpu/microcode/core.c | 2 --
 2 files changed, 5 deletions(-)


[GIT PULL] x86/fpu change for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/fpu git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-fpu-2020-08-03

   # HEAD: 4185b3b92792eaec5869266e594338343421ffb0 selftests/fpu: Add an FPU 
selftest

A single commit that adds the /sys/kernel/debug/selftest_helpers/test_fpu FPU 
self-test.

 Thanks,

Ingo

-->
Petteri Aimonen (1):
  selftests/fpu: Add an FPU selftest


 lib/Kconfig.debug   | 11 
 lib/Makefile| 24 
 lib/test_fpu.c  | 89 +
 tools/testing/selftests/Makefile|  1 +
 tools/testing/selftests/fpu/.gitignore  |  2 +
 tools/testing/selftests/fpu/Makefile|  9 +++
 tools/testing/selftests/fpu/run_test_fpu.sh | 46 +++
 tools/testing/selftests/fpu/test_fpu.c  | 61 
 8 files changed, 243 insertions(+)
 create mode 100644 lib/test_fpu.c
 create mode 100644 tools/testing/selftests/fpu/.gitignore
 create mode 100644 tools/testing/selftests/fpu/Makefile
 create mode 100755 tools/testing/selftests/fpu/run_test_fpu.sh
 create mode 100644 tools/testing/selftests/fpu/test_fpu.c


[GIT PULL] x86/cpu changes for v5.9

2020-08-03 Thread Ingo Molnar
Linus,

Please pull the latest x86/cpu git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-cpu-2020-08-03

   # HEAD: f69ca629d89d65737537e05308ac531f7bb07d5c x86/cpu: Refactor 
sync_core() for readability

Misc changes:

 - Prepare for Intel's new SERIALIZE instruction
 - Enable split-lock debugging on more CPUs
 - Add more Intel CPU models
 - Optimize stack canary initialization a bit
 - Simplify the Spectre logic a bit

  out-of-topic modifications in x86-cpu-2020-08-03:
  ---
  drivers/misc/sgi-gru/grufault.c# 9998a9832c40: x86/cpu: Relocate 
sync_core(
  drivers/misc/sgi-gru/gruhandles.c  # 9998a9832c40: x86/cpu: Relocate 
sync_core(
  drivers/misc/sgi-gru/grukservices.c# 9998a9832c40: x86/cpu: Relocate 
sync_core(

 Thanks,

Ingo

-->
Borislav Petkov (1):
  x86/speculation: Merge one test in spectre_v2_user_select_mitigation()

Brian Gerst (1):
  x86/stackprotector: Pre-initialize canary for secondary CPUs

Fenghua Yu (1):
  x86/split_lock: Enable the split lock feature on Sapphire Rapids and 
Alder Lake CPUs

Ricardo Neri (3):
  x86/cpufeatures: Add enumeration for SERIALIZE instruction
  x86/cpu: Relocate sync_core() to sync_core.h
  x86/cpu: Refactor sync_core() for readability

Tony Luck (1):
  x86/cpu: Add Lakefield, Alder Lake and Rocket Lake models to the to Intel 
CPU family


 arch/x86/include/asm/cpufeatures.h|  1 +
 arch/x86/include/asm/intel-family.h   |  7 
 arch/x86/include/asm/processor.h  | 64 ---
 arch/x86/include/asm/special_insns.h  |  1 -
 arch/x86/include/asm/stackprotector.h | 12 ++
 arch/x86/include/asm/sync_core.h  | 72 +++
 arch/x86/kernel/alternative.c |  1 +
 arch/x86/kernel/cpu/bugs.c| 13 ++-
 arch/x86/kernel/cpu/intel.c   |  2 +
 arch/x86/kernel/cpu/mce/core.c|  1 +
 arch/x86/kernel/smpboot.c | 14 +--
 arch/x86/xen/smp_pv.c |  2 -
 drivers/misc/sgi-gru/grufault.c   |  1 +
 drivers/misc/sgi-gru/gruhandles.c |  1 +
 drivers/misc/sgi-gru/grukservices.c   |  1 +
 15 files changed, 105 insertions(+), 88 deletions(-)


<    1   2   3   4   5   6   7   8   9   10   >