Re: [PATCH v4 05/15] mm: introduce execmem_alloc() and execmem_free()

2024-04-12 Thread Ingo Molnar


* Mike Rapoport  wrote:

> +/**
> + * enum execmem_type - types of executable memory ranges
> + *
> + * There are several subsystems that allocate executable memory.
> + * Architectures define different restrictions on placement,
> + * permissions, alignment and other parameters for memory that can be used
> + * by these subsystems.
> + * Types in this enum identify subsystems that allocate executable memory
> + * and let architectures define parameters for ranges suitable for
> + * allocations by each subsystem.
> + *
> + * @EXECMEM_DEFAULT: default parameters that would be used for types that
> + * are not explcitly defined.
> + * @EXECMEM_MODULE_TEXT: parameters for module text sections
> + * @EXECMEM_KPROBES: parameters for kprobes
> + * @EXECMEM_FTRACE: parameters for ftrace
> + * @EXECMEM_BPF: parameters for BPF
> + * @EXECMEM_TYPE_MAX:
> + */
> +enum execmem_type {
> + EXECMEM_DEFAULT,
> + EXECMEM_MODULE_TEXT = EXECMEM_DEFAULT,
> + EXECMEM_KPROBES,
> + EXECMEM_FTRACE,
> + EXECMEM_BPF,
> + EXECMEM_TYPE_MAX,
> +};

s/explcitly
 /explicitly

Thanks,

Ingo


Re: [RFC PATCH 5/7] x86/module: perpare module loading for ROX allocations of text

2024-04-12 Thread Ingo Molnar


* Mike Rapoport  wrote:

>   for (s = start; s < end; s++) {
>   void *addr = (void *)s + *s;
> + void *wr_addr = addr + module_writable_offset(mod, addr);

So instead of repeating this pattern in a dozen of places, why not use a 
simpler method:

void *wr_addr = module_writable_address(mod, addr);

or so, since we have to pass 'addr' to the module code anyway.

The text patching code is pretty complex already.

Thanks,

Ingo


Re: [PATCH v11 00/11] Support page table check PowerPC

2024-03-28 Thread Ingo Molnar


* Rohan McLure  wrote:

> Rohan McLure (11):
>   Revert "mm/page_table_check: remove unused parameter in 
> [__]page_table_check_pud_set"
>   Revert "mm/page_table_check: remove unused parameter in 
> [__]page_table_check_pmd_set"
>   Revert "mm/page_table_check: remove unused parameter in 
> [__]page_table_check_pud_clear"
>   Revert "mm/page_table_check: remove unused parameter in 
> [__]page_table_check_pmd_clear"
>   Revert "mm/page_table_check: remove unused parameter in 
> [__]page_table_check_pte_clear"

Just a process request: please give these commits proper titles, they 
are not really 'reverts' in the classical sense, and this title hides 
what is being done in the commit. The typical use of reverts is to 
revert a bad change because it broke something. Here the goal is to 
reintroduce functionality.

So please name these 5 patches accordingly, to shed light on what is 
being reintroduced. You can mention it at the end of the changelog that 
it's a functional revert of commit XYZ, but that's not the primary 
purpose of the commit.

Thanks,

Ingo


Re: [PATCH 3/8] arch/x86: Remove sentinel elem from ctl_table arrays

2023-09-06 Thread Ingo Molnar


* Dave Hansen  wrote:

> On 9/6/23 03:03, Joel Granados via B4 Relay wrote:
> > This commit comes at the tail end of a greater effort to remove the
> > empty elements at the end of the ctl_table arrays (sentinels) which
> > will reduce the overall build time size of the kernel and run time
> > memory bloat by ~64 bytes per sentinel (further information Link :
> > https://lore.kernel.org/all/zo5yx5jfoggi%2f...@bombadil.infradead.org/)
> > 
> > Remove sentinel element from sld_sysctl and itmt_kern_table.
> 
> There's a *LOT* of content to read for a reviewer to figure out what's
> going on here between all the links.  I would have appreciated one more
> sentence here, maybe:
> 
>   This is now safe because the sysctl registration code
>   (register_sysctl()) implicitly uses ARRAY_SIZE() in addition
>   to checking for a sentinel.
> 
> That needs to be more prominent _somewhere_.  Maybe here, or maybe in
> the cover letter, but _somewhere_.
> 
> That said, feel free to add this to the two x86 patches:
> 
> Acked-by: Dave Hansen  # for x86

Absolutely needs to be in the title as well, something like:

   arch/x86: Remove now superfluous sentinel elem from ctl_table arrays

With that propagated into the whole series:

   Reviewed-by: Ingo Molnar 

Thanks,

Ingo


Re: linux-next: boot failure after merge of the tip tree

2023-05-26 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Wed, May 24, 2023 at 03:44:59PM +1000, Stephen Rothwell wrote:
> > Hi all,
> > 
> > After merging the tip tree, today's linux-next build (powerpc
> > pseries_le_defconfig) failed to boot like this:
> > 
> >  Built 1 zonelists, mobility grouping on.  Total pages: 32736
> >  Policy zone: Normal
> >  mem auto-init: stack:all(zero), heap alloc:off, heap free:off
> >  Memory: 2027392K/2097152K available (17984K kernel code, 3328K rwdata, 
> > 14784K rodata, 6080K init, 1671K bss, 69760K reserved, 0K cma-reserved)
> > 
> > *crickets*
> > 
> > Bisected to commit
> > 
> >   f4ab23558310 ("slub: Replace cmpxchg_double()")
> > 
> > I have reverted that commit (and the following one) for today.
> 
> Sorry about that; turns out I forgot to test the case where cmpxchg128
> wasn't available.
> 
> Please see:
> 
>   
> https://lkml.kernel.org/r/20230524093246.gp83...@hirez.programming.kicks-ass.net
> 
> As stated there, I'm going to zap tip/locking/core for a few days and
> let this fixed version run through the robots again before re-instating
> it.

Note to -next, this should now be resolved in the tip:auto-latest branch.

Thanks,

Ingo


Re: [PATCH 08/41] mm: introduce CONFIG_PER_VMA_LOCK

2023-01-11 Thread Ingo Molnar


* Michal Hocko  wrote:

> On Tue 10-01-23 16:44:42, Suren Baghdasaryan wrote:
> > On Tue, Jan 10, 2023 at 4:39 PM Davidlohr Bueso  wrote:
> > >
> > > On Mon, 09 Jan 2023, Suren Baghdasaryan wrote:
> > >
> > > >This configuration variable will be used to build the support for VMA
> > > >locking during page fault handling.
> > > >
> > > >This is enabled by default on supported architectures with SMP and MMU
> > > >set.
> > > >
> > > >The architecture support is needed since the page fault handler is called
> > > >from the architecture's page faulting code which needs modifications to
> > > >handle faults under VMA lock.
> > >
> > > I don't think that per-vma locking should be something that is 
> > > user-configurable.
> > > It should just be depdendant on the arch. So maybe just remove 
> > > CONFIG_PER_VMA_LOCK?
> > 
> > Thanks for the suggestion! I would be happy to make that change if
> > there are no objections. I think the only pushback might have been the
> > vma size increase but with the latest optimization in the last patch
> > maybe that's less of an issue?
> 
> Has vma size ever been a real problem? Sure there might be a lot of those 
> but your patch increases it by rwsem (without the last patch) which is 
> something like 40B on top of 136B vma so we are talking about 400B in 
> total which even with wild mapcount limits shouldn't really be 
> prohibitive. With a default map count limit we are talking about 2M 
> increase at most (per address space).
> 
> Or are you aware of any specific usecases where vma size is a real 
> problem?

40 bytes for the rwsem, plus the patch also adds a 32-bit sequence counter:

  + int vm_lock_seq;
  + struct rw_semaphore lock;

So it's +44 bytes.

Thanks,

Ingo


Re: [PATCH] objtool: continue if find_insn() fails in decode_instructions()

2023-01-09 Thread Ingo Molnar


* Sathvika Vasireddy  wrote:

> Hi Ingo, Happy New Year!

Happy New Year to you too! :-)

> On 07/01/23 15:51, Ingo Molnar wrote:
> > * Sathvika Vasireddy  wrote:
> > 
> > > Currently, decode_instructions() is failing if it is not able to find
> > > instruction, and this is happening since commit dbcdbdfdf137b4
> > > ("objtool: Rework instruction -> symbol mapping") because it is
> > > expecting instruction for STT_NOTYPE symbols.
> > > 
> > > Due to this, the following objtool warnings are seen:
> > >   [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
> > > optprobe_template_end(): can't find starting instruction
> > >   [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: 
> > > kvm_template_end(): can't find starting instruction
> > >   [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): 
> > > can't find starting instruction
> > > 
> > > The warnings are thrown because find_insn() is failing for symbols that
> > > are at the end of the file, or at the end of the section. Given how
> > > STT_NOTYPE symbols are currently handled in decode_instructions(),
> > > continue if the instruction is not found, instead of throwing warning
> > > and returning.
> > > 
> > > Signed-off-by: Naveen N. Rao 
> > > Signed-off-by: Sathvika Vasireddy 
> > The SOB chain doesn't look valid: is Naveen N. Rao, the first SOB line, the
> > author of the patch? If yes then a matching From: line is needed.
> > 
> > Or if two people developed the patch, then Co-developed-by should be used:
> > 
> >  Co-developed-by: First Co-Author 
> >  Signed-off-by: First Co-Author 
> >  Co-developed-by: Second Co-Author 
> >  Signed-off-by: Second Co-Author 
> > 
> > [ In this SOB sequence "Second Co-Author" is the one who submits the patch. 
> > ]
> > 
> > [ Please only use Co-developed-by if actual lines of code were written by
> >the co-author that created copyrightable material - it's not a courtesy
> >tag. Reviewed-by/Acked-by/Tested-by can be used to credit non-code
> >contributions. ]
> Thank you for the clarification, and for bringing these points to my
> attention. I'll keep these things in mind. In this case, since both Naveen
> N. Rao and I developed the patch, the below tags
> are applicable.
> 
>     Co-developed-by: First Co-Author 
>     Signed-off-by: First Co-Author 
>     Co-developed-by: Second Co-Author 
>     Signed-off-by: Second Co-Author 

... while filling in your real names & email addresses I suppose. ;-)

> 
> However, I would be dropping this particular patch, since I think Nick's
> patch [1] is better to fix the objtool issue.
> 
> [1] - 
> https://lore.kernel.org/linuxppc-dev/20221220101323.3119939-1-npig...@gmail.com/

Ok, I'll pick up Nick's fix, with these tags added for the PowerPC 
regression aspect and your review:

  Reported-by: Naveen N. Rao 
  Reported-by: Sathvika Vasireddy 
  Acked-by: Sathvika Vasireddy 

To document & credit the efforts of your patch.

Thanks,

Ingo


Re: [PATCH v3 3/8] sched, smp: Trace IPIs sent via send_call_function_single_ipi()

2023-01-07 Thread Ingo Molnar


* Valentin Schneider  wrote:

> send_call_function_single_ipi() is the thing that sends IPIs at the bottom
> of smp_call_function*() via either generic_exec_single() or
> smp_call_function_many_cond(). Give it an IPI-related tracepoint.
> 
> Note that this ends up tracing any IPI sent via __smp_call_single_queue(),
> which covers __ttwu_queue_wakelist() and irq_work_queue_on() "for free".
> 
> Signed-off-by: Valentin Schneider 
> Reviewed-by: Steven Rostedt (Google) 

Acked-by: Ingo Molnar 

Patch series logistics:

 - No objections from the scheduler side, this feature looks pretty useful.

 - Certain patches are incomplete, others are noted as being merged 
   separately, so I presume you'll send an updated/completed series 
   eventually?

 - We can merge this via the scheduler tree I suspect, as most callbacks 
   affected relate to tip:sched/core and tmp:smp/core - but if you have 
   some other preferred tree that's fine too.

Thanks,

Ingo


Re: [PATCH] objtool: continue if find_insn() fails in decode_instructions()

2023-01-07 Thread Ingo Molnar


* Sathvika Vasireddy  wrote:

> Currently, decode_instructions() is failing if it is not able to find
> instruction, and this is happening since commit dbcdbdfdf137b4
> ("objtool: Rework instruction -> symbol mapping") because it is
> expecting instruction for STT_NOTYPE symbols.
> 
> Due to this, the following objtool warnings are seen:
>  [1] arch/powerpc/kernel/optprobes_head.o: warning: objtool: 
> optprobe_template_end(): can't find starting instruction
>  [2] arch/powerpc/kernel/kvm_emul.o: warning: objtool: kvm_template_end(): 
> can't find starting instruction
>  [3] arch/powerpc/kernel/head_64.o: warning: objtool: end_first_256B(): can't 
> find starting instruction
> 
> The warnings are thrown because find_insn() is failing for symbols that
> are at the end of the file, or at the end of the section. Given how
> STT_NOTYPE symbols are currently handled in decode_instructions(),
> continue if the instruction is not found, instead of throwing warning
> and returning.
> 
> Signed-off-by: Naveen N. Rao 
> Signed-off-by: Sathvika Vasireddy 

The SOB chain doesn't look valid: is Naveen N. Rao, the first SOB line, the 
author of the patch? If yes then a matching From: line is needed.

Or if two people developed the patch, then Co-developed-by should be used:

Co-developed-by: First Co-Author 
Signed-off-by: First Co-Author 
Co-developed-by: Second Co-Author 
Signed-off-by: Second Co-Author 

[ In this SOB sequence "Second Co-Author" is the one who submits the patch. ]

[ Please only use Co-developed-by if actual lines of code were written by 
  the co-author that created copyrightable material - it's not a courtesy 
  tag. Reviewed-by/Acked-by/Tested-by can be used to credit non-code 
  contributions. ]

Thanks,

Ingo


Re: [PATCH v3] mm: Avoid unnecessary page fault retires on shared memory types

2022-05-27 Thread Ingo Molnar
et>, Chris Zankel , Michal Simek , Thomas 
Bogendoerfer , linux-par...@vger.kernel.org, Max 
Filippov , linux-ker...@vger.kernel.org, Dinh Nguyen 
, Palmer Dabbelt , Sven Schnelle 
, Guo Ren , Borislav Petkov 
, Johannes Berg , 
linuxppc-dev@lists.ozlabs.org, "David S . Miller" 
Errors-To: linuxppc-dev-bounces+archive=mail-archive@lists.ozlabs.org
Sender: "Linuxppc-dev" 



* Peter Xu  wrote:

> This patch provides a ~12% perf boost on my aarch64 test VM with a simple
> program sequentially dirtying 400MB shmem file being mmap()ed and these are
> the time it needs:
>
>   Before: 650.980 ms (+-1.94%)
>   After:  569.396 ms (+-1.38%)

Nice!

>  arch/x86/mm/fault.c   |  4 

Reviewed-by: Ingo Molnar 

Minor comment typo:

> + /*
> +  * We should do the same as VM_FAULT_RETRY, but let's not
> +  * return -EBUSY since that's not reflecting the reality on
> +  * what has happened - we've just fully completed a page
> +  * fault, with the mmap lock released.  Use -EAGAIN to show
> +  * that we want to take the mmap lock _again_.
> +  */

s/reflecting the reality on what has happened
 /reflecting the reality of what has happened

>   ret = handle_mm_fault(vma, address, fault_flags, NULL);
> +
> + if (ret & VM_FAULT_COMPLETED) {
> + /*
> +  * NOTE: it's a pity that we need to retake the lock here
> +  * to pair with the unlock() in the callers. Ideally we
> +  * could tell the callers so they do not need to unlock.
> +  */
> + mmap_read_lock(mm);
> + *unlocked = true;
> + return 0;

Indeed that's a pity - I guess more performance could be gained here, 
especially in highly parallel threaded workloads?

Thanks,

Ingo


Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3

2020-09-04 Thread Ingo Molnar


* Christoph Hellwig  wrote:

> Hi all,
> 
> this series removes the last set_fs() used to force a kernel address
> space for the uaccess code in the kernel read/write/splice code, and then
> stops implementing the address space overrides entirely for x86 and
> powerpc.

Cool! For the x86 bits:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v2 17/17] memblock: use separate iterators for memory and reserved regions

2020-08-02 Thread Ingo Molnar


* Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> for_each_memblock() is used to iterate over memblock.memory in
> a few places that use data from memblock_region rather than the memory
> ranges.
> 
> Introduce separate for_each_mem_region() and for_each_reserved_mem_region()
> to improve encapsulation of memblock internals from its users.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  .clang-format  |  3 ++-
>  arch/arm64/kernel/setup.c  |  2 +-
>  arch/arm64/mm/numa.c   |  2 +-
>  arch/mips/netlogic/xlp/setup.c |  2 +-
>  arch/x86/mm/numa.c |  2 +-
>  include/linux/memblock.h   | 19 ---
>  mm/memblock.c  |  4 ++--
>  mm/page_alloc.c|  8 
>  8 files changed, 28 insertions(+), 14 deletions(-)

The x86 part:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v2 14/17] x86/setup: simplify reserve_crashkernel()

2020-08-02 Thread Ingo Molnar


* Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> * Replace magic numbers with defines
> * Replace memblock_find_in_range() + memblock_reserve() with
>   memblock_phys_alloc_range()
> * Stop checking for low memory size in reserve_crashkernel_low(). The
>   allocation from limited range will anyway fail if there is no enough
>   memory, so there is no need for extra traversal of memblock.memory
> 
> Signed-off-by: Mike Rapoport 

Assuming that this got or will get tested with a crash kernel:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v2 13/17] x86/setup: simplify initrd relocation and reservation

2020-08-02 Thread Ingo Molnar


* Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> Currently, initrd image is reserved very early during setup and then it
> might be relocated and re-reserved after the initial physical memory
> mapping is created. The "late" reservation of memblock verifies that mapped
> memory size exceeds the size of initrd, the checks whether the relocation
> required and, if yes, relocates inirtd to a new memory allocated from
> memblock and frees the old location.
> 
> The check for memory size is excessive as memblock allocation will anyway
> fail if there is not enough memory. Besides, there is no point to allocate
> memory from memblock using memblock_find_in_range() + memblock_reserve()
> when there exists memblock_phys_alloc_range() with required functionality.
> 
> Remove the redundant check and simplify memblock allocation.
> 
> Signed-off-by: Mike Rapoport 

Assuming there's no hidden dependency here breaking something:

  Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

2020-07-28 Thread Ingo Molnar


* Mike Rapoport  wrote:

> On Tue, Jul 28, 2020 at 12:44:40PM +0200, Ingo Molnar wrote:
> > 
> > * Mike Rapoport  wrote:
> > 
> > > From: Mike Rapoport 
> > > 
> > > numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> > > regions to set node ID in memblock.reserved and than traverses
> > > memblock.reserved to update reserved_nodemask to include node IDs that 
> > > were
> > > set in the first loop.
> > > 
> > > Remove redundant traversal over memblock.reserved and update
> > > reserved_nodemask while iterating over numa_meminfo.
> > > 
> > > Signed-off-by: Mike Rapoport 
> > > ---
> > >  arch/x86/mm/numa.c | 26 ++
> > >  1 file changed, 10 insertions(+), 16 deletions(-)
> > 
> > I suspect you'd like to carry this in the -mm tree?
> 
> Yes.
>  
> > Acked-by: Ingo Molnar 
> 
> Thanks!

Assuming it is correct and works. :-)

Thanks,

Ingo


Re: [PATCH 14/15] x86/numa: remove redundant iteration over memblock.reserved

2020-07-28 Thread Ingo Molnar


* Mike Rapoport  wrote:

> From: Mike Rapoport 
> 
> numa_clear_kernel_node_hotplug() function first traverses numa_meminfo
> regions to set node ID in memblock.reserved and than traverses
> memblock.reserved to update reserved_nodemask to include node IDs that were
> set in the first loop.
> 
> Remove redundant traversal over memblock.reserved and update
> reserved_nodemask while iterating over numa_meminfo.
> 
> Signed-off-by: Mike Rapoport 
> ---
>  arch/x86/mm/numa.c | 26 ++
>  1 file changed, 10 insertions(+), 16 deletions(-)

I suspect you'd like to carry this in the -mm tree?

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH] lockdep: Fix TRACE_IRQFLAGS vs NMIs

2020-07-27 Thread Ingo Molnar


* pet...@infradead.org  wrote:

> 
> Prior to commit 859d069ee1dd ("lockdep: Prepare for NMI IRQ state
> tracking") IRQ state tracking was disabled in NMIs due to nmi_enter()
> doing lockdep_off() -- with the obvious requirement that NMI entry
> call nmi_enter() before trace_hardirqs_off().
> 
> [ afaict, PowerPC and SH violate this order on their NMI entry ]
> 
> However, that commit explicitly changed lockdep_hardirqs_*() to ignore
> lockdep_off() and breaks every architecture that has irq-tracing in
> it's NMI entry that hasn't been fixed up (x86 being the only fixed one
> at this point).
> 
> The reason for this change is that by ignoring lockdep_off() we can:
> 
>   - get rid of 'current->lockdep_recursion' in lockdep_assert_irqs*()
> which was going to to give header-recursion issues with the
> seqlock rework.
> 
>   - allow these lockdep_assert_*() macros to function in NMI context.
> 
> Restore the previous state of things and allow an architecture to
> opt-in to the NMI IRQ tracking support, however instead of relying on
> lockdep_off(), rely on in_nmi(), both are part of nmi_enter() and so
> over-all entry ordering doesn't need to change.
> 
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  arch/x86/Kconfig.debug   |3 +++
>  kernel/locking/lockdep.c |8 +++-
>  lib/Kconfig.debug|6 ++
>  3 files changed, 16 insertions(+), 1 deletion(-)

Tree management side note: to apply this I've created a new 
tip:locking/nmi branch, which is based off the existing NMI vs. IRQ 
tracing commits included in locking/core:

ed00495333cc: ("locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs")
ba1f2b2eaa2a: ("x86/entry: Fix NMI vs IRQ state tracking")
859d069ee1dd: ("lockdep: Prepare for NMI IRQ state tracking")
248591f5d257: ("kcsan: Make KCSAN compatible with new IRQ state tracking")
e1bcad609f5a: ("Merge branch 'tip/x86/entry'")
b037b09b9058: ("x86/entry: Rename idtentry_enter/exit_cond_rcu() to 
idtentry_enter/exit()")
dcb7fd82c75e: ("Linux 5.8-rc4")

This locking/nmi branch can then be merged into irq/entry (there's a 
bunch of conflicts between them), without coupling all of v5.9's 
locking changes to Thomas's generic entry work.

Thanks,

Ingo


Re: [PATCH V9] mm/debug: Add tests validating architecture page table helpers

2019-11-11 Thread Ingo Molnar


* Anshuman Khandual  wrote:

> +config DEBUG_VM_PGTABLE
> + bool "Debug arch page table for semantics compliance"
> + depends on MMU
> + depends on DEBUG_VM
> + depends on ARCH_HAS_DEBUG_VM_PGTABLE
> + help
> +   This option provides a debug method which can be used to test
> +   architecture page table helper functions on various platforms in
> +   verifying if they comply with expected generic MM semantics. This
> +   will help architecture code in making sure that any changes or
> +   new additions of these helpers still conform to expected
> +   semantics of the generic MM.
> +
> +   If unsure, say N.

Since CONFIG_DEBUG_VM is generally disabled in distro kernel packages, 
why not make this 'default y' to maximize testing coverage? The code size 
increase should be minimal, and DEBUG_VM increases size anyway.

Other than that this looks good to me:

  Reviewed-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Ingo Molnar


* Kirill A. Shutemov  wrote:

> On Mon, Oct 07, 2019 at 03:06:17PM +0200, Ingo Molnar wrote:
> > 
> > * Anshuman Khandual  wrote:
> > 
> > > This adds a test module which will validate architecture page table 
> > > helpers
> > > and accessors regarding compliance with generic MM semantics expectations.
> > > This will help various architectures in validating changes to the existing
> > > page table helpers or addition of new ones.
> > > 
> > > Test page table and memory pages creating it's entries at various level 
> > > are
> > > all allocated from system memory with required alignments. If memory pages
> > > with required size and alignment could not be allocated, then all 
> > > depending
> > > individual tests are skipped.
> > 
> > > diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> > > b/arch/x86/include/asm/pgtable_64_types.h
> > > index 52e5f5f2240d..b882792a3999 100644
> > > --- a/arch/x86/include/asm/pgtable_64_types.h
> > > +++ b/arch/x86/include/asm/pgtable_64_types.h
> > > @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
> > >  #define pgtable_l5_enabled() 0
> > >  #endif /* CONFIG_X86_5LEVEL */
> > >  
> > > +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
> > > +
> > >  extern unsigned int pgdir_shift;
> > >  extern unsigned int ptrs_per_p4d;
> > 
> > Any deep reason this has to be a macro instead of proper C?
> 
> It's a way to override the generic mm_p4d_folded(). It can be rewritten
> as inline function + define. Something like:
> 
> #define mm_p4d_folded mm_p4d_folded
> static inline bool mm_p4d_folded(struct mm_struct *mm)
> {
>   return !pgtable_l5_enabled();
> }
> 
> But I don't see much reason to be more verbose here than needed.

C type checking? Documentation? Yeah, I know it's just a one-liner, but 
the principle of the death by a thousand cuts applies here.

BTW., any reason this must be in the low level pgtable_64_types.h type 
header, instead of one of the API level header files?

Thanks,

Ingo


Re: [PATCH V4 2/2] mm/pgtable/debug: Add test validating architecture page table helpers

2019-10-07 Thread Ingo Molnar


* Anshuman Khandual  wrote:

> This adds a test module which will validate architecture page table helpers
> and accessors regarding compliance with generic MM semantics expectations.
> This will help various architectures in validating changes to the existing
> page table helpers or addition of new ones.
> 
> Test page table and memory pages creating it's entries at various level are
> all allocated from system memory with required alignments. If memory pages
> with required size and alignment could not be allocated, then all depending
> individual tests are skipped.

> diff --git a/arch/x86/include/asm/pgtable_64_types.h 
> b/arch/x86/include/asm/pgtable_64_types.h
> index 52e5f5f2240d..b882792a3999 100644
> --- a/arch/x86/include/asm/pgtable_64_types.h
> +++ b/arch/x86/include/asm/pgtable_64_types.h
> @@ -40,6 +40,8 @@ static inline bool pgtable_l5_enabled(void)
>  #define pgtable_l5_enabled() 0
>  #endif /* CONFIG_X86_5LEVEL */
>  
> +#define mm_p4d_folded(mm) (!pgtable_l5_enabled())
> +
>  extern unsigned int pgdir_shift;
>  extern unsigned int ptrs_per_p4d;

Any deep reason this has to be a macro instead of proper C?

> diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> index 327b3ebf23bf..683131b1ee7d 100644
> --- a/mm/Kconfig.debug
> +++ b/mm/Kconfig.debug
> @@ -117,3 +117,18 @@ config DEBUG_RODATA_TEST
>  depends on STRICT_KERNEL_RWX
>  ---help---
>This option enables a testcase for the setting rodata read-only.
> +
> +config DEBUG_ARCH_PGTABLE_TEST
> + bool "Test arch page table helpers for semantics compliance"
> + depends on MMU
> + depends on DEBUG_KERNEL
> + depends on !(ARM || IA64)

Please add a proper enabling switch for architectures to opt in.

Please also add it to Documentation/features/list-arch.sh so that it's 
listed as a 'TODO' entry on architectures where the tests are not enabled 
yet.

> + help
> +   This options provides a kernel module which can be used to test
> +   architecture page table helper functions on various platform in
> +   verifying if they comply with expected generic MM semantics. This
> +   will help architectures code in making sure that any changes or
> +   new additions of these helpers will still conform to generic MM
> +   expected semantics.

Typos and grammar fixed:

help
  This option provides a kernel module which can be used to test
  architecture page table helper functions on various platforms in
  verifying if they comply with expected generic MM semantics. This
  will help architecture code in making sure that any changes or
  new additions of these helpers still conform to expected 
  semantics of the generic MM.

Also, more fundamentally: isn't a kernel module too late for such a debug 
check, should something break due to a core MM change? Have these debug 
checks caught any bugs or inconsistencies before?

Why not call this as some earlier MM debug check, after enabling paging 
but before executing user-space binaries or relying on complex MM ops 
within the kernel, called at a stage when those primitives are all 
expected to work fine?

It seems to me that arch_pgtable_tests_init) won't even context-switch 
normally, right?

Finally, instead of inventing yet another randomly named .config debug 
switch, please fit it into the regular MM debug options which go along 
the CONFIG_DEBUG_VM* naming scheme.

Might even make sense to enable these new debug checks by default if 
CONFIG_DEBUG_VM=y, that way we'll get a *lot* more debug coverage than 
some random module somewhere that few people will know about, let alone 
run.

Thanks,

Ingo


Re: [PATCH v12 11/12] open: openat2(2) syscall

2019-09-10 Thread Ingo Molnar


* Linus Torvalds  wrote:

> On Sat, Sep 7, 2019 at 10:42 AM Andy Lutomirski  wrote:
> >
> > Linus, you rejected resolveat() because you wanted a *nice* API
> 
> No. I rejected resoveat() because it was a completely broken garbage
> API that couldn't do even basic stuff right (like O_CREAT).
> 
> We have a ton of flag space in the new openat2() model, we might as
> well leave the old flags alone that people are (a) used to and (b) we
> have code to support _anyway_.
> 
> Making up a new flag namespace is only going to cause us - and users -
> more work, and more confusion. For no actual advantage. It's not going
> to be "cleaner". It's just going to be worse.

I suspect there is a "add a clean new flags namespace" analogy to the 
classic "add a clean new standard" XKCD:

https://xkcd.com/927/

Thanks,

Ingo


Re: [PATCH v2] powerpc/lockdep: fix a false positive warning

2019-09-08 Thread Ingo Molnar


* Qian Cai  wrote:

> I thought about making it a bool in the first place, but since all 
> other similar helpers (arch_is_kernel_initmem_freed(), 
> arch_is_kernel_text(), arch_is_kernel_data() etc) could be bool too but 
> are not, I kept arch_is_bss_hole() just to be “int” for consistent.
> 
> Although then there is is_kernel_rodata() which is bool. I suppose I’ll 
> change arch_is_bss_hole() to bool, and then could have a follow-up 
> patch to covert all similar helpers to return boo instead.

Sounds good to me.

Thanks,

Ingo


Re: [PATCH v2] powerpc/lockdep: fix a false positive warning

2019-09-07 Thread Ingo Molnar


* Qian Cai  wrote:

> The commit 108c14858b9e ("locking/lockdep: Add support for dynamic
> keys") introduced a boot warning on powerpc below, because since the
> commit 2d4f567103ff ("KVM: PPC: Introduce kvm_tmp framework") adds
> kvm_tmp[] into the .bss section and then free the rest of unused spaces
> back to the page allocator.
> 
> kernel_init
>   kvm_guest_init
> kvm_free_tmp
>   free_reserved_area
> free_unref_page
>   free_unref_page_prepare
> 
> Later, alloc_workqueue() happens to allocate some pages from there and
> trigger the warning at,
> 
> if (WARN_ON_ONCE(static_obj(key)))
> 
> Fix it by adding a generic helper arch_is_bss_hole() to skip those areas
> in static_obj(). Since kvm_free_tmp() is only done early during the
> boot, just go lockless to make the implementation simple for now.
> 
> WARNING: CPU: 0 PID: 13 at kernel/locking/lockdep.c:1120
> Workqueue: events work_for_cpu_fn
> Call Trace:
>   lockdep_register_key+0x68/0x200
>   wq_init_lockdep+0x40/0xc0
>   trunc_msg+0x385f9/0x4c30f (unreliable)
>   wq_init_lockdep+0x40/0xc0
>   alloc_workqueue+0x1e0/0x620
>   scsi_host_alloc+0x3d8/0x490
>   ata_scsi_add_hosts+0xd0/0x220 [libata]
>   ata_host_register+0x178/0x400 [libata]
>   ata_host_activate+0x17c/0x210 [libata]
>   ahci_host_activate+0x84/0x250 [libahci]
>   ahci_init_one+0xc74/0xdc0 [ahci]
>   local_pci_probe+0x78/0x100
>   work_for_cpu_fn+0x40/0x70
>   process_one_work+0x388/0x750
>   process_scheduled_works+0x50/0x90
>   worker_thread+0x3d0/0x570
>   kthread+0x1b8/0x1e0
>   ret_from_kernel_thread+0x5c/0x7c
> 
> Fixes: 108c14858b9e ("locking/lockdep: Add support for dynamic keys")
> Signed-off-by: Qian Cai 
> ---
> 
> v2: No need to actually define arch_is_bss_hole() powerpc64 only.
> 
>  arch/powerpc/include/asm/sections.h | 11 +++
>  arch/powerpc/kernel/kvm.c   |  5 +
>  include/asm-generic/sections.h  |  7 +++
>  kernel/locking/lockdep.c|  3 +++
>  4 files changed, 26 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/sections.h 
> b/arch/powerpc/include/asm/sections.h
> index 4a1664a8658d..4f5d69c42017 100644
> --- a/arch/powerpc/include/asm/sections.h
> +++ b/arch/powerpc/include/asm/sections.h
> @@ -5,8 +5,19 @@
>  
>  #include 
>  #include 
> +
> +#define arch_is_bss_hole arch_is_bss_hole
> +
>  #include 
>  
> +extern void *bss_hole_start, *bss_hole_end;
> +
> +static inline int arch_is_bss_hole(unsigned long addr)
> +{
> + return addr >= (unsigned long)bss_hole_start &&
> +addr < (unsigned long)bss_hole_end;
> +}
> +
>  extern char __head_end[];
>  
>  #ifdef __powerpc64__
> diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c
> index b7b3a5e4e224..89e0e522e125 100644
> --- a/arch/powerpc/kernel/kvm.c
> +++ b/arch/powerpc/kernel/kvm.c
> @@ -66,6 +66,7 @@
>  static bool kvm_patching_worked = true;
>  char kvm_tmp[1024 * 1024];
>  static int kvm_tmp_index;
> +void *bss_hole_start, *bss_hole_end;
>  
>  static inline void kvm_patch_ins(u32 *inst, u32 new_inst)
>  {
> @@ -707,6 +708,10 @@ static __init void kvm_free_tmp(void)
>*/
>   kmemleak_free_part(_tmp[kvm_tmp_index],
>  ARRAY_SIZE(kvm_tmp) - kvm_tmp_index);
> +
> + bss_hole_start = _tmp[kvm_tmp_index];
> + bss_hole_end = _tmp[ARRAY_SIZE(kvm_tmp)];
> +
>   free_reserved_area(_tmp[kvm_tmp_index],
>  _tmp[ARRAY_SIZE(kvm_tmp)], -1, NULL);
>  }
> diff --git a/include/asm-generic/sections.h b/include/asm-generic/sections.h
> index d1779d442aa5..4d8b1f2c5fd9 100644
> --- a/include/asm-generic/sections.h
> +++ b/include/asm-generic/sections.h
> @@ -91,6 +91,13 @@ static inline int arch_is_kernel_initmem_freed(unsigned 
> long addr)
>  }
>  #endif
>  
> +#ifndef arch_is_bss_hole
> +static inline int arch_is_bss_hole(unsigned long addr)
> +{
> + return 0;
> +}
> +#endif
> +
>  /**
>   * memory_contains - checks if an object is contained within a memory region
>   * @begin: virtual address of the beginning of the memory region
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 4861cf8e274b..cd75b51f15ce 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -675,6 +675,9 @@ static int static_obj(const void *obj)
>   if (arch_is_kernel_initmem_freed(addr))
>   return 0;
>  
> + if (arch_is_bss_hole(addr))
> + return 0;

arch_is_bss_hole() should use a 'bool' - but other than that, this 
looks good to me, if the PowerPC maintainers agree too.

Thanks,

Ingo


Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86

2019-09-02 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Mon, Sep 02, 2019 at 08:25:24PM +0800, Yunsheng Lin wrote:
> > On 2019/9/2 15:25, Peter Zijlstra wrote:
> > > On Mon, Sep 02, 2019 at 01:46:51PM +0800, Yunsheng Lin wrote:
> > >> On 2019/9/1 0:12, Peter Zijlstra wrote:
> > > 
> > >>> 1) because even it is not set, the device really does belong to a node.
> > >>> It is impossible a device will have magic uniform access to memory when
> > >>> CPUs cannot.
> > >>
> > >> So it means dev_to_node() will return either NUMA_NO_NODE or a
> > >> valid node id?
> > > 
> > > NUMA_NO_NODE := -1, which is not a valid node number. It is also, like I
> > > said, not a valid device location on a NUMA system.
> > > 
> > > Just because ACPI/BIOS is shit, doesn't mean the device doesn't have a
> > > node association. It just means we don't know and might have to guess.
> > 
> > How do we guess the device's location when ACPI/BIOS does not set it?
> 
> See device_add(), it looks to the device's parent and on NO_NODE, puts
> it there.
> 
> Lacking any hints, just stick it to node0 and print a FW_BUG or
> something.
> 
> > It seems dev_to_node() does not do anything about that and leave the
> > job to the caller or whatever function that get called with its return
> > value, such as cpumask_of_node().
> 
> Well, dev_to_node() doesn't do anything; nor should it. It are the
> callers of set_dev_node() that should be taking care.
> 
> Also note how device_add() sets the device node to the parent device's
> node on NUMA_NO_NODE. Arguably we should change it to complain when it
> finds NUMA_NO_NODE and !parent.
> 
> ---
>  drivers/base/core.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index f0dd8e38fee3..2caf204966a0 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2120,8 +2120,16 @@ int device_add(struct device *dev)
>   dev->kobj.parent = kobj;
>  
>   /* use parent numa_node */
> - if (parent && (dev_to_node(dev) == NUMA_NO_NODE))
> - set_dev_node(dev, dev_to_node(parent));
> + if (dev_to_node(dev) == NUMA_NO_NODE) {
> + if (parent)
> + set_dev_node(dev, dev_to_node(parent));
> +#ifdef CONFIG_NUMA
> + else {
> + pr_err("device: '%s': has no assigned NUMA node\n", 
> dev_name(dev));
> + set_dev_node(dev, 0);
> + }
> +#endif

BTW., is firmware required to always provide a NUMA node on NUMA systems?

I.e. do we really want this warning on non-NUMA systems that don't assign 
NUMA nodes?

Also, even on NUMA systems, is firmware required to provide a NUMA node - 
i.e. is it in principle invalid to offer no NUMA binding?

Thanks,

Ingo


Re: [PATCH v2 2/9] x86: numa: check the node id consistently for x86

2019-09-02 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> Also note that the CONFIG_DEBUG_PER_CPU_MAPS version of
> cpumask_of_node() already does this (although it wants the below fix).
> 
> ---
> diff --git a/arch/x86/mm/numa.c b/arch/x86/mm/numa.c
> index e6dad600614c..5f49c10201c7 100644
> --- a/arch/x86/mm/numa.c
> +++ b/arch/x86/mm/numa.c
> @@ -861,7 +861,7 @@ void numa_remove_cpu(int cpu)
>   */
>  const struct cpumask *cpumask_of_node(int node)
>  {
> - if (node >= nr_node_ids) {
> + if ((unsigned)node >= nr_node_ids) {
>   printk(KERN_WARNING
>   "cpumask_of_node(%d): node > nr_node_ids(%u)\n",
>   node, nr_node_ids);

Nitpicking: please also fix the kernel message to say ">=".

With that:

Acked-by: Ingo Molnar 

Note that:

- arch/arm64/mm/numa.c copied the same sign bug (or unrobustness if we 
  don't want to call it a bug).

- Kudos to the mm/memcontrol.c and kernel/bpf/syscall.c teams for not 
  copying that bug. Booo for none of them fixing the buggy pattern 
  elsewhere in the kernel ;-)

Thanks,

Ingo


Re: [PATCH] [v2] x86/mpx: fix recursive munmap() corruption

2019-05-09 Thread Ingo Molnar


* Dave Hansen  wrote:

> Reported-by: Richard Biener 
> Reported-by: H.J. Lu 
> Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
> Cc: Yang Shi 
> Cc: Michal Hocko 
> Cc: Vlastimil Babka 
> Cc: Andy Lutomirski 
> Cc: x...@kernel.org
> Cc: Andrew Morton 
> Cc: linux-ker...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: sta...@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux...@lists.infradead.org
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Michael Ellerman 
> Cc: linux-a...@vger.kernel.org
> Cc: Guan Xuetao 
> Cc: Jeff Dike 
> Cc: Richard Weinberger 
> Cc: Anton Ivanov 

I've also added your:

  Signed-off-by: Dave Hansen 

Because I suppose you intended to sign off on it?

Thanks,

Ingo


Re: [PATCH 00/10] implement DYNAMIC_DEBUG_RELATIVE_POINTERS

2019-05-06 Thread Ingo Molnar


* Rasmus Villemoes  wrote:

> I _am_ bending the C rules a bit with the "extern some_var; asm 
> volatile(".section some_section\nsome_var: blabla");". I should 
> probably ask on the gcc list whether this way of defining a local 
> symbol in inline assembly and referring to it from C is supposed to 
> work, or it just happens to work by chance.

Doing that would be rather useful I think.

Thanks,

Ingo


Re: [PATCH 00/10] implement DYNAMIC_DEBUG_RELATIVE_POINTERS

2019-05-06 Thread Ingo Molnar


* Rasmus Villemoes  wrote:

> On 09/04/2019 23.25, Rasmus Villemoes wrote:
> 
> > While refreshing these patches, which were orignally just targeted at
> > x86-64, it occured to me that despite the implementation relying on
> > inline asm, there's nothing x86 specific about it, and indeed it seems
> > to work out-of-the-box for ppc64 and arm64 as well, but those have
> > only been compile-tested.
> 
> So, apart from the Clang build failures for non-x86, I now also got a
> report that gcc 4.8 miscompiles this stuff in some cases [1], even for
> x86 - gcc 4.9 does not seem to have the problem. So, given that the 5.2
> merge window just opened, I suppose this is the point where I should
> pull the plug on this experiment :(
> 
> Rasmus
> 
> [1] Specifically, the problem manifested in net/ipv4/tcp_input.c: Both
> uses of the static inline inet_csk_clear_xmit_timer() pass a
> compile-time constant 'what', so the ifs get folded away and both uses
> are completely inlined. Yet, gcc still decides to emit a copy of the
> final 'else' branch of inet_csk_clear_xmit_timer() as its own
> inet_csk_reset_xmit_timer.part.55 function, which is of course unused.
> And despite the asm() that defines the ddebug descriptor being an "asm
> volatile", gcc thinks it's fine to elide that (the code path is
> unreachable, after all), so the entire asm for that function is
> 
> .section.text.unlikely
> .type   inet_csk_reset_xmit_timer.part.55, @function
> inet_csk_reset_xmit_timer.part.55:
> movq$.LC1, %rsi #,
> movq$__UNIQUE_ID_ddebug160, %rdi#,
> xorl%eax, %eax  #
> jmp __dynamic_pr_debug  #
> .size   inet_csk_reset_xmit_timer.part.55,
> .-inet_csk_reset_xmit_timer.part.55
> 
> which of course fails to link since the symbol __UNIQUE_ID_ddebug160 is
> nowhere defined.

It's sad to see such nice data footprint savings go the way of the dodo 
just because GCC 4.8 is buggy.

The current compatibility cut-off is GCC 4.6:

  GNU C  4.6  gcc --version

Do we know where the GCC bug was fixed, was it in GCC 4.9?

According to the GCC release dates:

  https://www.gnu.org/software/gcc/releases.html

4.8.0 was released in early-2013, while 4.9.0 was released in early-2014. 
So the tooling compatibility window for latest upstream would narrow from 
~6 years to ~5 years.

Thanks,

Ingo


Re: [PATCH 4/5] x86: don't use asm-generic/ptrace.h

2019-05-01 Thread Ingo Molnar


* Christoph Hellwig  wrote:

> Doing the indirection through macros for the regs accessors just
> makes them harder to read, so implement the helpers directly.
> 
> Note that only the helpers actually used are implemented now.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/include/asm/ptrace.h | 29 -
>  1 file changed, 24 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
> index 8a7fc0cca2d1..9b81ef539eb3 100644
> --- a/arch/x86/include/asm/ptrace.h
> +++ b/arch/x86/include/asm/ptrace.h
> @@ -98,7 +98,6 @@ struct cpuinfo_x86;
>  struct task_struct;
>  
>  extern unsigned long profile_pc(struct pt_regs *regs);
> -#define profile_pc profile_pc
>  
>  extern unsigned long
>  convert_ip_to_linear(struct task_struct *child, struct pt_regs *regs);
> @@ -175,11 +174,31 @@ static inline unsigned long kernel_stack_pointer(struct 
> pt_regs *regs)
>  }
>  #endif
>  
> -#define GET_IP(regs) ((regs)->ip)
> -#define GET_FP(regs) ((regs)->bp)
> -#define GET_USP(regs) ((regs)->sp)
> +static inline unsigned long instruction_pointer(struct pt_regs *regs)
> +{
> + return regs->ip;
> +}
> +static inline void instruction_pointer_set(struct pt_regs *regs,

Nit: missing newline between inline functions.

> + unsigned long val)
> +{
> + regs->ip = val;
> +}
> +
> +static inline unsigned long frame_pointer(struct pt_regs *regs)
> +{
> + return regs->bp;
> +}
>  
> -#include 
> +static inline unsigned long user_stack_pointer(struct pt_regs *regs)
> +{
> + return regs->sp;
> +}
> +
> +static inline void user_stack_pointer_set(struct pt_regs *regs,
> + unsigned long val)
> +{
> + regs->sp = val;
> +}

Other than that:

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH] compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING

2019-04-18 Thread Ingo Molnar


* Masahiro Yamada  wrote:

> Commit 60a3cdd06394 ("x86: add optimized inlining") introduced
> CONFIG_OPTIMIZE_INLINING, but it has been available only for x86.
> 
> The idea is obviously arch-agnostic although we need some code fixups.
> This commit moves the config entry from arch/x86/Kconfig.debug to
> lib/Kconfig.debug so that all architectures (except MIPS for now) can
> benefit from it.
> 
> At this moment, I added "depends on !MIPS" because fixing 0day bot reports
> for MIPS was complex to me.
> 
> I tested this patch on my arm/arm64 boards.
> 
> This can make a huge difference in kernel image size especially when
> CONFIG_OPTIMIZE_FOR_SIZE is enabled.
> 
> For example, I got 3.5% smaller arm64 kernel image for v5.1-rc1.
> 
>   dec   file
>   18983424  arch/arm64/boot/Image.before
>   18321920  arch/arm64/boot/Image.after
> 
> This also slightly improves the "Kernel hacking" Kconfig menu.
> Commit e61aca5158a8 ("Merge branch 'kconfig-diet' from Dave Hansen')
> mentioned this config option would be a good fit in the "compiler option"
> menu. I did so.

No objections against moving it from x86 code to generic code.

Thanks,

Ingo


Re: [PATCH v2 2/2] locking/rwsem: Optimize down_read_trylock()

2019-02-13 Thread Ingo Molnar


* Waiman Long  wrote:

> I looked at the assembly code in arch/x86/include/asm/rwsem.h. For both
> trylocks (read & write), the count is read first before attempting to
> lock it. We did the same for all trylock functions in other locks.
> Depending on how the trylock is used and how contended the lock is, it
> may help or hurt performance. Changing down_read_trylock to do an
> unconditional cmpxchg will change the performance profile of existing
> code. So I would prefer keeping the current code.
> 
> I do notice now that the generic down_write_trylock() code is doing an
> unconditional compxchg. So I wonder if we should change it to read the
> lock first like other trylocks or just leave it as it is.

No, I think we should instead move the other trylocks to the 
try-for-ownership model as well, like Linus suggested.

That's the general assumption we make in locking primitives, that we 
optimize for the common, expected case - which would be that the trylock 
succeeds, and I don't see why trylock primitives should be different.

In fact I can see more ways for read-for-sharing to perform suboptimally 
on larger systems.

Thanks,

Ingo


Re: [RFC PATCH] x86, numa: always initialize all possible nodes

2019-02-11 Thread Ingo Molnar


* Michal Hocko  wrote:

> On Thu 24-01-19 11:10:50, Dave Hansen wrote:
> > On 1/24/19 6:17 AM, Michal Hocko wrote:
> > > and nr_cpus set to 4. The underlying reason is tha the device is bound
> > > to node 2 which doesn't have any memory and init_cpu_to_node only
> > > initializes memory-less nodes for possible cpus which nr_cpus restrics.
> > > This in turn means that proper zonelists are not allocated and the page
> > > allocator blows up.
> > 
> > This looks OK to me.
> > 
> > Could we add a few DEBUG_VM checks that *look* for these invalid
> > zonelists?  Or, would our existing list debugging have caught this?
> 
> Currently we simply blow up because those zonelists are NULL. I do not
> think we have a way to check whether an existing zonelist is actually 
> _correct_ other thatn check it for NULL. But what would we do in the
> later case?
> 
> > Basically, is this bug also a sign that we need better debugging around
> > this?
> 
> My earlier patch had a debugging printk to display the zonelists and
> that might be worthwhile I guess. Basically something like this
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2e097f336126..c30d59f803fb 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5259,6 +5259,11 @@ static void build_zonelists(pg_data_t *pgdat)
>  
>   build_zonelists_in_node_order(pgdat, node_order, nr_nodes);
>   build_thisnode_zonelists(pgdat);
> +
> + pr_info("node[%d] zonelist: ", pgdat->node_id);
> + for_each_zone_zonelist(zone, z, 
> >node_zonelists[ZONELIST_FALLBACK], MAX_NR_ZONES-1)
> + pr_cont("%d:%s ", zone_to_nid(zone), zone->name);
> + pr_cont("\n");
>  }

Looks like this patch fell through the cracks - any update on this?

Thanks,

Ingo


Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Ingo Molnar


* Will Deacon  wrote:

> On Mon, Feb 11, 2019 at 11:39:27AM +0100, Ingo Molnar wrote:
> > 
> > * Ingo Molnar  wrote:
> > 
> > > Sounds good to me - I've merged this patch, will push it out after 
> > > testing.
> > 
> > Based on Peter's feedback I'm delaying this - performance testing on at 
> > least one key ll/sc arch would be nice indeed.
> 
> Once Waiman has posted a new version, I can take it for a spin on some
> arm64 boxen if he shares his workload.

Cool, thanks!

Ingo


Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-11 Thread Ingo Molnar


* Ingo Molnar  wrote:

> Sounds good to me - I've merged this patch, will push it out after 
> testing.

Based on Peter's feedback I'm delaying this - performance testing on at 
least one key ll/sc arch would be nice indeed.

Thanks,

Ingo


Re: [PATCH-tip 00/22] locking/rwsem: Rework rwsem-xadd & enable new rwsem features

2019-02-10 Thread Ingo Molnar


* Waiman Long  wrote:

> On 02/07/2019 02:51 PM, Davidlohr Bueso wrote:
> > On Thu, 07 Feb 2019, Waiman Long wrote:
> >> 30 files changed, 1197 insertions(+), 1594 deletions(-)
> >
> > Performance numbers on numerous workloads, pretty please.
> >
> > I'll go and throw this at my mmap_sem intensive workloads
> > I've collected.
> >
> > Thanks,
> > Davidlohr
> 
> Thanks for getting some of the performance numbers. This is the initial
> draft after more than 1 years of hibernation. I will also get other
> performance numbers in subsequent revision of the patch.

If you could sort all the invariant preparatory patches to the head of 
the series I can merge them to reduce overall complexity and simplify 
performance testing and review of the rest.

Thanks,

Ingo


Re: [PATCH] locking/rwsem: Remove arch specific rwsem files

2019-02-10 Thread Ingo Molnar


* Waiman Long  wrote:

> On 02/10/2019 09:00 PM, Waiman Long wrote:
> > As the generic rwsem-xadd code is using the appropriate acquire and
> > release versions of the atomic operations, the arch specific rwsem.h
> > files will not be that much faster than the generic code as long as the
> > atomic functions are properly implemented. So we can remove those arch
> > specific rwsem.h and stop building asm/rwsem.h to reduce maintenance
> > effort.
> >
> > Currently, only x86, alpha and ia64 have implemented architecture
> > specific fast paths. I don't have access to alpha and ia64 systems for
> > testing, but they are legacy systems that are not likely to be updated
> > to the latest kernel anyway.
> >
> > By using a rwsem microbenchmark, the total locking rates on a 4-socket
> > 56-core 112-thread x86-64 system before and after the patch were as
> > follows (mixed means equal # of read and write locks):
> >
> >   Before Patch  After Patch
> ># of Threads  wlock   rlock   mixed wlock   rlock   mixed
> >  -   -   - -   -   -
> > 127,373  29,409  28,17028,773  30,164  29,276
> > 2 7,697  14,922   1,703 7,435  15,167   1,729
> > 4 6,987  14,285   1,490 7,181  14,438   1,330
> > 8 6,650  13,652 761 6,918  13,796 718
> >16 6,434  15,729 713 6,554  16,030 625
> >32 5,590  15,312 552 6,124  15,344 471
> >64 5,980  15,478  61 5,668  15,509  58
> >
> > There were some run-to-run variations for the multi-thread tests. For
> > x86-64, using the generic C code fast path seems to be a liitle bit
> > faster than the assembly version especially for read-lock and when lock
> > contention is low.  Looking at the assembly version of the fast paths,
> > there are assembly to/from C code wrappers that save and restore all
> > the callee-clobbered registers (7 registers on x86-64). The assembly
> > generated from the generic C code doesn't need to do that. That may
> > explain the slight performance gain here.
> >
> > The generic asm rwsem.h can also be merged into kernel/locking/rwsem.h
> > as no other code other than those under kernel/locking needs to access
> > the internal rwsem macros and functions.
> >
> > Signed-off-by: Waiman Long 
> 
> I have decided to break the rwsem patchset that I sent out on last
> Thursday into 3 parts. This patch is part 0 as it touches a number of
> arch specific files and so have the widest distribution. I would like to
> get it merged first. Part 1 will be patches 1-10 (except 4) of my
> original rwsem patchset. This part moves things around, adds more
> debugging capability and lays the ground work for the next part. Part 2
> will contains the remaining patches which are the real beef of the whole
> patchset.

Sounds good to me - I've merged this patch, will push it out after 
testing.

Thanks,

Ingo


Re: [PATCH 02/17] x86: Add support for ZSTD-compressed kernel

2018-11-11 Thread Ingo Molnar


* Adam Borowski  wrote:

> From: Nick Terrell 
> 
> Integrates the ZSTD decompression code to the x86 pre-boot code.
> 
> Zstandard requires slightly more memory during the kernel decompression
> on x86 (192 KB vs 64 KB), and the memory usage is independent of the
> window size.
> 
> Zstandard requires memory proportional to the window size used during
> compression for decompressing the ramdisk image, since streaming mode is
> used. Newer versions of zstd (1.3.2+) list the window size of a file
> with `zstd -lv '. The absolute maximum amount of memory required
> is just over 8 MB.
> 
> Signed-off-by: Nick Terrell 
> ---
>  Documentation/x86/boot.txt| 6 +++---
>  arch/x86/Kconfig  | 1 +
>  arch/x86/boot/compressed/Makefile | 5 -
>  arch/x86/boot/compressed/misc.c   | 4 
>  arch/x86/boot/header.S| 8 +++-
>  arch/x86/include/asm/boot.h   | 6 --
>  6 files changed, 23 insertions(+), 7 deletions(-)

Acked-by: Ingo Molnar 

> diff --git a/arch/x86/boot/header.S b/arch/x86/boot/header.S
> index 4c881c850125..af2efb256527 100644
> --- a/arch/x86/boot/header.S
> +++ b/arch/x86/boot/header.S
> @@ -526,8 +526,14 @@ pref_address:.quad LOAD_PHYSICAL_ADDR
> # preferred load addr
>  # the size-dependent part now grows so fast.
>  #
>  # extra_bytes = (uncompressed_size >> 8) + 65536
> +#
> +# ZSTD compressed data grows by at most 3 bytes per 128K, and only has a 22
> +# byte fixed overhead but has a maximum block size of 128K, so it needs a
> +# larger margin.
> +#
> +# extra_bytes = (uncompressed_size >> 8) + 131072
>  
> -#define ZO_z_extra_bytes ((ZO_z_output_len >> 8) + 65536)
> +#define ZO_z_extra_bytes ((ZO_z_output_len >> 8) + 131072)

This change would also affect other decompressors, not just ZSTD, 
correct?

Might want to split this change out into a separate preparatory patch to 
allow it to be bisected to, or at least mention it in the changelog more 
explicitly?

Thanks,

Ingo


Re: [PATCH 2/2] x86, powerpc: remove -funit-at-a-time compiler option entirely

2018-11-11 Thread Ingo Molnar


* Masahiro Yamada  wrote:

> GCC 4.6 manual says:
> 
> -funit-at-a-time
>   This option is left for compatibility reasons. -funit-at-a-time has
>   no effect, while -fno-unit-at-a-time implies -fno-toplevel-reorder
>   and -fno-section-anchors.
>   Enabled by default.
> 
> Signed-off-by: Masahiro Yamada 
> ---
> 
>  arch/powerpc/Makefile | 4 
>  arch/x86/Makefile | 4 
>  arch/x86/Makefile.um  | 5 -
>  3 files changed, 13 deletions(-)
> 
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 88398fd..3508049 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -130,10 +130,6 @@ else
>  
>  KBUILD_CFLAGS += -mno-red-zone
>  KBUILD_CFLAGS += -mcmodel=kernel
> -
> -# -funit-at-a-time shrinks the kernel .text considerably
> -# unfortunately it makes reading oopses harder.
> -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
>  endif
>  
>  ifdef CONFIG_X86_X32
> diff --git a/arch/x86/Makefile.um b/arch/x86/Makefile.um
> index 577976b..1db7913 100644
> --- a/arch/x86/Makefile.um
> +++ b/arch/x86/Makefile.um
> @@ -26,9 +26,6 @@ cflags-y += $(call cc-option,-mpreferred-stack-boundary=2)
>  # an unresolved reference.
>  cflags-y += -ffreestanding
>  
> -# gcc 4.3.0 needs -funit-at-a-time for extern inline functions.
> -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> -
>  KBUILD_CFLAGS += $(cflags-y)
>  
>  else
> @@ -50,6 +47,4 @@ ELF_FORMAT := elf64-x86-64
>  LINK-$(CONFIG_LD_SCRIPT_DYN) += -Wl,-rpath,/lib64
>  LINK-y += -m64
>  
> -# Do unit-at-a-time unconditionally on x86_64, following the host
> -KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
>  endif

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [PATCH v6 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-08-07 Thread Ingo Molnar
  | 54 ++---
>  arch/sparc/include/asm/hugetlb.h | 40 +++--
>  arch/x86/include/asm/hugetlb.h   | 69 --
>  include/asm-generic/hugetlb.h| 88 
> +++-
>  15 files changed, 135 insertions(+), 394 deletions(-)

The x86 bits look good to me (assuming it's all tested on all relevant 
architectures, etc.)

Acked-by: Ingo Molnar 

Thanks,

Ingo


Re: [GIT PULL 00/21] perf/core improvements and fixes

2018-08-02 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, contains a recently merged
> tip/perf/urgent,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit c2586cfbb905939b79b49a9121fb0a59a5668fd6:
> 
>   Merge remote-tracking branch 'tip/perf/urgent' into perf/core (2018-07-31 
> 09:55:45 -0300)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.19-20180801
> 
> for you to fetch changes up to b912885ab75c7c8aa841c615108afd755d0b97f8:
> 
>   perf trace: Do not require --no-syscalls to suppress strace like output 
> (2018-08-01 16:20:28 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> perf trace: (Arnaldo Carvalho de Melo)
> 
> - Do not require --no-syscalls to suppress strace like output, i.e.
> 
>  # perf trace -e sched:*switch
> 
>   will show just sched:sched_switch events, not strace-like formatted
>   syscall events, use --syscalls to get the previous behaviour.
> 
>   If instead:
> 
>  # perf trace
> 
>   is used, i.e. no events specified, then --syscalls is implied and
>   system wide strace like formatting will be applied to all syscalls.
> 
>   The behaviour when just a syscall subset is used with '-e' is unchanged:
> 
>  # perf trace -e *sleep,sched:*switch
> 
>   will work as before: just the 'nanosleep' syscall will be strace-like
>   formatted plus the sched:sched_switch tracepoint event, system wide.
> 
> - Allow string table generators to use a default header dir, allowing
>   use of them without parameters to see the table it generates on
>   stdout, e.g.:
> 
> $ tools/perf/trace/beauty/kvm_ioctl.sh
> static const char *kvm_ioctl_cmds[] = {
> [0x00] = "GET_API_VERSION",
> [0x01] = "CREATE_VM",
> [0x02] = "GET_MSR_INDEX_LIST",
> [0x03] = "CHECK_EXTENSION",
> 
> [0xe0] = "CREATE_DEVICE",
> [0xe1] = "SET_DEVICE_ATTR",
> [0xe2] = "GET_DEVICE_ATTR",
> [0xe3] = "HAS_DEVICE_ATTR",
> };
> $
> 
>   See 'ls tools/perf/trace/beauty/*.sh' to see the available string
>   table generators.
> 
> - Add a generator for IPPROTO_ socket's protocol constants.
> 
> perf record: (Kan Liang)
> 
> - Fix error out while applying initial delay and using LBR, due to
>   the use of a PERF_TYPE_SOFTWARE/PERF_COUNT_SW_DUMMY event to track
>   PERF_RECORD_MMAP events while waiting for the initial delay. Such
>   events fail when configured asking PERF_SAMPLE_BRANCH_STACK in
>   perf_event_attr.sample_type.
> 
> perf c2c: (Jiri Olsa)
> 
> - Fix report crash for empty browser, when processing a perf.data file
>   without events of interest, either because not asked for in
>   'perf record' or because the workload didn't triggered such events.
> 
> perf list: (Michael Petlan)
> 
> - Align metric group description format with PMU event description.
> 
> perf tests: (Sandipan Das)
> 
> - Fix indexing when invoking subtests, which caused BPF tests to
>   get results for the next test in the list, with the last one
>   reporting a failure.
> 
> eBPF:
> 
> - Fix installation directory for header files included from eBPF proggies,
>   avoiding clashing with relative paths used to build other software projects
>   such as glibc. (Thomas Richter)
> 
> - Show better message when failing to load an object. (Arnaldo Carvalho de 
> Melo)
> 
> General: (Christophe Leroy)
> 
> - Allow overriding MAX_NR_CPUS at compile time, to make the tooling
>   usable in systems with less memory, in time this has to be changed
>   to properly allocate based on _NPROCESSORS_ONLN.
> 
> Architecture specific:
> 
> - Update arm64's ThunderX2 implementation defined pmu core events (Ganapatrao 
> Kulkarni)
> 
> - Fix complex event name parsing in 'perf test' for PowerPC, where the 
> 'umask' event
>   modifier isn't present. (Sandipan Das)
> 
> CoreSight ARM hardware tracing: (Leo Yan)
> 
> - Fix start tracing packet handling.
> 
> - Support dummy address value for CS_ETM_TRACE_ON packet.
> 
> - Generate branch sample when receiving a CS_ETM_TRACE_ON packet.
> 
> - Generate branch sample for CS_ETM_TRACE_ON packet.
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (9):
>   perf trace beauty: Default header_dir to cwd to work without parms
>   tools include uapi: Grab a copy of linux/in.h
>   perf beauty: Add a generator for IPPROTO_ socket's protocol constants
>   perf trace beauty: Do not print NULL strarray entries
>   perf trace beauty: Add beautifiers for 'socket''s 'protocol' arg
>   perf trace: Beautify the AF_INET & AF_INET6 'socket' syscall 'protocol' 
> args
>   perf bpf: Show better message when failing to load an object
>   perf bpf: Include uapi/linux/bpf.h from the 'perf trace' script's 

Re: [GIT PULL 0/5] perf/urgent fixes

2018-07-30 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, just to get the build without warnings
> and finishing successfully in all my test environments,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 7f635ff187ab6be0b350b3ec06791e376af238ab:
> 
>   perf/core: Fix crash when using HW tracing kernel filters (2018-07-25 
> 11:46:22 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-4.18-20180730
> 
> for you to fetch changes up to 44fe619b1418ff4e9d2f9518a940fbe2fb686a08:
> 
>   perf tools: Fix the build on the alpine:edge distro (2018-07-30 13:15:03 
> -0300)
> 
> 
> perf/urgent fixes: (Arnaldo Carvalho de Melo)
> 
> - Update the tools copy of several files, including perf_event.h,
>   powerpc's asm/unistd.h (new io_pgetevents syscall), bpf.h and
>   x86's memcpy_64.s (used in 'perf bench mem'), silencing the
>   respective warnings during the perf tools build.
> 
> - Fix the build on the alpine:edge distro.
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (5):
>   tools headers uapi: Update tools's copy of linux/perf_event.h
>   tools headers powerpc: Update asm/unistd.h copy to pick new
>   tools headers uapi: Refresh linux/bpf.h copy
>   tools arch: Update arch/x86/lib/memcpy_64.S copy used in 'perf bench 
> mem memcpy'
>   perf tools: Fix the build on the alpine:edge distro
> 
>  tools/arch/powerpc/include/uapi/asm/unistd.h |   1 +
>  tools/arch/x86/include/asm/mcsafe_test.h |  13 
>  tools/arch/x86/lib/memcpy_64.S   | 112 
> +--
>  tools/include/uapi/linux/bpf.h   |  28 +--
>  tools/include/uapi/linux/perf_event.h|   2 +
>  tools/perf/arch/x86/util/pmu.c   |   1 +
>  tools/perf/arch/x86/util/tsc.c   |   1 +
>  tools/perf/bench/Build   |   1 +
>  tools/perf/bench/mem-memcpy-x86-64-asm.S |   1 +
>  tools/perf/bench/mem-memcpy-x86-64-lib.c |  24 ++
>  tools/perf/perf.h|   1 +
>  tools/perf/util/header.h |   1 +
>  tools/perf/util/namespaces.h |   1 +
>  13 files changed, 124 insertions(+), 63 deletions(-)
>  create mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
>  create mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c

Pulled, thanks a lot Arnaldo!

Ingo


Re: [PATCH] watchdog/softlockup: Fix SOFTLOCKUP_DETECTOR=n build

2018-07-10 Thread Ingo Molnar


* Peter Zijlstra  wrote:

> On Mon, Jul 09, 2018 at 11:40:14PM +0530, Abdul Haleem wrote:
> 
> > Thanks Peter for the patch, build and boot is fine.
> > 
> > Reported-and-tested-by: Abdul Haleem 
> 
> Excellent, Ingo can you stick this in?

Sure, done!

Thanks,

Ingo


Re: [PATCH v9 0/6] add support for relative references in special sections

2018-07-03 Thread Ingo Molnar


* Ard Biesheuvel  wrote:

> On 27 June 2018 at 17:15, Will Deacon  wrote:
> > Hi Ard,
> >
> > On Tue, Jun 26, 2018 at 08:27:55PM +0200, Ard Biesheuvel wrote:
> >> This adds support for emitting special sections such as initcall arrays,
> >> PCI fixups and tracepoints as relative references rather than absolute
> >> references. This reduces the size by 50% on 64-bit architectures, but
> >> more importantly, it removes the need for carrying relocation metadata
> >> for these sections in relocatable kernels (e.g., for KASLR) that needs
> >> to be fixed up at boot time. On arm64, this reduces the vmlinux footprint
> >> of such a reference by 8x (8 byte absolute reference + 24 byte RELA entry
> >> vs 4 byte relative reference)
> >>
> >> Patch #3 was sent out before as a single patch. This series supersedes
> >> the previous submission. This version makes relative ksymtab entries
> >> dependent on the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS rather
> >> than trying to infer from kbuild test robot replies for which architectures
> >> it should be blacklisted.
> >>
> >> Patch #1 introduces the new Kconfig symbol HAVE_ARCH_PREL32_RELOCATIONS,
> >> and sets it for the main architectures that are expected to benefit the
> >> most from this feature, i.e., 64-bit architectures or ones that use
> >> runtime relocations.
> >>
> >> Patch #2 add support for #define'ing __DISABLE_EXPORTS to get rid of
> >> ksymtab/kcrctab sections in decompressor and EFI stub objects when
> >> rebuilding existing C files to run in a different context.
> >
> > I had a small question on patch 3, but it's really for my understanding.
> > So, for patches 1-3:
> >
> > Reviewed-by: Will Deacon 
> >
> 
> Thanks all.
> 
> Thomas, Ingo,
> 
> Except for the below tweak against patch #3 for powerpc, which may
> apparently get confused by an input section called .discard without
> any suffixes, this series is good to go, but requires your ack to
> proceed, so I would like to ask you to share your comments and/or
> objections. Also, any suggestions or recommendations regarding the
> route these patches should take are highly appreciated.

LGTM:

Acked-by: Ingo Molnar 

Regarding route - I suspect -mm would be good, or any other tree that does a 
lot 
of cross-arch testing?

Thanks,

Ingo


Re: [PATCH] Extract initrd free logic from arch-specific code.

2018-04-02 Thread Ingo Molnar

* Shea Levy  wrote:

> > Please also put it into Documentation/features/.
> 
> I switched this patch series (the latest revision v6 was just posted) to
> using weak symbols instead of Kconfig. Does it still warrant documentation?

Probably not.

Thanks,

Ingo


Re: [PATCH] Extract initrd free logic from arch-specific code.

2018-03-30 Thread Ingo Molnar

* Shea Levy  wrote:

> Now only those architectures that have custom initrd free requirements
> need to define free_initrd_mem.
> 
> Signed-off-by: Shea Levy 

Please put the Kconfig symbol name this patch introduces both into the title, 
so 
that people know what to grep for.

> ---
>  arch/alpha/mm/init.c  |  8 
>  arch/arc/mm/init.c|  7 ---
>  arch/arm/Kconfig  |  1 +
>  arch/arm64/Kconfig|  1 +
>  arch/blackfin/Kconfig |  1 +
>  arch/c6x/mm/init.c|  7 ---
>  arch/cris/Kconfig |  1 +
>  arch/frv/mm/init.c| 11 ---
>  arch/h8300/mm/init.c  |  7 ---
>  arch/hexagon/Kconfig  |  1 +
>  arch/ia64/Kconfig |  1 +
>  arch/m32r/Kconfig |  1 +
>  arch/m32r/mm/init.c   | 11 ---
>  arch/m68k/mm/init.c   |  7 ---
>  arch/metag/Kconfig|  1 +
>  arch/microblaze/mm/init.c |  7 ---
>  arch/mips/Kconfig |  1 +
>  arch/mn10300/Kconfig  |  1 +
>  arch/nios2/mm/init.c  |  7 ---
>  arch/openrisc/mm/init.c   |  7 ---
>  arch/parisc/mm/init.c |  7 ---
>  arch/powerpc/mm/mem.c |  7 ---
>  arch/riscv/mm/init.c  |  6 --
>  arch/s390/Kconfig |  1 +
>  arch/score/Kconfig|  1 +
>  arch/sh/mm/init.c |  7 ---
>  arch/sparc/Kconfig|  1 +
>  arch/tile/Kconfig |  1 +
>  arch/um/kernel/mem.c  |  7 ---
>  arch/unicore32/Kconfig|  1 +
>  arch/x86/Kconfig  |  1 +
>  arch/xtensa/Kconfig   |  1 +
>  init/initramfs.c  |  7 +++
>  usr/Kconfig   |  4 
>  34 files changed, 28 insertions(+), 113 deletions(-)

Please also put it into Documentation/features/.

> diff --git a/usr/Kconfig b/usr/Kconfig
> index 43658b8a975e..7a94f6df39bf 100644
> --- a/usr/Kconfig
> +++ b/usr/Kconfig
> @@ -233,3 +233,7 @@ config INITRAMFS_COMPRESSION
>   default ".lzma" if RD_LZMA
>   default ".bz2"  if RD_BZIP2
>   default ""
> +
> +config HAVE_ARCH_FREE_INITRD_MEM
> + bool
> + default n

Help text would be nice, to tell arch maintainers what the purpose of this 
switch 
is.

Also, a nit, I think this should be named "ARCH_HAS_FREE_INITRD_MEM", which is 
the 
dominant pattern:

triton:~/tip> git grep 'select.*ARCH' arch/x86/Kconfig* | cut -f2 | cut -d_ 
-f1-2 | sort | uniq -c | sort -n
...
  2 select ARCH_USES
  2 select ARCH_WANTS
  3 select ARCH_MIGHT
  3 select ARCH_WANT
  4 select ARCH_SUPPORTS
  4 select ARCH_USE
 16 select HAVE_ARCH
 23 select ARCH_HAS

It also reads nicely in English:

  "arch has free_initrd_mem()"

While the other makes little sense:

  "have arch free_initrd_mem()"

?

Thanks,

Ingo


Re: [RFC] new SYSCALL_DEFINE/COMPAT_SYSCALL_DEFINE wrappers

2018-03-30 Thread Ingo Molnar

* John Paul Adrian Glaubitz  wrote:

> On 03/27/2018 12:40 PM, Linus Torvalds wrote:
> > On Mon, Mar 26, 2018 at 4:37 PM, John Paul Adrian Glaubitz
> >  wrote:
> >>
> >> What about a tarball with a minimal Debian x32 chroot? Then you can
> >> install interesting packages you would like to test yourself.
> > 
> > That probably works fine.
> 
> I just created a fresh Debian x32 unstable chroot using this command:
> 
> $ debootstrap --no-check-gpg --variant=minbase --arch=x32 unstable 
> debian-x32-unstable http://ftp.ports.debian.org/debian-ports
> 
> It can be downloaded from my Debian webspace along checksum files for
> verification:
> 
> > https://people.debian.org/~glaubitz/chroots/
> 
> Let me know if you run into any issues.

Here's the direct download link:

  $ wget https://people.debian.org/~glaubitz/chroots/debian-x32-unstable.tar.gz

Checksum should be:

  $ sha256sum debian-x32-unstable.tar.gz
  010844bcc76bd1a3b7a20fe47f7067ed8e429a84fa60030a2868626e8fa7ec3b  
debian-x32-unstable.tar.gz

Seems to work fine here (on a distro kernel) even if I extract all the files as 
a 
non-root user and do:

  ~/s/debian-x32-unstable> fakechroot /usr/sbin/chroot . /usr/bin/dpkg -l  | 
tail -2

  ERROR: ld.so: object 'libfakechroot.so' from LD_PRELOAD cannot be preloaded 
(cannot open shared object file): ignored.
  ii  util-linux:x32 2.31.1-0.5   x32  miscellaneous 
system utilities
  ii  zlib1g:x32 1:1.2.8.dfsg-5   x32  compression 
library - runtime

So that 'dpkg' instance appears to be running inside the chroot environment and 
is 
listing x32 installed packages.

Although I did get this warning:

  ERROR: ld.so: object 'libfakechroot.so' from LD_PRELOAD cannot be preloaded 
(cannot open shared object file): ignored.

Even with that warning, is still still a sufficiently complex test of x32 
syscall 
code paths?

BTW., "fakechroot /usr/sbin/chroot ." crashes instead of giving me a bash shell.

Thanks,

Ingo


Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-20 Thread Ingo Molnar

* Al Viro  wrote:

> > For example this attempt at creating a new system call:
> > 
> >   SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)
> > 
> > ... would translate into something like:
> > 
> > .name = "moron", .pattern = "WWW", .type = "int",.size = 4,
> > .name = NULL,  .type = "loff_t", .size = 8,
> > .name = NULL,  .type = "size_t", .size = 4,
> > .name = NULL,  .type = NULL, .size = 0, /* 
> > end of parameter list */
> > 
> > i.e. "WDW". The build-time constraint checker could then warn about:
> > 
> >   # error: System call "moron" uses invalid 'WWW' argument mapping for a 
> > 'WDW' sequence
> >   #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' 
> > instead
> 
> ... if you do 32bit build.

Yeah - but the checking tool could do a 32-bit sizing of the types and thus the 
checks would work on all arches and on all bitness settings.

I don't think doing part of this in CPP is a good idea:

 - It won't be able to do the full range of checks

 - Wrappers should IMHO be trivial and open coded as much as possible - not 
hidden
   inside several layers of macros.

 - There should be a penalty for newly introduced, badly designed system call
   ABIs, while most CPP variants I can think of will just make bad but solvable 
   decisions palatable, AFAICS.

I.e. I think the way out of this would be two steps:

 1) for new system calls: hard-enforce the highest quality at the development
stage and hard-reject crap. No new 6-parameter system calls or badly ordered
arguments. The tool would also check new extensions to existing system 
calls, 
i.e. no more "add a crappy 4th argument to an existing system call that 
works 
on x86 but hurts MIPS".

 2) for old legacies: cleanly open code all our existing legacies and weird
wrappers. No new muck will be added to it so the line count does not matter.

... is there anything I'm missing?

Thanks,

Ingo


Re: [GIT PULL 00/14] perf/core improvements and fixes

2018-03-19 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, this has those 31 patches that were
> blocked due to some problems (author not being the fist S-o-B, build
> broken on ppc), those issues should all be fixed and then we have 14
> patches more, described in the signed tag.
> 
> Regards,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 10f354a36f9a9aa1b8bffe0abc1cd43822a85bcd:
> 
>   perf test: Fix exit code for record+probe_libc_inet_pton.sh (2018-03-16 
> 13:56:31 -0300)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.17-20180319
> 
> for you to fetch changes up to 1cd618838b9703eabe4a75badf433382b12f6bef:
> 
>   perf tests bp_account: Fix build with clang-6 (2018-03-19 13:51:54 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> - Fixes for problems experienced with new gcc 8 warnings, that treated
>   as errors, broke the build, related to snprintf and casting issues.
>   (Arnaldo Carvalho de Melo, Jiri Olsa, Josh Poinboeuf)
> 
> - Fix build of new breakpoint 'perf test' entry with clang < 6, noticed
>   on fedora 25, 26 and 27 (Arnaldo Carvalho de Melo)
> 
> - Workaround problem with symbol resolution in 'perf annotate', using
>   the symbol name already present in the objdump output (Arnaldo Carvalho de 
> Melo)
> 
> - Document 'perf top --ignore-vmlinux' (Arnaldo Carvalho de Melo)
> 
> - Fix out of bounds access on array fd when cnt is 100 in one of the
>   'perf test' entries, detected using 'cpptest' (Colin Ian King)
> 
> - Add support for the forced leader feature, i.e. 'perf report --group'
>   for a group of events not really grouped when scheduled (without using
>   {} to enclose the list of events in the command line) in pipe mode,
>   e.g.:
> 
>   $ perf record -e cycles,instructions -o - kill | perf report --group -i -
> 
> - Use right type to access array elements in 'perf probe' (Masami Hiramatsu)
> 
> - Update POWER9 vendor events (those described in JSON format) (Sukadev 
> Bhattiprolu)
> 
> - Discard head in overwrite_rb_find_range() (Yisheng Xie)
> 
> - Avoid setting 'quiet' to 'true' unnecessarily (Yisheng Xie)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (4):
>   perf annotate: Use asprintf when formatting objdump command line
>   perf top: Document --ignore-vmlinux
>   perf annotate: Use ops->target.name when available for unresolved call 
> targets
>   perf tests bp_account: Fix build with clang-6
> 
> Colin Ian King (1):
>   perf tests: Fix out of bounds access on array fd when cnt is 100
> 
> Jiri Olsa (4):
>   perf record: Synthesize features before events in pipe mode
>   perf report: Support forced leader feature in pipe mode
>   perf tools: Fix snprint warnings for gcc 8
>   perf tools: Fix python extension build for gcc 8
> 
> Josh Poimboeuf (1):
>   objtool, perf: Fix GCC 8 -Wrestrict error
> 
> Masami Hiramatsu (1):
>   perf probe: Use right type to access array elements
> 
> Sukadev Bhattiprolu (1):
>   perf vendor events: Update POWER9 events
> 
> Yisheng Xie (2):
>   perf mmap: Discard head in overwrite_rb_find_range()
>   perf debug: Avoid setting 'quiet' to 'true' unnecessarily
> 
>  tools/lib/str_error_r.c|   2 +-
>  tools/perf/Documentation/perf-top.txt  |   3 +
>  tools/perf/builtin-record.c|  18 +-
>  tools/perf/builtin-report.c|  57 +++--
>  tools/perf/builtin-script.c|  22 +-
>  .../perf/pmu-events/arch/powerpc/power9/cache.json |  25 ---
>  .../pmu-events/arch/powerpc/power9/frontend.json   |  10 -
>  .../pmu-events/arch/powerpc/power9/marked.json |   5 -
>  .../pmu-events/arch/powerpc/power9/memory.json |   5 -
>  .../perf/pmu-events/arch/powerpc/power9/other.json | 241 
> ++---
>  .../pmu-events/arch/powerpc/power9/pipeline.json   |  50 ++---
>  tools/perf/pmu-events/arch/powerpc/power9/pmc.json |   5 -
>  .../arch/powerpc/power9/translation.json   |  10 +-
>  tools/perf/tests/attr.c|   4 +-
>  tools/perf/tests/bp_account.c  |  10 +-
>  tools/perf/tests/mem.c |   2 +-
>  tools/perf/tests/pmu.c |   2 +-
>  tools/perf/util/annotate.c |  20 +-
>  tools/perf/util/cgroup.c   |   2 +-
>  tools/perf/util/debug.c|   1 -
>  tools/perf/util/header.c   |  11 +-
>  tools/perf/util/mmap.c |  15 +-
>  tools/perf/util/parse-events.c |   4 +-
>  

Re: [RFC PATCH 4/6] mm: provide generic compat_sys_readahead() implementation

2018-03-19 Thread Ingo Molnar

* Al Viro  wrote:

> On Sun, Mar 18, 2018 at 06:18:48PM +, Al Viro wrote:
> 
> > I'd done some digging in that area, will find the notes and post.
> 
> OK, found:

Very nice writeup - IMHO this should go into Documentation/!

> OTOH, consider arm.  There we have
>   * r0, r1, r2, r3, [sp,#8], [sp,#12], [sp,#16]... is the sequence
> of objects used to pass arguments
>   * 32bit and less - pick the next available slot
>   * 64bit - skip a slot if we'd already taken an odd number, then use
> the next two slots for lower and upper 32 bits of the argument.
> 
> So our classes take
> simple n-argument:0 to 6 slots
> WD4 slots
> DWW   4 slots
> WDW   5 slots
> WWDD  6 slots
> WDWW  5 slots
> WWWD  6 slots
> WWDWW 6 slots
> WDDW  7 slots (!)  Also , , !@#!@#!@#!# and other nice
> and well-deserved comments from arch maintainers, some of them even printable:
> /* It would be nice if people remember that not all the world's an i386
>when they introduce new system calls */
> SYSCALL_DEFINE4(sync_file_range2, int, fd, unsigned int, flags,
>  loff_t, offset, loff_t, nbytes)

Such idiosyncratic platform quirks that have an impact on generic code should 
be 
as self-maintaining as possible: i.e. there should be a build time warning even 
on 
x86 if someone introduces a new, suboptimally packed system call.

Otherwise we'll have such incidents again and again as new system calls get 
added.

> [snip the preprocessor horrors - the sketches I've got there are downright 
> obscene]

I still think we should consider creating a generic facility and a tool: which 
would immediately and automatically add new system calls to *every* 
architecture - 
or which would initially at least check these syscall ABI constraints.

I.e. this would start with a new generic kernel facility that warns about 
suboptimal new system call argument layouts on every architecture, not just on 
the 
affected ones.

That's a significant undertaking but should be possible to do.

Once such a facility is in place all the existing old mess is still a PITA, but 
should be manageable eventually - as no new mess is added to it.

IMHO that's the only thing that could break the somewhat deadly current dynamic 
of 
system call mappings mess. Complaining about people not knowing about quirks 
won't 
help.

One way to implement this would be to put the argument chain types (string) and 
sizes (int) into a special debug section which isn't included in the final 
kernel 
image but which can be checked at link time.

For example this attempt at creating a new system call:

  SYSCALL_DEFINE3(moron, int, fd, loff_t, offset, size_t, count)

... would translate into something like:

.name = "moron", .pattern = "WWW", .type = "int",.size = 4,
.name = NULL,  .type = "loff_t", .size = 8,
.name = NULL,  .type = "size_t", .size = 4,
.name = NULL,  .type = NULL, .size = 0, /* 
end of parameter list */

i.e. "WDW". The build-time constraint checker could then warn about:

  # error: System call "moron" uses invalid 'WWW' argument mapping for a 'WDW' 
sequence
  #please avoid long-long arguments or use 'SYSCALL_DEFINE3_WDW()' 
instead

Each architecture can provide its own syscall parameter checking logic. Both 
'stack boundary' and parameter packing rules would be straightforward to 
express 
if we had such a data structure.

Also note that this tool could also check for optimum packing, i.e. if the new 
system call is defined as:

  SYSCALL_DEFINE3_WDW(moron, int, fd, loff_t, offset, size_t, count)

... would translate to something like:

.name = "moron", .pattern = "WDW", .type = "int",.size = 4,
.name = NULL,  .type = "loff_t", .size = 8,
.name = NULL,  .type = "size_t", .size = 4,
.name = NULL,  .type = NULL, .size = 0, /* 
end of parameter list */

where the tool would print out this error:

  # error: System call "moron" uses suboptimal 'WDW' argument mapping instead 
of 'WWD'

there would be a whitelist of existing system calls that are already using an 
suboptimal argument order - but the warnings/errors would trigger for all new 
system calls.

But adding non-straight-mapped system calls would be the exception in any case.

Such tooling could also do other things, such as limit the C types used for 
system 
call defines to a well-chosen set of ABI-safe types, such as:

  3  key_t
  3  uint32_t
  4  aio_context_t
  4  mqd_t
  4  timer_t
 10  clockid_t
 10  gid_t
 10  loff_t
 10  long
 10  old_gid_t
 10  old_uid_t
 10  umode_t
 11  uid_t
 31  pid_t
 34  size_t
 69  unsigned int

Re: [PATCH] x86, powerpc : pkey-mprotect must allow pkey-0

2018-03-09 Thread Ingo Molnar

* Ram Pai <linux...@us.ibm.com> wrote:

> Once an address range is associated with an allocated pkey, it cannot be
> reverted back to key-0. There is no valid reason for the above behavior.  On
> the contrary applications need the ability to do so.
> 
> The patch relaxes the restriction.
> 
> Tested on powerpc and x86_64.
> 
> cc: Dave Hansen <dave.han...@intel.com>
> cc: Michael Ellermen <m...@ellerman.id.au>
> cc: Ingo Molnar <mi...@kernel.org>
> Signed-off-by: Ram Pai <linux...@us.ibm.com>
> ---
>  arch/powerpc/include/asm/pkeys.h | 19 ++-
>  arch/x86/include/asm/pkeys.h |  5 +++--
>  2 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pkeys.h 
> b/arch/powerpc/include/asm/pkeys.h
> index 0409c80..3e8abe4 100644
> --- a/arch/powerpc/include/asm/pkeys.h
> +++ b/arch/powerpc/include/asm/pkeys.h
> @@ -101,10 +101,18 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
>  
>  static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
>  {
> - /* A reserved key is never considered as 'explicitly allocated' */
> - return ((pkey < arch_max_pkey()) &&
> - !__mm_pkey_is_reserved(pkey) &&
> - __mm_pkey_is_allocated(mm, pkey));
> + /* pkey 0 is allocated by default. */
> + if (!pkey)
> +return true;
> +
> + if (pkey < 0 || pkey >= arch_max_pkey())
> +return false;
> +
> + /* reserved keys are never allocated. */
> + if (__mm_pkey_is_reserved(pkey))
> +return false;

Please capitalize in comments consistently, i.e.:

/* Reserved keys are never allocated: */

> +
> + return(__mm_pkey_is_allocated(mm, pkey));

'return' is not a function.

Thanks,

Ingo


Re: [PATCH v12 01/22] selftests/x86: Move protecton key selftest to arch neutral directory

2018-02-21 Thread Ingo Molnar

* Ram Pai <linux...@us.ibm.com> wrote:

> cc: Dave Hansen <dave.han...@intel.com>
> cc: Florian Weimer <fwei...@redhat.com>
> Signed-off-by: Ram Pai <linux...@us.ibm.com>
> ---
>  tools/testing/selftests/vm/Makefile   |1 +
>  tools/testing/selftests/vm/pkey-helpers.h |  223 
>  tools/testing/selftests/vm/protection_keys.c  | 1407 
> +
>  tools/testing/selftests/x86/Makefile  |2 +-
>  tools/testing/selftests/x86/pkey-helpers.h|  223 
>  tools/testing/selftests/x86/protection_keys.c | 1407 
> -
>  6 files changed, 1632 insertions(+), 1631 deletions(-)
>  create mode 100644 tools/testing/selftests/vm/pkey-helpers.h
>  create mode 100644 tools/testing/selftests/vm/protection_keys.c
>  delete mode 100644 tools/testing/selftests/x86/pkey-helpers.h
>  delete mode 100644 tools/testing/selftests/x86/protection_keys.c

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [GIT PULL 00/41] perf/core improvements and fixes

2018-02-17 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, this is on top of tip/perf/urgent.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 297f9233b53a08fd457815e19f1d6f2c3389857b:
> 
>   kprobes: Propagate error from disarm_kprobe_ftrace() (2018-02-16 09:12:58 
> +0100)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.17-20180216
> 
> for you to fetch changes up to 21316ac6803d4a1aadd74b896db8d60a92cd1140:
> 
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe 
> (2018-02-16 15:31:12 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> - Fix wrong jump arrow in systems with branch records with cycles,
>   i.e. Intel's >= Skylake (Jin Yao)
> 
> - Fix 'perf record --per-thread' problem introduced when
>   implementing 'perf stat --per-thread (Jin Yao)
> 
> - Use arch__compare_symbol_names() to fix 'perf test vmlinux',
>   that was using strcmp(symbol names) while the dso routines
>   doing symbol lookups used the arch overridable one, making
>   this test fail in architectures that overrided that function
>   with something other than strcmp() (Jiri Olsa)
> 
> - Add 'perf script --show-round-event' to display
>   PERF_RECORD_FINISHED_ROUND entries (Jiri Olsa)
> 
> - Fix dwarf unwind for stripped binaries in 'perf test' (Jiri Olsa)
> 
> - Use ordered_events for 'perf report --tasks', otherwise we may get
>   artifacts when PERF_RECORD_FORK gets processed before PERF_RECORD_COMM
>   (when they got recorded in different CPUs) (Jiri Olsa)
> 
> - Add support to display group output for non group events, i.e.
>   now when one uses 'perf report --group' on a perf.data file
>   recorded without explicitly grouping events with {} (e.g.
>   "perf record -e '{cycles,instructions}'" get the same output
>   that would produce, i.e. see all those non-grouped events in
>   multiple columns, at the same time (Jiri Olsa)
> 
> - Skip non-address kallsyms entries, e.g. '(null)' for !root (Jiri Olsa)
> 
> - Kernel maps fixes wrt perf.data(report) versus live system (top)
>   (Jiri Olsa)
> 
> - Fix memory corruption when using 'perf record -j call -g -a '
>   followed by 'perf report --branch-history' (Jiri Olsa)
> 
> - ARM CoreSight fixes (Mathieu Poirier)
> 
> - Add inject capability for CoreSight Traces (Robert Waker)
> 
> - Update documentation for use of 'perf' + ARM CoreSight (Robert Walker)
> 
> - Man pages fixes (Sangwon Hong, Jaecheol Shin)
> 
> - Fix some 'perf test' cases on s/390 and x86_64 (some backtraces
>   changed with a glibc update) (Thomas Richter)
> 
> - Add detailed CPUID info in the 'perf.data' headers for s/390 to
>   then use it in 'perf annotate' (Thomas Richter)
> 
> - Add '--interval-count N' to 'perf stat', to use with -I, i.e.
>   'perf stat -I 1000 --interval-count 2' will show stats every
>1000ms, two times (yuzhoujian)
> 
> - Add 'perf stat --timeout Nms', that will run for that many
>   milliseconds and then stop, printing the counters (yuzhoujian)
> 
> - Fix description for 'perf report --mem-modex (Andi Kleen)
> 
> - Use a wildcard to remove the vfs_getname probe in the
>   'perf test' shell based test cases (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andi Kleen (1):
>   perf report: Fix description for --mem-mode
> 
> Arnaldo Carvalho de Melo (1):
>   perf tests shell lib: Use a wildcard to remove the vfs_getname probe
> 
> Jaecheol Shin (1):
>   perf annotate: Add missing arguments in Man page
> 
> Jin Yao (2):
>   perf tools: Use target->per_thread and target->system_wide flags
>   perf report: Fix wrong jump arrow
> 
> Jiri Olsa (18):
>   perf record: Put new line after target override warning
>   perf script: Add --show-round-event to display 
> PERF_RECORD_FINISHED_ROUND
>   tools lib api fs: Add filename__read_xll function
>   tools lib api fs: Add sysfs__read_xll function
>   perf tests: Fix dwarf unwind for stripped binaries
>   perf tools: Fix comment for sort__* compare functions
>   perf report: Ask for ordered events for --tasks option
>   perf report: Add support to display group output for non group events
>   tools lib symbol: Skip non-address kallsyms line
>   perf symbols: Check if we read regular file in dso__load()
>   perf machine: Free root_dir in machine__init() error path
>   perf machine: Move kernel mmap name into struct machine
>   perf machine: Generalize machine__set_kernel_mmap()
>   perf machine: Don't search for active kernel start in 
> __machine__create_kernel_maps
>   perf machine: Remove machine__load_kallsyms()
>   perf tools: Do not create kernel maps in 

Re: [PATCH] headers: untangle kmemleak.h from mm.h

2018-02-11 Thread Ingo Molnar

* Randy Dunlap <rdun...@infradead.org> wrote:

> From: Randy Dunlap <rdun...@infradead.org>
> 
> Currently  #includes  for no obvious
> reason. It looks like it's only a convenience, so remove kmemleak.h
> from slab.h and add  to any users of kmemleak_*
> that don't already #include it.
> Also remove  from source files that do not use it.
> 
> This is tested on i386 allmodconfig and x86_64 allmodconfig. It
> would be good to run it through the 0day bot for other $ARCHes.
> I have neither the horsepower nor the storage space for the other
> $ARCHes.
> 
> [slab.h is the second most used header file after module.h; kernel.h
> is right there with slab.h. There could be some minor error in the
> counting due to some #includes having comments after them and I
> didn't combine all of those.]
> 
> This is Lingchi patch #1 (death by a thousand cuts, applied to kernel
> header files).
> 
> Signed-off-by: Randy Dunlap <rdun...@infradead.org>

Nice find:

Reviewed-by: Ingo Molnar <mi...@kernel.org>

I agree that it needs to go through 0-day to find any hidden dependencies we 
might 
have grown due to this.

Thanks,

Ingo


Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()

2018-02-05 Thread Ingo Molnar

* Mathieu Desnoyers  wrote:

>  
> +config ARCH_HAS_MEMBARRIER_HOOKS
> + bool

Yeah, so I have renamed this to ARCH_HAS_MEMBARRIER_CALLBACKS, and propagated 
it 
through the rest of the patches. "Callback" is the canonical name, and I also 
cringe every time I see 'hook'.

Please let me know if there are any big objections against this minor cleanup.

Thanks,

Ingo


Re: [PATCH v11 0/3] mm, x86, powerpc: Enhancements to Memory Protection Keys.

2018-01-30 Thread Ingo Molnar

* Ram Pai <linux...@us.ibm.com> wrote:

> This patch series provides arch-neutral enhancements to
> enable memory-keys on new architecutes, and the corresponding
> changes in x86 and powerpc specific code to support that.
> 
> a) Provides ability to support upto 32 keys.  PowerPC
>   can handle 32 keys and hence needs this.
> 
> b) Arch-neutral code; and not the arch-specific code,
>determines the format of the string, that displays the key
>for each vma in smaps.
> 
> PowerPC implementation of memory-keys is now in powerpc/next tree.
> https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h=next=92e3da3cf193fd27996909956c12a23c0333da44

All three patches look sane to me. If you would like to carry these generic 
bits 
in the PowerPC tree as well then:

  Reviewed-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [PATCH v6 2/8] module: use relative references for __ksymtab entries

2017-12-28 Thread Ingo Molnar

* Ard Biesheuvel  wrote:

> Annoyingly, we need this because there is a single instance of a
> special section that ends up in the EFI stub code: we build lib/sort.c
> again as a EFI libstub object, and given that sort() is exported, we
> end up with a ksymtab section in the EFI stub. The sort() thing has
> caused issues before [0], so perhaps I should just clone sort.c into
> drivers/firmware/efi/libstub and get rid of that hack.
> 
> [0] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=29f9007b3182ab3f328a31da13e6b1c9072f7a95

If the root problem is early bootstrap code randomly using generic facility 
that 
isn't __init, then we should definitely improve tooling to at least detect 
these 
problems.

As bootstrap code gets improved (KASLR, more complex decompression, etc. etc.) 
we 
keep using new bits of generic facilities...

So this should definitely not be hidden by open coding that function (which has 
various other disadvantages as well), but should be turned from silent breakage 
either into non-breakage (and do so not only for sort() but for other generic 
functions as well), or should be turned into a build failure.

Thanks,

Ingo


Re: [PATCH 0/4] PCI: Cleanup unused stuff

2017-10-07 Thread Ingo Molnar

* Bjorn Helgaas <helg...@kernel.org> wrote:

> Sorry for the long cc list.  These are pretty trivial; they just remove
> some unnecessary declarations across several arches.
> 
> ---
> 
> Bjorn Helgaas (4):
>   PCI: Remove redundant pcibios_set_master() declarations
>   PCI: Remove redundant pci_dev, pci_bus, resource declarations
>   PCI: Remove unused declarations
>   alpha/PCI: Make pdev_save_srm_config() static
> 
> 
>  arch/alpha/include/asm/pci.h|5 -
>  arch/alpha/kernel/pci.c |   11 ++-
>  arch/alpha/kernel/pci_impl.h|8 
>  arch/cris/include/asm/pci.h |9 -
>  arch/frv/include/asm/pci.h  |4 
>  arch/ia64/include/asm/pci.h |4 
>  arch/mips/include/asm/pci.h |4 
>  arch/mn10300/include/asm/pci.h  |4 
>  arch/mn10300/unit-asb2305/pci-asb2305.h |3 ---
>  arch/parisc/include/asm/pci.h   |8 
>  arch/powerpc/include/asm/pci.h  |2 --
>  arch/sh/include/asm/pci.h   |4 
>  arch/sparc/include/asm/pci_32.h |2 --
>  arch/x86/include/asm/pci.h  |2 --
>  arch/xtensa/include/asm/pci.h   |2 --
>  15 files changed, 10 insertions(+), 62 deletions(-)

Nice cleanups! For the whole series:

  Reviewed-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo



Re: [GIT PULL 00/13] perf/core improvements and fixes

2017-09-04 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 1b2f76d77a277bb70d38ad0991ed7f16bbc115a9:
> 
>   Merge tag 'perf-core-for-mingo-4.14-20170829' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-08-29 23:13:56 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.14-20170901
> 
> for you to fetch changes up to eba9fac017617e685d648339e29a1453a30cb065:
> 
>   perf annotate browser: Help for cycling thru hottest instructions with 
> TAB/shift+TAB (2017-09-01 14:55:40 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> - Support syscall name glob matching in 'perf trace' (Arnaldo Carvalho de 
> Melo)
> 
>   e.g.:
> 
># perf trace -e pkey_*
>32.784 (0.006 ms): pkey/16018 pkey_alloc(init_val: DISABLE_WRITE) = -1 
> EINVAL Invalid argument
>32.795 (0.004 ms): pkey/16018 pkey_mprotect(start: 0x7f380d0a6000, len: 
> 4096, prot: READ|WRITE, pkey: -1) = 0
>32.801 (0.002 ms): pkey/16018 pkey_free(pkey: -1) = -1 
> EINVAL Invalid argument
>^C#
> 
> - Do not auto merge counts for explicitely specified events in
>   'perf stat' (Arnaldo Carvalho de Melo)
> 
> - Fix syntax in documentation of .perfconfig intel-pt option (Jack Henschel)
> 
> - Calculate the average cycles of iterations for loops detected by the
>   branch history support in 'perf report' (Jin Yao)
> 
> - Support PERF_SAMPLE_PHYS_ADDR as a sort key "phys_daddr" in the 'script', 
> 'mem',
>   'top' and 'report'. Also add a test entry for it in 'perf test' (Kan Liang)
> 
> - Fix 'Object code reading' 'perf test' entry in PowerPC (Ravi Bangoria)
> 
> - Remove some duplicate Power9 duplicate vendor events (described in JSON
>   files) (Sukadev Bhattiprolu)
> 
> - Add help entry in the TUI annotate browser about cycling thru hottest
>   instructions with TAB/shift+TAB (Arnaldo Carvalho de Melo)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (4):
>   perf syscalltbl: Support glob matching on syscall names
>   perf trace: Support syscall name globbing
>   perf stat: Only auto-merge events that are PMU aliases
>   perf annotate browser: Help for cycling thru hottest instructions with 
> TAB/shift+TAB
> 
> Jack Henschel (1):
>   perf intel-pt: Fix syntax in documentation of config option
> 
> Jin Yao (1):
>   perf report: Calculate the average cycles of iterations
> 
> Kan Liang (5):
>   perf tools: Support new sample type for physical address
>   perf sort: Add sort option for physical address
>   perf mem: Support physical address
>   perf script: Support physical address
>   perf test: Add test case for PERF_SAMPLE_PHYS_ADDR
> 
> Ravi Bangoria (1):
>   perf test powerpc: Fix 'Object code reading' test
> 
> Sukadev Bhattiprolu (1):
>   perf vendor events powerpc: Remove duplicate events
> 
>  tools/include/uapi/linux/perf_event.h  |   4 +-
>  tools/perf/Documentation/intel-pt.txt  |   2 +-
>  tools/perf/Documentation/perf-mem.txt  |   4 +
>  tools/perf/Documentation/perf-record.txt   |   5 +-
>  tools/perf/Documentation/perf-report.txt   |   1 +
>  tools/perf/Documentation/perf-script.txt   |   2 +-
>  tools/perf/Documentation/perf-trace.txt|   2 +-
>  tools/perf/builtin-mem.c   |  97 -
>  tools/perf/builtin-record.c|   2 +
>  tools/perf/builtin-script.c|  15 ++-
>  tools/perf/builtin-stat.c  |   2 +-
>  tools/perf/builtin-trace.c |  39 ++-
>  tools/perf/perf.h  |   1 +
>  .../pmu-events/arch/powerpc/power9/frontend.json   |   7 +-
>  .../perf/pmu-events/arch/powerpc/power9/other.json | 120 
> -
>  .../pmu-events/arch/powerpc/power9/pipeline.json   |   7 +-
>  tools/perf/pmu-events/arch/powerpc/power9/pmc.json |   7 +-
>  tools/perf/tests/code-reading.c|   5 +
>  tools/perf/tests/sample-parsing.c  |   6 +-
>  tools/perf/ui/browsers/annotate.c  |   3 +-
>  tools/perf/ui/browsers/hists.c |   8 +-
>  tools/perf/ui/stdio/hist.c |  10 +-
>  tools/perf/util/callchain.c|  49 -
>  tools/perf/util/callchain.h|   9 +-
>  tools/perf/util/event.h|   1 +
>  tools/perf/util/evsel.c|  19 +++-
>  tools/perf/util/evsel.h|   1 +
>  

Re: [GIT PULL 00/19] perf/core improvements and fixes

2017-08-14 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> 
> The following changes since commit 82119cbe8e1e32cc2a941393e59816e731681310:
> 
>   Merge tag 'perf-core-for-mingo-4.14-20170801' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-08-10 17:07:02 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.14-20170814
> 
> for you to fetch changes up to 8fc375d7d36c72b4c2d55f5c24be022a939295d4:
> 
>   perf test shell: Add uprobes + backtrace ping test (2017-08-11 16:18:49 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> Infrastructure:
> 
> - Do not consider empty files as valid srclines (Milian Wolff)
> 
> - Fix wrong size in perf_record_mmap for last kernel module,
>   which resulted in erroneous symbol resolution in at least s390x (Thomas 
> Richter)
> 
> - Add missing newline to expr parser error messages (Andi Kleen)
> 
> - Fix saved values rbtree lookup in 'perf stat' (Andi Kleen)
> 
> - Add support for shell based tests in 'perf test', add a few that
>   run 'perf probe', 'perf trace', using kprobes, uprobes to check
>   the output of those tools and the effects on the system, checking,
>   for instance, DWARF backtraces from uprobes (Arnaldo Carvalho de Melo)
> 
> Arch specific:
> 
> - Add ppc64le to audit uname list in the python scripting support (Naveen N. 
> Rao)
> 
> - Update POWER9 vendor events tables (Sukadev Bhattiprolu)
> 
> - Fix module symbol adjustment for s390x (Thomas Richter)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andi Kleen (2):
>   perf stat: Fix saved values rbtree lookup
>   perf tools: Add missing newline to expr parser error messages
> 
> Arnaldo Carvalho de Melo (10):
>   perf test: Make 'list' subcommand match main 'perf test' 
> numbering/matching
>   perf test: Add 'struct test *' to the test functions
>   perf test: Add infrastructure to run shell based tests
>   perf test: Make 'list' use same filtering code as main 'perf test'
>   perf test shell: Add 'probe_vfs_getname' shell test
>   perf test shell: Install shell tests
>   perf test shell: Move vfs_getname probe function to lib
>   perf test shell: Add test using probe:vfs_getname and verifying results
>   perf test shell: Add test using vfs_getname + 'perf trace'
>   perf test shell: Add uprobes + backtrace ping test
> 
> Milian Wolff (2):
>   perf util: Take elf_name as const string in dso__demangle_sym
>   perf srcline: Do not consider empty files as valid srclines
> 
> Naveen N. Rao (1):
>   perf scripting python: Add ppc64le to audit uname list
> 
> Sukadev Bhattiprolu (2):
>   perf vendor events powerpc: remove suffix in mapfile
>   perf vendor events powerpc: Update POWER9 events
> 
> Thomas Richter (2):
>   perf record: Fix wrong size in perf_record_mmap for last kernel module
>   perf report: Fix module symbol adjustment for s390x
> 
>  tools/perf/Makefile.perf   |6 +-
>  tools/perf/arch/s390/util/sym-handling.c   |7 +
>  tools/perf/arch/x86/include/arch-tests.h   |   11 +-
>  tools/perf/arch/x86/tests/insn-x86.c   |2 +-
>  tools/perf/arch/x86/tests/intel-cqm.c  |2 +-
>  tools/perf/arch/x86/tests/perf-time-to-tsc.c   |2 +-
>  tools/perf/arch/x86/tests/rdpmc.c  |2 +-
>  tools/perf/pmu-events/arch/powerpc/mapfile.csv |   20 +-
>  .../perf/pmu-events/arch/powerpc/power9/cache.json |  191 +-
>  .../arch/powerpc/power9/floating-point.json|   42 +-
>  .../pmu-events/arch/powerpc/power9/frontend.json   |  517 ++--
>  .../pmu-events/arch/powerpc/power9/marked.json |  905 +++
>  .../pmu-events/arch/powerpc/power9/memory.json |  178 +-
>  .../perf/pmu-events/arch/powerpc/power9/other.json | 2768 
> 
>  .../pmu-events/arch/powerpc/power9/pipeline.json   |  779 +++---
>  tools/perf/pmu-events/arch/powerpc/power9/pmc.json |  167 +-
>  .../arch/powerpc/power9/translation.json   |  314 +--
>  .../python/Perf-Trace-Util/lib/Perf/Trace/Util.py  |1 +
>  tools/perf/tests/attr.c|2 +-
>  tools/perf/tests/backward-ring-buffer.c|2 +-
>  tools/perf/tests/bitmap.c  |2 +-
>  tools/perf/tests/bp_signal.c   |2 +-
>  tools/perf/tests/bp_signal_overflow.c  |2 +-
>  tools/perf/tests/bpf.c |4 +-
>  tools/perf/tests/builtin-test.c|  184 +-
>  tools/perf/tests/clang.c   |4 +-
>  

Re: [GIT PULL 00/25] perf/core improvements and fixes

2017-06-21 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 007b811b4041989ec2dc91b9614aa2c41332723e:
> 
>   Merge tag 'perf-core-for-mingo-4.13-20170719' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-06-20 10:49:08 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.13-20170621
> 
> for you to fetch changes up to 701516ae3dec801084bc913d21e03fce15c61a0b:
> 
>   perf script: Fix message because field list option is -F not -f (2017-06-21 
> 11:35:53 -0300)
> 
> 
> perf/core improvements ad fixes:
> 
> New features:
> 
> - Add support to measure SMI cost in 'perf stat' (Kan Liang)
> 
> - Add support for unwinding callchains in powerpc with libdw (Paolo Bonzini)
> 
> Fixes:
> 
> - Fix message: cpu list option is -C not -c (Adrian Hunter)
> 
> - Fix 'perf script' message: field list option is -F not -f (Adrian Hunter)
> 
> - Intel PT fixes: (Adrian Hunter)
> 
>   o Fix missing stack clear
>   o Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
>   o Fix last_ip usage
>   o Ensure never to set 'last_ip' when packet 'count' is zero
>   o Clear FUP flag on error
>   o Fix transactions_sample_type
> 
> Infrastructure:
> 
> - Intel PT cleanups/refactorings (Adrian Hunter)
> 
>   o Use FUP always when scanning for an IP
>   o Add missing __fallthrough
>   o Remove redundant initial_skip checks
>   o Allow decoding with branch tracing disabled
>   o Add default config for pass-through branch enable
>   o Add documentation for new config terms
>   o Add decoder support for ptwrite and power event packets
>   o Add reserved byte to CBR packet payload
>   o Add decoder support for CBR events
> 
> - Move  find_process() to the only place that uses it, skimming some
>   more fat from util.[ch] (Arnaldo Carvalho de Melo)
> 
> - Do parameter validation earlier on fetch_kernel_version() (Arnaldo Carvalho 
> de Melo)
> 
> - Remove unused _ALL_SOURCE define (Arnaldo Carvalho de Melo)
> 
> - Add sysfs__write_int function (Kan Liang)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (19):
>   perf intel-pt: Move decoder error setting into one condition
>   perf intel-pt: Improve sample timestamp
>   perf intel-pt: Fix missing stack clear
>   perf intel-pt: Ensure IP is zero when state is INTEL_PT_STATE_NO_IP
>   perf intel-pt: Fix last_ip usage
>   perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
>   perf intel-pt: Use FUP always when scanning for an IP
>   perf intel-pt: Clear FUP flag on error
>   perf intel-pt: Add missing __fallthrough
>   perf intel-pt: Allow decoding with branch tracing disabled
>   perf intel-pt: Add default config for pass-through branch enable
>   perf intel-pt: Add documentation for new config terms
>   perf intel-pt: Add decoder support for ptwrite and power event packets
>   perf intel-pt: Add reserved byte to CBR packet payload
>   perf intel-pt: Add decoder support for CBR events
>   perf intel-pt: Remove redundant initial_skip checks
>   perf intel-pt: Fix transactions_sample_type
>   perf tools: Fix message because cpu list option is -C not -c
>   perf script: Fix message because field list option is -F not -f
> 
> Arnaldo Carvalho de Melo (3):
>   perf evsel: Adopt find_process()
>   perf tools: Do parameter validation earlier on fetch_kernel_version()
>   perf tools: Remove unused _ALL_SOURCE define
> 
> Kan Liang (2):
>   tools lib api fs: Add sysfs__write_int function
>   perf stat: Add support to measure SMI cost
> 
> Paolo Bonzini (1):
>   perf unwind: Support for powerpc
> 
>  tools/lib/api/fs/fs.c  |  30 +++
>  tools/lib/api/fs/fs.h  |   4 +
>  tools/perf/Documentation/intel-pt.txt  |  36 +++
>  tools/perf/Documentation/perf-stat.txt |  14 +
>  tools/perf/Makefile.config |   2 +-
>  tools/perf/arch/powerpc/util/Build |   2 +
>  tools/perf/arch/powerpc/util/unwind-libdw.c|  73 ++
>  tools/perf/arch/x86/util/intel-pt.c|   5 +
>  tools/perf/builtin-script.c|   2 +-
>  tools/perf/builtin-stat.c  |  49 
>  tools/perf/util/evsel.c|  39 +++
>  .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 290 
> +++--
>  .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  13 +
>  .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 110 +++-
>  

Re: [PATCH 5/6] mm, x86: Add ARCH_HAS_ZONE_DEVICE

2017-05-23 Thread Ingo Molnar

* Oliver O'Halloran <ooh...@gmail.com> wrote:

> Currently ZONE_DEVICE depends on X86_64. This is fine for now, but it
> will get unwieldly as new platforms get ZONE_DEVICE support. Moving it
> to an arch selected Kconfig option to save us some trouble in the
> future.
> 
> Cc: x...@kernel.org
> Signed-off-by: Oliver O'Halloran <ooh...@gmail.com>

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [GIT PULL 0/6] perf/core improvements and fixes

2017-03-16 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit ffa86c2f1a8862cf58c873f6f14d4b2c3250fb48:
> 
>   Merge tag 'perf-core-for-mingo-4.12-20170314' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-03-15 19:27:27 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.12-20170316
> 
> for you to fetch changes up to 61f35d750683b21e9e3836e309195c79c1daed74:
> 
>   uprobes: Default UPROBES_EVENTS to Y (2017-03-16 12:42:02 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Add 'brstackinsn' field in 'perf script' to reuse the x86 instruction
>   decoder used in the Intel PT code to study hot paths to samples (Andi Kleen)
> 
> Kernel:
> 
> - Default UPROBES_EVENTS to Y (Alexei Starovoitov)
> 
> - Fix check for kretprobe offset within function entry (Naveen N. Rao)
> 
> Infrastructure:
> 
> - Introduce util func is_sdt_event() (Ravi Bangoria)
> 
> - Make perf_event__synthesize_mmap_events() scale on older kernels where
>   reading /proc/pid/maps is way slower than reading /proc/pid/task/pid/maps 
> (Stephane Eranian)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andi Kleen (1):
>   perf script: Add 'brstackinsn' for branch stacks
> 
> Arnaldo Carvalho de Melo (2):
>   tools headers: Sync {tools/,}arch/x86/include/asm/cpufeatures.h
>   uprobes: Default UPROBES_EVENTS to Y
> 
> Naveen N. Rao (1):
>   trace/kprobes: Fix check for kretprobe offset within function entry
> 
> Ravi Bangoria (1):
>   perf probe: Introduce util func is_sdt_event()
> 
> Stephane Eranian (1):
>   perf tools: Make perf_event__synthesize_mmap_events() scale
> 
>  include/linux/kprobes.h|   1 +
>  kernel/kprobes.c   |  40 ++--
>  kernel/trace/Kconfig   |   2 +-
>  kernel/trace/trace_kprobe.c|   2 +-
>  tools/arch/x86/include/asm/cpufeatures.h   |   5 +-
>  tools/perf/Documentation/perf-script.txt   |  13 +-
>  tools/perf/builtin-script.c| 264 
> -
>  tools/perf/util/Build  |   1 +
>  tools/perf/util/dump-insn.c|  14 ++
>  tools/perf/util/dump-insn.h|  22 ++
>  tools/perf/util/event.c|   4 +-
>  .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  24 ++
>  tools/perf/util/parse-events.h |  20 ++
>  tools/perf/util/probe-event.c  |   9 +-
>  14 files changed, 381 insertions(+), 40 deletions(-)
>  create mode 100644 tools/perf/util/dump-insn.c
>  create mode 100644 tools/perf/util/dump-insn.h

Pulled, thanks a lot Arnaldo!

Ingo


Re: [GIT PULL 00/19] perf/core improvements and fixes

2017-03-15 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 84e5b549214f2160c12318aac549de85f600c79a:
> 
>   Merge tag 'perf-core-for-mingo-4.11-20170306' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2017-03-07 08:14:14 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.12-20170314
> 
> for you to fetch changes up to 5f6bee34707973ea7879a7857fd63ddccc92fff3:
> 
>   kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL 
> (2017-03-14 15:17:40 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Add PERF_RECORD_NAMESPACES so that the kernel can record information
>   required to associate samples to namespaces, helping in container
>   problem characterization.
> 
>   Now the 'perf record has a --namespace' option to ask for such info,
>   and when present, it can be used, initially, via a new sort order,
>   'cgroup_id', allowing histogram entry bucketization by a (device, inode)
>   based cgroup identifier (Hari Bathini)
> 
> - Add --next option to 'perf sched timehist', showing what is the next
>   thread to run (Brendan Gregg)
> 
> Fixes:
> 
> - Fix segfault with basic block 'cycles' sort dimension (Changbin Du)
> 
> - Add c2c to command-list.txt, making it appear in the 'perf help'
>   output (Changbin Du)
> 
> - Fix zeroing of 'abs_path' variable in the perf hists browser switch
>   file code (Changbin Du)
> 
> - Hide tips messages when -q/--quiet is given to 'perf report' (Namhyung Kim)
> 
> Infrastructure:
> 
> - Use ref_reloc_sym + offset to setup kretprobes (Naveen Rao)
> 
> - Ignore generated files pmu-events/{jevents,pmu-events.c} for git (Changbin 
> Du)
> 
> Documentation:
> 
> - Document +field style argument support for --field option (Changbin Du)
> 
> - Clarify 'perf c2c --stats' help message (Namhyung Kim)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Brendan Gregg (1):
>   perf sched timehist: Add --next option
> 
> Changbin Du (5):
>   perf tools: Missing c2c command in command-list
>   perf tools: Ignore generated files pmu-events/{jevents,pmu-events.c} 
> for git
>   perf sort: Fix segfault with basic block 'cycles' sort dimension
>   perf report: Document +field style argument support for --field option
>   perf hists browser: Fix typo in function switch_data_file
> 
> Hari Bathini (5):
>   perf: Add PERF_RECORD_NAMESPACES to include namespaces related info
>   perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related 
> info
>   perf record: Synthesize namespace events for current processes
>   perf script: Add script print support for namespace events
>   perf tools: Add 'cgroup_id' sort order keyword
> 
> Namhyung Kim (3):
>   perf report: Hide tip message when -q option is given
>   perf c2c: Clarify help message of --stats option
>   perf c2c: Fix display bug when using pipe
> 
> Naveen N. Rao (5):
>   perf probe: Factor out the ftrace README scanning
>   perf kretprobes: Offset from reloc_sym if kernel supports it
>   perf powerpc: Choose local entry point with kretprobes
>   doc: trace/kprobes: add information about NOKPROBE_SYMBOL
>   kprobes: Convert kprobe_exceptions_notify to use NOKPROBE_SYMBOL
> 
>  Documentation/trace/kprobetrace.txt |   5 +-
>  include/linux/perf_event.h  |   2 +
>  include/uapi/linux/perf_event.h |  32 +-
>  kernel/events/core.c| 139 ++
>  kernel/fork.c   |   2 +
>  kernel/kprobes.c|   5 +-
>  kernel/nsproxy.c|   3 +
>  tools/include/uapi/linux/perf_event.h   |  32 +-
>  tools/perf/.gitignore   |   2 +
>  tools/perf/Documentation/perf-record.txt|   3 +
>  tools/perf/Documentation/perf-report.txt|   7 +-
>  tools/perf/Documentation/perf-sched.txt |   4 +
>  tools/perf/Documentation/perf-script.txt|   3 +
>  tools/perf/arch/powerpc/util/sym-handling.c |  14 ++-
>  tools/perf/builtin-annotate.c   |   1 +
>  tools/perf/builtin-c2c.c|   4 +-
>  tools/perf/builtin-diff.c   |   1 +
>  tools/perf/builtin-inject.c |  13 +++
>  tools/perf/builtin-kmem.c   |   1 +
>  tools/perf/builtin-kvm.c|   2 +
>  tools/perf/builtin-lock.c   |   1 +
>  tools/perf/builtin-mem.c|   1 +
>  tools/perf/builtin-record.c |  35 ++-
>  tools/perf/builtin-report.c   

Re: [PATCH v5 00/15] livepatch: hybrid consistency model

2017-03-07 Thread Ingo Molnar

* Josh Poimboeuf <jpoim...@redhat.com> wrote:

>  arch/Kconfig |   6 +
>  arch/powerpc/include/asm/thread_info.h   |   4 +-
>  arch/powerpc/kernel/signal.c |   4 +
>  arch/s390/include/asm/thread_info.h  |  24 +-
>  arch/s390/kernel/entry.S |  31 +-
>  arch/x86/Kconfig |   1 +
>  arch/x86/entry/common.c  |   9 +-
>  arch/x86/include/asm/thread_info.h   |  13 +-
>  arch/x86/include/asm/unwind.h|   6 +
>  arch/x86/kernel/stacktrace.c |  96 +++-
>  arch/x86/kernel/unwind_frame.c   |   2 +

for the x86 and scheduler changes:

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [GIT PULL 00/35] perf/core improvements and fixes

2017-03-06 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> From: Arnaldo Carvalho de Melo 
> 
> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit 9d020d33fc1b2faa0eb35859df1381ca5dc94ffe:
> 
>   Merge branch 'linus' into perf/urgent, to resolve conflict (2017-03-02 
> 08:05:45 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-4.11-20170306
> 
> for you to fetch changes up to 001916b94a04809a94abb07daba6f9ace01906ba:
> 
>   perf bench numa: Add more comment for -c option (2017-03-06 12:39:30 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Allow sorting by symbol_size in 'perf report' and 'perf top' (Charles 
> Baylis)
> 
>   E.g.:
> 
>   # perf report -s symbol_size,symbol
> 
>   Samples: 9K of event 'cycles:k', Event count (approx.): 2870461623
>   Overhead  Symbol size  Symbol
> 14.55%  326  [k] flush_tlb_mm_range
>  7.20% 1045  [k] filemap_map_pages
>  5.82%  124  [k] vma_interval_tree_insert
>  5.18% 2430  [k] unmap_page_range
>  2.57%  571  [k] vma_interval_tree_remove
>  1.94%  494  [k] page_add_file_rmap
>  1.82%  740  [k] page_remove_rmap
>  1.66% 1017  [k] release_pages
>  1.57% 1636  [k] update_blocked_averages
>  1.57%   76  [k] unlock_page
> 
> - Add support for -p/--pid, -a/--all-cpus and -C/--cpu in 'perf ftrace' 
> (Namhyung Kim)
> 
> Change in behaviour:
> 
> - Make system wide (-a) the default option if no target was specified and one
>   of following conditions is met:
> 
>   - No workload specified (current behaviour)
> 
>   - A workload is specified but all requested events are system wide ones,
> like uncore ones. (Jiri Olsa)
> 
> Fixes:
> 
> - Add missing initialization to the instruction decoder used in the
>   intel PT/BTS code, which was causing lots of failures in 'perf test',
>   looking for a value when there was none (Adrian Hunter)
> 
> Infrastructure:
> 
> - Add arch code needed to adopt the kernel's refcount_t to aid in
>   catching bugs when using atomic_t as a reference counter, basically
>   cmpxchg related functions (Arnaldo Carvalho de Melo)
> 
> - Convert the code using atomic_t as reference counts to refcount_t
>   (Elena Rashetova)
> 
> - Add feature test for sched_getcpu() to more easily check for its
>   presence in the many libc implementations and accross different
>   versions of such C libraries (Arnaldo Carvalho de Melo)
> 
> - Issue a HW watchdog disable hint in 'perf stat' for when some of the
>   requested events can't get counted because a PMU counter is taken by that
>   watchdog (Borislav Petkov).
> 
> - Add mapping for Intel's KnightsMill PMU events (Karol Wachowski)
> 
> Documentation:
> 
> - Clarify the term 'convergence' in:
> 
>perf bench numa numa-mem -h --show_convergence (Jiri Olsa)
> 
> Kernel code:
> 
> - Ensure probe location is at function entry in kretprobes (Naveen N. Rao)
> 
> - Allow return probes with offsets and absolute addresses (Naveen N. Rao)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (1):
>   perf intel-PT/BTS: Add missing initialization
> 
> Arnaldo Carvalho de Melo (12):
>   tools include: Adopt __compiletime_error
>   tools arch x86: Include asm/cmpxchg.h
>   tools arch x86: Introduce atomic_cmpxchg()
>   tools include: Introduce atomic_cmpxchg_{relaxed,release}()
>   tools include: Provide gcc based cmpxchg fallback for !x86
>   tools include: Add UINT_MAX def to kernel.h
>   tools include: Adopt kernel's refcount.h
>   perf evlist: Clarify a bit the use of perf_mmap->refcnt
>   tools build: Add test for sched_getcpu()
>   perf bench futex: Use __maybe_unused
>   perf bench futex: Fix build on musl + clang
>   tools build: Use the same CC for feature detection and actual build
> 
> Borislav Petkov (1):
>   perf stat: Issue a HW watchdog disable hint
> 
> Charles Baylis (1):
>   perf tools: Allow sorting by symbol size
> 
> Elena Reshetova (9):
>   perf cgroup: Convert cgroup_sel.refcnt from atomic_t to refcount_t
>   perf cpumap: Convert cpu_map.refcnt from atomic_t to refcount_t
>   perf comm: Convert comm_str.refcnt from atomic_t to refcount_t
>   perf dso: Convert dso.refcnt from atomic_t to refcount_t
>   perf map: Convert map.refcnt from atomic_t to refcount_t
>   perf map: Convert map_groups.refcnt from atomic_t to refcount_t
>   perf evlist: Convert perf_map.refcnt from atomic_t to refcount_t
>   perf thread: convert thread.refcnt from atomic_t to refcount_t
>   perf thread_map: Convert 

Re: [PATCH 1/3] kprobes: introduce weak variant of kprobe_exceptions_notify

2017-02-09 Thread Ingo Molnar

* Michael Ellerman <m...@ellerman.id.au> wrote:

> "Naveen N. Rao" <naveen.n@linux.vnet.ibm.com> writes:
> 
> > kprobe_exceptions_notify() is not used on some of the architectures such
> > as arm[64] and powerpc anymore. Introduce a weak variant for such
> > architectures.
> 
> I'll merge patch 1 & 3 via the powerpc tree for v4.11.

Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [PATCH tip/core/rcu 2/3] srcu: Force full grace-period ordering

2017-01-15 Thread Ingo Molnar

* Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:

> On Sun, Jan 15, 2017 at 10:40:58AM +0100, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:
> > 
> > > > [sounds of rummaging around in the Git tree]
> > > > 
> > > > I found this commit of yours from ancient history (more than a year 
> > > > ago!):
> > > > 
> > > >   commit 12d560f4ea87030667438a169912380be00cea4b
> > > >   Author: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > > >   Date:   Tue Jul 14 18:35:23 2015 -0700
> > > > 
> > > > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > > > 
> > > > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > > > likely the only thing that ever will use it, so this commit makes 
> > > > this
> > > > macro private to RCU.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
> > > > Cc: Will Deacon <will.dea...@arm.com>
> > > > Cc: Peter Zijlstra <pet...@infradead.org>
> > > > Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
> > > > Cc: "linux-a...@vger.kernel.org" <linux-a...@vger.kernel.org>
> > > > 
> > > > So I concur and I'm fine with your patch - or with the status quo code 
> > > > as well.
> > > 
> > > I already have the patch queued, so how about I keep it if I get an ack
> > > from the powerpc guys and drop it otherwise?
> > 
> > Yeah, sounds good! Your patch made me look up 'RelAcq' so it has 
> > documentation 
> > value as well ;-)
> 
> ;-) ;-) ;-)
> 
> Looking forward, my guess would be that if some other code needs
> smp_mb__after_unlock_lock() or if some other architecture needs
> non-smb_mb() special handling, I should consider making it work the
> same as smp_mb__after_atomic() and friends.  Does that seem like a
> reasonable thought?

Yeah, absolutely - it's just that the pattern triggered the 'this looks a bit 
too 
specialized' response in me, but after seeing the details (again ...) I agree 
that 
this time is different!

Thanks,

Ingo


Re: [PATCH tip/core/rcu 2/3] srcu: Force full grace-period ordering

2017-01-15 Thread Ingo Molnar

* Paul E. McKenney  wrote:

> > [sounds of rummaging around in the Git tree]
> > 
> > I found this commit of yours from ancient history (more than a year ago!):
> > 
> >   commit 12d560f4ea87030667438a169912380be00cea4b
> >   Author: Paul E. McKenney 
> >   Date:   Tue Jul 14 18:35:23 2015 -0700
> > 
> > rcu,locking: Privatize smp_mb__after_unlock_lock()
> > 
> > RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
> > likely the only thing that ever will use it, so this commit makes this
> > macro private to RCU.
> > 
> > Signed-off-by: Paul E. McKenney 
> > Cc: Will Deacon 
> > Cc: Peter Zijlstra 
> > Cc: Benjamin Herrenschmidt 
> > Cc: "linux-a...@vger.kernel.org" 
> > 
> > So I concur and I'm fine with your patch - or with the status quo code as 
> > well.
> 
> I already have the patch queued, so how about I keep it if I get an ack
> from the powerpc guys and drop it otherwise?

Yeah, sounds good! Your patch made me look up 'RelAcq' so it has documentation 
value as well ;-)

Thanks,

Ingo


Re: [PATCH tip/core/rcu 2/3] srcu: Force full grace-period ordering

2017-01-14 Thread Ingo Molnar

* Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:

> On Sun, Jan 15, 2017 at 08:11:23AM +0100, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paul...@linux.vnet.ibm.com> wrote:
> > 
> > > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > > index 357b32aaea48..5fdfe874229e 100644
> > > --- a/include/linux/rcupdate.h
> > > +++ b/include/linux/rcupdate.h
> > > @@ -1175,11 +1175,11 @@ do { \
> > >   * if the UNLOCK and LOCK are executed by the same CPU or if the
> > >   * UNLOCK and LOCK operate on the same lock variable.
> > >   */
> > > -#ifdef CONFIG_PPC
> > > +#ifdef CONFIG_ARCH_WEAK_RELACQ
> > >  #define smp_mb__after_unlock_lock()  smp_mb()  /* Full ordering for 
> > > lock. */
> > > -#else /* #ifdef CONFIG_PPC */
> > > +#else /* #ifdef CONFIG_ARCH_WEAK_RELACQ */
> > >  #define smp_mb__after_unlock_lock()  do { } while (0)
> > > -#endif /* #else #ifdef CONFIG_PPC */
> > > +#endif /* #else #ifdef CONFIG_ARCH_WEAK_RELACQ */
> > >  
> > >  
> > 
> > So at the risk of sounding totally pedantic, why not structure it like the 
> > existing smp_mb__before/after*() primitives in barrier.h?
> > 
> > That allows asm-generic/barrier.h to pick up the definition - for example 
> > in the 
> > case of smp_acquire__after_ctrl_dep() we do:
> > 
> >  #ifndef smp_acquire__after_ctrl_dep
> >  #define smp_acquire__after_ctrl_dep()   smp_rmb()
> >  #endif
> > 
> > Which allows Tile to relax it:
> > 
> >   arch/tile/include/asm/barrier.h:#define smp_acquire__after_ctrl_dep()   
> > barrier()
> > 
> > I.e. I'd move the API definition out of rcupdate.h and into barrier.h - 
> > even 
> > though tree-RCU is the only user of this barrier type.
> 
> I wouldn't have any problem with that, however, some time back it was
> moved into RCU because (you guessed it!) RCU is the only user.  ;-)

Indeed ...

[sounds of rummaging around in the Git tree]

I found this commit of yours from ancient history (more than a year ago!):

  commit 12d560f4ea87030667438a169912380be00cea4b
  Author: Paul E. McKenney <paul...@linux.vnet.ibm.com>
  Date:   Tue Jul 14 18:35:23 2015 -0700

rcu,locking: Privatize smp_mb__after_unlock_lock()

RCU is the only thing that uses smp_mb__after_unlock_lock(), and is
likely the only thing that ever will use it, so this commit makes this
macro private to RCU.

Signed-off-by: Paul E. McKenney <paul...@linux.vnet.ibm.com>
Cc: Will Deacon <will.dea...@arm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Benjamin Herrenschmidt <b...@kernel.crashing.org>
Cc: "linux-a...@vger.kernel.org" <linux-a...@vger.kernel.org>

So I concur and I'm fine with your patch - or with the status quo code as well.

Thanks,

Ingo


Re: [PATCH tip/core/rcu 2/3] srcu: Force full grace-period ordering

2017-01-14 Thread Ingo Molnar

* Paul E. McKenney  wrote:

> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 357b32aaea48..5fdfe874229e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -1175,11 +1175,11 @@ do { \
>   * if the UNLOCK and LOCK are executed by the same CPU or if the
>   * UNLOCK and LOCK operate on the same lock variable.
>   */
> -#ifdef CONFIG_PPC
> +#ifdef CONFIG_ARCH_WEAK_RELACQ
>  #define smp_mb__after_unlock_lock()  smp_mb()  /* Full ordering for lock. */
> -#else /* #ifdef CONFIG_PPC */
> +#else /* #ifdef CONFIG_ARCH_WEAK_RELACQ */
>  #define smp_mb__after_unlock_lock()  do { } while (0)
> -#endif /* #else #ifdef CONFIG_PPC */
> +#endif /* #else #ifdef CONFIG_ARCH_WEAK_RELACQ */
>  
>  

So at the risk of sounding totally pedantic, why not structure it like the 
existing smp_mb__before/after*() primitives in barrier.h?

That allows asm-generic/barrier.h to pick up the definition - for example in 
the 
case of smp_acquire__after_ctrl_dep() we do:

 #ifndef smp_acquire__after_ctrl_dep
 #define smp_acquire__after_ctrl_dep()   smp_rmb()
 #endif

Which allows Tile to relax it:

  arch/tile/include/asm/barrier.h:#define smp_acquire__after_ctrl_dep()   
barrier()

I.e. I'd move the API definition out of rcupdate.h and into barrier.h - even 
though tree-RCU is the only user of this barrier type.

Thanks,

Ingo


Re: [GIT PULL 00/29] perf/core improvements and fixes

2016-12-20 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
> Please consider pulling, I had most of this queued before your first
> pull req to Linus for 4.10, most are fixes, with 'perf sched timehist --idle'
> as a followup new feature to the 'perf sched timehist' command introduced in
> this window.
>   
>   One other thing that delayed this was the samples/bpf/ switch to
> tools/lib/bpf/ that involved fixing up merge clashes with net.git and also
> to properly test it, after more rounds than antecipated, but all seems ok
> now and would be good to get this merge issues past us ASAP.
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit e7aa8c2eb11ba69b1b69099c3c7bd6be3087b0ba:
> 
>   Merge tag 'docs-4.10' of git://git.lwn.net/linux (2016-12-12 21:58:13 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20161220
> 
> for you to fetch changes up to 9899694a7f67714216665b87318eb367e2c5c901:
> 
>   samples/bpf: Move open_raw_sock to separate header (2016-12-20 12:00:40 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> New features:
> 
> - Introduce 'perf sched timehist --idle', to analyse processes
>   going to/from idle state (Namhyung Kim)
> 
> Fixes:
> 
> - Allow 'perf record -u user' to continue when facing races with threads
>   going away after having scanned them via /proc (Jiri Olsa)
> 
> - Fix 'perf mem' --all-user/--all-kernel options (Jiri Olsa)
> 
> - Support jumps with multiple arguments (Ravi Bangoria)
> 
> - Fix jumps to before the function where they are located (Ravi
> Bangoria)
> 
> - Fix lock-pi help string (Davidlohr Bueso)
> 
> - Fix build of 'perf trace' in odd systems such as a RHEL PPC one (Jiri Olsa)
> 
> - Do not overwrite valid build id in 'perf diff' (Kan Liang)
> 
> - Don't throw error for zero length symbols, allowing the use of the TUI
>   in PowerPC, where such symbols became more common recently (Ravi Bangoria)
> 
> Infrastructure:
> 
> - Switch of samples/bpf/ to use tools/lib/bpf, removing libbpf
>   duplication (Joe Stringer)
> 
> - Move headers check into bash script (Jiri Olsa)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (3):
>   perf tools: Remove some needless __maybe_unused
>   samples/bpf: Make perf_event_read() static
>   samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
> 
> Davidlohr Bueso (1):
>   perf bench futex: Fix lock-pi help string
> 
> Jiri Olsa (7):
>   perf tools: Move headers check into bash script
>   perf mem: Fix --all-user/--all-kernel options
>   perf evsel: Use variable instead of repeating lengthy FD macro
>   perf thread_map: Add thread_map__remove function
>   perf evsel: Allow to ignore missing pid
>   perf record: Force ignore_missing_thread for uid option
>   perf trace: Check if MAP_32BIT is defined (again)
> 
> Joe Stringer (8):
>   tools lib bpf: Sync {tools,}/include/uapi/linux/bpf.h
>   tools lib bpf: use __u32 from linux/types.h
>   tools lib bpf: Add flags to bpf_create_map()
>   samples/bpf: Make samples more libbpf-centric
>   samples/bpf: Switch over to libbpf
>   tools lib bpf: Add bpf_prog_{attach,detach}
>   samples/bpf: Remove perf_event_open() declaration
>   samples/bpf: Move open_raw_sock to separate header
> 
> Kan Liang (1):
>   perf diff: Do not overwrite valid build id
> 
> Namhyung Kim (6):
>   perf sched timehist: Split is_idle_sample()
>   perf sched timehist: Introduce struct idle_time_data
>   perf sched timehist: Save callchain when entering idle
>   perf sched timehist: Skip non-idle events when necessary
>   perf sched timehist: Add -I/--idle-hist option
>   perf sched timehist: Show callchains for idle stat
> 
> Ravi Bangoria (3):
>   perf annotate: Support jump instruction with target as second operand
>   perf annotate: Fix jump target outside of function address range
>   perf annotate: Don't throw error for zero length symbols
> 
>  samples/bpf/Makefile  |  70 +--
>  samples/bpf/README.rst|   4 +-
>  samples/bpf/bpf_load.c|  21 +-
>  samples/bpf/bpf_load.h|   3 +
>  samples/bpf/fds_example.c |  13 +-
>  samples/bpf/lathist_user.c|   2 +-
>  samples/bpf/libbpf.c  | 176 ---
>  samples/bpf/libbpf.h  |  28 +-
>  samples/bpf/lwt_len_hist_user.c   |   6 +-
>  samples/bpf/offwaketime_user.c|   8 +-
>  samples/bpf/sampleip_user.c   |   7 +-
>  

Re: [GIT PULL 00/20] perf/core improvements and fixes

2016-12-06 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Test results at the end of this message, as usual.
> 
> The following changes since commit e7af7b15121ca08c31a0ab9df71a41b4c53365b4:
> 
>   Merge tag 'perf-core-for-mingo-20161201' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2016-12-02 10:08:03 +0100)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20161205
> 
> for you to fetch changes up to bec60e50af83741cde1786ab475d4bf472aed6f9:
> 
>   perf annotate: Show raw form for jump instruction with indirect target 
> (2016-12-05 17:21:57 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> Fixes:
> 
> - Do not show a bogus target address in 'perf annotate' for targetless powerpc
>   jump instructions such as 'bctr' (Ravi Bangoria)
> 
> - tools/build fixes related to race conditions with the fixdep utility (Jiri 
> Olsa)
> 
> - Fix building objtool with clang (Peter Foley)
> 
> Infrastructure:
> 
> - Support linking perf with clang and LLVM libraries, initially statically, 
> but
>   this limitation will be lifted and shared libraries, when available, will
>   be preferred to the static build, that should, as with other features, be
>   enabled explicitly (Wang Nan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Jiri Olsa (7):
>   tools build: Make fixdep parsing wait for last target
>   tools build: Make the .cmd file more readable
>   tools build: Move tabs to spaces where suitable
>   perf tools: Move install-gtk target into rules area
>   perf tools: Move python/perf.so target into rules area
>   perf tools: Cleanup build directory before each test
>   perf tools: Add non config targets
> 
> Peter Foley (1):
>   tools build: Fix objtool build with clang
> 
> Ravi Bangoria (1):
>   perf annotate: Show raw form for jump instruction with indirect target
> 
> Wang Nan (11):
>   perf tools: Pass context to perf hook functions
>   perf llvm: Extract helpers in llvm-utils.c
>   tools build: Add feature detection for LLVM
>   tools build: Add feature detection for clang
>   perf build: Add clang and llvm compile and linking support
>   perf clang: Add builtin clang support ant test case
>   perf clang: Use real file system for #include
>   perf clang: Allow passing CFLAGS to builtin clang
>   perf clang: Update test case to use real BPF script
>   perf clang: Support compile IR to BPF object and add testcase
>   perf clang: Compile BPF script using builtin clang support
> 
>  tools/build/Build.include  |  20 ++--
>  tools/build/Makefile.feature   | 138 +-
>  tools/build/feature/Makefile   | 120 +--
>  tools/build/feature/test-clang.cpp |  21 
>  tools/build/feature/test-llvm.cpp  |   8 ++
>  tools/build/fixdep.c   |   5 +-
>  tools/perf/Makefile.config |  62 +---
>  tools/perf/Makefile.perf   |  56 +++
>  tools/perf/tests/Build |   1 +
>  tools/perf/tests/builtin-test.c|   9 ++
>  tools/perf/tests/clang.c   |  46 +
>  tools/perf/tests/llvm.h|   7 ++
>  tools/perf/tests/make  |   4 +-
>  tools/perf/tests/perf-hooks.c  |  14 ++-
>  tools/perf/tests/tests.h   |   3 +
>  tools/perf/util/Build  |   2 +
>  tools/perf/util/annotate.c |   3 +
>  tools/perf/util/bpf-loader.c   |  19 +++-
>  tools/perf/util/c++/Build  |   2 +
>  tools/perf/util/c++/clang-c.h  |  43 
>  tools/perf/util/c++/clang-test.cpp |  62 
>  tools/perf/util/c++/clang.cpp  | 195 
> +
>  tools/perf/util/c++/clang.h|  26 +
>  tools/perf/util/llvm-utils.c   |  76 +++
>  tools/perf/util/llvm-utils.h   |   6 ++
>  tools/perf/util/perf-hooks.c   |  10 +-
>  tools/perf/util/perf-hooks.h   |   6 +-
>  tools/perf/util/util-cxx.h |  26 +
>  28 files changed, 795 insertions(+), 195 deletions(-)
>  create mode 100644 tools/build/feature/test-clang.cpp
>  create mode 100644 tools/build/feature/test-llvm.cpp
>  create mode 100644 tools/perf/tests/clang.c
>  create mode 100644 tools/perf/util/c++/Build
>  create mode 100644 tools/perf/util/c++/clang-c.h
>  create mode 100644 tools/perf/util/c++/clang-test.cpp
>  create mode 100644 tools/perf/util/c++/clang.cpp
>  create mode 100644 tools/perf/util/c++/clang.h
>  create mode 100644 tools/perf/util/util-cxx.h
> 
>   # uname -a
>   Linux jouet 4.8.8-300.fc25.x86_64 #1 SMP Tue Nov 15 18:10:06 UTC 2016 
> x86_64 x86_64 x86_64 GNU/Linux
>   # perf test
>1: vmlinux symtab 

Re: [GIT PULL 00/22] perf/core improvements and fixes

2016-10-04 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> Build and test stats at the end of the message.
> 
> The following changes since commit 41aad2a6d4fcdda8d73c9739daf7a9f3f49499d6:
> 
>   Merge tag 'perf-core-for-mingo-20160929' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2016-09-29 19:09:58 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20161003
> 
> for you to fetch changes up to b42c7369e3f451e22c2b0be5d193955498d37546:
> 
>   perf pmu-events: Add Skylake frontend MSR support (2016-10-03 21:52:01 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> - Allow vendors to provide JSON files describing PMU events, that then
>   get parsed to generate C tables that are linked against perf, allowing
>   the use of the names in their documentations, such as:
> 
>   # perf list l1d
> 
>   List of pre-defined events (to be used in -e):
> 
>   Cache:
> l1d.replacement
>  [L1D data line replacements]
> l1d_pend_miss.fb_full
>  [Cycles a demand request was blocked due to Fill Buffers 
> inavailability]
> l1d_pend_miss.pending
>  [L1D miss oustandings duration in cycles]
> l1d_pend_miss.pending_cycles
>  [Cycles with L1D load Misses outstanding]
> l1d_pend_miss.pending_cycles_any
>  [Cycles with L1D load Misses outstanding from any thread on physical 
> core]
> l2_trans.l1d_wb
>  [L1D writebacks that access L2 cache]
> 
>   Pipeline:
> cycle_activity.cycles_l1d_miss
>  [Cycles while L1 cache miss demand load is outstanding]
> cycle_activity.cycles_l1d_pending
>  [Cycles while L1 cache miss demand load is outstanding]
> cycle_activity.stalls_l1d_miss
>  [Execution stalls while L1 cache miss demand load is outstanding]
> cycle_activity.stalls_l1d_pending
>  [Execution stalls while L1 cache miss demand load is outstanding]
> 
>   The above example was done on a Broadwell based ThinkPad t450s after
>   downloading and installing such JSON files which will be added to the
>   tools/perf/pmu-events/ directory in a subsequent patchkit.
> 
>   Now one can use those names with -e/--event in all 'perf tools'.
>   (Andi Kleen, Sukadev Bhattiprolu)
> 
> - Add a missing pointer dereference in 'perf probe' (Colin Ian King)
> 
> - Add support for building host programs to be used in generating files
>   to be used in the build process, such as fixdep and jevents, fixing
>   the usage of these features in a cross compilation setup (Jiri Olsa)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andi Kleen (12):
>   perf tools: Add jsmn `jasmine' JSON parser
>   perf jevents: Program to convert JSON file
>   perf tools: Support CPU id matching for x86 v2
>   perf jevents: Handle header line in mapfile
>   perf pmu: Support alias descriptions
>   perf tools: Query terminal width and use in perf list
>   perf list: Add a --no-desc flag
>   perf pmu: Add override support for event list CPUID
>   perf list jevents: Add support for event list topics
>   perf tools: Make alias matching case-insensitive
>   perf pmu-events: Fix fixed counters on Intel
>   perf pmu-events: Add Skylake frontend MSR support
> 
> Arnaldo Carvalho de Melo (1):
>   perf tools: Experiment with cppcheck
> 
> Colin Ian King (1):
>   perf probe: Check if *ptr2 is zero and not ptr2
> 
> Jiri Olsa (2):
>   tools build: Add support for host programs format
>   tools build: Make fixdep a hostprog
> 
> Sukadev Bhattiprolu (6):
>   perf pmu: Use pmu_events table to create aliases
>   perf powerpc: Support CPU ID matching for Powerpc
>   perf jevents: Add support for long descriptions
>   perf list: Support long jevents descriptions
>   perf tools: Add README for info on parsing JSON/map files
>   perf tools: Allow period= in perf stat CPU event descriptions.
> 
>  tools/build/Build  |   2 +
>  tools/build/Build.include  |   5 +
>  tools/build/Makefile   |   8 +-
>  tools/build/Makefile.build |  19 +-
>  tools/build/Makefile.include   |   4 -
>  tools/lib/subcmd/pager.c   |  16 +
>  tools/lib/subcmd/pager.h   |   1 +
>  tools/perf/Documentation/perf-list.txt |  12 +-
>  tools/perf/Makefile.perf   |  34 +-
>  tools/perf/arch/powerpc/util/header.c  |  11 +
>  tools/perf/arch/x86/util/header.c  |  24 +-
>  tools/perf/builtin-list.c  |  20 +-
>  tools/perf/pmu-events/Build|  13 +
>  tools/perf/pmu-events/README   | 147 ++
>  tools/perf/pmu-events/jevents.c| 812 
> 

Re: [GIT PULL 00/27] perf/core improvements and fixes

2016-09-29 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling, more to come soon,
> 
> - Arnaldo
> 
> Build and test results at the end of this message.
> 
> The following changes since commit 6b652de2b27c0a4020ce0e8f277e782b6af76096:
> 
>   Merge tag 'perf-core-for-mingo-20160922' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2016-09-23 07:21:38 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20160929
> 
> for you to fetch changes up to d18019a53a07e009899ff6b8dc5ec30f249360d9:
> 
>   perf tests: Add dwarf unwind test for powerpc (2016-09-29 11:18:21 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> User visible:
> -
> 
> New features:
> 
> - Add support for using symbols in address filters with Intel PT and ARM
>   CoreSight (hardware assisted tracing facilities) (Adrian Hunter, Mathieu 
> Poirier)
> 
> Fixes:
> 
> - Fix MMAP event synthesis for pre-existing threads when no hugetlbfs
>   mount is in place (Adrian Hunter)
> 
> - Don't ignore kernel idle symbols in 'perf script' (Adrian Hunter)
> 
> - Assorted Intel PT fixes (Adrian Hunter)
> 
> Improvements:
> 
> - Fix handling of C++ symbols in 'perf probe' (Masami Hiramatsu)
> 
> - Beautify sched_[gs]et_attr return value in 'perf trace' (Arnaldo Carvalho 
> de Melo)
> 
> Infrastructure:
> ---
> 
> New features:
> 
> - Add dwarf unwind 'perf test' for powerpc (Ravi Bangoria)
> 
> Fixes:
> 
> - Fix error paths in 'perf record' (Adrian Hunter)
> 
> Documentation:
> 
> - Update documentation info about quipper, a C++ parser for converting
>   to/from perf.data/chromium profiling format (Simon Que)
> 
> Build Fixes:
> 
>   Fix building in 32 bit platform with libbabeltrace (Wang Nan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (16):
>   perf record: Fix documentation 'event_sources' -> 'event_source'
>   perf tools: Fix MMAP event synthesis broken by MAP_HUGETLB change
>   perf script: Fix vanished idle symbols
>   perf record: Rename label 'out_symbol_exit'
>   perf record: Fix error paths
>   perf symbols: Add dso__last_symbol()
>   perf record: Add support for using symbols in address filters
>   perf probe: Increase debug level of SDT debug messages
>   perf intel-pt: Fix snapshot overlap detection decoder errors
>   perf intel-pt: Add support for recording the max non-turbo ratio
>   perf intel-pt: Fix missing error codes processing auxtrace_info
>   perf intel-pt: Add a helper function for processing AUXTRACE_INFO
>   perf intel-pt: Record address filter in AUXTRACE_INFO event
>   perf intel-pt: Read address filter from AUXTRACE_INFO event
>   perf intel-pt: Enable decoder to handle TIP.PGD with missing IP
>   perf intel-pt: Fix decoding when there are address filters
> 
> Arnaldo Carvalho de Melo (1):
>   perf trace: Beautify sched_[gs]et_attr return value
> 
> Masami Hiramatsu (4):
>   perf probe: Ignore the error of finding inline instance
>   perf probe: Skip if the function address is 0
>   perf probe: Fix to cut off incompatible chars from group name
>   perf probe: Match linkage name with mangled name
> 
> Mathieu Poirier (3):
>   perf tools: Make perf_evsel__append_filter() generic
>   perf evsel: New tracepoint specific function
>   perf evsel: Add support for address filters
> 
> Ravi Bangoria (1):
>   perf tests: Add dwarf unwind test for powerpc
> 
> Simon Que (1):
>   perf tools: Update documentation info about quipper
> 
> Wang Nan (1):
>   perf data: Fix building in 32 bit platform with libbabeltrace
> 
>  tools/perf/Documentation/perf-record.txt   |  61 +-
>  tools/perf/Documentation/perf.data-file-format.txt |   6 +-
>  tools/perf/arch/powerpc/Build  |   1 +
>  tools/perf/arch/powerpc/include/arch-tests.h   |  13 +
>  tools/perf/arch/powerpc/include/perf_regs.h|   2 +
>  tools/perf/arch/powerpc/tests/Build|   4 +
>  tools/perf/arch/powerpc/tests/arch-tests.c |  15 +
>  tools/perf/arch/powerpc/tests/dwarf-unwind.c   |  62 ++
>  tools/perf/arch/powerpc/tests/regs_load.S  |  94 +++
>  tools/perf/arch/x86/util/intel-pt.c|  57 +-
>  tools/perf/builtin-record.c|  32 +-
>  tools/perf/builtin-trace.c |  10 +-
>  tools/perf/tests/Build |   2 +-
>  tools/perf/tests/dwarf-unwind.c|   2 +-
>  tools/perf/util/auxtrace.c | 737 
> +
>  tools/perf/util/auxtrace.h |  54 ++
>  tools/perf/util/build-id.c |   

Re: [PATCH v20 00/20] perf, tools: Add support for PMU events in JSON format

2016-09-13 Thread Ingo Molnar

* Michael Ellerman <m...@ellerman.id.au> wrote:

> Jiri Olsa <jo...@redhat.com> writes:
> 
> > On Wed, Aug 31, 2016 at 09:15:30AM -0700, Andi Kleen wrote:
> >> > > 
> >> > > > 
> >> > > > I've already made some changes in pmu-events/* to support
> >> > > > this hierarchy to see how bad the change would be.. and
> >> > > > it's not that bad ;-)
> >> > > 
> >> > > Everything has to be automated, please no manual changes.
> >> > 
> >> > sure
> >> > 
> >> > so, if you're ok with the layout, how do you want to proceed further?
> >> 
> >> If the split version is acceptable it's fine for me to merge it.
> >> 
> >> I'll add split-json to my scripting, so the next update would
> >> be split too.
> >
> > ook, I'll wait for patches then
> 
> Who are you waiting for patches from?
> 
> Would be great if this could go in for 4.9 still.

No objections from me - the latest bits were good:

  Acked-by: Ingo Molnar <mi...@kernel.org>

Thanks,

Ingo


Re: [patch V2 30/67] powerpc/numa: Convert to hotplug state machine

2016-07-15 Thread Ingo Molnar

* Anton Blanchard  wrote:

> Hi Anna-Maria,
> 
> > >> Install the callbacks via the state machine and let the core invoke
> > >> the callbacks on the already online CPUs.  
> > >
> > > This is causing an oops on ppc64le QEMU, looks like a NULL
> > > pointer:  
> > 
> > Did you tested it against tip WIP.hotplug?
> 
> I noticed tip started failing in my CI environment which tests on QEMU.
> The failure bisected to commit 425209e0abaf2c6e3a90ce4fedb935c10652bf80

That's very useful, thanks Anton!

I have removed this commit from the series for the time being, refactored the 
followup commits (there was one trivial conflict). We can re-try this patch 
when a 
fix is found.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-10 Thread Ingo Molnar

* PaX Team  wrote:

> On 9 Jul 2016 at 14:27, Andy Lutomirski wrote:
> 
> > I like the series, but I have one minor nit to pick.  The effect of this 
> > series is to harden usercopy, but most of the code is really about 
> > infrastructure to validate that a pointed-to object is valid.
> 
> actually USERCOPY has never been about validating pointers. its sole purpose 
> is 
> to validate the *size* argument of copy*user calls, a very specific form of 
> runtime bounds checking.

What this code has been about originally is largely immaterial, unless you can 
formulate it into a technical argument.

There are a number of cheap tests we can do and there are a number of ways how 
a 
'pointer' can be validated runtime, without any 'size' information:

 - for example if a pointer points into a red zone straight away then we know 
it's
   bogus.

 - or if a kernel pointer is points outside the valid kernel virtual memory 
range
   we know it's bogus as well.

So while only doing a bounds check might have been the original purpose of the 
patch set, Andy's point is that it might make sense to treat this facility as a 
more generic 'object validation' code of (pointer,size) object and not limit it 
to 
'runtime bounds checking'. That kind of extended purpose behind a facility 
should 
be reflected in the naming.

Confusing names are often the source of misunderstandings and bugs.

The 9-patch series as submitted here is neither just 'bounds checking' nor just 
pure 'pointer checking', it's about validating that a (pointer,size) range of 
memory passed to a (user) memory copy function is fully within a valid object 
the 
kernel might know about (in an fast to check fashion).

This necessary means:

 - the start of the range points to a valid object to begin with (if known)

 - the range itself does not point beyond the end of the object (if known)

 - even if the kernel does not know anything about the pointed to object it can 
   do a pointer check (for example is it pointing inside kernel virtual memory) 
   and do a bounds check on the size.

Do you disagree with that?

> > Might it make sense to call the infrastructure part something else?
> 
> yes, more bikeshedding will surely help, [...]

Insulting and ridiculing a reviewer who explicitly qualified his comments with 
"one minor nit to pick" sure does not help upstream integration either. (Unless 
the goal is to prevent upstream integration.)

> [...] like the renaming of .data..read_only to .data..ro_after_init which 
> also 
> had nothing to do with init but everything to do with objects being 
> conceptually 
> read-only...

.data..ro_after_init objects get written to during bootup so it's conceptually 
quite confusing to name it "read-only" without any clear qualifiers.

That it's named consistently with its role of "read-write before init and read 
only after init" on the other hand is not confusing at all. Not sure what your 
problem is with the new name.

Names within submitted patches get renamed on a routine basis during review. 
It's 
often only minor improvements in naming (which you can consider bike shedding), 
but in this particular case the rename was clearly useful in not just improving 
the name but in avoiding an actively confusing name. So I disagree not just 
with 
the hostile tone of your reply but with your underlying technical point as well.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-09 Thread Ingo Molnar

* Rik van Riel  wrote:

> On Fri, 2016-07-08 at 19:22 -0700, Laura Abbott wrote:
> > 
> > Even with the SLUB fixup I'm still seeing this blow up on my arm64
> > system. This is a
> > Fedora rawhide kernel + the patches
> > 
> > [0.666700] usercopy: kernel memory exposure attempt detected from
> > fc0008b4dd58 () (8 bytes)
> > [0.666720] CPU: 2 PID: 79 Comm: modprobe Tainted:
> > GW   4.7.0-0.rc6.git1.1.hardenedusercopy.fc25.aarch64 #1
> > [0.666733] Hardware name: AppliedMicro Mustang/Mustang, BIOS
> > 1.1.0 Nov 24 2015
> > [0.666744] Call trace:
> > [0.666756] [] dump_backtrace+0x0/0x1e8
> > [0.666765] [] show_stack+0x24/0x30
> > [0.666775] [] dump_stack+0xa4/0xe0
> > [0.666785] [] __check_object_size+0x6c/0x230
> > [0.666795] [] create_elf_tables+0x74/0x420
> > [0.666805] [] load_elf_binary+0x828/0xb70
> > [0.666814] [] search_binary_handler+0xb4/0x240
> > [0.666823] [] do_execveat_common+0x63c/0x950
> > [0.666832] [] do_execve+0x3c/0x50
> > [0.666841] []
> > call_usermodehelper_exec_async+0xe8/0x148
> > [0.666850] [] ret_from_fork+0x10/0x50
> > 
> > This happens on every call to execve. This seems to be the first
> > copy_to_user in
> > create_elf_tables. I didn't get a chance to debug and I'm going out
> > of town
> > all of next week so all I have is the report unfortunately. config
> > attached.
> 
> That's odd, this should be copying a piece of kernel data (not text)
> to userspace.
> 
> from fs/binfmt_elf.c
> 
>         const char *k_platform = ELF_PLATFORM;
> 
> ...
>                 size_t len = strlen(k_platform) + 1;
>   
>                 u_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
> if (__copy_to_user(u_platform, k_platform, len))
> return -EFAULT;
> 
> from arch/arm/include/asm/elf.h:
> 
> #define ELF_PLATFORM_SIZE 8
> #define ELF_PLATFORM(elf_platform)
> 
> extern char elf_platform[];
> 
> from arch/arm/kernel/setup.c:
> 
> char elf_platform[ELF_PLATFORM_SIZE];
> EXPORT_SYMBOL(elf_platform);
> 
> ...
> 
> snprintf(elf_platform, ELF_PLATFORM_SIZE, "%s%c",
>  list->elf_name, ENDIANNESS);
> 
> How does that end up in the .text section of the
> image, instead of in one of the various data sections?
> 
> What kind of linker oddity is going on with ARM?

I think the crash happened on ARM64, not ARM.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Ingo Molnar

* Linus Torvalds <torva...@linux-foundation.org> wrote:

> On Fri, Jul 8, 2016 at 1:46 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > Could you please try to find some syscall workload that does many small user
> > copies and thus excercises this code path aggressively?
> 
> Any stat()-heavy path will hit cp_new_stat() very heavily. Think the
> usual kind of "traverse the whole tree looking for something". "git
> diff" will do it, just checking that everything is up-to-date.
> 
> That said, other things tend to dominate.

So I think a cached 'find /usr >/dev/null' might be a good one as well:

 triton:~/tip> strace -c find /usr >/dev/null
 % time seconds  usecs/call callserrors syscall
 -- --- --- - - 
  47.090.006518   0254697   newfstatat
  26.200.003627   0254795   getdents
  14.450.002000   0   1147411   fcntl
   7.330.001014   0509811   close
   3.280.000454   0128220 1 openat
   1.520.000210   0128230   fstat
   0.270.16   0 12810   write
   0.000.00   010   read

 triton:~/tip> perf stat --repeat 3 -e cycles:u,cycles:k,cycles find /usr 
>/dev/null

 Performance counter stats for 'find /usr' (3 runs):

 1,594,437,143  cycles:u
  ( +-  2.76% )
 2,570,544,009  cycles:k
  ( +-  2.50% )
 4,164,981,152  cycles  
  ( +-  2.59% )

   0.929883686 seconds time elapsed 
 ( +-  2.57% )

... and it's dominated by kernel overhead, with a fair amount of memcpy 
overhead 
as well:

   1.22%  find [kernel.kallsyms]   [k] copy_user_enhanced_fast_string   

 

But maybe there are simple shell commands that are even more user-memcpy 
intense? 

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 0/9] mm: Hardened usercopy

2016-07-08 Thread Ingo Molnar

* Kees Cook  wrote:

> - I couldn't detect a measurable performance change with these features
>   enabled. Kernel build times were unchanged, hackbench was unchanged,
>   etc. I think we could flip this to "on by default" at some point.

Could you please try to find some syscall workload that does many small user 
copies and thus excercises this code path aggressively?

If that measurement works out fine then I'd prefer to enable these security 
checks 
by default.

Thaks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/17] perf/core improvements and fixes

2016-05-06 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> 
> The following changes since commit 1b6de5917172967acd8db4d222df4225d23a8a60:
> 
>   perf/x86/intel/pt: Convert ACCESS_ONCE()s (2016-05-05 10:16:29 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-20160505
> 
> for you to fetch changes up to b6b85dad30ad7e7394990e2317a780577974a4e6:
> 
>   perf evlist: Rename variable in perf_mmap__read() (2016-05-05 21:04:04 
> -0300)
> 
> 
> perf/core improvements and fixes:
> 
> User visible:
> 
> - Order output of 'perf trace --summary' better, now the threads will
>   appear ascending order of number of events, and then, for each, in
>   descending order of syscalls by the time spent in the syscalls, so
>   that the last page produced can be the one about the most interesting
>   thread straced, suggested by Milian Wolff (Arnaldo Carvalho de Melo)
> 
> - Do not show the runtime_ms for a thread when not collecting it, that
>   is done so far only with 'perf trace --sched' (Arnaldo Carvalho de Melo)
> 
> - Fix kallsyms perf test on ppc64le (Naveen N. Rao)
> 
> Infrastructure:
> 
> - Move global variables related to presence of some keys in the sort order to 
> a
>   per hist struct, to allow code like the hists browser to work with multiple
>   hists with different lists of columns (Jiri Olsa)
> 
> - Add support for generating bpf prologue in powerpc (Naveen N. Rao)
> 
> - Fix kprobe and kretprobe handling with kallsyms on ppc64le (Naveen N. Rao)
> 
> - evlist mmap changes, prep work for supporting reading backwards (Wang Nan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (5):
>   perf machine: Introduce number of threads member
>   perf tools: Add template for generating rbtree resort class
>   perf trace: Sort summary output by number of events
>   perf trace: Sort syscalls stats by msecs in --summary
>   perf trace: Do not show the runtime_ms for a thread when not collecting 
> it
> 
> Jiri Olsa (7):
>   perf hists: Move sort__need_collapse into struct perf_hpp_list
>   perf hists: Move sort__has_parent into struct perf_hpp_list
>   perf hists: Move sort__has_sym into struct perf_hpp_list
>   perf hists: Move sort__has_dso into struct perf_hpp_list
>   perf hists: Move sort__has_socket into struct perf_hpp_list
>   perf hists: Move sort__has_thread into struct perf_hpp_list
>   perf hists: Move sort__has_comm into struct perf_hpp_list
> 
> Naveen N. Rao (3):
>   perf tools powerpc: Add support for generating bpf prologue
>   perf powerpc: Fix kprobe and kretprobe handling with kallsyms on ppc64le
>   perf symbols: Fix kallsyms perf test on ppc64le
> 
> Wang Nan (2):
>   perf evlist: Extract perf_mmap__read()
>   perf evlist: Rename variable in perf_mmap__read()
> 
>  tools/perf/arch/powerpc/Makefile|   1 +
>  tools/perf/arch/powerpc/util/dwarf-regs.c   |  40 +---
>  tools/perf/arch/powerpc/util/sym-handling.c |  43 ++--
>  tools/perf/builtin-diff.c   |   4 +-
>  tools/perf/builtin-report.c |   4 +-
>  tools/perf/builtin-top.c|   8 +-
>  tools/perf/builtin-trace.c  |  87 ++--
>  tools/perf/tests/hists_common.c |   2 +-
>  tools/perf/tests/hists_cumulate.c   |   2 +-
>  tools/perf/tests/hists_link.c   |   4 +-
>  tools/perf/tests/hists_output.c |   2 +-
>  tools/perf/ui/browsers/hists.c  |  32 +++---
>  tools/perf/ui/gtk/hists.c   |   2 +-
>  tools/perf/ui/hist.c|   2 +-
>  tools/perf/util/annotate.c  |   2 +-
>  tools/perf/util/callchain.c |   2 +-
>  tools/perf/util/evlist.c|  56 ++-
>  tools/perf/util/hist.c  |  14 +--
>  tools/perf/util/hist.h  |  10 ++
>  tools/perf/util/machine.c   |   9 +-
>  tools/perf/util/machine.h   |   1 +
>  tools/perf/util/probe-event.c   |   5 +-
>  tools/perf/util/probe-event.h   |   3 +-
>  tools/perf/util/rb_resort.h | 149 
> 
>  tools/perf/util/sort.c  |  35 +++
>  tools/perf/util/sort.h  |   7 --
>  tools/perf/util/symbol-elf.c|   7 +-
>  tools/perf/util/symbol.h|   3 +-
>  28 files changed, 382 insertions(+), 154 deletions(-)
>  create mode 100644 tools/perf/util/rb_resort.h

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list

Re: [RFC PATCH v2 05/18] sched: add task flag for preempt IRQ tracking

2016-05-02 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> > Another idea to detect missing frames: for each return address on the 
> > stack, 
> > ensure there's a corresponding "call " instruction immediately 
> > preceding 
> > the return location, where  matches what's on the stack.
> 
> Hmm, interesting.
> 
> I hope your plans include rewriting the current stack unwinder completely.  
> The 
> thing in print_context_stack is (a) hard-to-understand and hard-to-modify 
> crap 
> and (b) is called in a loop from another file using totally ridiculous 
> conventions.

So we had several attempts at making it better, any further improvements 
(including radical rewrites) are more than welcome!

The generalization between the various stack walking methods certainly didn't 
make 
things easier to read - we might want to eliminate that by using better 
primitives 
to iterate over the stack frame.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Ingo Molnar

* Michael Ellerman <m...@ellerman.id.au> wrote:

> On Wed, 2016-04-13 at 09:43 +0200, Ingo Molnar wrote:
> > * Srikar Dronamraju <sri...@linux.vnet.ibm.com> wrote:
> > 
> > > * Anton Blanchard <an...@samba.org> [2016-04-06 21:59:50]:
> > > 
> > > > Looks good, and the patch below does fix the oops for me.
> > > > 
> > > > Anton
> > > > --
> > > > 
> > > > task_pt_regs() can return NULL for kernel threads, so add a check.
> > > > This fixes an oops at boot on ppc64.
> > > > 
> > > > Signed-off-by: Anton Blanchard <an...@samba.org>
> > > 
> > > Works for me too.
> > > 
> > > Reported-and-Tested-by: Srikar Dronamraju <sri...@linux.vnet.ibm.com>
> > 
> > Could someone please re-send the fix, because it has not reached me nor 
> > lkml.
> 
> It did hit LKML:
> 
> http://lkml.kernel.org/r/20160406215950.04bc3f0b@kryten
> 
> But that did have some verbiage at the top.
> 
> Anton's also resent it directly To you.

So it was in my Spam folder, due to the following SPF softfail:

  Received-SPF: softfail (google.com: domain of transitioning an...@samba.org 
does not designate 198.145.29.136 as permitted sender) client-ip=198.145.29.136;

have the patch now.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] sched/cpuacct: Check for NULL when using task_pt_regs()

2016-04-13 Thread Ingo Molnar

* Srikar Dronamraju  wrote:

> * Anton Blanchard  [2016-04-06 21:59:50]:
> 
> > Looks good, and the patch below does fix the oops for me.
> > 
> > Anton
> > --
> > 
> > task_pt_regs() can return NULL for kernel threads, so add a check.
> > This fixes an oops at boot on ppc64.
> > 
> > Signed-off-by: Anton Blanchard 
> 
> Works for me too.
> 
> Reported-and-Tested-by: Srikar Dronamraju 

Could someone please re-send the fix, because it has not reached me nor lkml.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 0/2] perf/urgent fixes

2016-03-31 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> The following changes since commit f6343be96ebbae38a07e0878810f5bbc0c38cade:
> 
>   Merge tag 'perf-urgent-for-mingo-20160329' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent 
> (2016-03-30 12:31:03 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo-20160330
> 
> for you to fetch changes up to 9f56c092b99b40ce3cf4c6d0134ff7e513c9f1a6:
> 
>   perf jit: genelf makes assumptions about endian (2016-03-30 18:12:06 -0300)
> 
> 
> perf/urgent fixes:
> 
> - Fix determination of a callchain node's childlessness in
>   the top/report TUI, which was preventing navigating some
>   callchains, --stdio unnaffected (Andres Freund)
> 
> - Fix jitdump's genelf assumption that PowerPC is big endian
>   only (Anton Blanchard)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Andres Freund (1):
>   perf hists: Fix determination of a callchain node's childlessness
> 
> Anton Blanchard (1):
>   perf jit: genelf makes assumptions about endian
> 
>  tools/perf/ui/browsers/hists.c |  2 +-
>  tools/perf/util/genelf.h   | 24 ++--
>  2 files changed, 11 insertions(+), 15 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/2] Consolidate redundant register/stack access code

2016-02-17 Thread Ingo Molnar

* David Long <dave.l...@linaro.org> wrote:

> On 02/09/2016 04:45 AM, Ingo Molnar wrote:
> >
> >* Michael Ellerman <m...@ellerman.id.au> wrote:
> >
> >>On Tue, 2016-02-09 at 00:38 -0500, David Long wrote:
> >>
> >>>From: "David A. Long" <dave.l...@linaro.org>
> >>>
> >>>Move duplicate and functionally equivalent code for accessing registers
> >>>and stack (CONFIG_HAVE_REGS_AND_STACK_ACCESS_API) from arch subdirs into
> >>>common kernel files.
> >>>
> >>>I'm sending this out again (with updated distribution list) because v2
> >>>just never got pulled in, even though I don't think there were any
> >>>outstanding issues.
> >>
> >>A big cross arch patch like this would often get taken by Andrew Morton, but
> >>AFAICS you didn't CC him - so I just added him, perhaps he'll pick it up for
> >>us :D
> >
> >The other problem is that the second patch is commingling changes to 6 
> >separate
> >architectures:
> >
> >  16 files changed, 106 insertions(+), 343 deletions(-)
> >
> >that should probably be 6 separate patches. Easier to review, easier to 
> >bisect to,
> >easier to revert, etc.
> >
> >Thanks,
> >
> > Ingo
> >
> 
> I see your point but I'm not sure it could have been broken into separate 
> successive patches that would each build for all architectures.

Why? AFAICS all the functionality appears to be conditional on 
CONFIG_HAVE_REGS_AND_STACK_ACCESS_API, so it ought to build standalone as well, 
on 
a per arch basis, as long as the core kernel patch is applied first.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 0/2] Consolidate redundant register/stack access code

2016-02-09 Thread Ingo Molnar

* Michael Ellerman  wrote:

> On Tue, 2016-02-09 at 00:38 -0500, David Long wrote:
> 
> > From: "David A. Long" 
> >
> > Move duplicate and functionally equivalent code for accessing registers
> > and stack (CONFIG_HAVE_REGS_AND_STACK_ACCESS_API) from arch subdirs into
> > common kernel files.
> >
> > I'm sending this out again (with updated distribution list) because v2
> > just never got pulled in, even though I don't think there were any
> > outstanding issues.
> 
> A big cross arch patch like this would often get taken by Andrew Morton, but
> AFAICS you didn't CC him - so I just added him, perhaps he'll pick it up for
> us :D

The other problem is that the second patch is commingling changes to 6 separate 
architectures:

 16 files changed, 106 insertions(+), 343 deletions(-)

that should probably be 6 separate patches. Easier to review, easier to bisect 
to, 
easier to revert, etc.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/16] perf/core improvements and fixes

2016-02-03 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   This is on top of the previously submitted perf-core-for-mingo tag,
> please consider applying,
> 
> - Arnaldo
> 
> The following changes since commit 5ac76283b32b116c58e362e99542182ddcfc8262:
> 
>   perf cpumap: Auto initialize cpu__max_{node,cpu} (2016-01-26 16:08:36 -0300)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo-2
> 
> for you to fetch changes up to 814568db641f6587c1e98a3a85f214cb6a30fe10:
> 
>   perf build: Align the names of the build tests: (2016-01-29 17:51:04 -0300)
> 
> 
> New features:
> 
> - Port 'perf kvm stat' to PowerPC (Hemant Kumar)
> 
> Infrastructure:
> 
> - Use the 'feature-dump' target to do the feature checks just once and then
>   add code to reuse that in the tests/make makefile, speeding up the
>   'make -C tools/perf build-test' target (Wang Nan)
> 
> - Reduce the number of tests the 'build-test' target do to those that don't
>   pollute the source tree (Arnaldo Carvalho de Melo)
> 
> - Improve the output of the build tests a bit by aligning the name of the
>   tests, more can be done to filter out uninteresting info in the output
>   (Arnaldo Carvalho de Melo)
> 
> - Add perf_evlist pointer to *info_priv_size(), more prep work for
>   supporting the coresight architecture (Mathieu Poirier)
> 
> - Improve the 'perf test bp_signal' test (Wang Nan)
> 
> - Check environment before starting the BPF 'perf test', so that we can just
>   'Skip' older kernels instead of 'FAIL'ing them (Wang Nan)
> 
> - Fix cpumode of synthesized buildid event (Wang Nan)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (2):
>   perf tools: Speed up build-tests by reducing the number of builds tested
>   perf build: Align the names of the build tests:
> 
> Hemant Kumar (4):
>   perf kvm/{x86,s390}: Remove dependency on uapi/kvm_perf.h
>   perf kvm/{x86,s390}: Remove const from kvm_events_tp
>   perf kvm/powerpc: Port perf kvm stat to powerpc
>   perf kvm/powerpc: Add support for HCALL reasons
> 
> Jiri Olsa (1):
>   perf build: Fix feature-dump checks, we need to test all features
> 
> Mathieu Poirier (1):
>   perf auxtrace: Add perf_evlist pointer to *info_priv_size()
> 
> Wang Nan (8):
>   tools build: Check basic headers for test-compile feature checker
>   perf build: Remove all condition feature check {C,LD}FLAGS
>   perf build: Use feature dump file for build-test
>   perf buildid: Fix cpumode of buildid event
>   perf test: Check environment before start real BPF test
>   perf test: Improve bp_signal
>   perf tools: Move timestamp creation to util
>   perf record: Use OPT_BOOLEAN_SET for buildid cache related options
> 
>  tools/build/Makefile.feature   |   8 ++
>  tools/build/feature/test-compile.c |   2 +
>  tools/perf/Makefile|  11 +-
>  tools/perf/arch/powerpc/Makefile   |   2 +
>  tools/perf/arch/powerpc/util/Build |   1 +
>  tools/perf/arch/powerpc/util/book3s_hcalls.h   | 123 ++
>  tools/perf/arch/powerpc/util/book3s_hv_exits.h |  33 +
>  tools/perf/arch/powerpc/util/kvm-stat.c| 170 
> +
>  tools/perf/arch/s390/util/kvm-stat.c   |  10 +-
>  tools/perf/arch/x86/util/intel-bts.c   |   4 +-
>  tools/perf/arch/x86/util/intel-pt.c|   4 +-
>  tools/perf/arch/x86/util/kvm-stat.c|  16 ++-
>  tools/perf/builtin-buildid-cache.c |  14 +-
>  tools/perf/builtin-kvm.c   |  38 --
>  tools/perf/builtin-record.c|  12 +-
>  tools/perf/config/Makefile | 101 +++
>  tools/perf/tests/bp_signal.c   | 140 
>  tools/perf/tests/bpf.c |  37 ++
>  tools/perf/tests/make  |  39 +-
>  tools/perf/util/auxtrace.c |   7 +-
>  tools/perf/util/auxtrace.h |   6 +-
>  tools/perf/util/build-id.c |   6 +-
>  tools/perf/util/kvm-stat.h |   8 +-
>  tools/perf/util/util.c |  17 +++
>  tools/perf/util/util.h |   1 +
>  25 files changed, 688 insertions(+), 122 deletions(-)
>  create mode 100644 tools/perf/arch/powerpc/util/book3s_hcalls.h
>  create mode 100644 tools/perf/arch/powerpc/util/book3s_hv_exits.h
>  create mode 100644 tools/perf/arch/powerpc/util/kvm-stat.c

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org

Re: [PATCH 0/4] support for text-relative kallsyms table

2016-01-20 Thread Ingo Molnar

* Ard Biesheuvel  wrote:

> This implements text-relative kallsyms address tables. This was developed as 
> part of my series to implement KASLR/CONFIG_RELOCATABLE for arm64, but I 
> think 
> it may be beneficial to other architectures as well, so I am presenting it as 
> a 
> separate series.
> 
> The idea is that on 64-bit builds, it is rather wasteful to use absolute 
> addressing for kernel symbols since they are all within a couple of MBs of 
> each 
> other. On top of that, the absolute addressing implies that, when the kernel 
> is 
> relocated at runtime, each address in the table needs to be fixed up 
> individually.
> 
> Since all section-relative addresses are already emitted relative to _text, 
> it 
> is quite straight-forward to record only the offset, and add the absolute 
> address of _text at runtime when referring to the address table.
> 
> The reduction ranges from around 250 KB uncompressed vmlinux size and 10 KB 
> compressed size (s390) to 3 MB/500 KB for ppc64 (although, in the latter 
> case, 
> the reduction in uncompressed size is primarily __init data)

So since kallsyms is in unswappable kernel RAM, the uncompressed size reduction 
is 
what we care about mostly. How much bootloader load times are impacted is a 
third 
order concern.

IOW a nice change!

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 0/1] perf/urgent fix

2015-10-07 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> From: Arnaldo Carvalho de Melo 
> 
> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> The following changes since commit 097f70b3c4d84ffccca15195bdfde3a37c0a7c0f:
> 
>   Merge branch 'upstream' of 
> git://git.linux-mips.org/pub/scm/ralf/upstream-linus (2015-09-27 18:22:34 
> -0400)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-urgent-for-mingo
> 
> for you to fetch changes up to 9fb4765451f22c5e782c1590747717550bff34b2:
> 
>   perf tools: Fix build break on powerpc due to sample_reg_masks (2015-10-07 
> 10:20:08 -0300)
> 
> 
> perf/urgent fix:
> 
> - Fix build break on (at least) powerpc due to sample_reg_masks, not being
>   available for linking (Sukadev Bhattiprolu)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Sukadev Bhattiprolu (1):
>   perf tools: Fix build break on powerpc due to sample_reg_masks
> 
>  tools/perf/util/Build   | 2 +-
>  tools/perf/util/perf_regs.c | 2 ++
>  tools/perf/util/perf_regs.h | 1 +
>  3 files changed, 4 insertions(+), 1 deletion(-)

Pulled, thanks Arnaldo!

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/16] perf/core improvements and fixes

2015-10-01 Thread Ingo Molnar

* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling,
> 
> - Arnaldo
> 
> The following changes since commit 9c17dbc6eb73bdd8a6aaea1baefd37ff78d86148:
> 
>   Merge tag 'perf-core-for-mingo' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2015-09-29 09:43:46 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-core-for-mingo
> 
> for you to fetch changes up to 7f8d1ade1b19f684ed3a7c4fb1dc5d347127b438:
> 
>   perf tools: By default use the most precise "cycles" hw counter available 
> (2015-09-30 18:34:39 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> User visible:
> 
> - By default use the most precise "cycles" hw counter available, i.e.
>   when the user doesn't specify any event, it will try using cycles:ppp,
>   cycles:pp, etc (Arnaldo Carvalho de Melo)

That looks really useful!

> - Remove blank lines, headers when piping output in 'perf list', so that it 
> can
>   be sanely used with 'wc -l', etc (Arnaldo Carvalho de Melo)
> 
> - Amend documentation about max_stack and synthesized callchains (Adrian 
> Hunter)
> 
> - Fix 'perf probe -l' for probes added to kernel module functions (Masami 
> Hiramatsu)
> 
> Build fixes:
> 
> - Fix shadowed declarations that break the build on older distros (Jiri Olsa)
> 
> - Fix build break on powerpc due to sample_reg_masks (Sukadev Bhattiprolu)
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Adrian Hunter (1):
>   perf report: Amend documentation about max_stack and synthesized 
> callchains
> 
> Arnaldo Carvalho de Melo (7):
>   perf maps: Introduce maps__find_symbol_by_name()
>   perf machine: Use machine__kernel_map() thoroughly
>   perf machine: Add method for common kernel_map(FUNCTION) operation
>   tools lib symbol: Rename kallsyms2elf_type to kallsyms2elf_binding
>   tools lib symbol: Introduce kallsyms2elf_type
>   perf list: Remove blank lines, headers when piping output
>   perf tools: By default use the most precise "cycles" hw counter 
> available
> 
> Jiri Olsa (2):
>   tools: Fix shadowed declaration in err.h
>   perf tools: Fix shadowed declaration in parse-events.c
> 
> Masami Hiramatsu (5):
>   perf probe: Fix to remove dot suffix from second or latter events
>   perf probe: Begin and end libdwfl report session correctly
>   perf probe: Show correct source lines of probes on kmodules
>   perf probe: Fix a segfault bug in debuginfo_cache
>   perf probe: Improve error message when %return is on inlined function
> 
> Sukadev Bhattiprolu (1):
>   perf tools: Fix build break on powerpc due to sample_reg_masks
> 
>  tools/include/linux/err.h|  4 +-
>  tools/lib/symbol/kallsyms.c  |  6 ++
>  tools/lib/symbol/kallsyms.h  |  4 +-
>  tools/perf/Documentation/perf-report.txt |  2 +
>  tools/perf/builtin-kmem.c|  2 +-
>  tools/perf/builtin-list.c|  2 +-
>  tools/perf/builtin-report.c  |  2 +-
>  tools/perf/tests/code-reading.c  |  2 +-
>  tools/perf/tests/vmlinux-kallsyms.c  |  4 +-
>  tools/perf/util/Build|  2 +-
>  tools/perf/util/event.c  |  7 +--
>  tools/perf/util/evlist.c | 22 +++-
>  tools/perf/util/intel-pt.c   |  2 +-
>  tools/perf/util/machine.c| 26 -
>  tools/perf/util/machine.h|  8 ++-
>  tools/perf/util/map.c| 21 ---
>  tools/perf/util/map.h|  2 +
>  tools/perf/util/parse-events.c   | 53 +-
>  tools/perf/util/perf_regs.c  |  2 +
>  tools/perf/util/perf_regs.h  |  1 +
>  tools/perf/util/pmu.c|  2 +-
>  tools/perf/util/probe-event.c| 96 
> 
>  tools/perf/util/probe-finder.c   | 58 +--
>  tools/perf/util/symbol.c |  2 +-
>  24 files changed, 224 insertions(+), 108 deletions(-)

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v2 0/6] powerpc: use jump label for {cpu,mmu}_has_feature()

2015-08-25 Thread Ingo Molnar

* Kevin Hao haoke...@gmail.com wrote:

 Hi,
 
 v2:
 Drop the following two patches as suggested by Ingo and Peter:
 jump_label: no need to acquire the jump_label_mutex in jump_lable_init()
 jump_label: introduce DEFINE_STATIC_KEY_{TRUE,FALSE}_ARRAY macros
 
 v1:
 I have tried to change the {cpu,mmu}_has_feature() to use jump label two yeas
 ago [1]. But that codes seem a bit ugly. This is a reimplementation by moving 
 the
 jump_label_init() much earlier so the jump label can be used in a very earlier
 stage. Boot test on p4080ds, t2080rdb and powermac (qemu). This patch series
 is against linux-next.
 
 [1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-September/111026.html
 
 Kevin Hao (6):
   jump_label: make it possible for the archs to invoke jump_label_init()
 much earlier
   powerpc: invoke jump_label_init() in a much earlier stage
   powerpc: kill mfvtb()
   powerpc: move the cpu_has_feature to a separate file
   powerpc: use the jump label for cpu_has_feature
   powerpc: use jump label for mmu_has_feature
 
  arch/powerpc/include/asm/cacheflush.h   |  1 +
  arch/powerpc/include/asm/cpufeatures.h  | 34 ++
  arch/powerpc/include/asm/cputable.h | 16 +++---
  arch/powerpc/include/asm/cputime.h  |  1 +
  arch/powerpc/include/asm/dbell.h|  1 +
  arch/powerpc/include/asm/dcr-native.h   |  1 +
  arch/powerpc/include/asm/mman.h |  1 +
  arch/powerpc/include/asm/mmu.h  | 29 ++
  arch/powerpc/include/asm/reg.h  |  9 
  arch/powerpc/include/asm/time.h |  3 ++-
  arch/powerpc/include/asm/xor.h  |  1 +
  arch/powerpc/kernel/align.c |  1 +
  arch/powerpc/kernel/cputable.c  | 37 
 +
  arch/powerpc/kernel/irq.c   |  1 +
  arch/powerpc/kernel/process.c   |  1 +
  arch/powerpc/kernel/setup-common.c  |  1 +
  arch/powerpc/kernel/setup_32.c  |  5 +
  arch/powerpc/kernel/setup_64.c  |  4 
  arch/powerpc/kernel/smp.c   |  1 +
  arch/powerpc/platforms/cell/pervasive.c |  1 +
  arch/powerpc/xmon/ppc-dis.c |  1 +
  kernel/jump_label.c |  3 +++
  22 files changed, 135 insertions(+), 18 deletions(-)
  create mode 100644 arch/powerpc/include/asm/cpufeatures.h

Looks good to me!

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/8] jump_label: introduce DEFINE_STATIC_KEY_{TRUE,FALSE}_ARRAY macros

2015-08-21 Thread Ingo Molnar

* Kevin Hao haoke...@gmail.com wrote:

 On Fri, Aug 21, 2015 at 08:28:26AM +0200, Ingo Molnar wrote:
  
  * Kevin Hao haoke...@gmail.com wrote:
  
   These are used to define a static_key_{true,false} array.
   
   Signed-off-by: Kevin Hao haoke...@gmail.com
   ---
include/linux/jump_label.h | 6 ++
1 file changed, 6 insertions(+)
   
   diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
   index 7f653e8f6690..5c1d6a49dd6b 100644
   --- a/include/linux/jump_label.h
   +++ b/include/linux/jump_label.h
   @@ -267,6 +267,12 @@ struct static_key_false {
#define DEFINE_STATIC_KEY_FALSE(name)\
 struct static_key_false name = STATIC_KEY_FALSE_INIT

   +#define DEFINE_STATIC_KEY_TRUE_ARRAY(name, n)\
   + struct static_key_true name[n] = { [0 ... n - 1] = STATIC_KEY_TRUE_INIT 
   }
   +
   +#define DEFINE_STATIC_KEY_FALSE_ARRAY(name, n)   \
   + struct static_key_false name[n] = { [0 ... n - 1] = 
   STATIC_KEY_FALSE_INIT }
  
  I think the define makes the code more obfuscated and less clear, the 
  open-coded 
  initialization is pretty dense and easy to read to begin with.
 
 OK, I will drop this patch and move the initialization of the array to the 
 corresponding patch.

Please also Cc: peterz and me to the next submission of the series - static key 
(and jump label) changes go through the locking tree normally, and there's a 
number of changes pending already for v4.3:

20f9ed1568c0 locking/static_keys: Make verify_keys() static
412758cb2670 jump label, locking/static_keys: Update docs
2bf9e0ab08c6 locking/static_keys: Provide a selftest
ed79e946732e s390/uaccess, locking/static_keys: employ static_branch_likely()
3bbfafb77a06 x86, tsc, locking/static_keys: Employ static_branch_likely()
1987c947d905 locking/static_keys: Add selftest
11276d5306b8 locking/static_keys: Add a new static_key interface
706249c222f6 locking/static_keys: Rework update logic
e33886b38cc8 locking/static_keys: Add static_key_{en,dis}able() helpers
7dcfd915bae5 jump_label: Add jump_entry_key() helper
a1efb01feca5 jump_label, locking/static_keys: Rename JUMP_LABEL_TYPE_* and 
related helpers to the static_key* pattern

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/8] jump_label: introduce DEFINE_STATIC_KEY_{TRUE,FALSE}_ARRAY macros

2015-08-21 Thread Ingo Molnar

* Kevin Hao haoke...@gmail.com wrote:

 These are used to define a static_key_{true,false} array.
 
 Signed-off-by: Kevin Hao haoke...@gmail.com
 ---
  include/linux/jump_label.h | 6 ++
  1 file changed, 6 insertions(+)
 
 diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
 index 7f653e8f6690..5c1d6a49dd6b 100644
 --- a/include/linux/jump_label.h
 +++ b/include/linux/jump_label.h
 @@ -267,6 +267,12 @@ struct static_key_false {
  #define DEFINE_STATIC_KEY_FALSE(name)\
   struct static_key_false name = STATIC_KEY_FALSE_INIT
  
 +#define DEFINE_STATIC_KEY_TRUE_ARRAY(name, n)\
 + struct static_key_true name[n] = { [0 ... n - 1] = STATIC_KEY_TRUE_INIT 
 }
 +
 +#define DEFINE_STATIC_KEY_FALSE_ARRAY(name, n)   \
 + struct static_key_false name[n] = { [0 ... n - 1] = 
 STATIC_KEY_FALSE_INIT }

I think the define makes the code more obfuscated and less clear, the 
open-coded 
initialization is pretty dense and easy to read to begin with.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: provide more common DMA API functions V2

2015-08-18 Thread Ingo Molnar

* Andrew Morton a...@linux-foundation.org wrote:

 On Tue, 18 Aug 2015 07:38:25 +0200 Christoph Hellwig h...@lst.de wrote:
 
  On Mon, Aug 17, 2015 at 02:24:29PM -0700, Andrew Morton wrote:
   110254 bytes saved, shrinking the kernel by a whopping 0.17%. 
   Thoughts?
  
  Sounds fine to me.
 
 OK, I'll clean it up a bit, check that each uninlining actually makes
 sense and then I'll see how it goes.
 
   
   I'll merge these 5 patches for 4.3.  That means I'll release them into
   linux-next after 4.2 is released.
  
  So you only add for-4.3 code to -next after 4.2 is odd?  Isn't thast the
  wrong way around?
 
 Linus will be releasing 4.2 in 1-2 weeks and until then, linux-next is
 supposed to contain only 4.2 material.  Once 4.2 is released,
 linux-next is open for 4.3 material.

Isn't that off by one?

I.e. shouldn't this be:

 I'll merge these 5 patches for 4.4.  That means I'll release them into 
 linux-next after 4.2 is released.

 [...]
 
 Linus will be releasing 4.2 in 1-2 weeks and until then, linux-next is 
 supposed 
 to contain only 4.3 material.  Once 4.2 is released and the 4.3 merge window 
 opens, linux-next is open for 4.4 material.

?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/13] perf/core improvements and fixes

2015-06-25 Thread Ingo Molnar

* Arnaldo Carvalho de Melo a...@kernel.org wrote:

 Hi Ingo,
 
   Please consider pulling,
 
 - Arnaldo
 
 The following changes since commit a9a3cd900fbbcbf837d65653105e7bfc583ced09:
 
   Merge tag 'perf-core-for-mingo' of 
 git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
 (2015-06-20 01:11:11 +0200)
 
 are available in the git repository at:
 
   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
 tags/perf-core-for-mingo
 
 for you to fetch changes up to 83b2ea257eb1d43e52f76d756722aeb899a2852c:
 
   perf tools: Allow auxtrace data alignment (2015-06-23 18:28:37 -0300)
 
 
 perf/core improvements and fixes:
 
 User visible:
 
 - Move toggling event logic from 'perf top' and into hists browser, allowing
   freeze/unfreeze with event lists with more than one entry (Namhyung Kim)
 
 - Add missing newlines when dumping PERF_RECORD_FINISHED_ROUND and
   showing the Aggregated stats in 'perf report -D' (Adrian Hunter)
 
 Infrastructure:
 
 - Allow auxtrace data alignment (Adrian Hunter)
 
 - Allow events with dot (Andi Kleen)
 
 - Fix failure to 'perf probe' events on arm (He Kuang)
 
 - Add testing for Makefile.perf (Jiri Olsa)
 
 - Add test for make install with prefix (Jiri Olsa)
 
 - Fix single target build dependency check (Jiri Olsa)
 
 - Access thread_map entries via accessors, prep patch to hold more info per
   entry, for ongoing 'perf stat --per-thread' work (Jiri Olsa)
 
 - Use __weak definition from compiler.h (Sukadev Bhattiprolu)
 
 - Split perf_pmu__new_alias() (Sukadev Bhattiprolu)
 
 Signed-off-by: Arnaldo Carvalho de Melo a...@redhat.com
 
 
 Adrian Hunter (3):
   perf session: Print a newline when dumping PERF_RECORD_FINISHED_ROUND
   perf tools: Print a newline before dumping Aggregated stats
   perf tools: Allow auxtrace data alignment
 
 Andi Kleen (1):
   perf tools: Allow events with dot
 
 He Kuang (1):
   perf probe: Fix failure to probe events on arm
 
 Jiri Olsa (5):
   perf tests: Add testing for Makefile.perf
   perf tests: Add test for make install with prefix
   perf build: Fix single target build dependency check
   perf thread_map: Don't access the array entries directly
   perf thread_map: Change map entries into a struct
 
 Namhyung Kim (1):
   perf top: Move toggling event logic into hists browser
 
 Sukadev Bhattiprolu (2):
   perf pmu: Use __weak definition from linux/compiler.h
   perf pmu: Split perf_pmu__new_alias()
 
  tools/perf/Makefile |  4 +--
  tools/perf/builtin-top.c| 24 ++-
  tools/perf/builtin-trace.c  |  4 +--
  tools/perf/tests/make   | 31 ++--
  tools/perf/tests/openat-syscall-tp-fields.c |  2 +-
  tools/perf/ui/browsers/hists.c  | 19 ++--
  tools/perf/util/auxtrace.c  | 11 +--
  tools/perf/util/auxtrace.h  |  1 +
  tools/perf/util/event.c |  6 ++--
  tools/perf/util/evlist.c|  4 +--
  tools/perf/util/evsel.c |  2 +-
  tools/perf/util/parse-events.l  |  5 ++--
  tools/perf/util/pmu.c   | 45 
 +++--
  tools/perf/util/probe-event.c   |  6 +++-
  tools/perf/util/session.c   |  4 ++-
  tools/perf/util/thread_map.c| 24 ---
  tools/perf/util/thread_map.h| 16 +-
  17 files changed, 136 insertions(+), 72 deletions(-)

Pulled, thanks a lot Arnaldo!

Btw., one small thing I noticed about the status line in perf top: if I ever 
use 
'f' to freeze/unfreeze events, the following message:

  Press 'f' to disable the events or 'h' to see other hotkeys

sticks around forever, even after I look into annotation and exit it, etc.

So I don't mind some default, helpful message there (such as 'Press 'h' to see 
hotkeys'), but it appears this particular message is context and usage 
sensitive, 
which wasn't really the goal, right?

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C style file

2015-05-29 Thread Ingo Molnar

* Andi Kleen a...@linux.intel.com wrote:

  So instead of this flat structure, there should at minimum be broad 
  categorization 
  of the various parts of the hardware they relate to: whether they relate to 
  the 
  branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
  execution units, FPU ops, etc., etc. - so that they can be queried via 
  'perf 
  list'.
 
 The categorization is generally on the stem name, which already works fine 
 with 
 the existing perf list wildcard support. So for example you only want 
 branches.

 perf list br*
 ...
   br_inst_exec.all_branches 
[Speculative and retired branches]
   br_inst_exec.all_conditional  
[Speculative and retired macro-conditional branches]
   br_inst_exec.all_direct_jmp   
[Speculative and retired macro-unconditional branches excluding calls 
 and indirects]
   br_inst_exec.all_direct_near_call 
[Speculative and retired direct near calls]
   br_inst_exec.all_indirect_jump_non_call_ret   
[Speculative and retired indirect branches excluding calls and returns]
   br_inst_exec.all_indirect_near_return 
[Speculative and retired indirect return branches]
 ...
 
 Or mid level cache events:
 
 perf list l2*
 ...
   l2_l1d_wb_rqsts.all   
[Not rejected writebacks from L1D to L2 cache lines in any state]
   l2_l1d_wb_rqsts.hit_e 
[Not rejected writebacks from L1D to L2 cache lines in E state]
   l2_l1d_wb_rqsts.hit_m 
[Not rejected writebacks from L1D to L2 cache lines in M state]
   l2_l1d_wb_rqsts.miss  
[Count the number of modified Lines evicted from L1 and missed L2. 
 (Non-rejected WBs from the DCU.)]
   l2_lines_in.all   
[L2 cache lines filling L2]
 ...
 
 There are some exceptions, but generally it works this way.

You are missing my point in several ways:

1)

Firstly, there are _tons_ of 'exceptions' to the 'stem name' grouping, to the 
level that makes it unusable for high level grouping of events.

Here's the 'stem name' histogram on the SandyBridge event list:

  $ grep EventName pmu-events/arch/x86/SandyBridge_core.json  | cut -d\. -f1 | 
cut -d\ -f4 | cut -d\_ -f1 | sort | uniq -c | sort -n

  1 AGU
  1 BACLEARS
  1 EPT
  1 HW
  1 ICACHE
  1 INSTS
  1 PAGE
  1 ROB
  1 RS
  1 SQ
  2 ARITH
  2 DSB2MITE
  2 ILD
  2 LOAD
  2 LOCK
  2 LONGEST
  2 MISALIGN
  2 SIMD
  2 TLB
  3 CPL
  3 DSB
  3 INST
  3 INT
  3 LSD
  3 MACHINE
  4 CPU
  4 OTHER
  4 PARTIAL
  5 CYCLE
  5 ITLB
  6 LD
  7 L1D
  8 DTLB
 10 FP
 12 RESOURCE
 21 UOPS
 24 IDQ
 25 MEM
 37 BR
 37 L2
131 OFFCORE

Out of 386 events. This grouping has the following severe problems:

  - that's 41 'stem name' groups, way too much as a first hop high level 
structure. We want the kind of high level categorization I suggested:
cache, decoding, branches, execution pipeline, memory events, vector unit 
events - which broad categories exist in all CPUs and are microarchitecture 
independent.

  - even these 'stem names' are mostly unstructured and unreadable. The two 
examples you cited are the best case that are borderline readable, but they
cover less than 20% of all events.

  - the 'stem name' concept is not even used consistently, the names are 
essentially a random collection of Intel internal acronyms, which 
occasionally 
match up with high level concepts. These vendor defined names have very 
poor 
high level structure.

  - the 'stem names' are totally imbalanced: there's one 'super' category 'stem 
name': OFFCORE_RESPONSE, with 131 events in it and then there are super 
small 
groups in the list above. Not well suited to get a good overview about what 
measurement capabilities the hardware has.

So forget about using 'stem names' as the high level structure. These events 
have 
no high level structure and we should provide that, instead of dumping 380+ 
events 
on the unsuspecting user.

2)

Secondly, categorization and higher level hieararchy should be used to keep the 
list manageable. The fact that if _you_ know what to search for you can list 
just 
a subset does not mean anything to the new user trying to discover events.

A simple 'perf list' should list the high level categories by default, with a 
count displayed that shows how many further events are within that category. 
(compacted tree output would be usable as well.)

 The stem could be put into a separate header, but it would seem redundant to 
 me.

Higher level categories simply don't exist in these names in any usable form, 
so 
it has to be created. Just redundantly 

Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C style file

2015-05-28 Thread Ingo Molnar

* Ingo Molnar mi...@kernel.org wrote:

 
 * Jiri Olsa jo...@redhat.com wrote:
 
  On Wed, May 27, 2015 at 11:59:04PM +0900, Namhyung Kim wrote:
   Hi Andi,
   
   On Wed, May 27, 2015 at 11:40 PM, Andi Kleen a...@linux.intel.com wrote:
So we build tables of all models in the architecture, and choose
matching one when compiling perf, right?  Can't we do that when
building the tables?  IOW, why don't we check the VFM and discard
non-matching tables?  Those non-matching tables are also needed?
   
We build it for all cpus in an architecture, not all architectures.
So e.g. for an x86 binary power is not included, and vice versa.
   
   OK.
   
It always includes all CPUs for a given architecture, so it's possible
to use the perf binary on other systems than just the one it was
build on.
   
   So it selects one at run-time not build-time, good.  But I worry about
   the size of the intel tables.  How large are they?  Maybe we can make
   it dynamic-loadable if needed..
  
  just compiled Sukadev's new version with Andi's events list
  and stripped binary size is:
  
  [jolsa@krava perf]$ ls -l perf
  -rwxrwxr-x 1 jolsa jolsa 2772640 May 28 13:49 perf
  
  
  while perf on Arnaldo's perf/core is:
  
  [jolsa@krava perf]$ ls -l perf
  -rwxrwxr-x 1 jolsa jolsa 2334816 May 28 13:49 perf
  
  seems not that bad
 
 It's not bad at all.
 
 Do you have a Git tree URI where I could take a look at its current state? A 
 tree would be nice that has as many of these patches integrated as possible.

A couple of observations:

1)

The x86 JSON files are unnecessarily large, and for no good reason, for example:

 triton:~/tip/tools/perf/pmu-events/arch/x86 grep -h EdgeDetect * | sort | 
uniq -c
   5534 EdgeDetect: 0,
 57 EdgeDetect: 1,

it's ridiculous to repeat EdgeDetect: 0 more than 5 thousand times, just so 
that in 57 cases we can say '1'. Those lines should be omitted, and the default 
value should be 0.

This would reduce the source code line count of the JSON files by 40% already:

 triton:~/tip/tools/perf/pmu-events/arch/x86 grep ': 0,' * | wc -l
 42127
 triton:~/tip/tools/perf/pmu-events/arch/x86 cat * | wc -l
 103702

And no, I don't care if manufacturers release crappy JSON files - they need to 
be 
fixed/stripped before applied to our source tree.

2)

Also, the JSON files should carry more high levelstructure than they do today. 
Let's take SandyBridge_core.json as an example: it defines 386 events, but they 
are all in a 'flat' hierarchy, which is almost impossible for all but the most 
expert users to overview.

So instead of this flat structure, there should at minimum be broad 
categorization 
of the various parts of the hardware they relate to: whether they relate to the 
branch predictor, memory caches, TLB caches, memory ops, offcore, decoders, 
execution units, FPU ops, etc., etc. - so that they can be queried via 'perf 
list'.

We don't just want the import the unstructured mess that these event files are 
- 
we want to turn them into real structure. We can still keep the messy vendor 
names 
as well, like IDQ.DSB_CYCLES, but we want to impose structure as well.

3)

There should be good 'perf list' visualization for these events: grouping, 
individual names, with a good interface to query details if needed. I.e. it 
should 
be possible to browse and discover events relevant to the CPU the tool is 
executing on.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/4] perf: jevents: Program to convert JSON file to C style file

2015-05-28 Thread Ingo Molnar

* Jiri Olsa jo...@redhat.com wrote:

 On Wed, May 27, 2015 at 11:59:04PM +0900, Namhyung Kim wrote:
  Hi Andi,
  
  On Wed, May 27, 2015 at 11:40 PM, Andi Kleen a...@linux.intel.com wrote:
   So we build tables of all models in the architecture, and choose
   matching one when compiling perf, right?  Can't we do that when
   building the tables?  IOW, why don't we check the VFM and discard
   non-matching tables?  Those non-matching tables are also needed?
  
   We build it for all cpus in an architecture, not all architectures.
   So e.g. for an x86 binary power is not included, and vice versa.
  
  OK.
  
   It always includes all CPUs for a given architecture, so it's possible
   to use the perf binary on other systems than just the one it was
   build on.
  
  So it selects one at run-time not build-time, good.  But I worry about
  the size of the intel tables.  How large are they?  Maybe we can make
  it dynamic-loadable if needed..
 
 just compiled Sukadev's new version with Andi's events list
 and stripped binary size is:
 
 [jolsa@krava perf]$ ls -l perf
 -rwxrwxr-x 1 jolsa jolsa 2772640 May 28 13:49 perf
 
 
 while perf on Arnaldo's perf/core is:
 
 [jolsa@krava perf]$ ls -l perf
 -rwxrwxr-x 1 jolsa jolsa 2334816 May 28 13:49 perf
 
 seems not that bad

It's not bad at all.

Do you have a Git tree URI where I could take a look at its current state? A 
tree 
would be nice that has as many of these patches integrated as possible.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 1/2] perf/kvm: Port perf kvm to powerpc

2015-05-08 Thread Ingo Molnar

* Hemant Kumar hem...@linux.vnet.ibm.com wrote:

 
 On 05/08/2015 09:58 AM, Ingo Molnar wrote:
 * Hemant Kumar hem...@linux.vnet.ibm.com wrote:
 
   # perf kvm stat report -p 60515
 Analyze events for pid(s) 60515, all VCPUs:
 
 VM-EXITSamples  Samples% Time%Min Time Max  
Time Avg time
 
 H_DATA_STORAGE   500635.30% 0.13%  1.94us 49.46us 
 12.37us ( +-   0.52% )
 HV_DECREMENTER   445731.43% 0.02%  0.72us 16.14us  
 1.91us ( +-   0.96% )
 SYSCALL   269018.97% 0.10%  2.84us528.24us 
  18.29us ( +-   3.75% )
 RETURN_TO_HOST   178912.61%99.76%  1.58us 672791.91us  
 27470.23us ( +-   3.00% )
EXTERNAL240 1.69% 0.00%0.69us 10.67us
1.33us ( +-   5.34% )
 Where is the last line misaligned? Copy  paste error or does perf kvm
 produce it in such a way?
 
 Its a copy-paste error. Thanks for pointing this out.
 
 Shall I resend the patches with the correct alignment of the o/p?

I don't think that's necessary, as long as the code is fine.

Thanks,

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

  1   2   3   4   >