[PATCH v10 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver
This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices. The SoC has PMU support in L3 cache controller (L3C) and in the DDR4 Memory Controller (DMC). v10: Updated Documentation patch with comments [6]. [6] https://lkml.org/lkml/2018/12/5/649 v9: Updated with comments [5]. [5] https://lkml.org/lkml/2018/11/22/517 v8: Updated with comments [4]. [4] https://lkml.org/lkml/2018/10/25/215 v7: Incorporated review comments [3]. Modified driver as loadable module. Updated Documentation with Event description. Removed per-channel(no SMC calls) sampling implementation( Since DMC and L3C channels are interleave, we have decided to sample channel zero and prorate it to account for a Device). [3] https://patchwork.kernel.org/patch/10479203/ v6: Rebased to 4.18-rc1 Updated with comments from John Garry[3] [3] https://lkml.org/lkml/2018/5/17/408 v5: Incorporated review comments from Mark Rutland[2] v4: Incorporated review comments from Mark Rutland[1] [1] https://www.spinics.net/lists/arm-kernel/msg588563.html [2] https://lkml.org/lkml/2018/4/26/376 v3: Fixed warning reported by kbuild robot v2: Rebased to 4.12-rc1 Removed Arch VULCAN dependency. Update SMC call parameters as per latest firmware. v1: Initial patch Ganapatrao Kulkarni (2): perf, uncore: Adding documentation for ThunderX2 pmu uncore driver ThunderX2, perf : Add Cavium ThunderX2 SoC UNCORE PMU driver Documentation/perf/thunderx2-pmu.txt | 93 +++ drivers/perf/Kconfig | 9 + drivers/perf/Makefile| 1 + drivers/perf/thunderx2_pmu.c | 861 +++ include/linux/cpuhotplug.h | 1 + 5 files changed, 965 insertions(+) create mode 100644 Documentation/perf/thunderx2-pmu.txt create mode 100644 drivers/perf/thunderx2_pmu.c -- 2.18.0
[PATCH v9 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver
From: Ganapatrao Kulkarni This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices. The SoC has PMU support in L3 cache controller (L3C) and in the DDR4 Memory Controller (DMC). v9: Updated with comments [5]. [5] https://lkml.org/lkml/2018/11/22/517 v8: Updated with comments [4]. [4] https://lkml.org/lkml/2018/10/25/215 v7: Incorporated review comments [3]. Modified driver as loadable module. Updated Documentation with Event description. Removed per-channel(no SMC calls) sampling implementation( Since DMC and L3C channels are interleave, we have decided to sample channel zero and prorate it to account for a Device). [3] https://patchwork.kernel.org/patch/10479203/ v6: Rebased to 4.18-rc1 Updated with comments from John Garry[3] [3] https://lkml.org/lkml/2018/5/17/408 v5: Incorporated review comments from Mark Rutland[2] v4: Incorporated review comments from Mark Rutland[1] [1] https://www.spinics.net/lists/arm-kernel/msg588563.html [2] https://lkml.org/lkml/2018/4/26/376 v3: Fixed warning reported by kbuild robot v2: Rebased to 4.12-rc1 Removed Arch VULCAN dependency. Update SMC call parameters as per latest firmware. v1: Initial patch Ganapatrao Kulkarni (2): perf, uncore: Adding documentation for ThunderX2 pmu uncore driver ThunderX2, perf : Add Cavium ThunderX2 SoC UNCORE PMU driver Documentation/perf/thunderx2-pmu.txt | 93 +++ drivers/perf/Kconfig | 9 + drivers/perf/Makefile| 1 + drivers/perf/thunderx2_pmu.c | 861 +++ include/linux/cpuhotplug.h | 1 + 5 files changed, 965 insertions(+) create mode 100644 Documentation/perf/thunderx2-pmu.txt create mode 100644 drivers/perf/thunderx2_pmu.c -- 2.18.0
[PATCH v8 1/2] perf, uncore: Adding documentation for ThunderX2 pmu uncore driver
The SoC has PMU support in its L3 cache controller (L3C) and in the DDR4 Memory Controller (DMC). Signed-off-by: Ganapatrao Kulkarni --- Documentation/perf/thunderx2-pmu.txt | 106 +++ 1 file changed, 106 insertions(+) create mode 100644 Documentation/perf/thunderx2-pmu.txt diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt new file mode 100644 index ..9f5dd7459e68 --- /dev/null +++ b/Documentation/perf/thunderx2-pmu.txt @@ -0,0 +1,106 @@ + +Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE) +== + +ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such +as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC). + +DMC has 8 interleave channels and L3C has 16 interleave tiles. Events are +sampled for default channel(i.e channel 0) and prorated to total number of +channels/tiles. + +DMC and L3C, Each PMU supports up to 4 counters. Counters are independently +programmable and can be started and stopped individually. Each counter can +be set to sample specific perf events. Counters are 32 bit and do not support +overflow interrupt; they are sampled at every 2 seconds. + +PMU UNCORE (perf) driver: + +The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices. +Each of the PMUs provides description of its available events +and configuration options in sysfs. + see /sys/devices/uncore_ + +S is socket id. +Each PMU can be used to sample up to 4 events simultaneously. + +The "format" directory describes format of the config (event ID). +The "events" directory provides configuration templates for all +supported event types that can be used with perf tool. + +For example, "uncore_dmc_0/cnt_cycles/" is an +equivalent of "uncore_dmc_0/config=0x1/". + +Each perf driver also provides a "cpumask" sysfs attribute, which contains a +single CPU ID of the processor which is likely to be used to handle all the +PMU events. It will be the first online CPU from the NUMA node of the PMU device. + +Example for perf tool use: + +perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1 + +perf stat -a -e \ +uncore_dmc_0/cnt_cycles/,\ +uncore_dmc_0/data_transfers/,\ +uncore_dmc_0/read_txns/,\ +uncore_dmc_0/write_txns/ sleep 1 + +perf stat -a -e \ +uncore_l3c_0/read_request/,\ +uncore_l3c_0/read_hit/,\ +uncore_l3c_0/inv_request/,\ +uncore_l3c_0/inv_hit/ sleep 1 + +The driver does not support sampling, therefore "perf record" will +not work. Per-task (without "-a") perf sessions are not supported. + +L3C events: + + +read_request: + Number of Read requests received by the L3 Cache. + This include Read as well as Read Exclusives. + +read_hit: + Number of Read requests received by the L3 cache that were hit + in the L3 (Data provided form the L3) + +writeback_request: + Number of Write Backs received by the L3 Cache. These are basically + the L2 Evicts and writes from the PCIe Write Cache. + +inv_nwrite_request: + This is the Number of Invalidate and Write received by the L3 Cache. + Also Writes from IO that did not go through the PCIe Write Cache. + +inv_nwrite_hit + This is the Number of Invalidate and Write received by the L3 Cache + That were a hit in the L3 Cache. + +inv_request: + Number of Invalidate request received by the L3 Cache. + +inv_hit: + Number of Invalidate request received by the L3 Cache that were a + hit in L3. + +evict_request: + Number of Evicts that the L3 generated. + +NOTE: +1. Granularity of all these events counter value is cache line length(64 Bytes). +2. L3C cache Hit Ratio = (read_hit + inv_nwrite_hit + inv_hit) / (read_request + inv_nwrite_request + inv_request) + +DMC events: + +cnt_cycles: + Count cycles (Clocks at the DMC clock rate) + +write_txns: + Number of 64 Bytes write transactions received by the DMC(s) + +read_txns: + Number of 64 Bytes Read transactions received by the DMC(s) + +data_transfers: + Number of 64 Bytes data transferred to or from DRAM. -- 2.18.0
[PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.
Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect, since L1D_CACHE_REFILL counts both load and store misses. Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses and dTLB-loads are wrongly mapped. Hence Deleting all these cache events from armv8_pmuv3 cache mapping. Signed-off-by: Ganapatrao Kulkarni --- arch/arm64/kernel/perf_event.c | 8 1 file changed, 8 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 33147aacdafd..6a67ad22d1eb 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -207,17 +207,9 @@ static const unsigned armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX] [PERF_COUNT_HW_CACHE_RESULT_MAX] = { PERF_CACHE_MAP_ALL_UNSUPPORTED, - [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_READ)][C(RESULT_MISS)]= ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE, - [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL, - [C(L1I)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_CACHE, [C(L1I)][C(OP_READ)][C(RESULT_MISS)]= ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL, - [C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB, - [C(ITLB)][C(OP_READ)][C(RESULT_MISS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL, [C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB, -- 2.18.0
RE: [PATCH v4 1/7] arm64: Add ftrace support
From: AKASHI Takahiro Sent: Tuesday, February 25, 2014 2:53 PM To: rost...@goodmis.org; fweis...@gmail.com; mi...@redhat.com; catalin.mari...@arm.com; will.dea...@arm.com; tim.b...@sonymobile.com Cc: Kulkarni, Ganapatrao; dsax...@linaro.org; ar...@arndb.de; linux-arm-ker...@lists.infradead.org; linaro-ker...@lists.linaro.org; linux-kernel@vger.kernel.org; AKASHI Takahiro Subject: [PATCH v4 1/7] arm64: Add ftrace support This patch implements arm64 specific part to support function tracers, such as function (CONFIG_FUNCTION_TRACER), function_graph (CONFIG_FUNCTION_GRAPH_TRACER) and function profiler (CONFIG_FUNCTION_PROFILER). With 'function' tracer, all the functions in the kernel are traced with timestamps in ${sysfs}/tracing/trace. If function_graph tracer is specified, call graph is generated. The kernel must be compiled with -pg option so that _mcount() is inserted at the beginning of functions. This function is called on every function's entry as long as tracing is enabled. In addition, function_graph tracer also needs to be able to probe function's exit. ftrace_graph_caller() & return_to_handler do this by faking link register's value to intercept function's return path. More details on architecture specific requirements are described in Documentation/trace/ftrace-design.txt. Signed-off-by: AKASHI Takahiro --- arch/arm64/Kconfig |2 + arch/arm64/include/asm/ftrace.h | 23 + arch/arm64/kernel/Makefile |4 + arch/arm64/kernel/arm64ksyms.c |4 + arch/arm64/kernel/entry-ftrace.S | 173 ++ arch/arm64/kernel/ftrace.c | 64 ++ 6 files changed, 270 insertions(+) create mode 100644 arch/arm64/include/asm/ftrace.h create mode 100644 arch/arm64/kernel/entry-ftrace.S create mode 100644 arch/arm64/kernel/ftrace.c diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 27bbcfc..5783641 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -33,6 +33,8 @@ config ARM64 select HAVE_DMA_ATTRS select HAVE_DMA_CONTIGUOUS select HAVE_EFFICIENT_UNALIGNED_ACCESS + select HAVE_FUNCTION_TRACER + select HAVE_FUNCTION_GRAPH_TRACER select HAVE_GENERIC_DMA_COHERENT select HAVE_HW_BREAKPOINT if PERF_EVENTS select HAVE_MEMBLOCK diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h new file mode 100644 index 000..58ea595 --- /dev/null +++ b/arch/arm64/include/asm/ftrace.h @@ -0,0 +1,23 @@ +/* + * arch/arm64/include/asm/ftrace.h + * + * Copyright (C) 2013 Linaro Limited + * Author: AKASHI Takahiro + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef __ASM_FTRACE_H +#define __ASM_FTRACE_H + +#include + +#define MCOUNT_ADDR((unsigned long)_mcount) +#define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE + +#ifndef __ASSEMBLY__ +extern void _mcount(unsigned long); +#endif /* __ASSEMBLY__ */ + +#endif /* __ASM_FTRACE_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 2d4554b..ac67fd0 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -5,6 +5,9 @@ CPPFLAGS_vmlinux.lds := -DTEXT_OFFSET=$(TEXT_OFFSET) AFLAGS_head.o := -DTEXT_OFFSET=$(TEXT_OFFSET) +CFLAGS_REMOVE_ftrace.o = -pg +CFLAGS_REMOVE_insn.o = -pg + # Object file lists. arm64-obj-y:= cputable.o debug-monitors.o entry.o irq.o fpsimd.o \ entry-fpsimd.o process.o ptrace.o setup.o signal.o \ @@ -13,6 +16,7 @@ arm64-obj-y := cputable.o debug-monitors.o entry.o irq.o fpsimd.o \ arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o \ sys_compat.o +arm64-obj-$(CONFIG_FUNCTION_TRACER)+= ftrace.o entry-ftrace.o arm64-obj-$(CONFIG_MODULES)+= arm64ksyms.o module.o arm64-obj-$(CONFIG_SMP)+= smp.o smp_spin_table.o arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c index 338b568..7f0512f 100644 --- a/arch/arm64/kernel/arm64ksyms.c +++ b/arch/arm64/kernel/arm64ksyms.c @@ -56,3 +56,7 @@ EXPORT_SYMBOL(clear_bit); EXPORT_SYMBOL(test_and_clear_bit); EXPORT_SYMBOL(change_bit); EXPORT_SYMBOL(test_and_change_bit); + +#ifdef CONFIG_FUNCTION_TRACER +EXPORT_SYMBOL(_mcount); +#endif diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S new file mode 100644 index 000..2e8162e --- /dev/null +++ b/arch/arm64/kernel/entry-ftrace.S @@ -0,0 +1,173 @@ +/* + * arch/arm64/kernel/entry-ftrace.S + * + * Copyright (C) 2013 Linaro Limited + * Author: AKASHI Takahiro + * + * This program is free software; you can redistribute it an
RE: [PATCH v3 0/6] arm64: Add ftrace support
Looks OK to me. Reviewed-by: Ganapatrao Kulkarni regards, Ganapat From: AKASHI Takahiro Sent: Friday, February 7, 2014 3:48 PM To: rost...@goodmis.org; fweis...@gmail.com; mi...@redhat.com; catalin.mari...@arm.com; will.dea...@arm.com; Kulkarni, Ganapatrao; tim.b...@sonymobile.com Cc: ar...@arndb.de; linux-arm-ker...@lists.infradead.org; linaro-ker...@lists.linaro.org; linux-kernel@vger.kernel.org; patc...@linaro.org; AKASHI Takahiro Subject: [PATCH v3 0/6] arm64: Add ftrace support This is my third version of patchset for ftrace support. There was another implementation from Cavium network, but both of us agreed to use my patchset as future base. He is supposed to review this code, too. The only issue that I had some concern on was "fault protection" code in prepare_ftrace_return(). With discussions with Steven and Tim (as author of arm ftrace), I removed that code since I'm not quite sure about possibility of "fault" occurrences in this function. The code is tested on ARMv8 Fast Model with the following tracers & events: function tracer with dynamic ftrace function graph tracer with dynamic ftrace syscall tracepoint irqsoff & preemptirqsoff (which use CALLER_ADDRx) and also verified with in-kernel tests, FTRACE_SELFTEST, FTRACE_STARTUP_TEST and EVENT_TRACE_TEST_SYSCALLS. Please be careful: * elf.h on cross-build host must have AArch64 definitions, EM_AARCH64 and R_AARCH64_ABS64, to compile recordmcount utility. See [4/6]. [4/6] also gets warnings from checkpatch, but they are based on the original's coding style. * This patch may conflict with my audit patch because both changes the same location in syscall_trace(). I expect the functions are called in this order: On entry, * tracehook_report_syscall(ENTER) * trace_sys_enter() * audit_syscall_entry() On exit, * audit_sysscall_exit() * trace_sys_exit() * tracehook_report_syscall(EXIT) Changes from v1 to v2: * splitted one patch into some pieces for easier review (especially function tracer + dynamic ftrace + CALLER_ADDRx) * put return_address() in a separate file * renamed __mcount to _mcount (it was my mistake) * changed stackframe handling to get parent's frame pointer * removed ARCH_SUPPORTS_FTRACE_OPS * switched to "hotpatch" interfaces from Huawai * revised descriptions in comments Changes from v2 to v3: * optimized register usages in asm (by not saving x0, x1, and x2) * removed "fault protection" code in prepare_ftrace_return() * rewrote ftrace_modify_code() using "hotpatch" interfaces * revised descriptions in comments AKASHI Takahiro (6): arm64: Add ftrace support arm64: ftrace: Add dynamic ftrace support arm64: ftrace: Add CALLER_ADDRx macros ftrace: Add arm64 support to recordmcount arm64: ftrace: Add system call tracepoint arm64: Add 'notrace' attribute to unwind_frame() for ftrace arch/arm64/Kconfig |6 + arch/arm64/include/asm/ftrace.h| 54 + arch/arm64/include/asm/syscall.h |1 + arch/arm64/include/asm/unistd.h|2 + arch/arm64/kernel/Makefile |9 +- arch/arm64/kernel/arm64ksyms.c |4 + arch/arm64/kernel/entry-ftrace.S | 215 arch/arm64/kernel/ftrace.c | 177 + arch/arm64/kernel/ptrace.c |5 + arch/arm64/kernel/return_address.c | 55 + arch/arm64/kernel/stacktrace.c |2 +- scripts/recordmcount.c |4 + scripts/recordmcount.pl|5 + 13 files changed, 537 insertions(+), 2 deletions(-) create mode 100644 arch/arm64/include/asm/ftrace.h create mode 100644 arch/arm64/kernel/entry-ftrace.S create mode 100644 arch/arm64/kernel/ftrace.c create mode 100644 arch/arm64/kernel/return_address.c -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/