[PATCH v10 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver

2018-12-05 Thread Kulkarni, Ganapatrao
This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
The SoC has PMU support in L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

v10:
Updated Documentation patch with comments [6].

[6] https://lkml.org/lkml/2018/12/5/649

v9:
Updated with comments [5].

[5] https://lkml.org/lkml/2018/11/22/517

v8:
Updated with comments [4].

[4] https://lkml.org/lkml/2018/10/25/215

v7:
Incorporated review comments [3].
Modified driver as loadable module.
Updated Documentation with Event description.
Removed per-channel(no SMC calls) sampling implementation(
Since DMC and L3C channels are interleave, we have decided to
sample channel zero and prorate it to account for a Device).

[3] https://patchwork.kernel.org/patch/10479203/

v6:
Rebased to 4.18-rc1
Updated with comments from John Garry[3]

[3] https://lkml.org/lkml/2018/5/17/408

v5:
Incorporated review comments from Mark Rutland[2]
v4:
Incorporated review comments from Mark Rutland[1]

[1] https://www.spinics.net/lists/arm-kernel/msg588563.html
[2] https://lkml.org/lkml/2018/4/26/376

v3:
Fixed warning reported by kbuild robot

v2:
Rebased to 4.12-rc1
Removed Arch VULCAN dependency.
Update SMC call parameters as per latest firmware.

v1:
Initial patch

Ganapatrao Kulkarni (2):
  perf, uncore: Adding documentation for ThunderX2 pmu uncore driver
  ThunderX2, perf : Add Cavium ThunderX2 SoC UNCORE PMU driver

 Documentation/perf/thunderx2-pmu.txt |  93 +++
 drivers/perf/Kconfig |   9 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 861 +++
 include/linux/cpuhotplug.h   |   1 +
 5 files changed, 965 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt
 create mode 100644 drivers/perf/thunderx2_pmu.c

-- 
2.18.0



[PATCH v9 0/2] Add ThunderX2 SoC Performance Monitoring Unit driver

2018-12-05 Thread Kulkarni, Ganapatrao
From: Ganapatrao Kulkarni 

This patchset adds PMU driver for Cavium's ThunderX2 SoC UNCORE devices.
The SoC has PMU support in L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

v9:
Updated with comments [5].

[5] https://lkml.org/lkml/2018/11/22/517

v8:
Updated with comments [4].

[4] https://lkml.org/lkml/2018/10/25/215

v7:
Incorporated review comments [3].
Modified driver as loadable module.
Updated Documentation with Event description.
Removed per-channel(no SMC calls) sampling implementation(
Since DMC and L3C channels are interleave, we have decided to
sample channel zero and prorate it to account for a Device).

[3] https://patchwork.kernel.org/patch/10479203/

v6:
Rebased to 4.18-rc1
Updated with comments from John Garry[3]

[3] https://lkml.org/lkml/2018/5/17/408

v5:
Incorporated review comments from Mark Rutland[2]
v4:
Incorporated review comments from Mark Rutland[1]

[1] https://www.spinics.net/lists/arm-kernel/msg588563.html
[2] https://lkml.org/lkml/2018/4/26/376

v3:
Fixed warning reported by kbuild robot

v2:
Rebased to 4.12-rc1
Removed Arch VULCAN dependency.
Update SMC call parameters as per latest firmware.

v1:
Initial patch

Ganapatrao Kulkarni (2):
  perf, uncore: Adding documentation for ThunderX2 pmu uncore driver
  ThunderX2, perf : Add Cavium ThunderX2 SoC UNCORE PMU driver

 Documentation/perf/thunderx2-pmu.txt |  93 +++
 drivers/perf/Kconfig |   9 +
 drivers/perf/Makefile|   1 +
 drivers/perf/thunderx2_pmu.c | 861 +++
 include/linux/cpuhotplug.h   |   1 +
 5 files changed, 965 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt
 create mode 100644 drivers/perf/thunderx2_pmu.c

-- 
2.18.0



[PATCH v8 1/2] perf, uncore: Adding documentation for ThunderX2 pmu uncore driver

2018-11-21 Thread Kulkarni, Ganapatrao
The SoC has PMU support in its L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

Signed-off-by: Ganapatrao Kulkarni 
---
 Documentation/perf/thunderx2-pmu.txt | 106 +++
 1 file changed, 106 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt

diff --git a/Documentation/perf/thunderx2-pmu.txt 
b/Documentation/perf/thunderx2-pmu.txt
new file mode 100644
index ..9f5dd7459e68
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -0,0 +1,106 @@
+
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+==
+
+ThunderX2 SoC PMU consists of independent system wide per Socket PMUs such
+as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
+
+DMC has 8 interleave channels and L3C has 16 interleave tiles. Events are
+sampled for default channel(i.e channel 0) and prorated to total number of
+channels/tiles.
+
+DMC and L3C, Each PMU supports up to 4 counters. Counters are independently
+programmable and can be started and stopped individually. Each counter can
+be set to sample specific perf events. Counters are 32 bit and do not support
+overflow interrupt; they are sampled at every 2 seconds.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2-pmu driver registers several perf PMUs for DMC and L3C devices.
+Each of the PMUs provides description of its available events
+and configuration options in sysfs.
+   see /sys/devices/uncore_
+
+S is socket id.
+Each PMU can be used to sample up to 4 events simultaneously.
+
+The "format" directory describes format of the config (event ID).
+The "events" directory provides configuration templates for all
+supported event types that can be used with perf tool.
+
+For example, "uncore_dmc_0/cnt_cycles/" is an
+equivalent of "uncore_dmc_0/config=0x1/".
+
+Each perf driver also provides a "cpumask" sysfs attribute, which contains a
+single CPU ID of the processor which is likely to be used to handle all the
+PMU events. It will be the first online CPU from the NUMA node of the PMU 
device.
+
+Example for perf tool use:
+
+perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
+
+perf stat -a -e \
+uncore_dmc_0/cnt_cycles/,\
+uncore_dmc_0/data_transfers/,\
+uncore_dmc_0/read_txns/,\
+uncore_dmc_0/write_txns/ sleep 1
+
+perf stat -a -e \
+uncore_l3c_0/read_request/,\
+uncore_l3c_0/read_hit/,\
+uncore_l3c_0/inv_request/,\
+uncore_l3c_0/inv_hit/ sleep 1
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task (without "-a") perf sessions are not supported.
+
+L3C events:
+
+
+read_request:
+   Number of Read requests received by the L3 Cache.
+   This include Read as well as Read Exclusives.
+
+read_hit:
+   Number of Read requests received by the L3 cache that were hit
+   in the L3 (Data provided form the L3)
+
+writeback_request:
+   Number of Write Backs received by the L3 Cache. These are basically
+   the L2 Evicts and writes from the PCIe Write Cache.
+
+inv_nwrite_request:
+   This is the Number of Invalidate and Write received by the L3 Cache.
+   Also Writes from IO that did not go through the PCIe Write Cache.
+
+inv_nwrite_hit
+   This is the Number of Invalidate and Write received by the L3 Cache
+   That were a hit in the L3 Cache.
+
+inv_request:
+   Number of Invalidate request received by the L3 Cache.
+
+inv_hit:
+   Number of Invalidate request received by the L3 Cache that were a
+   hit in L3.
+
+evict_request:
+   Number of Evicts that the L3 generated.
+
+NOTE:
+1. Granularity of all these events counter value is cache line length(64 
Bytes).
+2. L3C cache Hit Ratio = (read_hit + inv_nwrite_hit + inv_hit) / (read_request 
+ inv_nwrite_request + inv_request)
+
+DMC events:
+
+cnt_cycles:
+   Count cycles (Clocks at the DMC clock rate)
+
+write_txns:
+   Number of 64 Bytes write transactions received by the DMC(s)
+
+read_txns:
+   Number of 64 Bytes Read transactions received by the DMC(s)
+
+data_transfers:
+   Number of 64 Bytes data transferred to or from DRAM.
-- 
2.18.0



[PATCH] arm_pmu: Delete incorrect cache event mapping for some armv8_pmuv3 events.

2018-10-01 Thread Kulkarni, Ganapatrao
Perf events L1-dcache-load-misses, L1-dcache-store-misses are mapped to
armv8_pmuv3 (both DT and ACPI) event L1D_CACHE_REFILL. This is incorrect,
since L1D_CACHE_REFILL counts both load and store misses.
Similarly the events L1-dcache-loads, L1-dcache-stores, dTLB-load-misses
and dTLB-loads are wrongly mapped. Hence Deleting all these cache events
from armv8_pmuv3 cache mapping.

Signed-off-by: Ganapatrao Kulkarni 
---
 arch/arm64/kernel/perf_event.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 33147aacdafd..6a67ad22d1eb 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -207,17 +207,9 @@ static const unsigned 
armv8_pmuv3_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]

[PERF_COUNT_HW_CACHE_RESULT_MAX] = {
PERF_CACHE_MAP_ALL_UNSUPPORTED,
 
-   [C(L1D)][C(OP_READ)][C(RESULT_ACCESS)]  = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-   [C(L1D)][C(OP_READ)][C(RESULT_MISS)]= 
ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-   [C(L1D)][C(OP_WRITE)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_CACHE,
-   [C(L1D)][C(OP_WRITE)][C(RESULT_MISS)]   = 
ARMV8_PMUV3_PERFCTR_L1D_CACHE_REFILL,
-
[C(L1I)][C(OP_READ)][C(RESULT_ACCESS)]  = ARMV8_PMUV3_PERFCTR_L1I_CACHE,
[C(L1I)][C(OP_READ)][C(RESULT_MISS)]= 
ARMV8_PMUV3_PERFCTR_L1I_CACHE_REFILL,
 
-   [C(DTLB)][C(OP_READ)][C(RESULT_MISS)]   = 
ARMV8_PMUV3_PERFCTR_L1D_TLB_REFILL,
-   [C(DTLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1D_TLB,
-
[C(ITLB)][C(OP_READ)][C(RESULT_MISS)]   = 
ARMV8_PMUV3_PERFCTR_L1I_TLB_REFILL,
[C(ITLB)][C(OP_READ)][C(RESULT_ACCESS)] = ARMV8_PMUV3_PERFCTR_L1I_TLB,
 
-- 
2.18.0



RE: [PATCH v4 1/7] arm64: Add ftrace support

2014-02-25 Thread Kulkarni, Ganapatrao


From: AKASHI Takahiro 
Sent: Tuesday, February 25, 2014 2:53 PM
To: rost...@goodmis.org; fweis...@gmail.com; mi...@redhat.com; 
catalin.mari...@arm.com; will.dea...@arm.com; tim.b...@sonymobile.com
Cc: Kulkarni, Ganapatrao; dsax...@linaro.org; ar...@arndb.de; 
linux-arm-ker...@lists.infradead.org; linaro-ker...@lists.linaro.org; 
linux-kernel@vger.kernel.org; AKASHI Takahiro
Subject: [PATCH v4 1/7] arm64: Add ftrace support

This patch implements arm64 specific part to support function tracers,
such as function (CONFIG_FUNCTION_TRACER), function_graph
(CONFIG_FUNCTION_GRAPH_TRACER) and function profiler
(CONFIG_FUNCTION_PROFILER).

With 'function' tracer, all the functions in the kernel are traced with
timestamps in ${sysfs}/tracing/trace. If function_graph tracer is
specified, call graph is generated.

The kernel must be compiled with -pg option so that _mcount() is inserted
at the beginning of functions. This function is called on every function's
entry as long as tracing is enabled.
In addition, function_graph tracer also needs to be able to probe function's
exit. ftrace_graph_caller() & return_to_handler do this by faking link
register's value to intercept function's return path.

More details on architecture specific requirements are described in
Documentation/trace/ftrace-design.txt.

Signed-off-by: AKASHI Takahiro 
---
 arch/arm64/Kconfig   |2 +
 arch/arm64/include/asm/ftrace.h  |   23 +
 arch/arm64/kernel/Makefile   |4 +
 arch/arm64/kernel/arm64ksyms.c   |4 +
 arch/arm64/kernel/entry-ftrace.S |  173 ++
 arch/arm64/kernel/ftrace.c   |   64 ++
 6 files changed, 270 insertions(+)
 create mode 100644 arch/arm64/include/asm/ftrace.h
 create mode 100644 arch/arm64/kernel/entry-ftrace.S
 create mode 100644 arch/arm64/kernel/ftrace.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 27bbcfc..5783641 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -33,6 +33,8 @@ config ARM64
select HAVE_DMA_ATTRS
select HAVE_DMA_CONTIGUOUS
select HAVE_EFFICIENT_UNALIGNED_ACCESS
+   select HAVE_FUNCTION_TRACER
+   select HAVE_FUNCTION_GRAPH_TRACER
select HAVE_GENERIC_DMA_COHERENT
select HAVE_HW_BREAKPOINT if PERF_EVENTS
select HAVE_MEMBLOCK
diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
new file mode 100644
index 000..58ea595
--- /dev/null
+++ b/arch/arm64/include/asm/ftrace.h
@@ -0,0 +1,23 @@
+/*
+ * arch/arm64/include/asm/ftrace.h
+ *
+ * Copyright (C) 2013 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __ASM_FTRACE_H
+#define __ASM_FTRACE_H
+
+#include 
+
+#define MCOUNT_ADDR((unsigned long)_mcount)
+#define MCOUNT_INSN_SIZE   AARCH64_INSN_SIZE
+
+#ifndef __ASSEMBLY__
+extern void _mcount(unsigned long);
+#endif /* __ASSEMBLY__ */
+
+#endif /* __ASM_FTRACE_H */
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index 2d4554b..ac67fd0 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -5,6 +5,9 @@
 CPPFLAGS_vmlinux.lds   := -DTEXT_OFFSET=$(TEXT_OFFSET)
 AFLAGS_head.o  := -DTEXT_OFFSET=$(TEXT_OFFSET)

+CFLAGS_REMOVE_ftrace.o = -pg
+CFLAGS_REMOVE_insn.o = -pg
+
 # Object file lists.
 arm64-obj-y:= cputable.o debug-monitors.o entry.o irq.o fpsimd.o   
\
   entry-fpsimd.o process.o ptrace.o setup.o signal.o   
\
@@ -13,6 +16,7 @@ arm64-obj-y   := cputable.o debug-monitors.o entry.o 
irq.o fpsimd.o   \

 arm64-obj-$(CONFIG_COMPAT) += sys32.o kuser32.o signal32.o 
\
   sys_compat.o
+arm64-obj-$(CONFIG_FUNCTION_TRACER)+= ftrace.o entry-ftrace.o
 arm64-obj-$(CONFIG_MODULES)+= arm64ksyms.o module.o
 arm64-obj-$(CONFIG_SMP)+= smp.o smp_spin_table.o
 arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o
diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c
index 338b568..7f0512f 100644
--- a/arch/arm64/kernel/arm64ksyms.c
+++ b/arch/arm64/kernel/arm64ksyms.c
@@ -56,3 +56,7 @@ EXPORT_SYMBOL(clear_bit);
 EXPORT_SYMBOL(test_and_clear_bit);
 EXPORT_SYMBOL(change_bit);
 EXPORT_SYMBOL(test_and_change_bit);
+
+#ifdef CONFIG_FUNCTION_TRACER
+EXPORT_SYMBOL(_mcount);
+#endif
diff --git a/arch/arm64/kernel/entry-ftrace.S b/arch/arm64/kernel/entry-ftrace.S
new file mode 100644
index 000..2e8162e
--- /dev/null
+++ b/arch/arm64/kernel/entry-ftrace.S
@@ -0,0 +1,173 @@
+/*
+ * arch/arm64/kernel/entry-ftrace.S
+ *
+ * Copyright (C) 2013 Linaro Limited
+ * Author: AKASHI Takahiro 
+ *
+ * This program is free software; you can redistribute it an

RE: [PATCH v3 0/6] arm64: Add ftrace support

2014-02-10 Thread Kulkarni, Ganapatrao
Looks OK to me.

Reviewed-by: Ganapatrao Kulkarni 

regards,
Ganapat


From: AKASHI Takahiro 
Sent: Friday, February 7, 2014 3:48 PM
To: rost...@goodmis.org; fweis...@gmail.com; mi...@redhat.com; 
catalin.mari...@arm.com; will.dea...@arm.com; Kulkarni, Ganapatrao; 
tim.b...@sonymobile.com
Cc: ar...@arndb.de; linux-arm-ker...@lists.infradead.org; 
linaro-ker...@lists.linaro.org; linux-kernel@vger.kernel.org; 
patc...@linaro.org; AKASHI Takahiro
Subject: [PATCH v3 0/6] arm64: Add ftrace support

This is my third version of patchset for ftrace support.
There was another implementation from Cavium network, but both of us agreed
to use my patchset as future base. He is supposed to review this code, too.

The only issue that I had some concern on was "fault protection" code
in prepare_ftrace_return(). With discussions with Steven and Tim (as author
of arm ftrace), I removed that code since I'm not quite sure about possibility
of "fault" occurrences in this function.

The code is tested on ARMv8 Fast Model with the following tracers & events:
 function tracer with dynamic ftrace
 function graph tracer with dynamic ftrace
 syscall tracepoint
 irqsoff & preemptirqsoff (which use CALLER_ADDRx)
and also verified with in-kernel tests, FTRACE_SELFTEST, FTRACE_STARTUP_TEST
and EVENT_TRACE_TEST_SYSCALLS.

Please be careful:
* elf.h on cross-build host must have AArch64 definitions, EM_AARCH64 and
  R_AARCH64_ABS64, to compile recordmcount utility. See [4/6].
  [4/6] also gets warnings from checkpatch, but they are based on the
  original's coding style.
* This patch may conflict with my audit patch because both changes the same
  location in syscall_trace(). I expect the functions are called in this
  order:
  On entry,
 * tracehook_report_syscall(ENTER)
 * trace_sys_enter()
 * audit_syscall_entry()
  On exit,
 * audit_sysscall_exit()
 * trace_sys_exit()
 * tracehook_report_syscall(EXIT)

Changes from v1 to v2:
* splitted one patch into some pieces for easier review
  (especially function tracer + dynamic ftrace + CALLER_ADDRx)
* put return_address() in a separate file
* renamed __mcount to _mcount (it was my mistake)
* changed stackframe handling to get parent's frame pointer
* removed ARCH_SUPPORTS_FTRACE_OPS
* switched to "hotpatch" interfaces from Huawai
* revised descriptions in comments

Changes from v2 to v3:
* optimized register usages in asm (by not saving x0, x1, and x2)
* removed "fault protection" code in prepare_ftrace_return()
* rewrote ftrace_modify_code() using "hotpatch" interfaces
* revised descriptions in comments

AKASHI Takahiro (6):
  arm64: Add ftrace support
  arm64: ftrace: Add dynamic ftrace support
  arm64: ftrace: Add CALLER_ADDRx macros
  ftrace: Add arm64 support to recordmcount
  arm64: ftrace: Add system call tracepoint
  arm64: Add 'notrace' attribute to unwind_frame() for ftrace

 arch/arm64/Kconfig |6 +
 arch/arm64/include/asm/ftrace.h|   54 +
 arch/arm64/include/asm/syscall.h   |1 +
 arch/arm64/include/asm/unistd.h|2 +
 arch/arm64/kernel/Makefile |9 +-
 arch/arm64/kernel/arm64ksyms.c |4 +
 arch/arm64/kernel/entry-ftrace.S   |  215 
 arch/arm64/kernel/ftrace.c |  177 +
 arch/arm64/kernel/ptrace.c |5 +
 arch/arm64/kernel/return_address.c |   55 +
 arch/arm64/kernel/stacktrace.c |2 +-
 scripts/recordmcount.c |4 +
 scripts/recordmcount.pl|5 +
 13 files changed, 537 insertions(+), 2 deletions(-)
 create mode 100644 arch/arm64/include/asm/ftrace.h
 create mode 100644 arch/arm64/kernel/entry-ftrace.S
 create mode 100644 arch/arm64/kernel/ftrace.c
 create mode 100644 arch/arm64/kernel/return_address.c

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/