[PATCH 57/57] perf c2c: Add man page and credits

2016-09-22 Thread Jiri Olsa
Adding man page for c2c command and credits
to builtin-c2c.c file.

Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hl...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
 tools/perf/Documentation/perf-c2c.txt | 276 ++
 tools/perf/builtin-c2c.c  |  11 ++
 2 files changed, 287 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-c2c.txt

diff --git a/tools/perf/Documentation/perf-c2c.txt 
b/tools/perf/Documentation/perf-c2c.txt
new file mode 100644
index ..ba2f4de399c3
--- /dev/null
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -0,0 +1,276 @@
+perf-c2c(1)
+===
+
+NAME
+
+perf-c2c - Shared Data C2C/HITM Analyzer.
+
+SYNOPSIS
+
+[verse]
+'perf c2c record' [] 
+'perf c2c record' [] -- [] 
+'perf c2c report' []
+
+DESCRIPTION
+---
+C2C stands for Cache To Cache.
+
+The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
+you to track down the cacheline contentions.
+
+The tool is based on x86's load latency and precise store facility events
+provided by Intel CPUs. These events provide:
+  - memory address of the access
+  - type of the access (load and store details)
+  - latency (in cycles) of the load access
+
+The c2c tool provide means to record this data and report back access details
+for cachelines with highest contention - highest number of HITM accesses.
+
+The basic workflow with this tool follows the standard record/report phase.
+User uses the record command to record events data and report command to
+display it.
+
+
+RECORD OPTIONS
+--
+-e::
+--event=::
+   Select the PMU event. Use 'perf mem record -e list'
+   to list available events.
+
+-v::
+--verbose::
+   Be more verbose (show counter open errors, etc).
+
+-l::
+--ldlat::
+   Configure mem-loads latency.
+
+-k::
+--all-kernel::
+   Configure all used events to run in kernel space.
+
+-u::
+--all-user::
+   Configure all used events to run in user space.
+
+REPORT OPTIONS
+--
+-k::
+--vmlinux=::
+   vmlinux pathname
+
+-v::
+--verbose::
+   Be more verbose (show counter open errors, etc).
+
+-i::
+--input::
+   Specify the input file to process.
+
+-N::
+--node-info::
+   Show extra node info in report (see NODE INFO section)
+
+-c::
+--coalesce::
+   Specify sorintg fields for single cacheline display.
+   Following fields are available: tid,pid,iaddr,dso
+   (see COALESCE)
+
+-g::
+--call-graph::
+   Setup callchains parameters.
+   Please refer to perf-report man page for details.
+
+--stdio::
+   Force the stdio output (see STDIO OUTPUT)
+
+--stats::
+   Display only statistic tables and force stdio mode.
+
+--full-symbols::
+   Display full length of symbols.
+
+C2C RECORD
+--
+The perf c2c record command setup options related to HITM cacheline analysis
+and calls standard perf record command.
+
+Following perf record options are configured by default:
+(check perf record man page for details)
+
+  -W,-d,--sample-cpu
+
+Unless specified otherwise with '-e' option, following events are monitored by
+default:
+
+  cpu/mem-loads,ldlat=30/P
+  cpu/mem-stores/P
+
+User can pass any 'perf record' option behind '--' mark, like (to enable
+callchains and system wide monitoring):
+
+  $ perf c2c record -- -g -a
+
+Please check RECORD OPTIONS section for specific c2c record options.
+
+C2C REPORT
+--
+The perf c2c report command displays shared data analysis.  It comes in two
+display modes: stdio and tui (default).
+
+The report command workflow is following:
+  - sort all the data based on the cacheline address
+  - store access details for each cacheline
+  - sort all cachelines based on user settings
+  - display data
+
+In general perf report output consist of 2 basic views:
+  1) most expensive cachelines list
+  2) offsets details for each cacheline
+
+For each cacheline in the 1) list we display following data:
+(Both stdio and TUI modes follow the same fields output)
+
+  Index
+  - zero based index to identify the cacheline
+
+  Cacheline
+  - cacheline address (hex number)
+
+  Total records
+  - sum of all cachelines accesses
+
+  Rmt/Lcl Hitm
+  - cacheline percentage of all Remote/Local HITM accesses
+
+  LLC Load Hitm - Total, Lcl, Rmt
+  - count of Total/Local/Remote load HITMs
+
+  Store Reference - Total, L1Hit, L1Miss
+Total - all store accesses
+L1Hit - store accesses that hit L1
+L1Hit - store accesses that missed L1
+
+  Load Dram
+  - count of local and remote DRAM accesses
+
+  LLC Ld Miss
+  - count of all accesses that missed LLC
+
+  Total Loads
+  - sum of all load accesses
+
+  Core Load Hit - FB, L1, L2
+  - count of load hits in FB (Fill Buffer), L1 and L2 cache
+
+  LLC Load Hit - Llc, Rmt
+  - count of LLC and Remote load hits
+
+For each offset in the 2) list we display following data:
+
+  HITM - Rmt, Lcl
+  - % of Remote/Local HITM accesses for given 

[PATCH 57/57] perf c2c: Add man page and credits

2016-09-22 Thread Jiri Olsa
Adding man page for c2c command and credits
to builtin-c2c.c file.

Link: http://lkml.kernel.org/n/tip-twbp391v8v9f5idp584hl...@git.kernel.org
Signed-off-by: Jiri Olsa 
---
 tools/perf/Documentation/perf-c2c.txt | 276 ++
 tools/perf/builtin-c2c.c  |  11 ++
 2 files changed, 287 insertions(+)
 create mode 100644 tools/perf/Documentation/perf-c2c.txt

diff --git a/tools/perf/Documentation/perf-c2c.txt 
b/tools/perf/Documentation/perf-c2c.txt
new file mode 100644
index ..ba2f4de399c3
--- /dev/null
+++ b/tools/perf/Documentation/perf-c2c.txt
@@ -0,0 +1,276 @@
+perf-c2c(1)
+===
+
+NAME
+
+perf-c2c - Shared Data C2C/HITM Analyzer.
+
+SYNOPSIS
+
+[verse]
+'perf c2c record' [] 
+'perf c2c record' [] -- [] 
+'perf c2c report' []
+
+DESCRIPTION
+---
+C2C stands for Cache To Cache.
+
+The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
+you to track down the cacheline contentions.
+
+The tool is based on x86's load latency and precise store facility events
+provided by Intel CPUs. These events provide:
+  - memory address of the access
+  - type of the access (load and store details)
+  - latency (in cycles) of the load access
+
+The c2c tool provide means to record this data and report back access details
+for cachelines with highest contention - highest number of HITM accesses.
+
+The basic workflow with this tool follows the standard record/report phase.
+User uses the record command to record events data and report command to
+display it.
+
+
+RECORD OPTIONS
+--
+-e::
+--event=::
+   Select the PMU event. Use 'perf mem record -e list'
+   to list available events.
+
+-v::
+--verbose::
+   Be more verbose (show counter open errors, etc).
+
+-l::
+--ldlat::
+   Configure mem-loads latency.
+
+-k::
+--all-kernel::
+   Configure all used events to run in kernel space.
+
+-u::
+--all-user::
+   Configure all used events to run in user space.
+
+REPORT OPTIONS
+--
+-k::
+--vmlinux=::
+   vmlinux pathname
+
+-v::
+--verbose::
+   Be more verbose (show counter open errors, etc).
+
+-i::
+--input::
+   Specify the input file to process.
+
+-N::
+--node-info::
+   Show extra node info in report (see NODE INFO section)
+
+-c::
+--coalesce::
+   Specify sorintg fields for single cacheline display.
+   Following fields are available: tid,pid,iaddr,dso
+   (see COALESCE)
+
+-g::
+--call-graph::
+   Setup callchains parameters.
+   Please refer to perf-report man page for details.
+
+--stdio::
+   Force the stdio output (see STDIO OUTPUT)
+
+--stats::
+   Display only statistic tables and force stdio mode.
+
+--full-symbols::
+   Display full length of symbols.
+
+C2C RECORD
+--
+The perf c2c record command setup options related to HITM cacheline analysis
+and calls standard perf record command.
+
+Following perf record options are configured by default:
+(check perf record man page for details)
+
+  -W,-d,--sample-cpu
+
+Unless specified otherwise with '-e' option, following events are monitored by
+default:
+
+  cpu/mem-loads,ldlat=30/P
+  cpu/mem-stores/P
+
+User can pass any 'perf record' option behind '--' mark, like (to enable
+callchains and system wide monitoring):
+
+  $ perf c2c record -- -g -a
+
+Please check RECORD OPTIONS section for specific c2c record options.
+
+C2C REPORT
+--
+The perf c2c report command displays shared data analysis.  It comes in two
+display modes: stdio and tui (default).
+
+The report command workflow is following:
+  - sort all the data based on the cacheline address
+  - store access details for each cacheline
+  - sort all cachelines based on user settings
+  - display data
+
+In general perf report output consist of 2 basic views:
+  1) most expensive cachelines list
+  2) offsets details for each cacheline
+
+For each cacheline in the 1) list we display following data:
+(Both stdio and TUI modes follow the same fields output)
+
+  Index
+  - zero based index to identify the cacheline
+
+  Cacheline
+  - cacheline address (hex number)
+
+  Total records
+  - sum of all cachelines accesses
+
+  Rmt/Lcl Hitm
+  - cacheline percentage of all Remote/Local HITM accesses
+
+  LLC Load Hitm - Total, Lcl, Rmt
+  - count of Total/Local/Remote load HITMs
+
+  Store Reference - Total, L1Hit, L1Miss
+Total - all store accesses
+L1Hit - store accesses that hit L1
+L1Hit - store accesses that missed L1
+
+  Load Dram
+  - count of local and remote DRAM accesses
+
+  LLC Ld Miss
+  - count of all accesses that missed LLC
+
+  Total Loads
+  - sum of all load accesses
+
+  Core Load Hit - FB, L1, L2
+  - count of load hits in FB (Fill Buffer), L1 and L2 cache
+
+  LLC Load Hit - Llc, Rmt
+  - count of LLC and Remote load hits
+
+For each offset in the 2) list we display following data:
+
+  HITM - Rmt, Lcl
+  - % of Remote/Local HITM accesses for given offset within