[GIT PULL 00/30] perf/core improvements and fixes
Hi Ingo, Please consider pulling, Best regards, - Arnaldo Test results at the end of this message, as usual. The following changes since commit b339da480315505aa28a723a983217ebcff95c86: Merge tag 'perf-core-for-mingo-5.1-20190307' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2019-03-09 17:00:17 +0100) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-5.1-20190311 for you to fetch changes up to dfcbc2f2994b8a3af3605a26dc29c07ad7378bf4: tools lib bpf: Fix the build by adding a missing stdarg.h include (2019-03-11 17:14:31 -0300) perf/core improvements and fixes: kernel: Stephane Eranian : - Restore mmap record type correctly when handling PERF_RECORD_MMAP2 events, as the same template is used for all the threads interested in mmap events, some may want just PERF_RECORD_MMAP, while some may want the extra info in MMAP2 records. perf probe: Adrian Hunter: - Fix getting the kernel map, because since changes related to x86 PTI entry trampolines handling, there are more than one kernel map. perf script: Andi Kleen: - Support insn output for normal samples, i.e.: perf script -F ip,sym,insn --xed Will fetch the sample IP from the thread address space and feed it to Intel's XED disassembler, producing lines such as: a4068804 native_write_msrwrmsr a415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx That match 'perf annotate's output. - Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in addition to PERF_RECORD_SAMPLE. perf report: - Add a new --samples option to save a small random number of samples per hist entry, using a reservoir technique to select a representative number of samples. Then allow browsing the samples using 'perf script' as part of the hist entry context menu. This automatically adds the right filters, so only the thread or CPU of the sample is displayed. Then we use less' search functionality to directly jump to the time stamp of the selected sample. It uses different menus for assembler and source display. Assembler needs xed installed and source needs debuginfo. - Fix the UI browser scripts pop up menu when there are many scripts available. perf report: Andi Kleen: - Add 'time' sort option. E.g.: % perf report --sort time,overhead,symbol --time-quantum 1ms --stdio ... 0.67% 277061.87300 [.] _dl_start 0.50% 277061.87300 [.] f1 0.50% 277061.87300 [.] f2 0.33% 277061.87300 [.] main 0.29% 277061.87300 [.] _dl_lookup_symbol_x 0.29% 277061.87300 [.] dl_main 0.29% 277061.87300 [.] do_lookup_x 0.17% 277061.87300 [.] _dl_debug_initialize 0.17% 277061.87300 [.] _dl_init_paths 0.08% 277061.87300 [.] check_match 0.04% 277061.87300 [.] _dl_count_modids 1.33% 277061.87400 [.] f1 1.33% 277061.87400 [.] f2 1.33% 277061.87400 [.] main 1.17% 277061.87500 [.] main 1.08% 277061.87500 [.] f1 1.08% 277061.87500 [.] f2 1.00% 277061.87600 [.] main 0.83% 277061.87600 [.] f1 0.83% 277061.87600 [.] f2 1.00% 277061.87700 [.] main tools headers: Arnaldo Carvalho de Melo: - Update x86's syscall_64.tbl, no change in tools/perf behaviour. - Sync copies asm-generic/unistd.h and linux/in with the kernel sources. perf data: Jiri Olsa: - Prep work to support having perf.data stored as a directory, with one file per CPU, that ultimately will allow having one ring buffer reading thread per CPU. Vendor events: Martin Liška: - perf PMU events for AMD Family 17h. perf script python: Tony Jones: - Add python3 support for the remaining Intel PT related scripts, with these we should have a clean build of perf with python3 while still supporting the build with python2. libbpf: Arnaldo Carvalho de Melo: - Fix the build on uCLibc, adding the missing stdarg.h since we use va_list in one typedef. Signed-off-by: Arnaldo Carvalho de Melo Adrian Hunter (1): perf probe: Fix getting the kernel map Andi Kleen (14): perf script: Support insn output for normal samples perf report: Support output in nanoseconds perf time-utils: Add utility function to print time stamps in nanoseconds perf report: Parse time quantum perf report: Use less for scripts output perf script: Filter COMM/FORK/.. events by CPU perf report: Support time sort key perf report: Support running scripts for current time range perf report: Support builtin perf script in scripts menu
Re: [GIT PULL 00/30] perf/core improvements and fixes
* Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, > > - Arnaldo > > Test results at the end of this message, as usual. > > The following changes since commit 8e70e8409102a37ab066bd91007b75fd5d113931: > > Merge tag 'perf-core-for-mingo-4.13-20170621' of > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core > (2017-06-21 20:11:53 +0200) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo-4.13-20170630 > > for you to fetch changes up to 644e0840ad4615e032d67adec6ee60f821b669fe: > > perf auxtrace: Add CPU filter support (2017-06-30 11:50:55 -0300) > > > perf/core improvements and fixes: > > Intel PT: > > - Support "ptwrite" instructio, a way to stuff 32 or 64 bit values into > the Intel PT trace (Adrian Hunter) > > - Support power events in Intel PT to report changes to C-state (Adrian > Hunter) > > - Synthesize Intel PT events as PERF_RECORD_SAMPLE records with a > perf_event_attr.type (PERF_TYPE_SYNTH) just after the range used by the > kernel, i.e. right after what is allocated for PMUs, at INT_MAX + 1U, > attr.config will have the identification for the synthesized event and > the PERF_SAMPLE_RAW payload will have its fields (Adrian Hunter) > > Infrastructure: > > - Remove warning() and error(), using instead pr_warning() and > pr_error(), consolidating error reporting (Arnaldo Carvalho de Melo) > > - Add platform dependency to 'perf test 15' (Thomas Richter) > > Signed-off-by: Arnaldo Carvalho de Melo > > > Adrian Hunter (19): > x86/insn: perf tools: Add new ptwrite instruction > perf script: Add 'synth' event type for synthesized events > tools include: Add byte-swapping macros to kernel.h > perf auxtrace: Add itrace option to output ptwrite events > perf auxtrace: Add itrace option to output power events > perf script: Add 'synth' field for synthesized event payloads > perf script: Add synthesized Intel PT power and ptwrite events > perf intel-pt: Factor out common code synthesizing event samples > perf intel-pt: Remove unused instructions_sample_period > perf intel-pt: Join needlessly wrapped lines > perf intel-pt: Tidy Intel PT evsel lookup into separate function > perf intel-pt: Tidy messages into called function intel_pt_synth_event() > perf intel-pt: Factor out intel_pt_set_event_name() > perf intel-pt: Move code in intel_pt_synth_events() to simplify attr > setting > perf intel-pt: Synthesize new power and "ptwrite" events > perf intel-pt: Add example script for power events and PTWRITE > perf intel-pt: Update documentation to include new ptwrite and power > events > perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC > perf auxtrace: Add CPU filter support > > Arnaldo Carvalho de Melo (9): > perf help: Introduce exec_failed() to avoid code duplication > perf help: Elliminate dup code for reporting > perf help: Use pr_warning() > perf config: Use pr_warning() > perf event-parse: Use pr_warning() > perf tools: Remove warning() > perf tools: Replace error() with pr_err() > perf config: Do not die when parsing u64 or int config values > perf tools: Kill die() > > Colin Ian King (1): > perf jit: fix typo: "incalid" -> "invalid" > > Thomas Richter (1): > perf tests: Add platform dependency to test 15 > > arch/x86/lib/x86-opcode-map.txt| 2 +- > tools/include/linux/kernel.h | 35 +- > tools/objtool/arch/x86/insn/x86-opcode-map.txt | 2 +- > tools/perf/Documentation/intel-pt.txt | 42 +- > tools/perf/Documentation/itrace.txt| 8 +- > tools/perf/Documentation/perf-script.txt | 6 +- > tools/perf/arch/x86/tests/insn-x86-dat-32.c| 12 + > tools/perf/arch/x86/tests/insn-x86-dat-64.c| 30 + > tools/perf/arch/x86/tests/insn-x86-dat-src.c | 30 + > tools/perf/builtin-c2c.c | 4 +- > tools/perf/builtin-diff.c | 5 +- > tools/perf/builtin-help.c | 48 +- > tools/perf/builtin-kmem.c | 4 +- > tools/perf/builtin-record.c| 4 +- > tools/perf/builtin-report.c| 8 +- > tools/perf/builtin-sched.c | 2 +- > tools/perf/builtin-script.c| 205 ++- > tools/perf/builtin-stat.c | 4 +- > tools/perf/builtin-top.c | 2 +- > tools/perf/jvmti/jvmti_agent.c | 2 +- > .../perf/scripts/python/bin/intel-pt-events-record |
[GIT PULL 00/30] perf/core improvements and fixes
Hi Ingo, Please consider pulling, - Arnaldo Test results at the end of this message, as usual. The following changes since commit 8e70e8409102a37ab066bd91007b75fd5d113931: Merge tag 'perf-core-for-mingo-4.13-20170621' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2017-06-21 20:11:53 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.13-20170630 for you to fetch changes up to 644e0840ad4615e032d67adec6ee60f821b669fe: perf auxtrace: Add CPU filter support (2017-06-30 11:50:55 -0300) perf/core improvements and fixes: Intel PT: - Support "ptwrite" instructio, a way to stuff 32 or 64 bit values into the Intel PT trace (Adrian Hunter) - Support power events in Intel PT to report changes to C-state (Adrian Hunter) - Synthesize Intel PT events as PERF_RECORD_SAMPLE records with a perf_event_attr.type (PERF_TYPE_SYNTH) just after the range used by the kernel, i.e. right after what is allocated for PMUs, at INT_MAX + 1U, attr.config will have the identification for the synthesized event and the PERF_SAMPLE_RAW payload will have its fields (Adrian Hunter) Infrastructure: - Remove warning() and error(), using instead pr_warning() and pr_error(), consolidating error reporting (Arnaldo Carvalho de Melo) - Add platform dependency to 'perf test 15' (Thomas Richter) Signed-off-by: Arnaldo Carvalho de Melo Adrian Hunter (19): x86/insn: perf tools: Add new ptwrite instruction perf script: Add 'synth' event type for synthesized events tools include: Add byte-swapping macros to kernel.h perf auxtrace: Add itrace option to output ptwrite events perf auxtrace: Add itrace option to output power events perf script: Add 'synth' field for synthesized event payloads perf script: Add synthesized Intel PT power and ptwrite events perf intel-pt: Factor out common code synthesizing event samples perf intel-pt: Remove unused instructions_sample_period perf intel-pt: Join needlessly wrapped lines perf intel-pt: Tidy Intel PT evsel lookup into separate function perf intel-pt: Tidy messages into called function intel_pt_synth_event() perf intel-pt: Factor out intel_pt_set_event_name() perf intel-pt: Move code in intel_pt_synth_events() to simplify attr setting perf intel-pt: Synthesize new power and "ptwrite" events perf intel-pt: Add example script for power events and PTWRITE perf intel-pt: Update documentation to include new ptwrite and power events perf intel-pt: Do not use TSC packets for calculating CPU cycles to TSC perf auxtrace: Add CPU filter support Arnaldo Carvalho de Melo (9): perf help: Introduce exec_failed() to avoid code duplication perf help: Elliminate dup code for reporting perf help: Use pr_warning() perf config: Use pr_warning() perf event-parse: Use pr_warning() perf tools: Remove warning() perf tools: Replace error() with pr_err() perf config: Do not die when parsing u64 or int config values perf tools: Kill die() Colin Ian King (1): perf jit: fix typo: "incalid" -> "invalid" Thomas Richter (1): perf tests: Add platform dependency to test 15 arch/x86/lib/x86-opcode-map.txt| 2 +- tools/include/linux/kernel.h | 35 +- tools/objtool/arch/x86/insn/x86-opcode-map.txt | 2 +- tools/perf/Documentation/intel-pt.txt | 42 +- tools/perf/Documentation/itrace.txt| 8 +- tools/perf/Documentation/perf-script.txt | 6 +- tools/perf/arch/x86/tests/insn-x86-dat-32.c| 12 + tools/perf/arch/x86/tests/insn-x86-dat-64.c| 30 + tools/perf/arch/x86/tests/insn-x86-dat-src.c | 30 + tools/perf/builtin-c2c.c | 4 +- tools/perf/builtin-diff.c | 5 +- tools/perf/builtin-help.c | 48 +- tools/perf/builtin-kmem.c | 4 +- tools/perf/builtin-record.c| 4 +- tools/perf/builtin-report.c| 8 +- tools/perf/builtin-sched.c | 2 +- tools/perf/builtin-script.c| 205 ++- tools/perf/builtin-stat.c | 4 +- tools/perf/builtin-top.c | 2 +- tools/perf/jvmti/jvmti_agent.c | 2 +- .../perf/scripts/python/bin/intel-pt-events-record | 13 + .../perf/scripts/python/bin/intel-pt-events-report | 3 + tools/perf/scripts/python/intel-pt-events.py | 128 + tools/perf/tests/attr.c| 10 +- tools/perf/tests/attr.py | 48 ++ tools
Re: [GIT PULL 00/30] perf/core improvements and fixes
* Arnaldo Carvalho de Melo wrote: > From: Arnaldo Carvalho de Melo > > Hi Ingo, > > Please consider pulling, > > - Arnaldo > > The following changes since commit 67d61296ffcc850bffdd4466430cb91e5328f39a: > > Merge tag 'perf-core-for-mingo-20160419' of > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux (2016-04-23 14:50:39 > +0200) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo-20160427 > > for you to fetch changes up to 4cb93446c587d56e2a54f4f83113daba2c0b6dee: > > perf tools: Set the maximum allowed stack from > /proc/sys/kernel/perf_event_max_stack (2016-04-27 10:29:07 -0300) > > > perf/core improvements and fixes: > > User visible: > > - perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de > Melo) > > Tracing write syscalls and major page faults with callchains while starting > firefox, limiting the stack to 5 frames: > > # perf trace -e write --pf maj --max-stack 5 firefox >589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, > count: 151) = 151 >[0xfaed] > (/usr/lib64/libpthread-2.22.so) >fire_glxtest_process+0x5c > (/usr/lib64/firefox/libxul.so) >InstallGdkErrorHandler+0x41 > (/usr/lib64/firefox/libxul.so) >XREMain::XRE_mainInit+0x12c > (/usr/lib64/firefox/libxul.so) >XREMain::XRE_main+0x1e4 > (/usr/lib64/firefox/libxul.so) >760.704 ( 0.000 ms): firefox/15332 majfault > [gtk_tree_view_accessible_get_type+0x0] => > /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.) >gtk_tree_view_accessible_get_type+0x0 > (/usr/lib64/libgtk-3.so.0.1800.9) >gtk_tree_view_class_intern_init+0x1a54 > (/usr/lib64/libgtk-3.so.0.1800.9) >g_type_class_ref+0x6dd > (/usr/lib64/libgobject-2.0.so.0.4600.2) >[0x115378] > (/usr/lib64/libgnutls.so.30.6.3) > > This automagically selects "--call-graph dwarf", use "--call-graph fp" on > systems > where -fno-omit-frame-pointer was used to built the components of interest, > to > incur in less overhead, or tune "--call-graph dwarf" appropriately, see > 'perf record --help'. > > - Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard > coded value > of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like > Groovy, Ruby, etc, > and also to reduce overhead by limiting it to a smaller value, upcoming > work will allow > this to be done per-event (Arnaldo Carvalho de Melo) > > - Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo > Carvalho de Melo) > > - Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo > Carvalho de Melo) > ># perf record --call lbr usleep 1 ># perf evlist -v >cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ... > branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES ># > > - Clear dummy entry accumulated period, fixing such 'perf top/report' output > as: (Kan Liang) > > 4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel] [k] > update_fast_timekeeper > > - System calls with pid_t arguments gets them augmented with the COMM event > more thoroughly: > > # trace -e perf_event_open perf stat -e cycles -p 15608 >6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 > (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 >6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 > (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 >6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 > (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 > ^^ >^C > > - Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria) > > - Fix module probe issue if no dwarf support in (Ravi Bangoria) > > Assorted fixes: > > - Fix off-by-one in write_buildid() (Andrey Ryabinin) > > - Fix segfault when printing callchains in 'perf script' (Chris Phlipot) > > - Replace assignment with comparison on assert check in 'perf test' entry > (Colin Ian King) > > - Fix off-by-one comparison in intel-pt code (Colin Ian King) > > - Close target file on error path in 'perf probe' (Masami Hiramatsu) > > - Set default kprobe group name if not given in 'perf probe' (Masami > Hiramatsu) > > - Avoid partial perf_event_header reads (Wang Nan) > > Infrastructure: > > - Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo > Carvalho de Melo) > > - Make the x86 clean qui
[GIT PULL 00/30] perf/core improvements and fixes
From: Arnaldo Carvalho de Melo Hi Ingo, Please consider pulling, - Arnaldo The following changes since commit 67d61296ffcc850bffdd4466430cb91e5328f39a: Merge tag 'perf-core-for-mingo-20160419' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux (2016-04-23 14:50:39 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20160427 for you to fetch changes up to 4cb93446c587d56e2a54f4f83113daba2c0b6dee: perf tools: Set the maximum allowed stack from /proc/sys/kernel/perf_event_max_stack (2016-04-27 10:29:07 -0300) perf/core improvements and fixes: User visible: - perf trace --pf maj/min/all works with --call-graph: (Arnaldo Carvalho de Melo) Tracing write syscalls and major page faults with callchains while starting firefox, limiting the stack to 5 frames: # perf trace -e write --pf maj --max-stack 5 firefox 589.549 ( 0.014 ms): firefox/15377 write(fd: 4, buf: 0x7fff80acc898, count: 151) = 151 [0xfaed] (/usr/lib64/libpthread-2.22.so) fire_glxtest_process+0x5c (/usr/lib64/firefox/libxul.so) InstallGdkErrorHandler+0x41 (/usr/lib64/firefox/libxul.so) XREMain::XRE_mainInit+0x12c (/usr/lib64/firefox/libxul.so) XREMain::XRE_main+0x1e4 (/usr/lib64/firefox/libxul.so) 760.704 ( 0.000 ms): firefox/15332 majfault [gtk_tree_view_accessible_get_type+0x0] => /usr/lib64/libgtk-3.so.0.1800.9@0xa0850 (x.) gtk_tree_view_accessible_get_type+0x0 (/usr/lib64/libgtk-3.so.0.1800.9) gtk_tree_view_class_intern_init+0x1a54 (/usr/lib64/libgtk-3.so.0.1800.9) g_type_class_ref+0x6dd (/usr/lib64/libgobject-2.0.so.0.4600.2) [0x115378] (/usr/lib64/libgnutls.so.30.6.3) This automagically selects "--call-graph dwarf", use "--call-graph fp" on systems where -fno-omit-frame-pointer was used to built the components of interest, to incur in less overhead, or tune "--call-graph dwarf" appropriately, see 'perf record --help'. - Allow /proc/sys/kernel/perf_event_max_stack, that defaults to the old hard coded value of PERF_MAX_STACK_DEPTH (127), useful for huge callstacks for things like Groovy, Ruby, etc, and also to reduce overhead by limiting it to a smaller value, upcoming work will allow this to be done per-event (Arnaldo Carvalho de Melo) - Make 'perf trace --min-stack' be honoured by --pf and --event (Arnaldo Carvalho de Melo) - Make 'perf evlist -v' decode perf_event_attr->branch_sample_type (Arnaldo Carvalho de Melo) # perf record --call lbr usleep 1 # perf evlist -v cycles:ppp: ... sample_type: IP|TID|TIME|CALLCHAIN|PERIOD|BRANCH_STACK, ... branch_sample_type: USER|CALL_STACK|NO_FLAGS|NO_CYCLES # - Clear dummy entry accumulated period, fixing such 'perf top/report' output as: (Kan Liang) 4769.98% 0.01% 0.00% 0.01% tchain_edit [kernel] [k] update_fast_timekeeper - System calls with pid_t arguments gets them augmented with the COMM event more thoroughly: # trace -e perf_event_open perf stat -e cycles -p 15608 6.876 ( 0.014 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15608 (hexchat), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 3 6.882 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15639 (gmain), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 4 6.889 ( 0.005 ms): perf_event_open(attr_uptr: 0x2ae20d8, pid: 15640 (gdbus), cpu: -1, group_fd: -1, flags: FD_CLOEXEC) = 5 ^^ ^C - Fix offline module name mismatch issue in 'perf probe' (Ravi Bangoria) - Fix module probe issue if no dwarf support in (Ravi Bangoria) Assorted fixes: - Fix off-by-one in write_buildid() (Andrey Ryabinin) - Fix segfault when printing callchains in 'perf script' (Chris Phlipot) - Replace assignment with comparison on assert check in 'perf test' entry (Colin Ian King) - Fix off-by-one comparison in intel-pt code (Colin Ian King) - Close target file on error path in 'perf probe' (Masami Hiramatsu) - Set default kprobe group name if not given in 'perf probe' (Masami Hiramatsu) - Avoid partial perf_event_header reads (Wang Nan) Infrastructure: - Update x86's syscall_64.tbl copy, adding preadv2 & pwritev2 (Arnaldo Carvalho de Melo) - Make the x86 clean quiet wrt syscall table removal (Jiri Olsa) Cleanups: - Simplify wrapper for LOCK_PI in 'perf bench futex' (Davidlohr Bueso) - Remove duplicate const qualifier (Eric Engestrom) Signed-off-by: Arnaldo Carvalho de Melo Andrey Ryabinin (1):
Re: [GIT PULL 00/30] perf/core improvements and fixes
Em Fri, May 15, 2015 at 11:08:04AM +0900, Namhyung Kim escreveu: > Hi Arnaldo, > > On Thu, May 14, 2015 at 10:18:27AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Thu, May 14, 2015 at 05:23:30PM +0900, Namhyung Kim escreveu: > > > On Mon, May 11, 2015 at 11:06:26AM -0300, Arnaldo Carvalho de Melo wrote: > > We need to improve this segfault backtrace, I have to always use > > addr2line to resolve those missing entries, i.e. if you try: > > > > addr2line -fe /path/to/your/perf 0x4dd9c8 > > addr2line -fe /path/to/your/perf 0x4e2580 > > > > We would have resolved those lines :-/ > > Right, I'll add it to my TODO list. > > Anyway, this is a backtrace using gdb.. Ok, reproduced here: [acme@ibm-x3650m4-01 linux]$ fg gdb perf list 134 if (verbose) { 135 dso_name_l = dso_l->long_name; 136 dso_name_r = dso_r->long_name; 137 } else { 138 dso_name_l = dso_l->short_name; 139 dso_name_r = dso_r->short_name; 140 } 141 142 return strcmp(dso_name_l, dso_name_r); 143 } (gdb) p dso_l $2 = (struct dso *) 0x1924ba0 (gdb) $3 = (struct dso *) 0x1924ba0 (gdb) p dso_r $4 = (struct dso *) 0x1 (gdb) bt #0 0x004f557b in _sort__dso_cmp (map_l=0x182ab3120, map_r=0xd5325b0) at util/sort.c:139 #1 0x004f55f1 in sort__dso_cmp (left=0x606c7f0, right=0x7fffb850) at util/sort.c:148 #2 0x004f8470 in __sort__hpp_cmp (fmt=0x1922fb0, a=0x606c7f0, b=0x7fffb850) at util/sort.c:1313 #3 0x004fc3b8 in hist_entry__cmp (left=0x606c7f0, right=0x7fffb850) at util/hist.c:911 #4 0x004fafcc in add_hist_entry (hists=0x1922d80, entry=0x7fffb850, al=0x7fffbbe0, sample_self=false) at util/hist.c:389 #5 0x004fb350 in __hists__add_entry (hists=0x1922d80, al=0x7fffbbe0, sym_parent=0x0, bi=0x0, mi=0x0, period=557536, weight=0, transaction=0, sample_self=false) at util/hist.c:471 #6 0x004fc03c in iter_add_next_cumulative_entry (iter=0x7fffbc10, al=0x7fffbbe0) at util/hist.c:797 #7 0x004fc291 in hist_entry_iter__add (iter=0x7fffbc10, al=0x7fffbbe0, evsel=0x1922c50, sample=0x7fffbdf0, max_stack_depth=127, arg=0x7fffc810) at util/hist.c:882 #8 0x0042f1b7 in process_sample_event (tool=0x7fffc810, event=0x7ffed74b41e0, sample=0x7fffbdf0, evsel=0x1922c50, machine=0x19213d0) at builtin-report.c:171 #9 0x004da272 in perf_evlist__deliver_sample (evlist=0x1922260, tool=0x7fffc810, event=0x7ffed74b41e0, sample=0x7fffbdf0, evsel=0x1922c50, machine=0x19213d0) at util/session.c:1000 #10 0x004da40c in machines__deliver_event (machines=0x19213d0, evlist=0x1922260, event=0x7ffed74b41e0, sample=0x7fffbdf0, tool=0x7fffc810, file_offset=1097646560) at util/session.c:1037 #11 0x004da659 in perf_session__deliver_event (session=0x1921310, event=0x7ffed74b41e0, sample=0x7fffbdf0, tool=0x7fffc810, file_offset=1097646560) at util/session.c:1082 #12 0x004d7d7b in ordered_events__deliver_event (oe=0x1921558, event=0x2050430) at util/session.c:109 #13 0x004dd65b in __ordered_events__flush (oe=0x1921558) at util/ordered-events.c:207 #14 0x004dd92f in ordered_events__flush (oe=0x1921558, how=OE_FLUSH__ROUND) at util/ordered-events.c:271 #15 0x004d94c8 in process_finished_round (tool=0x7fffc810, event=0x7ffed74c6830, oe=0x1921558) at util/session.c:663 #16 0x004da7cd in perf_session__process_user_event (session=0x1921310, event=0x7ffed74c6830, file_offset=1097721904) at util/session.c:1119 #17 0x004daced in perf_session__process_event (session=0x1921310, event=0x7ffed74c6830, file_offset=1097721904) at util/session.c:1232 #18 0x004db811 in __perf_session__process_events (session=0x1921310, data_offset=232, data_size=5774474704, file_size=5774474936) at util/session.c:1533 #19 0x004dba01 in perf_session__process_events (session=0x1921310) at util/session.c:1580 #20 0x0042ff9f in __cmd_report (rep=0x7fffc810) at builtin-report.c:487 #21 0x004315d9 in cmd_report (argc=0, argv=0x7fffddd0, prefix=0x0) at builtin-report.c:878 #22 0x00490fb8 in run_builtin (p=0x886528 , argc=1, argv=0x7fffddd0) at perf.c:370 #23 0x00491217 in handle_internal_command (argc=1, argv=0x7fffddd0) at perf.c:429 #24 0x00491363 in run_argv (argcp=0x7fffdc2c, argv=0x7fffdc20) at perf.c:473 #25 0x004916c4 in main (argc=1, argv=0x7fffddd0) at perf.c:588 (gdb) Looking at the frame #1 I see: (gdb) p left->hists $22 = (struct hists *) 0x1922d80 (gdb) p right->hists $23 = (struct hists *) 0x1922d80 (gdb) I.e. both look like fine hist_entry instances, both are on the same struct hists, but: (gdb) p right->ms.map->dso $25 = (struct dso *) 0x1924ba0 (gdb) p right->ms.ma There is no member named ma. (gdb) p right->ms.map $26 =
Re: [GIT PULL 00/30] perf/core improvements and fixes
Hi Arnaldo, On Thu, May 14, 2015 at 10:18:27AM -0300, Arnaldo Carvalho de Melo wrote: > Em Thu, May 14, 2015 at 05:23:30PM +0900, Namhyung Kim escreveu: > > On Mon, May 11, 2015 at 11:06:26AM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Mon, May 11, 2015 at 02:09:39PM +0900, Namhyung Kim escreveu: > > > > I'm seeing a segfault on 'perf report' with a large data file after > > > > applying thread refcount change - it happens regardless of the atomic > > > > operation. > > > > Any specific 'perf record' command line? Does it take a long time to > > > reproduce? Any backtraces? I'll try to repro, its possible that we're > > > doing one too many thread__put()... > > > It's a kernel build with '-j 20' and recorded data size is ~2.1GB. > > It takes ~30 sec to reproduce. > > > > $ perf report -i threaded/kbuild7.data --header-only > > # > > # captured on: Thu Dec 18 12:06:35 2014 > > # hostname : sejong > > # os release : 3.17.4-1-ARCH > > # perf version : 3.18.rc3.gcb4774b > > # arch : x86_64 > > # nrcpus online : 12 > > # nrcpus avail : 12 > > # cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz > > # cpuid : GenuineIntel,6,45,7 > > # total memory : 24646828 kB > > # cmdline : /home/namhyung/project/linux/tools/perf/perf record -ag -o > > /home/namhyung/tmp/perf/threaded/kbuild7.data -- make -j20 > > # event : name = cycles, , size = 104, { sample_period, sample_freq } = > > 4000, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled = 1, inherit > > # HEADER_CPU_TOPOLOGY info available, use -I to display > > # HEADER_NUMA_TOPOLOGY info available, use -I to display > > # pmu mappings: cpu = 4, software = 1, power = 24, uncore_pcu = 13, > > tracepoint = 2, uncore_imc_0 = 15, uncore_imc_1 = 16, uncore_imc_2 = 17, > > uncore_ > > # > > # > > > > > > $ perf data stat -i threaded/kbuild7.data > > > >Total event stats for 'threaded/kbuild7.data' file: > > > > TOTAL events: 25126492 > > MMAP events:114 > > COMM events: 117957 > > EXIT events: 240544 > > THROTTLE events: 16 > > UNTHROTTLE events: 16 > > FORK events: 120488 > > SAMPLE events: 23878219 > > MMAP2 events: 745325 > > FINISHED_ROUND events: 23813 > > > >Sample event stats: > > > > 20,579,564,471,104 cycles > > 23,878,219 samples # sampling ratio > > 99.745% (3989/4000) > > > >498.736917889 second time sampled > > > > > > $ perf report -i threaded/kbuild7.data > > We need to improve this segfault backtrace, I have to always use > addr2line to resolve those missing entries, i.e. if you try: > > addr2line -fe /path/to/your/perf 0x4dd9c8 > addr2line -fe /path/to/your/perf 0x4e2580 > > We would have resolved those lines :-/ Right, I'll add it to my TODO list. Anyway, this is a backtrace using gdb.. Thanks, Namhyung Program received signal SIGSEGV, Segmentation fault. 0x75fb229e in __strcmp_sse2_unaligned () from /usr/lib/libc.so.6 (gdb) bt #0 0x75fb229e in __strcmp_sse2_unaligned () from /usr/lib/libc.so.6 #1 0x004d3948 in _sort__dso_cmp (map_r=, map_l=) at util/sort.c:142 #2 sort__dso_cmp (left=, right=) at util/sort.c:148 #3 0x004d7f08 in hist_entry__cmp (right=0x7fffc530, left=0x323a27f0) at util/hist.c:911 #4 add_hist_entry (sample_self=true, al=0x7fffc710, entry=0x7fffc530, hists=0x18f6690) at util/hist.c:389 #5 __hists__add_entry (hists=0x18f6690, al=0x7fffc710, sym_parent=, bi=bi@entry=0x0, mi=mi@entry=0x0, period=, weight=0, transaction=0, sample_self=true) at util/hist.c:471 #6 0x004d8234 in iter_add_single_normal_entry (iter=0x7fffc740, al=) at util/hist.c:662 #7 0x004d8765 in hist_entry_iter__add (iter=0x7fffc740, al=0x7fffc710, evsel=0x18f6550, sample=, max_stack_depth=, arg=0x7fffd0a0) at util/hist.c:871 #8 0x00436353 in process_sample_event (tool=0x7fffd0a0, event=, sample=0x7fffc870, evsel=0x18f6550, machine=) at builtin-report.c:171 #9 0x004bbe23 in perf_evlist__deliver_sample (machine=0x18f4cc0, evsel=0x18f6550, sample=0x7fffc870, event=0x7fffe0bd3220, tool=0x7fffd0a0, evlist=0x18f5b50) at util/session.c:972 #10 machines__deliver_event (machines=machines@entry=0x18f4cc0, evlist=, event=event@entry=0x7fffe0bd3220, sample=sample@entry=0x7fffc870, tool=tool@entry=0x7fffd0a0, file_offset=file_offset@entry=1821434400) at util/session.c:1009 #11 0x004bc681 in perf_session__deliver_event (file_offset=1821434400, tool=0x7fffd0a0, sample=0x7fffc870, event=0x7fffe0bd3220, session=) at util/session.c:1050 #12 ordered_events__deliver_event (oe=0x18f4e00, event=) at util/session.c:109 #13 0x004bf12b in __ordered_events__flush (oe=0x18f4e00) at util/order
Re: [GIT PULL 00/30] perf/core improvements and fixes
Em Thu, May 14, 2015 at 05:23:30PM +0900, Namhyung Kim escreveu: > On Mon, May 11, 2015 at 11:06:26AM -0300, Arnaldo Carvalho de Melo wrote: > > Em Mon, May 11, 2015 at 02:09:39PM +0900, Namhyung Kim escreveu: > > > I'm seeing a segfault on 'perf report' with a large data file after > > > applying thread refcount change - it happens regardless of the atomic > > > operation. > > Any specific 'perf record' command line? Does it take a long time to > > reproduce? Any backtraces? I'll try to repro, its possible that we're > > doing one too many thread__put()... > It's a kernel build with '-j 20' and recorded data size is ~2.1GB. > It takes ~30 sec to reproduce. > > $ perf report -i threaded/kbuild7.data --header-only > # > # captured on: Thu Dec 18 12:06:35 2014 > # hostname : sejong > # os release : 3.17.4-1-ARCH > # perf version : 3.18.rc3.gcb4774b > # arch : x86_64 > # nrcpus online : 12 > # nrcpus avail : 12 > # cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz > # cpuid : GenuineIntel,6,45,7 > # total memory : 24646828 kB > # cmdline : /home/namhyung/project/linux/tools/perf/perf record -ag -o > /home/namhyung/tmp/perf/threaded/kbuild7.data -- make -j20 > # event : name = cycles, , size = 104, { sample_period, sample_freq } = > 4000, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled = 1, inherit > # HEADER_CPU_TOPOLOGY info available, use -I to display > # HEADER_NUMA_TOPOLOGY info available, use -I to display > # pmu mappings: cpu = 4, software = 1, power = 24, uncore_pcu = 13, > tracepoint = 2, uncore_imc_0 = 15, uncore_imc_1 = 16, uncore_imc_2 = 17, > uncore_ > # > # > > > $ perf data stat -i threaded/kbuild7.data > >Total event stats for 'threaded/kbuild7.data' file: > > TOTAL events: 25126492 > MMAP events:114 > COMM events: 117957 > EXIT events: 240544 > THROTTLE events: 16 > UNTHROTTLE events: 16 > FORK events: 120488 > SAMPLE events: 23878219 > MMAP2 events: 745325 > FINISHED_ROUND events: 23813 > >Sample event stats: > > 20,579,564,471,104 cycles > 23,878,219 samples # sampling ratio > 99.745% (3989/4000) > >498.736917889 second time sampled > > > $ perf report -i threaded/kbuild7.data We need to improve this segfault backtrace, I have to always use addr2line to resolve those missing entries, i.e. if you try: addr2line -fe /path/to/your/perf 0x4dd9c8 addr2line -fe /path/to/your/perf 0x4e2580 We would have resolved those lines :-/ But I think this is a longstanding bug in handling hist_entries, i.e. probably we have more than one pointer to a hist_entry and are accessing it in two places at the same time, with one of them deleting it and possibly reusing the data. > perf: Segmentation fault > backtrace > perf[0x51c7cb] > /usr/lib/libc.so.6(+0x33540)[0x7f37eb37e540] > /usr/lib/libc.so.6(+0x9029e)[0x7f37eb3db29e] > perf[0x4dd9c8] > perf(__hists__add_entry+0x188)[0x4e2258] > perf[0x4e2580] > perf(hist_entry_iter__add+0x9d)[0x4e2a7d] > perf[0x437fda] > perf[0x4c4c8e] > perf[0x4c5176] > perf[0x4c8bab] > perf[0x4c53c2] > perf[0x4c5f0c] > perf(perf_session__process_events+0xb3)[0x4c6b23] > perf(cmd_report+0x12a0)[0x439310] > perf[0x483ec3] > perf(main+0x60a)[0x42979a] > /usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f37eb36b800] > perf(_start+0x29)[0x4298b9] > [0x0] > > It seems like some memory area was corrupted.. Right, looks like use after free, for instance, freeing something still on a list or rbtree :-/ - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL 00/30] perf/core improvements and fixes
On Mon, May 11, 2015 at 11:06:26AM -0300, Arnaldo Carvalho de Melo wrote: > Em Mon, May 11, 2015 at 02:09:39PM +0900, Namhyung Kim escreveu: > > Hi Arnaldo, > > > > I'm seeing a segfault on 'perf report' with a large data file after > > applying thread refcount change - it happens regardless of the atomic > > operation. > > Any specific 'perf record' command line? Does it take a long time to > reproduce? Any backtraces? I'll try to repro, its possible that we're > doing one too many thread__put()... It's a kernel build with '-j 20' and recorded data size is ~2.1GB. It takes ~30 sec to reproduce. $ perf report -i threaded/kbuild7.data --header-only # # captured on: Thu Dec 18 12:06:35 2014 # hostname : sejong # os release : 3.17.4-1-ARCH # perf version : 3.18.rc3.gcb4774b # arch : x86_64 # nrcpus online : 12 # nrcpus avail : 12 # cpudesc : Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz # cpuid : GenuineIntel,6,45,7 # total memory : 24646828 kB # cmdline : /home/namhyung/project/linux/tools/perf/perf record -ag -o /home/namhyung/tmp/perf/threaded/kbuild7.data -- make -j20 # event : name = cycles, , size = 104, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CALLCHAIN|CPU|PERIOD, disabled = 1, inherit # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # pmu mappings: cpu = 4, software = 1, power = 24, uncore_pcu = 13, tracepoint = 2, uncore_imc_0 = 15, uncore_imc_1 = 16, uncore_imc_2 = 17, uncore_ # # $ perf data stat -i threaded/kbuild7.data Total event stats for 'threaded/kbuild7.data' file: TOTAL events: 25126492 MMAP events:114 COMM events: 117957 EXIT events: 240544 THROTTLE events: 16 UNTHROTTLE events: 16 FORK events: 120488 SAMPLE events: 23878219 MMAP2 events: 745325 FINISHED_ROUND events: 23813 Sample event stats: 20,579,564,471,104 cycles 23,878,219 samples # sampling ratio 99.745% (3989/4000) 498.736917889 second time sampled $ perf report -i threaded/kbuild7.data perf: Segmentation fault backtrace perf[0x51c7cb] /usr/lib/libc.so.6(+0x33540)[0x7f37eb37e540] /usr/lib/libc.so.6(+0x9029e)[0x7f37eb3db29e] perf[0x4dd9c8] perf(__hists__add_entry+0x188)[0x4e2258] perf[0x4e2580] perf(hist_entry_iter__add+0x9d)[0x4e2a7d] perf[0x437fda] perf[0x4c4c8e] perf[0x4c5176] perf[0x4c8bab] perf[0x4c53c2] perf[0x4c5f0c] perf(perf_session__process_events+0xb3)[0x4c6b23] perf(cmd_report+0x12a0)[0x439310] perf[0x483ec3] perf(main+0x60a)[0x42979a] /usr/lib/libc.so.6(__libc_start_main+0xf0)[0x7f37eb36b800] perf(_start+0x29)[0x4298b9] [0x0] It seems like some memory area was corrupted.. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL 00/30] perf/core improvements and fixes
Em Mon, May 11, 2015 at 02:09:39PM +0900, Namhyung Kim escreveu: > Hi Arnaldo, > > I'm seeing a segfault on 'perf report' with a large data file after > applying thread refcount change - it happens regardless of the atomic > operation. Any specific 'perf record' command line? Does it take a long time to reproduce? Any backtraces? I'll try to repro, its possible that we're doing one too many thread__put()... - Arnaldo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL 00/30] perf/core improvements and fixes
Hi Arnaldo, On Fri, May 08, 2015 at 05:56:12PM -0300, Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, > > - Arnaldo > > > > The following changes since commit cb307113746b4d184155d2c412e8069aeaa60d42: > > perf_event: Don't allow vmalloc() backed perf on powerpc (2015-05-08 > 12:26:01 +0200) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo > > for you to fetch changes up to 76d408498b08447e0f61dfdd611aeb6e8e61ce80: > > perf build: Disable libdw DWARF unwind when built with NO_DWARF (2015-05-08 > 16:43:14 -0300) > > > perf/core improvements and fixes: > > User visible: > > - 'perf probe' improvements (Masami Hiramatsu) > > - Support glob wildcards for function name > - Support $params special probe argument: Collect all function arguments > - Make --line checks validate C-style function name. > - Add --no-inlines option to avoid searching inline functions > > - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to > measure parallel waker threads generating contention for kerne > locks (hb->lock) (Davidlohr Bueso) > > Bug fixes: > > - 'perf top' survives much longer on high core count machines, more work > needed to refcount more data structures besides 'struct thread' and fix > more races (Arnaldo Carvalho de Melo) I'm seeing a segfault on 'perf report' with a large data file after applying thread refcount change - it happens regardless of the atomic operation. Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL 00/30] perf/core improvements and fixes
* Arnaldo Carvalho de Melo wrote: > Hi Ingo, > > Please consider pulling, > > - Arnaldo > > > > The following changes since commit cb307113746b4d184155d2c412e8069aeaa60d42: > > perf_event: Don't allow vmalloc() backed perf on powerpc (2015-05-08 > 12:26:01 +0200) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > tags/perf-core-for-mingo > > for you to fetch changes up to 76d408498b08447e0f61dfdd611aeb6e8e61ce80: > > perf build: Disable libdw DWARF unwind when built with NO_DWARF (2015-05-08 > 16:43:14 -0300) > > > perf/core improvements and fixes: > > User visible: > > - 'perf probe' improvements (Masami Hiramatsu) > > - Support glob wildcards for function name > - Support $params special probe argument: Collect all function arguments > - Make --line checks validate C-style function name. > - Add --no-inlines option to avoid searching inline functions > > - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to > measure parallel waker threads generating contention for kerne > locks (hb->lock) (Davidlohr Bueso) > > Bug fixes: > > - 'perf top' survives much longer on high core count machines, more work > needed to refcount more data structures besides 'struct thread' and fix > more races (Arnaldo Carvalho de Melo) > > Infrastructure: > > - Move barrier.h mb/rmb/wmb API from tools/perf/ to kernel like tools/arch/ > hierarchy (Arnaldo Carvalho de Melo) > > - Borrow atomic.h from the kernel, initially the x86 implementations > with a fallback to gcc intrinsics for the other arches, all the kernel > like framework in place for doing arch specific implementations, > preferrably cloning what is in the kernel to the greater extent > possible (Arnaldo Carvalho de Melo) > > - Protect the 'struct thread' lifetime with a reference counter, > and protect data structures that contains its instances with > a mutex (Arnaldo Carvalho de Melo > > - Disable libdw DWARF unwind when built with NO_DWARF (Naveen N. Rao) > > Signed-off-by: Arnaldo Carvalho de Melo > > > Arnaldo Carvalho de Melo (17): > perf tools: Move x86 barrier.h stuff to > tools/arch/x86/include/asm/barrier.h > perf tools: Move powerpc barrier.h stuff to > tools/arch/powerpc/include/asm/barrier.h > perf tools: Move s390 barrier.h stuff to > tools/arch/s390/include/asm/barrier.h > perf tools: Move barrier() definition to tools/include/linux/compiler.h > tools: Adopt asm-generic/barrier.h > perf tools: Move sh barrier.h stuff to > tools/arch/sh/include/asm/barrier.h > perf tools: Move sparc barrier.h stuff to > tools/arch/sparc/include/asm/barrier.h > perf tools: Move alpha barrier.h stuff to > tools/arch/alpha/include/asm/barrier.h > perf tools: Move ia64 barrier.h stuff to > tools/arch/ia64/include/asm/barrier.h > perf tools: Move arm(64) barrier.h stuff to > tools/arch/arm*/include/asm/barrier.h > perf tools: Move xtensa barrier.h stuff to > tools/arch/xtensa/include/asm/barrier.h > perf tools: Move mips barrier.h stuff to > tools/arch/mips/include/asm/barrier.h > perf tools: Move tile barrier.h stuff to > tools/arch/tile/include/asm/barrier.h > perf tools: Move generic barriers out of perf-sys.h > tools include: Add basic atomic.h implementation from the kernel sources > perf tools: Use atomic_t to implement thread__{get,put} refcnt > perf machine: Protect the machine->threads with a rwlock > > Davidlohr Bueso (2): > perf bench futex: Support parallel waker threads > perf bench futex: Handle spurious wakeups > > Masami Hiramatsu (10): > perf probe: Fix to close probe_events file in error > perf probe: Fix a typo for the flags of open > perf probe: Fix to return 0 when positive value returned > perf probe: Make --line checks validate C-style function name > perf probe: Skip kernel symbols which is out of .text > perf probe: Support $params special probe argument > perf probe: Use perf_probe_event.target instead of passing as an > argument > perf probe: Introduce probe_conf global configs > perf probe: Add --no-inlines option to avoid searching inline functions > perf probe: Support glob wildcards for function name > > Naveen N. Rao (1): > perf build: Disable libdw DWARF unwind when built with NO_DWARF > > tools/arch/alpha/include/asm/barrier.h| 8 + > tools/arch/arm/include/asm/barrier.h | 12 ++ > tools/arch/arm64/include/asm/barrier.h| 16 ++ > tools/arch/ia64/include/asm/barrier.h | 48 + > tools/arch/mips/include/asm/barrier.h | 20 ++ > tools/arch/powerpc/include/asm/barrier.h | 29 +++ > tools/arch/s390/include/asm/barrier.h | 30 +++ > tools/arch/sh/inc
[GIT PULL 00/30] perf/core improvements and fixes
Hi Ingo, Please consider pulling, - Arnaldo The following changes since commit cb307113746b4d184155d2c412e8069aeaa60d42: perf_event: Don't allow vmalloc() backed perf on powerpc (2015-05-08 12:26:01 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo for you to fetch changes up to 76d408498b08447e0f61dfdd611aeb6e8e61ce80: perf build: Disable libdw DWARF unwind when built with NO_DWARF (2015-05-08 16:43:14 -0300) perf/core improvements and fixes: User visible: - 'perf probe' improvements (Masami Hiramatsu) - Support glob wildcards for function name - Support $params special probe argument: Collect all function arguments - Make --line checks validate C-style function name. - Add --no-inlines option to avoid searching inline functions - Introduce new 'perf bench futex' benchmark: 'wake-parallel', to measure parallel waker threads generating contention for kerne locks (hb->lock) (Davidlohr Bueso) Bug fixes: - 'perf top' survives much longer on high core count machines, more work needed to refcount more data structures besides 'struct thread' and fix more races (Arnaldo Carvalho de Melo) Infrastructure: - Move barrier.h mb/rmb/wmb API from tools/perf/ to kernel like tools/arch/ hierarchy (Arnaldo Carvalho de Melo) - Borrow atomic.h from the kernel, initially the x86 implementations with a fallback to gcc intrinsics for the other arches, all the kernel like framework in place for doing arch specific implementations, preferrably cloning what is in the kernel to the greater extent possible (Arnaldo Carvalho de Melo) - Protect the 'struct thread' lifetime with a reference counter, and protect data structures that contains its instances with a mutex (Arnaldo Carvalho de Melo - Disable libdw DWARF unwind when built with NO_DWARF (Naveen N. Rao) Signed-off-by: Arnaldo Carvalho de Melo Arnaldo Carvalho de Melo (17): perf tools: Move x86 barrier.h stuff to tools/arch/x86/include/asm/barrier.h perf tools: Move powerpc barrier.h stuff to tools/arch/powerpc/include/asm/barrier.h perf tools: Move s390 barrier.h stuff to tools/arch/s390/include/asm/barrier.h perf tools: Move barrier() definition to tools/include/linux/compiler.h tools: Adopt asm-generic/barrier.h perf tools: Move sh barrier.h stuff to tools/arch/sh/include/asm/barrier.h perf tools: Move sparc barrier.h stuff to tools/arch/sparc/include/asm/barrier.h perf tools: Move alpha barrier.h stuff to tools/arch/alpha/include/asm/barrier.h perf tools: Move ia64 barrier.h stuff to tools/arch/ia64/include/asm/barrier.h perf tools: Move arm(64) barrier.h stuff to tools/arch/arm*/include/asm/barrier.h perf tools: Move xtensa barrier.h stuff to tools/arch/xtensa/include/asm/barrier.h perf tools: Move mips barrier.h stuff to tools/arch/mips/include/asm/barrier.h perf tools: Move tile barrier.h stuff to tools/arch/tile/include/asm/barrier.h perf tools: Move generic barriers out of perf-sys.h tools include: Add basic atomic.h implementation from the kernel sources perf tools: Use atomic_t to implement thread__{get,put} refcnt perf machine: Protect the machine->threads with a rwlock Davidlohr Bueso (2): perf bench futex: Support parallel waker threads perf bench futex: Handle spurious wakeups Masami Hiramatsu (10): perf probe: Fix to close probe_events file in error perf probe: Fix a typo for the flags of open perf probe: Fix to return 0 when positive value returned perf probe: Make --line checks validate C-style function name perf probe: Skip kernel symbols which is out of .text perf probe: Support $params special probe argument perf probe: Use perf_probe_event.target instead of passing as an argument perf probe: Introduce probe_conf global configs perf probe: Add --no-inlines option to avoid searching inline functions perf probe: Support glob wildcards for function name Naveen N. Rao (1): perf build: Disable libdw DWARF unwind when built with NO_DWARF tools/arch/alpha/include/asm/barrier.h| 8 + tools/arch/arm/include/asm/barrier.h | 12 ++ tools/arch/arm64/include/asm/barrier.h| 16 ++ tools/arch/ia64/include/asm/barrier.h | 48 + tools/arch/mips/include/asm/barrier.h | 20 ++ tools/arch/powerpc/include/asm/barrier.h | 29 +++ tools/arch/s390/include/asm/barrier.h | 30 +++ tools/arch/sh/include/asm/barrier.h | 32 tools/arch/sparc/include/asm/barrier.h| 8 + tools/arch/sparc/include/asm/barrier_32.h | 6 + tools/arch/sparc/include/asm/barrier_64.h | 42 + tools/arch/tile/include/asm/barrier.h | 15 ++ tools/arch/x86/include/asm/atomic.h
[GIT PULL 00/30] perf/core improvements and fixes
Hi Ingo, Please consider pulling, - Arnaldo The following changes since commit 1e6dd8adc78d4a153db253d051fd4ef6c49c9019: perf: Fix off by one test in perf_reg_value() (2012-09-19 17:08:40 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux tags/perf-core-for-mingo for you to fetch changes up to b1ac754b67b5a875d63bee880f60ccb0c6bd8899: tools lib traceevent: Handle alloc_arg failure (2012-09-24 12:31:52 -0300) perf/core improvements and fixes: . Convert the trace builtins to use the growing evsel/evlist tracepoint infrastructure, removing several open coded constructs like switch like series of strcmp to dispatch events, etc. Basically what had already been showcased in 'perf sched'. . Add evsel constructor for tracepoints, that uses libtraceevent just to parse the /format events file, use it in a new 'perf test' to make sure the libtraceevent format parsing regressions can be more readily caught. . Some strange errors were happening in some builds, but not on the next, reported by several people, problem was some parser related files, generated during the build, didn't had proper make deps, fix from Eric Sandeen. . Fix some compiling errors on 32-bit, from Feng Tang. . Don't use sscanf extension %as, not available on bionic, reimplementation by Irina Tirdea. . Fix bfd.h/libbfd detection with recent binutils, from Markus Trippelsdorf. . Introduce struct and cache information about the environment where a perf.data file was captured, from Namhyung Kim. . Fix several error paths in libtraceevent, from Namhyung Kim. Print event causing perf_event_open() to fail in 'perf record', from Stephane Eranian. . New 'kvm' analysis tool, from Xiao Guangrong. Signed-off-by: Arnaldo Carvalho de Melo Arnaldo Carvalho de Melo (11): perf kvm: Use perf_evsel__intval perf kmem: Use perf_evsel__intval and perf_session__set_tracepoints_handlers perf lock: Use perf_evsel__intval and perf_session__set_tracepoints_handlers perf timechart: Use zalloc and fix a couple leaks tools lib traceevent: Use asprintf were applicable tools lib traceevent: Use calloc were applicable tools lib traceevent: Fix afterlife gotos tools lib traceevent: Remove some die() calls tools lib traceevent: Carve out events format parsing routine perf evsel: Provide a new constructor for tracepoints perf test: Add test for the sched tracepoint format fields Eric Sandeen (1): perf tools: Fix parallel build Feng Tang (2): perf tools: Fix a compiling error in trace-event-perl.c for 32 bits machine perf tools: Fix a compiling error in util/map.c Irina Tirdea (1): perf tools: remove sscanf extension %as Markus Trippelsdorf (1): perf tools: bfd.h/libbfd detection fails with recent binutils Namhyung Kim (11): perf header: Add struct perf_session_env perf header: Add ->process callbacks to most of features perf header: Use pre-processed session env when printing perf header: Remove unused @feat arg from ->process callback perf kvm: Use perf_session_env for reading cpuid perf header: Remove perf_header__read_feature tools lib traceevent: Fix error path on process_array() tools lib traceevent: Make sure that arg->op.right is set properly tools lib traceevent: Free field if an error occurs on process_fields tools lib traceevent: Free field if an error occurs on process_flags/symbols tools lib traceevent: Handle alloc_arg failure Stephane Eranian (1): perf record: Print event causing perf_event_open() to fail Xiao Guangrong (2): KVM: x86: Export svm/vmx exit code and vector code to userspace perf kvm: Events analysis tool arch/x86/include/asm/kvm.h | 16 + arch/x86/include/asm/kvm_host.h| 16 - arch/x86/include/asm/svm.h | 205 +++-- arch/x86/include/asm/vmx.h | 127 ++- arch/x86/kvm/trace.h | 89 --- tools/lib/traceevent/event-parse.c | 570 + tools/lib/traceevent/event-parse.h |3 + tools/perf/Documentation/perf-kvm.txt | 30 +- tools/perf/MANIFEST|3 + tools/perf/Makefile|6 +- tools/perf/builtin-kmem.c | 90 +-- tools/perf/builtin-kvm.c | 836 +++- tools/perf/builtin-lock.c | 233 ++ tools/perf/builtin-record.c|6 +- tools/perf/builtin-test.c | 86 ++ tools/perf/builtin-timechart.c | 40 +- tools/perf/uti