systemtap release 4.4
The SystemTap team announces release 4.4 Enhancements to this release include: significant performance and stability improvements to user-space probing, implicit thread-local storage variables can now be accessed on x86_64, ppc and s390, support for processing floating point values, significantly improved concurrency for scripts using global variables via shortened critical sections, new syntax for defining aliases with both a prologue and epilogue, new @probewrite predicate and syscall arguments are writable again = Where to get it https://sourceware.org/systemtap/ - our project page https://sourceware.org/systemtap/ftp/releases/ https://koji.fedoraproject.org/koji/packageinfo?packageID=615 git tag release-4.4 (commit 988f439af39a) There have been over 135 commits since the last release. There have been 25+ bugs fixed / features added since the last release. = SystemTap frontend (stap) changes - New syntax for defining aliases with both a prologue and an epilogue: 'probe ALIAS = PROBE { }, { }' - New @probewrite predicate. @probewrite(var) returns 1 if var has been written to in the probe handler body and 0 otherwise. The check can only be used with probes that have an epilogue or prologue. - Implicit thread local storage variables can now be accessed on x86_64, ppc, and s390. = SystemTap backend changes - Various performance and stability improvements to user-space probing. This includes replacing spinlocks with RCU locks in vma map and utrace task's hash table lookups which reduces CPU time a lot when there are a lot of target processes and vma tracker or task finder is enabled. Also increased the default hash table sizes to reduce hash conflicts. Special thanks to Yichun Zhang and Sultan Alsawaf for these contributions. - The locks required to protect concurrent access to global variables has been optimized with a "pushdown" algorithm, so that they span the smallest possible critical region. Formerly, and with --compatible=4.3, locks always spanned the entire probe handler. Lock pushdown means much greater potential concurrency between probes running on different CPUs. - Systemtap now supports kernel-lockdown configurations that disable debugfs, by instead using procfs to carry relayfs transport files. = SystemTap tapset changes - Systemtap now supports extracting 64-bit floating point and stored in long type. Also basic floating point arithmetic and comparison functions are provided in a tapset. More automated syntax coming soon. e.g.: probe process.function("foo") { fp = user_long(& $fp_variable) println (fp_to_string (fp_add (string_to_fp("3.14"), fp))) } - Make syscall arguments writable again in non-DWARF probes on kernels that use syscall wrappers to pass arguments via pt_regs (currently x86_64 4.17+ and aarch64 4.19+). For example, the following probe adds rwx user permissions to any directory made by the process specified by stap -c: probe nd_syscall.mkdir { if (pid() == target()) mode |= 0700 } = SystemTap sample scripts - All 180+ examples can be found at https://sourceware.org/systemtap/examples/ - New sample scripts: floatingpoint.stp Extract a floating point value from a process and print the results of various simple floating point operations - The following sample scripts have been enabled to run on the stapbpf backend: general/sizeof.stp memory/overcommit.stp = Examples of tested kernel versions 2.6.32 (RHEL6 x86_64) 3.10.0 (RHEL7 x86_64) 4.15.0 (Ubuntu 18.04 x86_64) 4.18.0 (RHEL8 x86_64, aarch64, ppc64le, s390x) 5.3.8 (Fedora 30 i686) 5.8.16 (Fedora 32 x86_64) 5.8.18 (Fedora 33 x86_64) 5.9.0-rc7 (Fedora rawhide x86_64) 5.10.0-rc1 (Fedora rawhide x86_64) = Known issues with this release - There are known issues on kernel 5.10+ after adapting to set_fs() removal, with some memory accesses that previously returned valid data instead returning -EFAULT (see PR26811). - An sdt probe cannot parse a parameter that uses a segment register. (PR13429) - The presence of a line such as *CFLAGS += $(call cc-option, -fno-var-tracking-assignments) in the linux kernel Makefile unnecessarily reduces debuginfo quality, consider removing that line if you build kernels. = Contributors for this release Aaron Merey, Alice Zhang, Craig Ringer, Frank Ch. Eigler, Martin Cermak, Sagar Patel, Sergei Trofimovich*, Serhei Makarov, Stan Cox, Sultan Alsawaf*, Thorsten Glaser*, William Cohen, Yichun Zhang (agentzh) Special thanks to new contributors, marked with '*' above. Special thanks to Aaron Merey for drafting these notes. = Bugs fixed for this release <https://sourceware.org/PR#> 10013 Support ENABLED sdt probe macro 12663 statement probes on inlined-function-call sites: search .debug_lin
Re: [PATCH 0/2] perf probe: Support debuginfod client
Hi - > > > I need to support this in pahole... > > > > pahole/dwarves use elfutils, so it already has automatic support. > > https://sourceware.org/elfutils/Debuginfod.html > > I'm still not sure that which interface of elfutils I should use > for this "automatic" debuginfod support. Are there good documentation > about it? The libdwfl part of the elfutils API falls back to debuginfod lookups internally, so e.g. systemtap had to do nothing to benefit. > Since this series just for the kernel binary, I have to check we > can do something on user-space binaries. It should work identically & transparently. If you're using one of a few key packages of a few mainstream distros, the public debuginfod server may already have the material available. - FChE
Re: [PATCH 0/2] perf probe: Support debuginfod client
Hi - > > Nice, even uses the source code fetching part of the webapi! > > So, can I take that as an Acked-by or Reviewed-by? Sure. > I need to support this in pahole... pahole/dwarves use elfutils, so it already has automatic support. https://sourceware.org/elfutils/Debuginfod.html - FChE
Re: [PATCH 0/2] perf probe: Support debuginfod client
Hi - Nice, even uses the source code fetching part of the webapi! - FChE
Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless
Masami Hiramatsu writes: > Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking > care of spinlock too? On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t types/functions, to keep its probe handlers as atomic as we can make them. - FChE
Re: [PATCH v4 00/10] Function Granular KASLR
Hi - > > We have relocated based on sections, not some subset of function > > symbols accessible that way, partly because DWARF line- and DIE- based > > probes can map to addresses some way away from function symbols, into > > function interiors, or cloned/moved bits of optimized code. It would > > take some work to prove that function-symbol based heuristic > > arithmetic would have just as much reach. > > Interesting. Do you have an example handy? No, I'm afraid I don't have one that I know cannot possibly be expressed by reference to a function symbol only. I'd look at systemtap (4.3) probe point lists like: % stap -vL 'kernel.statement("*@kernel/*verif*.c:*")' % stap -vL 'module("amdgpu").statement("*@*execution*.c:*")' which give an impression of computed PC addresses. > It seems like something like that would reference the enclosing > section, which means we can't just leave them out of the sysfs > list... (but if such things never happen in the function-sections, > then we *can* remove them...) I'm not sure we can easily prove they can never happen there. - FChE
Re: [PATCH v4 00/10] Function Granular KASLR
Hi - On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote: > [...] > > Systemtap needs to know base addresses of loaded text & data sections, > > in order to perform relocation of probe point PCs and context data > > addresses. It uses /sys/module/, kind of under protest, because > > there seems to exist no MODULE_EXPORT'd API to get at that information > > some other way. > > Wouldn't /proc/kallsysms entries cover this? I must be missing > something... We have relocated based on sections, not some subset of function symbols accessible that way, partly because DWARF line- and DIE- based probes can map to addresses some way away from function symbols, into function interiors, or cloned/moved bits of optimized code. It would take some work to prove that function-symbol based heuristic arithmetic would have just as much reach. - FChE
Re: [PATCH v4 00/10] Function Granular KASLR
Hi - > > While this does seem to be the right solution for the extant problem, I > > do want to take a moment and ask if the function sections need to be > > exposed at all? What tools use this information, and do they just want > > to see the bounds of the code region? (i.e. the start/end of all the > > .text* sections) Perhaps .text.* could be excluded from the sysfs > > section list? > [[cc += FChE, see [0] for Evgenii's full mail ]] Thanks! > It looks like debugging tools like systemtap [1], gdb [2] and its > add-symbol-file cmd, etc. peek at the /sys/module//section/ info. > But yeah, it would be preferable if we didn't export a long sysfs > representation if nobody actually needs it. Systemtap needs to know base addresses of loaded text & data sections, in order to perform relocation of probe point PCs and context data addresses. It uses /sys/module/, kind of under protest, because there seems to exist no MODULE_EXPORT'd API to get at that information some other way. - FChE
systemtap 4.3 release
The SystemTap team announces release 4.3 Enhancements to this release include: Userspace probes may be targeted by buildid as an alternate to a path name, script functions may use probe $context variables, stapbpf improvements including try-catch statements, and error probes. = Where to get it https://sourceware.org/systemtap/ - our project page https://sourceware.org/systemtap/ftp/releases/ https://koji.fedoraproject.org/koji/packageinfo?packageID=615 git tag release-4.3 (commit c9c23c987d) There have been over 120.31415 commits since the last release. There have been 27+ bugs fixed / features added since the last release. = SystemTap frontend (stap) changes - The target of process probes may be specified by hexadecimal buildid as an alternative to a path name. This makes it possible to probe a variety of versions or aliases of a program, even if they are running inside containers under a different path name. Works best with a debuginfod server that publishes the executables / debuginfo. The following probes glibc.so 2.32-2.fc32.x86_64 from fedora running anywhere on your machine. # export DEBUGINFOD_URLS=https://debuginfod.elfutils.org/ # stap -e 'probe process("7ca24d4dc3de9d62d9ad6bb25e5b70a3e57a342f") .function("*system") { log("hi") }' - Functions can now be context-sensitive, meaning that they may make references to $context variables and similar constructs that could formerly appear only inside probe handlers. This is implemented by cloning such functions for each probe. Only some probe point (dwarf-based user & kernel) types are supported. function foo () { println ($$vars) } probe kernel.function("do_exit") { foo() } probe process("/bin/ls").function("main") { foo() } probe process("/lib*/libc.so.6").mark("*") { foo() } - The process(EXE).begin probe handlers are now always triggered for already-running target processes. = SystemTap backend changes - Almost all of the kmalloc() allocations exceeding 4KB have been replaced by vmalloc(). This helps stap's kernel runtime work properly on systems with serious fragmentation in physical memory address space. - More $variable resolution errors may be generated, especially for @var("") constructs that target global variables. These are duplicate-eliminated by default, but may be seen with verbosity>=2. - The stapbpf backend now supports try-catch statements, an improved error tapset, and error probes. - The "Build-id mismatch" condition now becomes a warning, so while related probes are not inserted, the rest of the script may run. = SystemTap tapset changes - Added a new tapset function dump_stack() which prints the current kernel backtrace to the kernel trace buffer (as a thin wrapper around the kernel C API function dump_stack). - The proc_mem_rss() tapset function now includes the resident shared memory pages as expected. The old behavior can be restored by the --compatible=4.2 option on the command line. - Modules compiled with guru mode for a particular kernel version can now only be loaded on kernels with exactly matching version (vermagic string) instead of any kernel whose API matches according to the modversions mechanism. Use -B CONFIG_MODVERSIONS=y to restore the prior behaviour. = SystemTap sample scripts - All 180+ examples can be found at https://sourceware.org/systemtap/examples/ - New sample scripts: security-band-aids/cve-2018-101.stp security-band-aids/cve-2018-6485 Historical emergency security band-aid scripts for example purposes only = Examples of tested kernel versions 2.6.32 (RHEL6 x86_64) 3.10.0 (RHEL7 x86_64) 4.15.0 (Ubuntu 18.04 x86_64) 4.18.0 (RHEL8 x86_64, aarch64, ppc64le, s390x) 5.3.8 (Fedora 30 i686) 5.3.9 (Fedora 31 x86_64) 5.4.0 (Fedora 32 x86_64) 5.7.0 (Fedora 33 x86_64) = Known issues with this release - A change to syscall wrappers has resulted in the loss of the ability to modify syscall parameters. (PR26015) - An sdt probe cannot parse a parameter that uses a segment register. (PR13429) - The presence of a line such as *CFLAGS += $(call cc-option, -fno-var-tracking-assignments) in the linux kernel Makefile unnecessarily reduces debuginfo quality, consider removing that line if you build kernels. = Contributors for this release Aaron Merey, Alice Zhang*, Craig Ringer*, Frank Ch. Eigler, Frank Sorenson*, HATAYAMA Daisuke*, Juri Lelli*, Sagar Patel, Serhei Makarov, Siddhesh Poyarekar, William Cohen, Yichun Zhang (agentzh) Special thanks to new contributors, marked with '*' above. = Bugs fixed for this release <https://sourceware.org/PR#> 6834 stap-client should not use bash network redirections 10280 allow relaxing of `uname -r` matching runtime assertion ro ABI-compatible kernel series 11249 uprobes fails on glibc get-pc-thunk ca
Re: [PATCH 2/3] module: Fix up module_notifier return values.
Hi - > > While auditing all module notifiers I noticed a whole bunch of fail > > wrt the return value. Notifiers have a 'special' return semantics. >From peterz's comments, the patches, it's not obvious to me how one is to choose between 0 (NOTIFY_DONE) and 1 (NOTIFY_OK) in the case of a routine success. > [...] > I have a similar erroneous module notifier return value pattern > in lttng-modules as well. I'll go fix it right away. CCing > Frank Eigler from SystemTAP which AFAIK use a copy of > lttng-tracepoint.c in their project, which should be fixed > as well. I'm pasting the lttng-modules fix below. Sure, following suit. Thanks. - FChE
systemtap 4.0 release
ly of executables run on the system cpu_throttle.stp Monitor Intel processors for throttling due to power or thermal limits syscallsbypid.stp Provide a per-process syscall tally on the system syscallerrorsbypid.stp Provide a per-process syscall error tally syscalllatency.stp Provide a per-process accumulation of syscall latency - New stap-exporter-scripts/ subdirectory in systemtap.examples. - Numerous example script improvements and new samples galore: gmalloc_watch.stp Tracing glib2 memory allocations ioctl_handler.stp Monitor which executables use ioctl syscalls and what kernel code is handling the ioctl libguestfs_log.stp Trace libguestfs startup measureinterval.stp Measure intervals between events php-trace.stp Tracing of PHP code execution stap_time.stp Provide elapsed times for passes of SystemTap script compilation tcl-funtop.stp Profile Tcl calls tcl-trace.stp Callgraph tracing of Tcl code cve-2018-14634.stp historical emergency security band-aid, for reference/education only = Examples of tested kernel versions 2.6.32 (RHEL 6 x86_64, i686) 3.10.0 (RHEL 7 x86_64) 4.15.0 (Ubuntu 18.04 x86_64) 4.16.13 (Fedora 28 x86_64) 4.18.0 (Fedora x86_64) 4.18.12 (Fedora 28 x86_64, arm64, ppc64) 4.19-rc7 (Fedora Rawhide x86_64) = Known issues with this release - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, dramatically reducing debuginfo quality, which can cause debuginfo failures. The simplest fix is to erase, excise, nay, eradicate this line from the top level linux Makefile: KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) = Coming soon - prometheus-exporter is here, more tasty systemtap & http chocolate en route = Contributors for this release Aaron Merey, David Smith, Frank Ch. Eigler, Jafeer Uddin, Martin Cermak, Masanari Iida, *Paulo Andrade, Serhei Makarov, Stan Cox, Victor Kamensky, William Cohen, Yichun Zhang (agentzh), *Zexuan Luo Special thanks to new contributors, marked with '*' above. Special thanks to Serhei Makarov for assembling these notes. = Bugs fixed for this release <https://sourceware.org/PR#> 14690 the syscall tapsets could be written to prefer the 'syscalls' tracepoints 21888 bpf variants of log()/etc. functions 22310 build parser syntax for all the new staptree types 23160 4.17 breaks syscalls tapset 23284 dmesg should identify the name of the stap script 23356 server.exp test case hangs on rawhide 23359 impose security constraints on @kderef, @kregister 23407 bpf: backend should support strings as first class values 23480 bpfinterp.cxx should respond to ^C 23488 support CONFIG_DEBUG_INFO_REDUCED builds 23510 Tapset function println() not supported in the bpf runtime 23599 Use of usymname() with stap -u leads to kernel module compilation errors 23608 long stapregex overflows arc_priority 23666 Aggregate operations specified in foreach loop is not respected by the translator 23736 rawhide 4.19 kernel panic during tracepoint enumeration 23760 .statement() wildcard probes fail if any cu/srcfile lacks debug_line data 23766 staprun -R (default) fails for modules with short hardcoded -m names
systemtap 4.0 release
ly of executables run on the system cpu_throttle.stp Monitor Intel processors for throttling due to power or thermal limits syscallsbypid.stp Provide a per-process syscall tally on the system syscallerrorsbypid.stp Provide a per-process syscall error tally syscalllatency.stp Provide a per-process accumulation of syscall latency - New stap-exporter-scripts/ subdirectory in systemtap.examples. - Numerous example script improvements and new samples galore: gmalloc_watch.stp Tracing glib2 memory allocations ioctl_handler.stp Monitor which executables use ioctl syscalls and what kernel code is handling the ioctl libguestfs_log.stp Trace libguestfs startup measureinterval.stp Measure intervals between events php-trace.stp Tracing of PHP code execution stap_time.stp Provide elapsed times for passes of SystemTap script compilation tcl-funtop.stp Profile Tcl calls tcl-trace.stp Callgraph tracing of Tcl code cve-2018-14634.stp historical emergency security band-aid, for reference/education only = Examples of tested kernel versions 2.6.32 (RHEL 6 x86_64, i686) 3.10.0 (RHEL 7 x86_64) 4.15.0 (Ubuntu 18.04 x86_64) 4.16.13 (Fedora 28 x86_64) 4.18.0 (Fedora x86_64) 4.18.12 (Fedora 28 x86_64, arm64, ppc64) 4.19-rc7 (Fedora Rawhide x86_64) = Known issues with this release - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, dramatically reducing debuginfo quality, which can cause debuginfo failures. The simplest fix is to erase, excise, nay, eradicate this line from the top level linux Makefile: KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) = Coming soon - prometheus-exporter is here, more tasty systemtap & http chocolate en route = Contributors for this release Aaron Merey, David Smith, Frank Ch. Eigler, Jafeer Uddin, Martin Cermak, Masanari Iida, *Paulo Andrade, Serhei Makarov, Stan Cox, Victor Kamensky, William Cohen, Yichun Zhang (agentzh), *Zexuan Luo Special thanks to new contributors, marked with '*' above. Special thanks to Serhei Makarov for assembling these notes. = Bugs fixed for this release <https://sourceware.org/PR#> 14690 the syscall tapsets could be written to prefer the 'syscalls' tracepoints 21888 bpf variants of log()/etc. functions 22310 build parser syntax for all the new staptree types 23160 4.17 breaks syscalls tapset 23284 dmesg should identify the name of the stap script 23356 server.exp test case hangs on rawhide 23359 impose security constraints on @kderef, @kregister 23407 bpf: backend should support strings as first class values 23480 bpfinterp.cxx should respond to ^C 23488 support CONFIG_DEBUG_INFO_REDUCED builds 23510 Tapset function println() not supported in the bpf runtime 23599 Use of usymname() with stap -u leads to kernel module compilation errors 23608 long stapregex overflows arc_priority 23666 Aggregate operations specified in foreach loop is not respected by the translator 23736 rawhide 4.19 kernel panic during tracepoint enumeration 23760 .statement() wildcard probes fail if any cu/srcfile lacks debug_line data 23766 staprun -R (default) fails for modules with short hardcoded -m names
Re: Code of Conduct: Let's revamp it.
Rik van Riel writes: > [...] The goal of the code of conduct is to make the community > welcoming, and to help people with being a part of the Linux > community. [...] That may well be the goal. But the proper way to evaluate policy is not the laudability of its goals but its forseeable and/or actual effects. Is there any plan to evaluate the CoC empirically somehow to see if it accomplishes what its proponents hope? - FChE
Re: Code of Conduct: Let's revamp it.
Rik van Riel writes: > [...] The goal of the code of conduct is to make the community > welcoming, and to help people with being a part of the Linux > community. [...] That may well be the goal. But the proper way to evaluate policy is not the laudability of its goals but its forseeable and/or actual effects. Is there any plan to evaluate the CoC empirically somehow to see if it accomplishes what its proponents hope? - FChE
systemtap 3.3 release
The SystemTap team announces release 3.3! eBPF backend extensions, easier access to examples, adapting to meltdown/spectre complications, real-time / high-cpu-count concurrency fixes = Where to get it https://sourceware.org/systemtap/ - our project page https://sourceware.org/systemtap/ftp/releases/systemtap-3.3.tar.gz https://koji.fedoraproject.org/koji/packageinfo?packageID=615 git tag release-3.3 (commit 48867d1cface944) There have been over 237 commits since the last release. There have been over 19 bugs fixed / features added since the last release. = How to build it See the README and NEWS files at https://sourceware.org/git/?p=systemtap.git;a=tree Further information at https://sourceware.org/systemtap/wiki/ = SystemTap frontend (stap) changes - The "stap --sysroot /PATH" option has received a revamp, so it works much better against cross-compiled environments. - A new "stap --example FOO.stp" mode searches the example scripts distributed with systemtap for a file named FOO.stp, so its whole path does not need to be typed in. = SystemTap backend changes - The eBPF backend now supports uprobes, perf counter, timer, and tracepoint probes. - The eBPF backend has learned to perform loops - at least in the userspace "begin/end" probe contexts, so one can iterate across BPF arrays for reporting. (The linux kernel eBPF interpreter precludes loops and string processing.) It can also handle much larger probe handler bodies, with a smarter register spiller/allocator. - Systemtap's runtime has learned to deal with some of the collateral damage from kernel hardening after meltdown/spectre, including more pointer hiding and relocation. The kptr_restrict procfs flag is forced on if running on a new enough kernel. - Several low level locking-related fixes were added to the runtime that used uprobes/tracepoint apis, in order to work more reliably on real-time kernels and on high-cpu-count machines. = SystemTap tapset changes - Runtime/tapsets were ported to include up to kernel version 4.17. (The syscall tapsets are broken on kernel 4.17-rc, and will be fixed in a next release coming soon; PR23160.) - Some MIPS support has been added. = SystemTap sample scripts All 178 examples can be found at https://sourceware.org/systemtap/examples/ - io_submit.stp has been optimized for larger systems - new example capture_ssl_master_secrets.stp is just as naughty as it sounds = Examples of tested kernel versions 2.6.32 (RHEL 6 x86 and x86_64) 3.10.0 (RHEL 7 x86_64) 4.16.5 (Fedora 27 x86_64) 4.18-rc0 (Fedora rawhide x86_64) = Known issues with this release - The syscall tapset is broken for kernels >= 4.17. Use the kernel.trace("sys_enter") probe until we get this fixed. (PR23160) - Some post-meltdown/spectre kernel versions have broken uprobes (resulting in SIGILL in userspace programs) and kernel tracepoints. Kernel fixes are underway. (RHBZ1579521) - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, dramatically reducing debuginfo quality, which can cause debuginfo failures. The simplest fix is to erase, excise, nay, eradicate this line from the top level linux Makefile: KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) = Coming soon - http and systemtap coming together, like peanut butter and chocolate = Contributors for this release Aaron Merey, *Aryeh Weinreb, *Bernhard Wiedemann, David Smith, Frank Ch. Eigler, *Gustavo Moreira, *Igor Gnatenko, *Iryna Shcherbina, *Jafeer Uddin, Jeff Moyer, *Lukas Herbolt, Mark Wielaard, Martin Cermak, *Petr Viktorin, Serhei Makarov, Stan Cox, Stefan Hajnoczi, Timo Juhani Lindfors, Victor Kamensky Special thanks to new contributors, marked with '*' above. = Bugs fixed for this release <https://sourceware.org/PR#> 21107 a few more access_ok tweaks needed 21890 bpf uprobes support 22004 dyninst does not handle R_*_IRELATIV in .rela.plt 22141 The RPM specfile needs an update handling the bpf bits 22248 failure processing linux-vdso64.so.1 22311 bpf: drop the copy of the bpf map logic & snapshot-based pre-post begin {} synch 22313 bpf: exit-state checking prologue 22314 bpf: add support for uprobes, uretprobe and tracepoint events 22323 bpf: format string tags appearing in output when wildcards are used 22327 the loadavg tapset no longer works on recent kernels 22328 bpf: add timer probes 22462 quoted include path 22536 Add shorthand option --bpf for --runtime=bpf 22551 on rawhide, we're getting a compile error that init_timer() doesn't exist 22695 "
systemtap 3.3 release
The SystemTap team announces release 3.3! eBPF backend extensions, easier access to examples, adapting to meltdown/spectre complications, real-time / high-cpu-count concurrency fixes = Where to get it https://sourceware.org/systemtap/ - our project page https://sourceware.org/systemtap/ftp/releases/systemtap-3.3.tar.gz https://koji.fedoraproject.org/koji/packageinfo?packageID=615 git tag release-3.3 (commit 48867d1cface944) There have been over 237 commits since the last release. There have been over 19 bugs fixed / features added since the last release. = How to build it See the README and NEWS files at https://sourceware.org/git/?p=systemtap.git;a=tree Further information at https://sourceware.org/systemtap/wiki/ = SystemTap frontend (stap) changes - The "stap --sysroot /PATH" option has received a revamp, so it works much better against cross-compiled environments. - A new "stap --example FOO.stp" mode searches the example scripts distributed with systemtap for a file named FOO.stp, so its whole path does not need to be typed in. = SystemTap backend changes - The eBPF backend now supports uprobes, perf counter, timer, and tracepoint probes. - The eBPF backend has learned to perform loops - at least in the userspace "begin/end" probe contexts, so one can iterate across BPF arrays for reporting. (The linux kernel eBPF interpreter precludes loops and string processing.) It can also handle much larger probe handler bodies, with a smarter register spiller/allocator. - Systemtap's runtime has learned to deal with some of the collateral damage from kernel hardening after meltdown/spectre, including more pointer hiding and relocation. The kptr_restrict procfs flag is forced on if running on a new enough kernel. - Several low level locking-related fixes were added to the runtime that used uprobes/tracepoint apis, in order to work more reliably on real-time kernels and on high-cpu-count machines. = SystemTap tapset changes - Runtime/tapsets were ported to include up to kernel version 4.17. (The syscall tapsets are broken on kernel 4.17-rc, and will be fixed in a next release coming soon; PR23160.) - Some MIPS support has been added. = SystemTap sample scripts All 178 examples can be found at https://sourceware.org/systemtap/examples/ - io_submit.stp has been optimized for larger systems - new example capture_ssl_master_secrets.stp is just as naughty as it sounds = Examples of tested kernel versions 2.6.32 (RHEL 6 x86 and x86_64) 3.10.0 (RHEL 7 x86_64) 4.16.5 (Fedora 27 x86_64) 4.18-rc0 (Fedora rawhide x86_64) = Known issues with this release - The syscall tapset is broken for kernels >= 4.17. Use the kernel.trace("sys_enter") probe until we get this fixed. (PR23160) - Some post-meltdown/spectre kernel versions have broken uprobes (resulting in SIGILL in userspace programs) and kernel tracepoints. Kernel fixes are underway. (RHBZ1579521) - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, dramatically reducing debuginfo quality, which can cause debuginfo failures. The simplest fix is to erase, excise, nay, eradicate this line from the top level linux Makefile: KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) = Coming soon - http and systemtap coming together, like peanut butter and chocolate = Contributors for this release Aaron Merey, *Aryeh Weinreb, *Bernhard Wiedemann, David Smith, Frank Ch. Eigler, *Gustavo Moreira, *Igor Gnatenko, *Iryna Shcherbina, *Jafeer Uddin, Jeff Moyer, *Lukas Herbolt, Mark Wielaard, Martin Cermak, *Petr Viktorin, Serhei Makarov, Stan Cox, Stefan Hajnoczi, Timo Juhani Lindfors, Victor Kamensky Special thanks to new contributors, marked with '*' above. = Bugs fixed for this release <https://sourceware.org/PR#> 21107 a few more access_ok tweaks needed 21890 bpf uprobes support 22004 dyninst does not handle R_*_IRELATIV in .rela.plt 22141 The RPM specfile needs an update handling the bpf bits 22248 failure processing linux-vdso64.so.1 22311 bpf: drop the copy of the bpf map logic & snapshot-based pre-post begin {} synch 22313 bpf: exit-state checking prologue 22314 bpf: add support for uprobes, uretprobe and tracepoint events 22323 bpf: format string tags appearing in output when wildcards are used 22327 the loadavg tapset no longer works on recent kernels 22328 bpf: add timer probes 22462 quoted include path 22536 Add shorthand option --bpf for --runtime=bpf 22551 on rawhide, we're getting a compile error that init_timer() doesn't exist 22695 "
Re: [RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation
mhiramat wrote: > Here is a correction of patches to introduce kretprobe_instance > dynamic allocation for avoiding kretprobe silently miss-hits. > [...] Thanks, this looks automatically useful also to systemtap users. - FChE
Re: [RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation
mhiramat wrote: > Here is a correction of patches to introduce kretprobe_instance > dynamic allocation for avoiding kretprobe silently miss-hits. > [...] Thanks, this looks automatically useful also to systemtap users. - FChE
Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
Hi, Tom - tom.zanussi wrote: > [...] >> Hmm, this looks a bit hard to understand, I guess that onmatch() means >> "if there is an event which has ts0 variable and the event's key matches >> this key, take some action". > > Yes, that's pretty much it. It's essentially shorthand for this kind of > common idiom, where timestamp[] is an associative array, which in our > case is the tracing_map of the histogram: > > event sched_wakeup() > { > ts0[wakeup_pid] = now() > } > event sched_switch() > { > if (ts0[next_pid]) > latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */ > } By the way, here is a working systemtap version of this demo: # cat foo.stp global ts0%, latency% function now() { return gettimeofday_us() } probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() } probe kernel.trace("sched_switch") { if (ts0[$next->pid]) latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid]; } probe timer.s(5) { foreach ([pid+,x] in latency) { println("pid:", pid, " prio:", x) print(@hist_log(latency[pid,x])) } delete latency } # stap foo.stp [...] pid:20183 prio:109 value |-- count 2 | 0 4 | 0 8 |@ 1 16 | 0 32 | 0 pid:29095 prio:120 value |-- count 0 |1 1 |8 2 |@@ 76 4 |@@ 60 8 |@@ 68 16 | 16 32 |0 64 |0 [...] > ts0 is basically a per-table-entry variable - there's one for each > entry in the table, and it can only be accessed by events with > matching keys. [...] So, that's a long-winded way of saying that the > name ts0 is global across all tables (histograms) but an instance of > ts0 is local to each entry in the table that owns the name. In systemtap, one of the things we take care of is automatic concurrency control over such shared variables. Even if many CPUs run these same functions and try to access the same ts0/latency hash tables at the same time, things will work correctly. I'm curious how your code deals with this. - FChE
Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support
Hi, Tom - tom.zanussi wrote: > [...] >> Hmm, this looks a bit hard to understand, I guess that onmatch() means >> "if there is an event which has ts0 variable and the event's key matches >> this key, take some action". > > Yes, that's pretty much it. It's essentially shorthand for this kind of > common idiom, where timestamp[] is an associative array, which in our > case is the tracing_map of the histogram: > > event sched_wakeup() > { > ts0[wakeup_pid] = now() > } > event sched_switch() > { > if (ts0[next_pid]) > latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */ > } By the way, here is a working systemtap version of this demo: # cat foo.stp global ts0%, latency% function now() { return gettimeofday_us() } probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() } probe kernel.trace("sched_switch") { if (ts0[$next->pid]) latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid]; } probe timer.s(5) { foreach ([pid+,x] in latency) { println("pid:", pid, " prio:", x) print(@hist_log(latency[pid,x])) } delete latency } # stap foo.stp [...] pid:20183 prio:109 value |-- count 2 | 0 4 | 0 8 |@ 1 16 | 0 32 | 0 pid:29095 prio:120 value |-- count 0 |1 1 |8 2 |@@ 76 4 |@@ 60 8 |@@ 68 16 | 16 32 |0 64 |0 [...] > ts0 is basically a per-table-entry variable - there's one for each > entry in the table, and it can only be accessed by events with > matching keys. [...] So, that's a long-winded way of saying that the > name ts0 is global across all tables (histograms) but an instance of > ts0 is local to each entry in the table that owns the name. In systemtap, one of the things we take care of is automatic concurrency control over such shared variables. Even if many CPUs run these same functions and try to access the same ts0/latency hash tables at the same time, things will work correctly. I'm curious how your code deals with this. - FChE
Re: [RFC][PATCH] x86: Verify access_ok() context
Hi, Thomas - > Well, if you are not in thread context then the check is pointless: > __range_not_ok(addr, size, user_addr_max()) > and: > #define user_addr_max() (current->thread.addr_limit.seg) > > So what guarantees when you are not in context of current, i.e. in thread > context, that the addr/size which is checked against the limits of current > actually belongs to current? We're probably in task context in that there is a valid current(), but running with preemption and/or interrupts and/or pagefaults disabled at that point, so in_task() objects. Think of it like from a kprobes handler callback, except maybe more temporary preemption blocking. > I assume this is about systemtap modules. Can you please explain > what you are trying to achieve? I guess you know that you actually > access current, but then we need a seperate special function and not > relaxing of the checks. This part is used in a part of the runtime that is a userspace analogue of probe_kernel_address(), where we're given a potential userspace address. We would like to quickly test whether it's even plausible as a userspace address, before doing a (pagefault-disabled) trial fetch/store to it. - FChE
Re: [RFC][PATCH] x86: Verify access_ok() context
Hi, Thomas - > Well, if you are not in thread context then the check is pointless: > __range_not_ok(addr, size, user_addr_max()) > and: > #define user_addr_max() (current->thread.addr_limit.seg) > > So what guarantees when you are not in context of current, i.e. in thread > context, that the addr/size which is checked against the limits of current > actually belongs to current? We're probably in task context in that there is a valid current(), but running with preemption and/or interrupts and/or pagefaults disabled at that point, so in_task() objects. Think of it like from a kprobes handler callback, except maybe more temporary preemption blocking. > I assume this is about systemtap modules. Can you please explain > what you are trying to achieve? I guess you know that you actually > access current, but then we need a seperate special function and not > relaxing of the checks. This part is used in a part of the runtime that is a userspace analogue of probe_kernel_address(), where we're given a potential userspace address. We would like to quickly test whether it's even plausible as a userspace address, before doing a (pagefault-disabled) trial fetch/store to it. - FChE
Re: [RFC][PATCH] x86: Verify access_ok() context
Hi, Thomas - On Thu, Jan 19, 2017 at 07:12:48PM +0100, Thomas Gleixner wrote: > [...] > It does matter very much, because the fact that the warning triggers tells > me that it's placed in code which is NOT executed in task context. > [...] > We are not papering over problems. Understood. We were interpreting the comments around access_ok to mean that the underlying hazard condition was different (stricter) than in_task(). If the warning could be made to match that hazard condition more precisely, then safe but non-in_task() callers can use access_ok() without the warning. - FChE
Re: [RFC][PATCH] x86: Verify access_ok() context
Hi, Thomas - On Thu, Jan 19, 2017 at 07:12:48PM +0100, Thomas Gleixner wrote: > [...] > It does matter very much, because the fact that the warning triggers tells > me that it's placed in code which is NOT executed in task context. > [...] > We are not papering over problems. Understood. We were interpreting the comments around access_ok to mean that the underlying hazard condition was different (stricter) than in_task(). If the warning could be made to match that hazard condition more precisely, then safe but non-in_task() callers can use access_ok() without the warning. - FChE
Re: BPF runtime for systemtap
brendan.d.gregg wrote: > [...] > Great! Is there a hello world example in there somewhere? I found this: > [...] Yup. Here is a smoke test. (A great many other things are not yet working.) % sudo ./stap -v --runtime=bpf -e 'global foo probe kprobe.function("vfs_read"), kprobe.function("do_select") { foo++ } probe begin { printf("systemtap starting probe\n") } probe end { printf("systemtap ending probe\n"); printf("foo = %d\n", foo) }' Pass 1: parsed user script and 35 library scripts using 198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/71real ms. Pass 2: analyzed script: 4 probes, 0 functions, 0 embeds, 1 global using 198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/0real ms. Pass 4: compiled BPF into "stap_32349.bo" in 0usr/0sys/0real ms. Pass 5: starting run. systemtap starting probe ^Csystemtap ending probe foo = 108812 Pass 5: run completed in 0usr/10sys/2525real ms.
Re: BPF runtime for systemtap
brendan.d.gregg wrote: > [...] > Great! Is there a hello world example in there somewhere? I found this: > [...] Yup. Here is a smoke test. (A great many other things are not yet working.) % sudo ./stap -v --runtime=bpf -e 'global foo probe kprobe.function("vfs_read"), kprobe.function("do_select") { foo++ } probe begin { printf("systemtap starting probe\n") } probe end { printf("systemtap ending probe\n"); printf("foo = %d\n", foo) }' Pass 1: parsed user script and 35 library scripts using 198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/71real ms. Pass 2: analyzed script: 4 probes, 0 functions, 0 embeds, 1 global using 198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/0real ms. Pass 4: compiled BPF into "stap_32349.bo" in 0usr/0sys/0real ms. Pass 5: starting run. systemtap starting probe ^Csystemtap ending probe foo = 108812 Pass 5: run completed in 0usr/10sys/2525real ms.
systemtap 3.0 release
uot;nfs")}.function("nfs*")! => kernel.function("nfs*")!, module("nfs").function("nfs*")! - Profiling timers at arbitrary frequencies are now provided and perf probes now support a frequency field as an alternative to sampling counts. probe timer.profile.freq.hz(N) probe perf.type(N).config(M).hz(X) The specified frequency is only accurate up to around 100hz. You may need to provide a higher value to achieve the desired rate. - Added support for private global variables and private functions. The 'private' keyword limits these to the tapset file they are defined in. = SystemTap tapset changes ansi.stp Functions ansi_set_color{2,3} are replaced by overloaded ansi_set_color linux/[arm/]aux_syscalls.stp Support for arm kernels less than 3.7. linux/arm/[nd_]syscalls.stp Support for [nd_]syscall.execve for arm kernels less than 3.7. linux/aux_syscalls.stpNew _stp_mlock2_str function to convert mlock2 syscall flags to a string. linux/context.stp New module_size() function. linux/conversions.stp - New kernel_string_quoted_utf[16|32] functions combines @string_quoted and @kernel_string_utf* - kernel_string* functions with alternative error strings are replaced by overloaded variants linux/nd_syscalls.stp Add nd_syscall.mlock2 kprobe based probe point linux/perf.stpUpdate recent uapi/linux/perf_event.h bits linux/proc_mem.stpproc_mem_*_pid functions are replaced by overloaded proc_mem_* linux/syscalls.stpAdd syscall.mlock2 kernel function probe point linux/task.stpNew task_cwd_path and task_exe_file functions linux/task_time.stp task_{s,u}time_tid functions are replaced by overloaded task_{s,u}time linux/uconversions.stpuser_string functions with alternative warning strings are replaced by overloaded variants logging.stp New overloaded variant of assert, assert(expression) print_stats.stpm @prints* macros for printing stats timers.stpProbe point timer.profile.freq for profiling try_assign.stpm New @try_assign macro uconversions.stp Add user_string_quoted_utf[16|32] function that quotes a given UTF-[16|32] string from a given user address - Internal tapset functions and global variables are marked as private, where possible. - Some tapsets have been modified to make use of the new function overloading feature. Instead of having new function names with suffixes such as "2" or "pid" to indicate extra arguments, the functions now seem to have optional arguments. - New tapset function string_quoted() to quote and \-escape general strings. String $context variables that are pretty-printed are now processed with such a quotation engine, falling back to a 0x%x (hex pointer) on errors - Functions get_mmap_args() and get_32mmap_args() got deprecated. = SystemTap sample scripts - now at 156 samples! who_sent_it.stp Trace outgoing network packets using the netfilter probes, printing the source thread name/id and destination host:port - New to collection is a selection of security band aids for specific CVEs. They are historical emergency security band-aids, and are for reference/education only. The scripts can be found under the security-band-aids folder in the examples directory. - A number of samples were tweaked for portability and demonstration of newer language/tapset facilities. = Examples of tested kernel versions 2.6.18 (RHEL 5 x86 and x86_64) 2.6.32 (RHEL 6 x86 and x86_64) 3.10.0 (RHEL 7 x86_64) 4.1.6 (Fedora 22 x86_64) 4.3.4 (Fedora 22 x86_64) 4.6.0-rc0 (Fedora rawhide x86_64) = Known issues with this release - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - The dyninst backend is still very much a prototype, with a number of issues, limitations, and general teething woes. See dyninst/README and the systemtap/dyninst Bugzilla component ( http://tinyurl. com/stapdyn-PR-list ) if you want all the gory details about the state of the feature. - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, reducing debuginfo quality which can cause debuginfo failures. A proposed workaround to this issue exists in: https://lkml.org/lkml/2014/11/21/505 . Fedora kernels are not affected by this issue. = Contributors for this release Abegail Jakop, David S
systemtap 3.0 release
uot;nfs")}.function("nfs*")! => kernel.function("nfs*")!, module("nfs").function("nfs*")! - Profiling timers at arbitrary frequencies are now provided and perf probes now support a frequency field as an alternative to sampling counts. probe timer.profile.freq.hz(N) probe perf.type(N).config(M).hz(X) The specified frequency is only accurate up to around 100hz. You may need to provide a higher value to achieve the desired rate. - Added support for private global variables and private functions. The 'private' keyword limits these to the tapset file they are defined in. = SystemTap tapset changes ansi.stp Functions ansi_set_color{2,3} are replaced by overloaded ansi_set_color linux/[arm/]aux_syscalls.stp Support for arm kernels less than 3.7. linux/arm/[nd_]syscalls.stp Support for [nd_]syscall.execve for arm kernels less than 3.7. linux/aux_syscalls.stpNew _stp_mlock2_str function to convert mlock2 syscall flags to a string. linux/context.stp New module_size() function. linux/conversions.stp - New kernel_string_quoted_utf[16|32] functions combines @string_quoted and @kernel_string_utf* - kernel_string* functions with alternative error strings are replaced by overloaded variants linux/nd_syscalls.stp Add nd_syscall.mlock2 kprobe based probe point linux/perf.stpUpdate recent uapi/linux/perf_event.h bits linux/proc_mem.stpproc_mem_*_pid functions are replaced by overloaded proc_mem_* linux/syscalls.stpAdd syscall.mlock2 kernel function probe point linux/task.stpNew task_cwd_path and task_exe_file functions linux/task_time.stp task_{s,u}time_tid functions are replaced by overloaded task_{s,u}time linux/uconversions.stpuser_string functions with alternative warning strings are replaced by overloaded variants logging.stp New overloaded variant of assert, assert(expression) print_stats.stpm @prints* macros for printing stats timers.stpProbe point timer.profile.freq for profiling try_assign.stpm New @try_assign macro uconversions.stp Add user_string_quoted_utf[16|32] function that quotes a given UTF-[16|32] string from a given user address - Internal tapset functions and global variables are marked as private, where possible. - Some tapsets have been modified to make use of the new function overloading feature. Instead of having new function names with suffixes such as "2" or "pid" to indicate extra arguments, the functions now seem to have optional arguments. - New tapset function string_quoted() to quote and \-escape general strings. String $context variables that are pretty-printed are now processed with such a quotation engine, falling back to a 0x%x (hex pointer) on errors - Functions get_mmap_args() and get_32mmap_args() got deprecated. = SystemTap sample scripts - now at 156 samples! who_sent_it.stp Trace outgoing network packets using the netfilter probes, printing the source thread name/id and destination host:port - New to collection is a selection of security band aids for specific CVEs. They are historical emergency security band-aids, and are for reference/education only. The scripts can be found under the security-band-aids folder in the examples directory. - A number of samples were tweaked for portability and demonstration of newer language/tapset facilities. = Examples of tested kernel versions 2.6.18 (RHEL 5 x86 and x86_64) 2.6.32 (RHEL 6 x86 and x86_64) 3.10.0 (RHEL 7 x86_64) 4.1.6 (Fedora 22 x86_64) 4.3.4 (Fedora 22 x86_64) 4.6.0-rc0 (Fedora rawhide x86_64) = Known issues with this release - Some kernel crashes continue to be reported when a script probes broad kernel function wildcards. (PR2725) - The dyninst backend is still very much a prototype, with a number of issues, limitations, and general teething woes. See dyninst/README and the systemtap/dyninst Bugzilla component ( http://tinyurl. com/stapdyn-PR-list ) if you want all the gory details about the state of the feature. - An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments" into KCFLAGS, reducing debuginfo quality which can cause debuginfo failures. A proposed workaround to this issue exists in: https://lkml.org/lkml/2014/11/21/505 . Fedora kernels are not affected by this issue. = Contributors for this release Abegail Jakop, David S
systemtap 2.9 release
ing is truncated on older kernels, such as 2.6.32 (PR15757) - The dyninst backend is still very much a prototype, with a number of issues, limitations, and general teething woes. For instance: + lack of support for multiarch/cross-instrumentation + tapset functions are still incomplete relative to what is supported when the kernel backend is active + exception handling becomes completely broken in programs instrumented by the current version of dyninst (PR14702) + not all registers are made available on 32-bit x86 (PR15136) See dyninst/README and the systemtap/dyninst Bugzilla component (http://tinyurl.com/stapdyn-PR-list) if you want all the gory details about the state of the feature. = Contributors for this release Abegail Jakop, David Smith, Felix Lu, Frank Ch. Eigler, Ivan Diorditsa*, Jose Castillo*, Josh Stone, Lukas Berk, Mark Wielaard, Martin Cermak, Mikhail Kulemin*, Nicolas Brito* Snehal Phule* Special thanks to new contributors, marked with '*' above. Special thanks to Felix Lu for compiling these notes. = Bugs fixed for this release <https://sourceware.org/PR#> 909 perf counter events, perfmon? kernel API 2111 document syscalls tapset 10487 flight recorder control from script 10977 Getting the address size used in a module 11263 exposing foo32 syscalls 12151 support /* stable */ embedded-c pragma 13664 support dwarf types for stap variables 15972 core dump with process probes 16493 Improve bkl.stp to add backtrace 16968 bad formatting in many help pages for probes 17831 kprobes_onthefly.exp fails on powerpc 17893 el6: cannot stat `build/en-US/pdf/*SystemTap_Beginners_Guide*.pdf': No such file or directory 17920 File descriptor to pathname function 17921 kernel backtrace missing /proc/kallsyms symbols 18455 const_folder::visit_binary_expression hurting type inference 18462 macro deprecation 18503 procfs .maxsize() overflow should generate error 18555 ppc64le: can't probe demangled C++ function names 18562 the listing_mode.exp test case has lots of errors on systems without uprobes 18563 on ppc64, the mbrwatch.stp example script fails when tested 18571 Tapset support and test coverage for bpf and seccomp syscalls. 18577 on rhel7, listing_mode_sanity.exp always gets a failure when doing 'stap -l **' 18597 long_arg() doesn't correctly handle negative values in 32-on-64 environment 18598 stap_staticmarkers.stp tapset has no test case 18630 dwarfless parameters from a uprobe need test coverage 18634 on rawhide, using timer probes gets a compilation error 18649 int_arg() misbehaves on x86[_64] for 32-bit uprobe in binary having debuginfo 18650 powerpc variant of longlong_arg() for uprobes swaps the high and low half of its 64bit retval 18651 Possible nd_syscall tapset cleanup based on PR18597 fix 18711 Pass 4 failure on RHEL7 for examples netfilter_summary and netfilter_drop 18751 support a STAP_PRINTF() macro for use in embedded-C functions 18769 [ppc64BE/--dyninst] unknown operator @__compat_task 18827 consistency check for syscall and nd_syscall tapset 18856 nfsd.close probe alias fails on rawhide 18885 Use /* unmodified-fnargs */ in tapsets 18889 lost ability to probe kernel module initializers 18936 script cache will fail if $jiffies is referenced 18942 any script will include all the globals from tapset/argv.stp 18944 the ioblock.stp tapset fails to compile on RHEL7 18971 process_by_pid.exp issues 18999 error("") stall (causing similar assert() stall) 19000 several task tapset functions can cause kernel crash 19021 the tapset function task_dentry_path() should handle more than just files 19043 __bio_ino(), __rqstp_gid() and __rqstp_uid() can crash the kernel 19045 kernel_string_quoted() can crash the kernel 19057 _is_reset() can crash the rhel6 / s390 kernel 19065 task_fd_lookup() can crash the s390x kernel when invoked with an invalid input 19069 task_euid() doesn't compile on aarch64 19070 Call to __ustack_raw(0) causes 'Unknown symbol in module' on rhel6- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
systemtap 2.9 release
ing is truncated on older kernels, such as 2.6.32 (PR15757) - The dyninst backend is still very much a prototype, with a number of issues, limitations, and general teething woes. For instance: + lack of support for multiarch/cross-instrumentation + tapset functions are still incomplete relative to what is supported when the kernel backend is active + exception handling becomes completely broken in programs instrumented by the current version of dyninst (PR14702) + not all registers are made available on 32-bit x86 (PR15136) See dyninst/README and the systemtap/dyninst Bugzilla component (http://tinyurl.com/stapdyn-PR-list) if you want all the gory details about the state of the feature. = Contributors for this release Abegail Jakop, David Smith, Felix Lu, Frank Ch. Eigler, Ivan Diorditsa*, Jose Castillo*, Josh Stone, Lukas Berk, Mark Wielaard, Martin Cermak, Mikhail Kulemin*, Nicolas Brito* Snehal Phule* Special thanks to new contributors, marked with '*' above. Special thanks to Felix Lu for compiling these notes. = Bugs fixed for this release <https://sourceware.org/PR#> 909 perf counter events, perfmon? kernel API 2111 document syscalls tapset 10487 flight recorder control from script 10977 Getting the address size used in a module 11263 exposing foo32 syscalls 12151 support /* stable */ embedded-c pragma 13664 support dwarf types for stap variables 15972 core dump with process probes 16493 Improve bkl.stp to add backtrace 16968 bad formatting in many help pages for probes 17831 kprobes_onthefly.exp fails on powerpc 17893 el6: cannot stat `build/en-US/pdf/*SystemTap_Beginners_Guide*.pdf': No such file or directory 17920 File descriptor to pathname function 17921 kernel backtrace missing /proc/kallsyms symbols 18455 const_folder::visit_binary_expression hurting type inference 18462 macro deprecation 18503 procfs .maxsize() overflow should generate error 18555 ppc64le: can't probe demangled C++ function names 18562 the listing_mode.exp test case has lots of errors on systems without uprobes 18563 on ppc64, the mbrwatch.stp example script fails when tested 18571 Tapset support and test coverage for bpf and seccomp syscalls. 18577 on rhel7, listing_mode_sanity.exp always gets a failure when doing 'stap -l **' 18597 long_arg() doesn't correctly handle negative values in 32-on-64 environment 18598 stap_staticmarkers.stp tapset has no test case 18630 dwarfless parameters from a uprobe need test coverage 18634 on rawhide, using timer probes gets a compilation error 18649 int_arg() misbehaves on x86[_64] for 32-bit uprobe in binary having debuginfo 18650 powerpc variant of longlong_arg() for uprobes swaps the high and low half of its 64bit retval 18651 Possible nd_syscall tapset cleanup based on PR18597 fix 18711 Pass 4 failure on RHEL7 for examples netfilter_summary and netfilter_drop 18751 support a STAP_PRINTF() macro for use in embedded-C functions 18769 [ppc64BE/--dyninst] unknown operator @__compat_task 18827 consistency check for syscall and nd_syscall tapset 18856 nfsd.close probe alias fails on rawhide 18885 Use /* unmodified-fnargs */ in tapsets 18889 lost ability to probe kernel module initializers 18936 script cache will fail if $jiffies is referenced 18942 any script will include all the globals from tapset/argv.stp 18944 the ioblock.stp tapset fails to compile on RHEL7 18971 process_by_pid.exp issues 18999 error("") stall (causing similar assert() stall) 19000 several task tapset functions can cause kernel crash 19021 the tapset function task_dentry_path() should handle more than just files 19043 __bio_ino(), __rqstp_gid() and __rqstp_uid() can crash the kernel 19045 kernel_string_quoted() can crash the kernel 19057 _is_reset() can crash the rhel6 / s390 kernel 19065 task_fd_lookup() can crash the s390x kernel when invoked with an invalid input 19069 task_euid() doesn't compile on aarch64 19070 Call to __ustack_raw(0) causes 'Unknown symbol in module' on rhel6- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: timing of module MODULE_STATE_COMING notifier
Hi, Rusty - I wrote: > [...] > > Notifiers suck for stuff like this :( Module state has many steps, > > so my preference has been to open-code explicit hooks. [...] > > You mean something like the trace_module_load()? (We will probably > experiment with hooking into that tracepoint instead of the notifier.) > [...] It turns out this works OK, except for EXPORT_TRACEPOINT_SYMBOL_GPL. Could we get a set of EXPORT_TRACEPOINT_SYMBOL_GPL's for the trace/events/module.h tracepoints (at least module_load and module_free)? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: timing of module MODULE_STATE_COMING notifier
Hi, Rusty - Thanks for your response! > [...] > > That patch also moved the MODULE_STATE_COMING notifier call to > > complete_formation(), which is relatively early to its former > > do_init_module() call site. It now precedes the parse_args(), > > mod_sysfs_setup(), and trace_module_load() steps. > > Yes, parse_args() can enter the module, so you really want it before > then. Understood. (Perhaps mod_sysfs_setup() could sneak in ahead.) > > Was the latter part of the change intended & necessary? It is > > negatively impacting systemtap, which was relying on > > MODULE_STATE_COMING being called from a fairly complete module > > state - just before the actual initializer function call. > Notifiers suck for stuff like this :( Module state has many steps, > so my preference has been to open-code explicit hooks. [...] You mean something like the trace_module_load()? (We will probably experiment with hooking into that tracepoint instead of the notifier.) A more hard-coded one with an in-kernel callee probably wouldn't help module-resident clients like us. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: timing of module MODULE_STATE_COMING notifier
Hi, Rusty - I wrote: > [...] > > Notifiers suck for stuff like this :( Module state has many steps, > > so my preference has been to open-code explicit hooks. [...] > > You mean something like the trace_module_load()? (We will probably > experiment with hooking into that tracepoint instead of the notifier.) > [...] It turns out this works OK, except for EXPORT_TRACEPOINT_SYMBOL_GPL. Could we get a set of EXPORT_TRACEPOINT_SYMBOL_GPL's for the trace/events/module.h tracepoints (at least module_load and module_free)? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: timing of module MODULE_STATE_COMING notifier
Hi, Rusty - Thanks for your response! > [...] > > That patch also moved the MODULE_STATE_COMING notifier call to > > complete_formation(), which is relatively early to its former > > do_init_module() call site. It now precedes the parse_args(), > > mod_sysfs_setup(), and trace_module_load() steps. > > Yes, parse_args() can enter the module, so you really want it before > then. Understood. (Perhaps mod_sysfs_setup() could sneak in ahead.) > > Was the latter part of the change intended & necessary? It is > > negatively impacting systemtap, which was relying on > > MODULE_STATE_COMING being called from a fairly complete module > > state - just before the actual initializer function call. > Notifiers suck for stuff like this :( Module state has many steps, > so my preference has been to open-code explicit hooks. [...] You mean something like the trace_module_load()? (We will probably experiment with hooking into that tracepoint instead of the notifier.) A more hard-coded one with an in-kernel callee probably wouldn't help module-resident clients like us. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
timing of module MODULE_STATE_COMING notifier
Hi, Rusty - We just [1] came across your patch [2] from last year (merged into 3.17), wherein the RO/NX mapping settings for module sections were moved to an earlier point in the module-loading sequence. That patch also moved the MODULE_STATE_COMING notifier call to complete_formation(), which is relatively early to its former do_init_module() call site. It now precedes the parse_args(), mod_sysfs_setup(), and trace_module_load() steps. Was the latter part of the change intended & necessary? It is negatively impacting systemtap, which was relying on MODULE_STATE_COMING being called from a fairly complete module state - just before the actual initializer function call. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=18889 [2] commit 4982223e51e8ea9d09bb33c8323b5ec1877b2b51 Author: Rusty Russell Date: Wed May 14 10:54:19 2014 +0930 - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
timing of module MODULE_STATE_COMING notifier
Hi, Rusty - We just [1] came across your patch [2] from last year (merged into 3.17), wherein the RO/NX mapping settings for module sections were moved to an earlier point in the module-loading sequence. That patch also moved the MODULE_STATE_COMING notifier call to complete_formation(), which is relatively early to its former do_init_module() call site. It now precedes the parse_args(), mod_sysfs_setup(), and trace_module_load() steps. Was the latter part of the change intended necessary? It is negatively impacting systemtap, which was relying on MODULE_STATE_COMING being called from a fairly complete module state - just before the actual initializer function call. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=18889 [2] commit 4982223e51e8ea9d09bb33c8323b5ec1877b2b51 Author: Rusty Russell ru...@rustcorp.com.au Date: Wed May 14 10:54:19 2014 +0930 - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/debug: Remove perpetually broken, unmaintainable dwarf annotations
Hi - On Fri, May 29, 2015 at 03:27:16PM -0500, Josh Poimboeuf wrote: > [...] > > > Also, with the feature missing completely, maybe someone finds a method to > > > introduce it in a maintainable fashion, while with the feature included > > > upstream > > > there's very little pressure to do that. As a bonus we'd also win a > > > workable dwarf > > > unwinder. > > > > Before doing something drastic like this, I think we should get Josh's > > opinion, since I think he's working on a new (?) unwinder. > > I'd definitely like to replace all the asm DWARF CFI annotations with > something more automated and robust. So it doesn't really affect me > whether they're ripped out now or replaced later. > [...] > Then again, I'm not sure how useful or reliable the existing annotations > are anyway, so maybe it doesn't matter much. In our experience as consumers of this CFI information for years in systemtap, the annotations have been generally correct and reliable. Their presence allows reliable, correct, and efficient kernel->userspace backtracing as used in important systemtap scripts. If the current complaint is primarily about testability, it would be easy to add simple stap-based tests to the kernel to exercise the code and confirm its operation. Perhaps we could extract a specialized self-contained test case (containing an unwinder). I'm not in a position to judge the purported cost savings of removing this code, but there is definitely a negative benefit as a loss of useful functionality, esp. with no replacement in sight. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86/debug: Remove perpetually broken, unmaintainable dwarf annotations
Hi - On Fri, May 29, 2015 at 03:27:16PM -0500, Josh Poimboeuf wrote: [...] Also, with the feature missing completely, maybe someone finds a method to introduce it in a maintainable fashion, while with the feature included upstream there's very little pressure to do that. As a bonus we'd also win a workable dwarf unwinder. Before doing something drastic like this, I think we should get Josh's opinion, since I think he's working on a new (?) unwinder. I'd definitely like to replace all the asm DWARF CFI annotations with something more automated and robust. So it doesn't really affect me whether they're ripped out now or replaced later. [...] Then again, I'm not sure how useful or reliable the existing annotations are anyway, so maybe it doesn't matter much. In our experience as consumers of this CFI information for years in systemtap, the annotations have been generally correct and reliable. Their presence allows reliable, correct, and efficient kernel-userspace backtracing as used in important systemtap scripts. If the current complaint is primarily about testability, it would be easy to add simple stap-based tests to the kernel to exercise the code and confirm its operation. Perhaps we could extract a specialized self-contained test case (containing an unwinder). I'm not in a position to judge the purported cost savings of removing this code, but there is definitely a negative benefit as a loss of useful functionality, esp. with no replacement in sight. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kbuild: Add an option to enable GCC VTA
Hi, Josh - On Fri, Apr 24, 2015 at 08:40:02AM -0400, Josh Boyer wrote: > [...] > Frank, did you rebase this against some newer tree or something? Yes; the lib/Kconfig.debug part didn't apply to current git. > Curious why you sent it again. At least as a patch-ping; the poor-debuginfo problems are reported to affect non-fedora users too. > > +ifdef CONFIG_DEBUG_INFO_VTA > > +KBUILD_CFLAGS += $(call cc-option, -fvar-tracking-assignments) > > +else > > +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) > > +endif > > + > > Is there a reason you moved this hunk under the DWARF4 options instead > of modifying it in-place like the original patch did? Yes, this version appears a little safer, in the sense that without CONFIG_DEBUG_INFO, neither setting of CONFIG_DEBUG_INFO_VTA would affect the CFLAGS. (In fact, Jakub advises the positive polarity -fvar-tracking-assignments is redundant with -g, and the negative polarity one only provides codegen-bug-protection in the CONFIG_DEBUG_INFO case.) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Kbuild: Add an option to enable GCC VTA
Hi, Josh - On Fri, Apr 24, 2015 at 08:40:02AM -0400, Josh Boyer wrote: [...] Frank, did you rebase this against some newer tree or something? Yes; the lib/Kconfig.debug part didn't apply to current git. Curious why you sent it again. At least as a patch-ping; the poor-debuginfo problems are reported to affect non-fedora users too. +ifdef CONFIG_DEBUG_INFO_VTA +KBUILD_CFLAGS += $(call cc-option, -fvar-tracking-assignments) +else +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) +endif + Is there a reason you moved this hunk under the DWARF4 options instead of modifying it in-place like the original patch did? Yes, this version appears a little safer, in the sense that without CONFIG_DEBUG_INFO, neither setting of CONFIG_DEBUG_INFO_VTA would affect the CFLAGS. (In fact, Jakub advises the positive polarity -fvar-tracking-assignments is redundant with -g, and the negative polarity one only provides codegen-bug-protection in the CONFIG_DEBUG_INFO case.) - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Kbuild: Add an option to enable GCC VTA
From: Josh Stone Due to isolated gcc codegen issues, gcc -fvar-tracking-assignments was unconditionally disabled in commit 2062afb4f804 ("Fix gcc-4.9.0 miscompilation of load_balance() in scheduler"). However, this reduces the debuginfo coverage for variable locations, especially in inline functions. VTA is certainly not perfect either in those cases, but it is much better than without. With compiler versions that have fixed the codegen bugs, we would prefer to have the better details for SystemTap, and surely other debuginfo consumers like perf will benefit as well. This patch simply makes CONFIG_DEBUG_INFO_VTA an option. I considered Frank and Linus's discussion of a cc-option-like -fcompare-debug test, but I'm convinced that a narrow test of an arch-specific codegen issue is not really useful. GCC has their own regression tests for this, so I'd suggest GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle is more useful for kernel developers to test confidence. In fact, I ran into a couple more issues when testing for this patch[1], although neither of those had any codegen impact. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1140872 With gcc-4.9.2-1.fc22, I can now build v3.18-rc5 with Fedora's i686 and x86_64 configs, and this is completely clean with GCC_COMPARE_DEBUG. Cc: Jakub Jelinek Cc: Josh Boyer Cc: Greg Kroah-Hartman Cc: Linus Torvalds Cc: Andrew Morton Cc: Markus Trippelsdorf Cc: Michel Dänzer Signed-off-by: Josh Stone Signed-off-by: Frank Ch. Eigler --- Makefile | 8 ++-- lib/Kconfig.debug | 21 - 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 6cc5b2434224..c8e1fcfdb41a 100644 --- a/Makefile +++ b/Makefile @@ -704,8 +704,6 @@ KBUILD_CFLAGS += -fomit-frame-pointer endif endif -KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) - ifdef CONFIG_DEBUG_INFO ifdef CONFIG_DEBUG_INFO_SPLIT KBUILD_CFLAGS += $(call cc-option, -gsplit-dwarf, -g) @@ -718,6 +716,12 @@ ifdef CONFIG_DEBUG_INFO_DWARF4 KBUILD_CFLAGS += $(call cc-option, -gdwarf-4,) endif +ifdef CONFIG_DEBUG_INFO_VTA +KBUILD_CFLAGS += $(call cc-option, -fvar-tracking-assignments) +else +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) +endif + ifdef CONFIG_DEBUG_INFO_REDUCED KBUILD_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \ $(call cc-option,-fno-var-tracking) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 17670573dda8..e8d072d2b402 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -165,7 +165,26 @@ config DEBUG_INFO_DWARF4 Generate dwarf4 debug info. This requires recent versions of gcc and gdb. It makes the debug information larger. But it significantly improves the success of resolving - variables in gdb on optimized code. + variables in gdb on optimized code. The gcc docs also + recommend enabling -fvar-tracking-assignments for maximum + benefit. (see DEBUG_INFO_VTA) + +config DEBUG_INFO_VTA + bool "Enable var-tracking-assignments for debuginfo" + depends on DEBUG_INFO + help + Enable gcc -fvar-tracking-assignments for improved debug + information on variable locations in optimized code. Per + gcc, DEBUG_INFO_DWARF4 is recommended for best use of VTA, + and allows maximal access to local variables in tracers + and debuggers like perf, systemtap, kgdb, and crash. + + VTA has been implicated in codegen bugs (gcc PR61801, + PR61904, both fixed in 2014-08), so this flag may be used + to exclude this rare class of problem. One can also set + GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle in the + environment to automatically compile everything both ways, + generating an error if anything differs. config GDB_SCRIPTS bool "Provide GDB scripts for kernel debugging" -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] Kbuild: Add an option to enable GCC VTA
From: Josh Stone jist...@redhat.com Due to isolated gcc codegen issues, gcc -fvar-tracking-assignments was unconditionally disabled in commit 2062afb4f804 (Fix gcc-4.9.0 miscompilation of load_balance() in scheduler). However, this reduces the debuginfo coverage for variable locations, especially in inline functions. VTA is certainly not perfect either in those cases, but it is much better than without. With compiler versions that have fixed the codegen bugs, we would prefer to have the better details for SystemTap, and surely other debuginfo consumers like perf will benefit as well. This patch simply makes CONFIG_DEBUG_INFO_VTA an option. I considered Frank and Linus's discussion of a cc-option-like -fcompare-debug test, but I'm convinced that a narrow test of an arch-specific codegen issue is not really useful. GCC has their own regression tests for this, so I'd suggest GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle is more useful for kernel developers to test confidence. In fact, I ran into a couple more issues when testing for this patch[1], although neither of those had any codegen impact. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1140872 With gcc-4.9.2-1.fc22, I can now build v3.18-rc5 with Fedora's i686 and x86_64 configs, and this is completely clean with GCC_COMPARE_DEBUG. Cc: Jakub Jelinek ja...@redhat.com Cc: Josh Boyer jwbo...@fedoraproject.org Cc: Greg Kroah-Hartman gre...@linuxfoundation.org Cc: Linus Torvalds torva...@linux-foundation.org Cc: Andrew Morton a...@linux-foundation.org Cc: Markus Trippelsdorf mar...@trippelsdorf.de Cc: Michel Dänzer mic...@daenzer.net Signed-off-by: Josh Stone jist...@redhat.com Signed-off-by: Frank Ch. Eigler f...@redhat.com --- Makefile | 8 ++-- lib/Kconfig.debug | 21 - 2 files changed, 26 insertions(+), 3 deletions(-) diff --git a/Makefile b/Makefile index 6cc5b2434224..c8e1fcfdb41a 100644 --- a/Makefile +++ b/Makefile @@ -704,8 +704,6 @@ KBUILD_CFLAGS += -fomit-frame-pointer endif endif -KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) - ifdef CONFIG_DEBUG_INFO ifdef CONFIG_DEBUG_INFO_SPLIT KBUILD_CFLAGS += $(call cc-option, -gsplit-dwarf, -g) @@ -718,6 +716,12 @@ ifdef CONFIG_DEBUG_INFO_DWARF4 KBUILD_CFLAGS += $(call cc-option, -gdwarf-4,) endif +ifdef CONFIG_DEBUG_INFO_VTA +KBUILD_CFLAGS += $(call cc-option, -fvar-tracking-assignments) +else +KBUILD_CFLAGS += $(call cc-option, -fno-var-tracking-assignments) +endif + ifdef CONFIG_DEBUG_INFO_REDUCED KBUILD_CFLAGS += $(call cc-option, -femit-struct-debug-baseonly) \ $(call cc-option,-fno-var-tracking) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 17670573dda8..e8d072d2b402 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -165,7 +165,26 @@ config DEBUG_INFO_DWARF4 Generate dwarf4 debug info. This requires recent versions of gcc and gdb. It makes the debug information larger. But it significantly improves the success of resolving - variables in gdb on optimized code. + variables in gdb on optimized code. The gcc docs also + recommend enabling -fvar-tracking-assignments for maximum + benefit. (see DEBUG_INFO_VTA) + +config DEBUG_INFO_VTA + bool Enable var-tracking-assignments for debuginfo + depends on DEBUG_INFO + help + Enable gcc -fvar-tracking-assignments for improved debug + information on variable locations in optimized code. Per + gcc, DEBUG_INFO_DWARF4 is recommended for best use of VTA, + and allows maximal access to local variables in tracers + and debuggers like perf, systemtap, kgdb, and crash. + + VTA has been implicated in codegen bugs (gcc PR61801, + PR61904, both fixed in 2014-08), so this flag may be used + to exclude this rare class of problem. One can also set + GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle in the + environment to automatically compile everything both ways, + generating an error if anything differs. config GDB_SCRIPTS bool Provide GDB scripts for kernel debugging -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: > > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes > > and to extract variables at those probes, much as systemtap does. > > Without var-tracking, probes placed at most interior points of > > functions will make variables inaccessible. > > .. and as mentioned, -O2 already does that for many things, even > *with* tracking. The whole point of variable tracking was to make -O2 usable (though still imperfect) for those who use debuggers and such tools. > [...] I don't understand how you guys can be so cavalier about a > compiler bug that has already resulted in actual real problems. No one is minimizing the problem. We are looking for a knob for those who know that their compiler does not have that bug. (Plus, those who don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad compiler.) > You bring up theoretical cases that nobody has actually reported > [...] I assure you that the years of effort that went into gcc variable tracking was justified with actual reports. > Do you compile without -O2 too? Because I *guarantee* you that with > -O2 (even with tracking), you'll get "local variable 'xyz' optimized > away" cases. One gets many fewer than without it, and also fewer false positives (where the non-var-tracking debuginfo claims a variable may be available, but points to the wrong place). > [...] Until you can get the compiler people to have some sane way > to know the problem is gone, I'm not going to maintain a kernel that > uses a known-broken compiler feature. It's that simple. Would you consider a patch that does a gcc COMPARE_DEBUG-based test? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - > >>. I don't disagree it should be > >> disabled by default, but making it unconditional is going to force the > >> distributions that care about perf, systemtap, and debuggers to > >> manually revert this. > > > > Bah. I bet I use 'perf' more than most, and it doesn't care about > > debug info. Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes and to extract variables at those probes, much as systemtap does. Without var-tracking, probes placed at most interior points of functions will make variables inaccessible. Do you need a fully worked out example to see this? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - . I don't disagree it should be disabled by default, but making it unconditional is going to force the distributions that care about perf, systemtap, and debuggers to manually revert this. Bah. I bet I use 'perf' more than most, and it doesn't care about debug info. Actually, perf probe does (via HAVE_DWARF_SUPPORT), to place probes and to extract variables at those probes, much as systemtap does. Without var-tracking, probes placed at most interior points of functions will make variables inaccessible. Do you need a fully worked out example to see this? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler
Hi - On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote: Actually, perf probe does (via HAVE_DWARF_SUPPORT), to place probes and to extract variables at those probes, much as systemtap does. Without var-tracking, probes placed at most interior points of functions will make variables inaccessible. .. and as mentioned, -O2 already does that for many things, even *with* tracking. The whole point of variable tracking was to make -O2 usable (though still imperfect) for those who use debuggers and such tools. [...] I don't understand how you guys can be so cavalier about a compiler bug that has already resulted in actual real problems. No one is minimizing the problem. We are looking for a knob for those who know that their compiler does not have that bug. (Plus, those who don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad compiler.) You bring up theoretical cases that nobody has actually reported [...] I assure you that the years of effort that went into gcc variable tracking was justified with actual reports. Do you compile without -O2 too? Because I *guarantee* you that with -O2 (even with tracking), you'll get local variable 'xyz' optimized away cases. One gets many fewer than without it, and also fewer false positives (where the non-var-tracking debuginfo claims a variable may be available, but points to the wrong place). [...] Until you can get the compiler people to have some sane way to know the problem is gone, I'm not going to maintain a kernel that uses a known-broken compiler feature. It's that simple. Would you consider a patch that does a gcc COMPARE_DEBUG-based test? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C
Hi, Alexei - > My understanding of systemtap is that the whole .stp script is converted > to C, compiled as .ko and loaded, so all map walking and prints are > happening in the kernel. Similarly for ktap which has special functions > in kernel to print histograms. That is correct. > I thought dtrace printf are also happening from the kernel. What is > the trick they use to know which pieces of dtrace script should be > run in user space? It appears as though the bytecode language running in the kernel sends some action commands back out to userspace, not just plain data. > In ebpf examples there are two C files: one for kernel with ebpf isa > and one for userspace as native. I thought about combining them, > but couldn't figure out a clean way of doing it. (#if ?) > > What kind of locking/serialization is provided by the ebpf runtime > > over shared variables such as my_map? > > it's traditional rcu scheme. OK, that protects the table structure, but: > [...] In such case concurrent write access to map value can be done > with bpf_xadd instruction, though using normal read/write is also > allowed. In some cases the speed of racy var++ is preferred over > 'lock xadd'. ... so concurrency control over shared values is left up to the programmer. > There are no lock/unlock function helpers available to ebpf > programs, since program may terminate early with div by zero > for example, so in-kernel lock helper implementation would > be complicated and slow. It's possible to do, but for the use > cases so far there is no need. OK, I hope that works out. I've been told that dtrace does something similiar (!) by eschewing protection on global variables such as strings. In their case it's less bad than it sounds because they are used to offloading computation to userspace or to store only thread-local state, and accept the corollary limitations on control. (Systemtap does fully & automatically protect shared variables, even in the face of run-time script errors.) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C
ast wrote earlier: > [...] > dtrace/systemtap/ktap approach is to use one script file that should provide > all desired functionality. That architectural decision overcomplicated their > implementations. > > eBPF follows split model: everything that needs to process millions of events > per second needs to run in kernel and needs to be short and deterministic, > all other things like aggregation and nice graphs should run in user space. > [...] For the record, this is not entirely accurate as to dtrace. dtrace delegates aggregation and most reporting to userspace. Also, systemtap is "short and deterministic" even for aggregations & nice graphs, but since it limits its storage & cpu consumption, its arrays/reports cannot get super large. > [...] > +SEC("events/skb/kfree_skb") > +int bpf_prog2(struct bpf_context *ctx) > +{ > +[...] > + value = bpf_map_lookup_elem(_map, ); > + if (value) > + (*(long *) value) += 1; > + else > + bpf_map_update_elem(_map, , _val); > + return 0; > +} What kind of locking/serialization is provided by the ebpf runtime over shared variables such as my_map? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C
ast wrote earlier: [...] dtrace/systemtap/ktap approach is to use one script file that should provide all desired functionality. That architectural decision overcomplicated their implementations. eBPF follows split model: everything that needs to process millions of events per second needs to run in kernel and needs to be short and deterministic, all other things like aggregation and nice graphs should run in user space. [...] For the record, this is not entirely accurate as to dtrace. dtrace delegates aggregation and most reporting to userspace. Also, systemtap is short and deterministic even for aggregations nice graphs, but since it limits its storage cpu consumption, its arrays/reports cannot get super large. [...] +SEC(events/skb/kfree_skb) +int bpf_prog2(struct bpf_context *ctx) +{ +[...] + value = bpf_map_lookup_elem(my_map, loc); + if (value) + (*(long *) value) += 1; + else + bpf_map_update_elem(my_map, loc, init_val); + return 0; +} What kind of locking/serialization is provided by the ebpf runtime over shared variables such as my_map? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C
Hi, Alexei - My understanding of systemtap is that the whole .stp script is converted to C, compiled as .ko and loaded, so all map walking and prints are happening in the kernel. Similarly for ktap which has special functions in kernel to print histograms. That is correct. I thought dtrace printf are also happening from the kernel. What is the trick they use to know which pieces of dtrace script should be run in user space? It appears as though the bytecode language running in the kernel sends some action commands back out to userspace, not just plain data. In ebpf examples there are two C files: one for kernel with ebpf isa and one for userspace as native. I thought about combining them, but couldn't figure out a clean way of doing it. (#if ?) What kind of locking/serialization is provided by the ebpf runtime over shared variables such as my_map? it's traditional rcu scheme. OK, that protects the table structure, but: [...] In such case concurrent write access to map value can be done with bpf_xadd instruction, though using normal read/write is also allowed. In some cases the speed of racy var++ is preferred over 'lock xadd'. ... so concurrency control over shared values is left up to the programmer. There are no lock/unlock function helpers available to ebpf programs, since program may terminate early with div by zero for example, so in-kernel lock helper implementation would be complicated and slow. It's possible to do, but for the use cases so far there is no need. OK, I hope that works out. I've been told that dtrace does something similiar (!) by eschewing protection on global variables such as strings. In their case it's less bad than it sounds because they are used to offloading computation to userspace or to store only thread-local state, and accept the corollary limitations on control. (Systemtap does fully automatically protect shared variables, even in the face of run-time script errors.) - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
Hi - On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote: > [...] > I thought Markus told us that -fno-var-tracking-assignments makes > absolutely no difference for non-debug kernels? It does affect CONFIG_DEBUG_INFO kernels, and that config option is set for all Red Hat kernels (-debug or plain). > [...] Is there some equivalent signalling system that gcc could use > [...] I'm not aware of anything trivial like a gcc --report-fixed-PRs kind of thing. But, kbuild could conceivably have a run-time test involving test-running gcc with in that compare-debug mode with a suitable test case. We use the latter technique in systemtap for auto-configuring to kernel versions/features; we got the $(CHECK_BUILD) trick from vmware module makefiles. It could be recast as a variant of $(cc-option ...). - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
torvalds wrote: > [...] > Actually, I prefer my patch that did it with cc-option checking, and > does it unconditionally. > > Because if we do it even for non-debug builds - where it ostensibly > shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working > regardless of configuration. Please note that the data produced by "-g -fvar-tracking" is consumed by tools like systemtap, perf, crash, and makes a significant difference to the observability of debug AND non-debug kernels. (The presence of compiled-in DEBUG_* self-checking code is orthogonal to kernel observability via debuginfo.) Please consider only disabling var-tracking optionally/temporarily to work around this already-fixed compiler bug, but not losing high-quality dwarf data permanently. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
torvalds wrote: [...] Actually, I prefer my patch that did it with cc-option checking, and does it unconditionally. Because if we do it even for non-debug builds - where it ostensibly shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working regardless of configuration. Please note that the data produced by -g -fvar-tracking is consumed by tools like systemtap, perf, crash, and makes a significant difference to the observability of debug AND non-debug kernels. (The presence of compiled-in DEBUG_* self-checking code is orthogonal to kernel observability via debuginfo.) Please consider only disabling var-tracking optionally/temporarily to work around this already-fixed compiler bug, but not losing high-quality dwarf data permanently. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Random panic in load_balance() with 3.16-rc
Hi - On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote: [...] I thought Markus told us that -fno-var-tracking-assignments makes absolutely no difference for non-debug kernels? It does affect CONFIG_DEBUG_INFO kernels, and that config option is set for all Red Hat kernels (-debug or plain). [...] Is there some equivalent signalling system that gcc could use [...] I'm not aware of anything trivial like a gcc --report-fixed-PRs kind of thing. But, kbuild could conceivably have a run-time test involving test-running gcc with in that compare-debug mode with a suitable test case. We use the latter technique in systemtap for auto-configuring to kernel versions/features; we got the $(CHECK_BUILD) trick from vmware module makefiles. It could be recast as a variant of $(cc-option ...). - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] Tracepoint: register/unregister struct tracepoint
Hi - On Thu, Mar 13, 2014 at 12:10:48PM -0400, Mathieu Desnoyers wrote: > [...] Moreover, tracers are responsible for unregistering the probe > before the module containing its associated tracepoint is unloaded. Could you spell out please how a tracer is supposed to know early enough that the module is going to be unloaded? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH v2] Tracepoint: register/unregister struct tracepoint
Hi - On Thu, Mar 13, 2014 at 12:10:48PM -0400, Mathieu Desnoyers wrote: [...] Moreover, tracers are responsible for unregistering the probe before the module containing its associated tracepoint is unloaded. Could you spell out please how a tracer is supposed to know early enough that the module is going to be unloaded? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs
Hi, Steven - > > So it is a deferred-activation kind of call, with no way of knowing > > when or if the tracepoints will start coming in. Why is that > > supported at all, considering that clients could monitor modules > > coming & going via the module_notifier chain, and update registration > > at that time? > > That's my argument. Was there an answer? > > >> +entry = get_tracepoint(name); > > >> +/* Make sure the entry was enabled */ > > >> +if (!entry || !entry->enabled) > > >> +ret = -ENODEV; > > > > For what it's worth, I agree with Mathieu that this sort of successful > > failure result code is a problem for tracking what needs cleanup and > > what doesn't. (In systemtap's case, if this change gets merged, we'll > > have to treat -ENODEV as if it were 0.) > > Does systemtap enable tracepoints before they are created? That is, do > you allow enabling of a tracepoint in a module that is not loaded yet? We have no formal opinion on whether or not this makes sense. If the kernel permits it, fine. > If not, than you want this as an error. But it's not exactly an error! It's a success of sorts, and means that later on we have to unregister the callback, just as if it were successful. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs
Hi, Steven - So it is a deferred-activation kind of call, with no way of knowing when or if the tracepoints will start coming in. Why is that supported at all, considering that clients could monitor modules coming going via the module_notifier chain, and update registration at that time? That's my argument. Was there an answer? +entry = get_tracepoint(name); +/* Make sure the entry was enabled */ +if (!entry || !entry-enabled) +ret = -ENODEV; For what it's worth, I agree with Mathieu that this sort of successful failure result code is a problem for tracking what needs cleanup and what doesn't. (In systemtap's case, if this change gets merged, we'll have to treat -ENODEV as if it were 0.) Does systemtap enable tracepoints before they are created? That is, do you allow enabling of a tracepoint in a module that is not loaded yet? We have no formal opinion on whether or not this makes sense. If the kernel permits it, fine. If not, than you want this as an error. But it's not exactly an error! It's a success of sorts, and means that later on we have to unregister the callback, just as if it were successful. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs
Hi - >> From: Steven Rostedt >> >> Tracepoints were made to allow enabling a tracepoint in a module before that >> module was loaded. When a tracepoint is enabled and it does not exist, the >> name is stored and will be enabled when the tracepoint is created. >> >> The problem with this approach is that when a tracepoint is enabled when >> it expects to be there, it gives no warning that it does not exist. So it is a deferred-activation kind of call, with no way of knowing when or if the tracepoints will start coming in. Why is that supported at all, considering that clients could monitor modules coming & going via the module_notifier chain, and update registration at that time? >> +entry = get_tracepoint(name); >> +/* Make sure the entry was enabled */ >> +if (!entry || !entry->enabled) >> +ret = -ENODEV; For what it's worth, I agree with Mathieu that this sort of successful failure result code is a problem for tracking what needs cleanup and what doesn't. (In systemtap's case, if this change gets merged, we'll have to treat -ENODEV as if it were 0.) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs
Hi - From: Steven Rostedt rost...@goodmis.org Tracepoints were made to allow enabling a tracepoint in a module before that module was loaded. When a tracepoint is enabled and it does not exist, the name is stored and will be enabled when the tracepoint is created. The problem with this approach is that when a tracepoint is enabled when it expects to be there, it gives no warning that it does not exist. So it is a deferred-activation kind of call, with no way of knowing when or if the tracepoints will start coming in. Why is that supported at all, considering that clients could monitor modules coming going via the module_notifier chain, and update registration at that time? +entry = get_tracepoint(name); +/* Make sure the entry was enabled */ +if (!entry || !entry-enabled) +ret = -ENODEV; For what it's worth, I agree with Mathieu that this sort of successful failure result code is a problem for tracking what needs cleanup and what doesn't. (In systemtap's case, if this change gets merged, we'll have to treat -ENODEV as if it were 0.) - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
rostedt wrote: > [...] > Oh! You are saying that if the kernel only *supports* signed modules, > and you load a module that is not signed, it will taint the kernel? Yes: this is the default for several distros. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
rostedt wrote: [...] Oh! You are saying that if the kernel only *supports* signed modules, and you load a module that is not signed, it will taint the kernel? Yes: this is the default for several distros. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi - > > So the similar thing happens when we enables events as below; > > > > # for i in /sys/kernel/debug/tracing/events/kprobes/* ; do date; echo 1 > > > $i; done > > Wed Jan 29 10:44:50 UTC 2014 > > ... > > > > I tried it and canceled after 4 min passed. It enabled about 17k > > events and slowed down my system very much(I almost got hang check > > timer). > > Ok, I guess that's the slowdown bug that Frank reported. It could be, but it feels a bit different. In my testing from December, it's as though it wasn't the activated probes *hitting* that were associated with the slowdown, but them merely being activated. It was as though something with the kprobes/ftrace probe-registration code performed a lot more work than it did longer ago. (One way to test this could be to be more careful in the selection of kprobes being enabled. For examle, emplace thousands, but only unused loaded modules.) I'm sorry I didn't get time to investigate further. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi - So the similar thing happens when we enables events as below; # for i in /sys/kernel/debug/tracing/events/kprobes/* ; do date; echo 1 $i; done Wed Jan 29 10:44:50 UTC 2014 ... I tried it and canceled after 4 min passed. It enabled about 17k events and slowed down my system very much(I almost got hang check timer). Ok, I guess that's the slowdown bug that Frank reported. It could be, but it feels a bit different. In my testing from December, it's as though it wasn't the activated probes *hitting* that were associated with the slowdown, but them merely being activated. It was as though something with the kprobes/ftrace probe-registration code performed a lot more work than it did longer ago. (One way to test this could be to be more careful in the selection of kprobes being enabled. For examle, emplace thousands, but only unused loaded modules.) I'm sorry I didn't get time to investigate further. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi - mingo wrote: > [...] > For example a hash table (hashed by probe address) could be used in > addition to the list, to speed up basic operations. In the past, when this sort of behavior popped up, it was due to machine-wide halt/sync operations being done too eagerly. At one point, the kprobes-unregistration interface grew a mass-unregister API to batch them (and save the machine sync's between operations). Maybe with the new checks/logic, a similar batching API may be needed for the registration side. I'll try to get more perf data once the VM comes back up; after a couple of hours of the test getting started, it died (for possibly unrelated reasons). [ 133.073670] stap_bc6054113aa63134d411836da0afefc3_123_1261: module verification failed: signature and/or required key missing - tainting kernel [ 404.357210] stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948: systemtap: 2.5/0.157, base: a0201000, memory: 15882data/24text/1ctx/2058net/4625alloc kb, probes: 34692 [ 1655.745075] hrtimer: interrupt took 1225946 ns [ 3969.175039] [sched_delayed] sched: RT throttling activated [10812.665534] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [10812.665534] BUG: unable to handle kernel paging request at 88007902f038 [10812.665534] IP: [] 0x88007902f038 [10812.665534] PGD 2d90067 PUD 2d93067 PMD 8000790001e3 [10812.665534] Oops: 0010 [#1] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#2] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#3] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#4] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665042] [ cut here ] [10812.665042] WARNING: CPU: 0 PID: 1948 at arch/x86/kernel/kprobes/core.c:600 reenter_kprobe+0x3c/0xd0() [10812.665042] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded:
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi - mingo wrote: [...] For example a hash table (hashed by probe address) could be used in addition to the list, to speed up basic operations. In the past, when this sort of behavior popped up, it was due to machine-wide halt/sync operations being done too eagerly. At one point, the kprobes-unregistration interface grew a mass-unregister API to batch them (and save the machine sync's between operations). Maybe with the new checks/logic, a similar batching API may be needed for the registration side. I'll try to get more perf data once the VM comes back up; after a couple of hours of the test getting started, it died (for possibly unrelated reasons). [ 133.073670] stap_bc6054113aa63134d411836da0afefc3_123_1261: module verification failed: signature and/or required key missing - tainting kernel [ 404.357210] stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948: systemtap: 2.5/0.157, base: a0201000, memory: 15882data/24text/1ctx/2058net/4625alloc kb, probes: 34692 [ 1655.745075] hrtimer: interrupt took 1225946 ns [ 3969.175039] [sched_delayed] sched: RT throttling activated [10812.665534] kernel tried to execute NX-protected page - exploit attempt? (uid: 0) [10812.665534] BUG: unable to handle kernel paging request at 88007902f038 [10812.665534] IP: [88007902f038] 0x88007902f038 [10812.665534] PGD 2d90067 PUD 2d93067 PMD 8000790001e3 [10812.665534] Oops: 0010 [#1] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#2] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#3] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665534] Oops: [#4] SMP [10812.665534] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic virtio_ring pata_acpi virtio [last unloaded: stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681] [10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF O 3.13.0-rc4-01828-g8b349c29efae #1 [10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10812.665534] BUG: unable to handle kernel paging request at 8f896e09 [10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60 [10812.665534] PGD 0 [10812.665534] Thread overran stack, or stack corrupted [10812.665042] [ cut here ] [10812.665042] WARNING: CPU: 0 PID: 1948 at arch/x86/kernel/kprobes/core.c:600 reenter_kprobe+0x3c/0xd0() [10812.665042] Modules linked in: stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: > Here is the version 6 of NOKPROBE_SYMBOL series. :) > [...] Some preliminary results from building these on top of tip/master on x86-64. # stap -te "probe kprobe.function("*") {}" starts up OK, without crashes, which looks like great progress. But a closer look indicates that the insertion of kprobes is taking about three (!!) orders of magnitude longer than before, as judged by the rate of increase of 'wc -l /sys/kernel/debug/kprobes/list'. So, one has to let the thing run for several hours just to get all the kprobes inserted, never mind letting stress-testing begin. For reference, here's the steady-state "perf top" output during all this insertion work: 54.81% [kernel][k] _raw_spin_unlock_irqrestore 38.13% [kernel][k] __slab_alloc 1.11% [kernel][k] kprobe_ftrace_handler 0.88% [kernel][k] _raw_spin_unlock_irq More notes once the machine gets far enough to get to the robustness testing phase. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: Here is the version 6 of NOKPROBE_SYMBOL series. :) [...] Some preliminary results from building these on top of tip/master on x86-64. # stap -te probe kprobe.function(*) {} starts up OK, without crashes, which looks like great progress. But a closer look indicates that the insertion of kprobes is taking about three (!!) orders of magnitude longer than before, as judged by the rate of increase of 'wc -l /sys/kernel/debug/kprobes/list'. So, one has to let the thing run for several hours just to get all the kprobes inserted, never mind letting stress-testing begin. For reference, here's the steady-state perf top output during all this insertion work: 54.81% [kernel][k] _raw_spin_unlock_irqrestore 38.13% [kernel][k] __slab_alloc 1.11% [kernel][k] kprobe_ftrace_handler 0.88% [kernel][k] _raw_spin_unlock_irq More notes once the machine gets far enough to get to the robustness testing phase. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
masami.hiramatsu.pt wrote: > [...] > Anyway, as far as I can see, there looks be two different models of > tracing in our mind. > > A) Fixed event based tracing: In this model, there are several fixed > "events" which well defined with fixed arguments. tracer handles these > events and only use limited arguments. It's like a packet stream > processing. ftrace, perf etc. are used this model. > > B) Flexible event-point tracing: In this model, each tracer(or even > trace user) can freely define their own event, there will be some fixed > tracing points defined, but arguments are defined by users. It's like a > debugger's breakpoint debugging. systemtap, ktap etc. are used this model. It may be more useful to think of it as a contrast along the hard-coded versus programmable axis. (perf, systemtap, and ktap can each reach to some extent across your "fixed" vs "flexible" line. Each has some dynamic and some static-tracepoint capability.) > e.g. B model has a good flexibility and A model is easy to use for > beginners. I don't think it's the model that dictates ease-of-use, but the quality of implementation, logistics, documentation, and examples. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 4/5] use BPF in tracing filters
masami.hiramatsu.pt wrote: [...] Anyway, as far as I can see, there looks be two different models of tracing in our mind. A) Fixed event based tracing: In this model, there are several fixed events which well defined with fixed arguments. tracer handles these events and only use limited arguments. It's like a packet stream processing. ftrace, perf etc. are used this model. B) Flexible event-point tracing: In this model, each tracer(or even trace user) can freely define their own event, there will be some fixed tracing points defined, but arguments are defined by users. It's like a debugger's breakpoint debugging. systemtap, ktap etc. are used this model. It may be more useful to think of it as a contrast along the hard-coded versus programmable axis. (perf, systemtap, and ktap can each reach to some extent across your fixed vs flexible line. Each has some dynamic and some static-tracepoint capability.) e.g. B model has a good flexibility and A model is easy to use for beginners. I don't think it's the model that dictates ease-of-use, but the quality of implementation, logistics, documentation, and examples. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi - On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote: > [...] > > Would you plan to limit kprobes (or just the perf-probe frontend) to > > only function-entries also? > Exactly, yes :). Currently I have a patch for kprobe-tracer > implementation (not only for perf-probe, but doesn't limit kprobes > itself). Interesting option. It sounds like a restrictive expedient that could result in kprobes never being made sufficiently robust. > > If not, and if intra-function statement-granularity kprobes remain > > allowed within a function-granularity whitelist, then you might > > still have those "quantitative" problems. > Yes, but as far as I've tested, the performance overhead is not > high, especially as far as putting kprobes at the entry of those > functions because of ftrace-based optimization. (Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?) > > Even worse, kprobes robustness problems can bite even with a small > > whitelist, unless you can test the countless subset selections > > cartesian-product the aggrevating factors (like other tracing > > facilities being in use at the same time, limited memory, high irq > > rates, debugging sessions, architectures, whatever). > > And also, what script will run on each probe, right? :) In the perf-probe world, the closest analogue could be varying the contextual data that's being extracted (stack traces, parameters, ...). > >> [...] For the long term solution, I think we can introduce some > >> kind of performance gatekeeper as systemtap does. Counting the > >> miss-hit rate per second and if it go over a threshold, disable next > >> miss-hit (or most miss-hit) probe (as OOM killer does). > > > > That would make sense, but again it would not help deal with kprobes > > robustness (in the kernel-crashing rather than kernel-slowdown sense). > > Why would you think so? Is there any hidden path for calling kprobes > mechanism?? The kernel crash problem just comes from bugs, not the > quantitative issue. I don't think we're disagreeing. A performance-gatekeeper in perf-probe or nearby would be useful (and manage the kprobe-quantity problem). It would not be sufficient to prevent the kernel-crashing bugs. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
hpa wrote: >> I can see there may be some setups which don't have a compiler >> (e.g. I know some people don't use systemtap because of that) >> But this needs a custom gcc install too as far as I understand. > > Yes... but no compiler and secure boot tend to go together, or at > least will in the future. (Maybe not: we're already experimenting with support for secureboot in systemtap.) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: > [...] > >> [...] Then, I'd like to propose this new whitelist feature in > >> kprobe-tracer (not raw kprobe itself). And a sysctl knob for > >> disabling the whitelist. That knob will be > >> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark > >> kernel tainted so that we can check it from bug reports. > > > > How would one assemble a reliable whitelist, if we haven't fully > > characterized the problems that make the blacklist necessary? > > As I said, we can use function graph tracer's list as the whitelist, > since it doesn't include any functions invoked from ftrace's event > handler. (Note that I don't mention the Systemtap or other user here) > > Whitelist is just for keeping the people away from the quantitative > issue, who just want to trace their subsystems except for ftrace. > [...] Would you plan to limit kprobes (or just the perf-probe frontend) to only function-entries also? If not, and if intra-function statement-granularity kprobes remain allowed within a function-granularity whitelist, then you might still have those "quantitative" problems. Even worse, kprobes robustness problems can bite even with a small whitelist, unless you can test the countless subset selections cartesian-product the aggrevating factors (like other tracing facilities being in use at the same time, limited memory, high irq rates, debugging sessions, architectures, whatever). > [...] For the long term solution, I think we can introduce some > kind of performance gatekeeper as systemtap does. Counting the > miss-hit rate per second and if it go over a threshold, disable next > miss-hit (or most miss-hit) probe (as OOM killer does). That would make sense, but again it would not help deal with kprobes robustness (in the kernel-crashing rather than kernel-slowdown sense). - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: [...] [...] Then, I'd like to propose this new whitelist feature in kprobe-tracer (not raw kprobe itself). And a sysctl knob for disabling the whitelist. That knob will be /proc/sys/debug/kprobe-event-whitelist and disabling it will mark kernel tainted so that we can check it from bug reports. How would one assemble a reliable whitelist, if we haven't fully characterized the problems that make the blacklist necessary? As I said, we can use function graph tracer's list as the whitelist, since it doesn't include any functions invoked from ftrace's event handler. (Note that I don't mention the Systemtap or other user here) Whitelist is just for keeping the people away from the quantitative issue, who just want to trace their subsystems except for ftrace. [...] Would you plan to limit kprobes (or just the perf-probe frontend) to only function-entries also? If not, and if intra-function statement-granularity kprobes remain allowed within a function-granularity whitelist, then you might still have those quantitative problems. Even worse, kprobes robustness problems can bite even with a small whitelist, unless you can test the countless subset selections cartesian-product the aggrevating factors (like other tracing facilities being in use at the same time, limited memory, high irq rates, debugging sessions, architectures, whatever). [...] For the long term solution, I think we can introduce some kind of performance gatekeeper as systemtap does. Counting the miss-hit rate per second and if it go over a threshold, disable next miss-hit (or most miss-hit) probe (as OOM killer does). That would make sense, but again it would not help deal with kprobes robustness (in the kernel-crashing rather than kernel-slowdown sense). - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
hpa wrote: I can see there may be some setups which don't have a compiler (e.g. I know some people don't use systemtap because of that) But this needs a custom gcc install too as far as I understand. Yes... but no compiler and secure boot tend to go together, or at least will in the future. (Maybe not: we're already experimenting with support for secureboot in systemtap.) - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi - On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote: [...] Would you plan to limit kprobes (or just the perf-probe frontend) to only function-entries also? Exactly, yes :). Currently I have a patch for kprobe-tracer implementation (not only for perf-probe, but doesn't limit kprobes itself). Interesting option. It sounds like a restrictive expedient that could result in kprobes never being made sufficiently robust. If not, and if intra-function statement-granularity kprobes remain allowed within a function-granularity whitelist, then you might still have those quantitative problems. Yes, but as far as I've tested, the performance overhead is not high, especially as far as putting kprobes at the entry of those functions because of ftrace-based optimization. (Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?) Even worse, kprobes robustness problems can bite even with a small whitelist, unless you can test the countless subset selections cartesian-product the aggrevating factors (like other tracing facilities being in use at the same time, limited memory, high irq rates, debugging sessions, architectures, whatever). And also, what script will run on each probe, right? :) In the perf-probe world, the closest analogue could be varying the contextual data that's being extracted (stack traces, parameters, ...). [...] For the long term solution, I think we can introduce some kind of performance gatekeeper as systemtap does. Counting the miss-hit rate per second and if it go over a threshold, disable next miss-hit (or most miss-hit) probe (as OOM killer does). That would make sense, but again it would not help deal with kprobes robustness (in the kernel-crashing rather than kernel-slowdown sense). Why would you think so? Is there any hidden path for calling kprobes mechanism?? The kernel crash problem just comes from bugs, not the quantitative issue. I don't think we're disagreeing. A performance-gatekeeper in perf-probe or nearby would be useful (and manage the kprobe-quantity problem). It would not be sufficient to prevent the kernel-crashing bugs. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
Andi Kleen writes: > [...] While it sounds interesting, I would strongly advise to make > this capability only available to root. Traditionally lots of > complex byte code languages which were designed to be "safe" and > verifiable weren't really. e.g. i managed to crash things with > "safe" systemtap multiple times. [...] Note that systemtap has never been a byte code language, that avenue being considered lkml-futile at the time, but instead pure C. Its safety comes from a mix of compiled-in checks (which you can inspect via "stap -p3") and script-to-C translation checks (which are self-explanatory). Its risks come from bugs in the checks (quite rare), problems in the runtime library (rare), and problems in underlying kernel facilities (rare or frequent - consider kprobes). > So the likelyhood of this having some hole somewhere (either in > the byte code or in some library function) is high. Very true! - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
ast wrote: >>[...] > Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email: > trace skb:kfree_skb { > if (arg2 == 0x100) { > printf("%x %x\n", arg1, arg2) > } > } > [...] For reference, you might try putting systemtap into the performance comparison matrix too: # stap -e 'probe kernel.trace("kfree_skb") { if ($location == 0x100 /* || $location == 0x200 etc. */ ) { printf("%x %x\n", $skb, $location) } }' - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: > [...] > For the safeness of kprobes, I have an idea; introduce a whitelist > for dynamic events. AFAICS, the biggest unstable issue of kprobes > comes from putting *many* probes on the functions called from tracers. Why do you think so? We have had problems with single kprobes in the "wrong" spot. The main reason I showed spraying them widely is to get wide coverage with minimal information/effort, not to suggest that the number of concurrent probes per se is a problem. (We have had systemtap scripts probing some areas of the kernel with thousands of active kprobes, e.g. for statement-by-statement variable-watching jobs, and these have worked fine.) > It doesn't crash the kernel but slows down so much, because every > probes hit many other nested miss-hit probes. (kprobes does have code to detect & handle reentrancy.) > This gives us a big performance impact. [...] Sure, but I'd expect to see pure slowdowns show their impact with time-related problems like watchdogs firing or timeouts. > [...] Then, I'd like to propose this new whitelist feature in > kprobe-tracer (not raw kprobe itself). And a sysctl knob for > disabling the whitelist. That knob will be > /proc/sys/debug/kprobe-event-whitelist and disabling it will mark > kernel tainted so that we can check it from bug reports. How would one assemble a reliable whitelist, if we haven't fully characterized the problems that make the blacklist necessary? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs
Hi, Masami - masami.hiramatsu.pt wrote: [...] For the safeness of kprobes, I have an idea; introduce a whitelist for dynamic events. AFAICS, the biggest unstable issue of kprobes comes from putting *many* probes on the functions called from tracers. Why do you think so? We have had problems with single kprobes in the wrong spot. The main reason I showed spraying them widely is to get wide coverage with minimal information/effort, not to suggest that the number of concurrent probes per se is a problem. (We have had systemtap scripts probing some areas of the kernel with thousands of active kprobes, e.g. for statement-by-statement variable-watching jobs, and these have worked fine.) It doesn't crash the kernel but slows down so much, because every probes hit many other nested miss-hit probes. (kprobes does have code to detect handle reentrancy.) This gives us a big performance impact. [...] Sure, but I'd expect to see pure slowdowns show their impact with time-related problems like watchdogs firing or timeouts. [...] Then, I'd like to propose this new whitelist feature in kprobe-tracer (not raw kprobe itself). And a sysctl knob for disabling the whitelist. That knob will be /proc/sys/debug/kprobe-event-whitelist and disabling it will mark kernel tainted so that we can check it from bug reports. How would one assemble a reliable whitelist, if we haven't fully characterized the problems that make the blacklist necessary? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
ast wrote: [...] Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email: trace skb:kfree_skb { if (arg2 == 0x100) { printf(%x %x\n, arg1, arg2) } } [...] For reference, you might try putting systemtap into the performance comparison matrix too: # stap -e 'probe kernel.trace(kfree_skb) { if ($location == 0x100 /* || $location == 0x200 etc. */ ) { printf(%x %x\n, $skb, $location) } }' - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 0/5] tracing filters with BPF
Andi Kleen a...@firstfloor.org writes: [...] While it sounds interesting, I would strongly advise to make this capability only available to root. Traditionally lots of complex byte code languages which were designed to be safe and verifiable weren't really. e.g. i managed to crash things with safe systemtap multiple times. [...] Note that systemtap has never been a byte code language, that avenue being considered lkml-futile at the time, but instead pure C. Its safety comes from a mix of compiled-in checks (which you can inspect via stap -p3) and script-to-C translation checks (which are self-explanatory). Its risks come from bugs in the checks (quite rare), problems in the runtime library (rare), and problems in underlying kernel facilities (rare or frequent - consider kprobes). So the likelyhood of this having some hole somewhere (either in the byte code or in some library function) is high. Very true! - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document
Alexei Starovoitov writes: > [...] >> Having EBPF code manipulating pointers - or kernel memory - directly >> seems like a nonstarter. However, per your subsequent paragraph it >> sounds like pointers are a special type at which point it shouldn't >> matter at the EBPF level how many bytes it takes to represent it? > > bpf_check() will track every register through every insn. > If pointer is stored in the register, it will know what type > of pointer it is and will allow '*reg' operation only if pointer is valid. > [...] > BPF program actually can manipulate kernel memory directly > when checker guarantees that it is safe to do so :) It sounds like this sort of static analysis would have difficulty with situations such as: - multiple levels of indirection - conditionals (where it can't trace a unique data/type flow for all pointers) - aliasing (same reason) - the possibility of bad (or userspace?) pointers arriving as parameters from the underlying trace events > For example in tracing filters bpf_context access is restricted to: > static const struct bpf_context_access ctx_access[MAX_CTX_OFF] = { > [offsetof(struct bpf_context, regs.di)] = { > FIELD_SIZEOF(struct bpf_context, regs.di), > BPF_READ > }, Are such constraints to be hard-coded in the kernel? > Over course of development bpf_check() found several compiler bugs. > I also tried all of sorts of ways to break bpf jail from inside of a > bpf program, but so far checker catches everything I was able to throw > at it. (One can be sure that attackers will chew hard on this interface, should it become reasonably accessible to userspace, so good job starting to check carefully!) - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document
Alexei Starovoitov a...@plumgrid.com writes: [...] Having EBPF code manipulating pointers - or kernel memory - directly seems like a nonstarter. However, per your subsequent paragraph it sounds like pointers are a special type at which point it shouldn't matter at the EBPF level how many bytes it takes to represent it? bpf_check() will track every register through every insn. If pointer is stored in the register, it will know what type of pointer it is and will allow '*reg' operation only if pointer is valid. [...] BPF program actually can manipulate kernel memory directly when checker guarantees that it is safe to do so :) It sounds like this sort of static analysis would have difficulty with situations such as: - multiple levels of indirection - conditionals (where it can't trace a unique data/type flow for all pointers) - aliasing (same reason) - the possibility of bad (or userspace?) pointers arriving as parameters from the underlying trace events For example in tracing filters bpf_context access is restricted to: static const struct bpf_context_access ctx_access[MAX_CTX_OFF] = { [offsetof(struct bpf_context, regs.di)] = { FIELD_SIZEOF(struct bpf_context, regs.di), BPF_READ }, Are such constraints to be hard-coded in the kernel? Over course of development bpf_check() found several compiler bugs. I also tried all of sorts of ways to break bpf jail from inside of a bpf program, but so far checker catches everything I was able to throw at it. (One can be sure that attackers will chew hard on this interface, should it become reasonably accessible to userspace, so good job starting to check carefully!) - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
Hi - > > Does this new blacklist cover enough that the kernel now survives a > > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? > > That's generally the purpose of the annotations - if it doesn't then > that's a bug. AFAIK, no kernel since kprobes was introduced has ever stood up to that test. perf probe lacks the wildcarding powers of systemtap, so one needs to resort to something like: # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do perf probe $symbol done then wait for a few hours for that to finish. Then, or while the loop is still running, run # perf record -e 'probe:*' -aR sleep 1 to take a kernel down. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
masami.hiramatsu.pt wrote: > [...] This series also includes a change which prohibits probing on > the address in .entry.text because the code is used for very > low-level sensitive interrupt/syscall entries. Probing such code may > cause unexpected result (actually most of that area is already in > the kprobe blacklist). So I've decide to prohibit probing all of > them. [...] Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
masami.hiramatsu.pt wrote: [...] This series also includes a change which prohibits probing on the address in .entry.text because the code is used for very low-level sensitive interrupt/syscall entries. Probing such code may cause unexpected result (actually most of that area is already in the kprobe blacklist). So I've decide to prohibit probing all of them. [...] Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
Hi - Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? That's generally the purpose of the annotations - if it doesn't then that's a bug. AFAIK, no kernel since kprobes was introduced has ever stood up to that test. perf probe lacks the wildcarding powers of systemtap, so one needs to resort to something like: # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do perf probe $symbol done then wait for a few hours for that to finish. Then, or while the loop is still running, run # perf record -e 'probe:*' -aR sleep 1 to take a kernel down. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:
Pekka Enberg writes: > Is there a technical reason why 'perf list' could not show all the > available SDT markers on a system and that the 'mark to event' > mapping cannot happen automatically? [...] A quick experiment with: find `echo $PATH | tr : ' '` -type f -perm -555 | xargs readelf -n 2>/dev/null | grep STAP 2>/dev/null suggests reasonable performance for my F19 workstation (a second or two over ~6000 executables), once all the ELF content is in the block cache. According to a stap eventcount.stp run, that required about 5 syscall.read events. Note that a $PATH search excludes shared libraries, which can also carry markers. Adding /usr/lib* in more than doubles the work, then there's /usr/libexec etc. - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:
Pekka Enberg penb...@kernel.org writes: Is there a technical reason why 'perf list' could not show all the available SDT markers on a system and that the 'mark to event' mapping cannot happen automatically? [...] A quick experiment with: find `echo $PATH | tr : ' '` -type f -perm -555 | xargs readelf -n 2/dev/null | grep STAP 2/dev/null suggests reasonable performance for my F19 workstation (a second or two over ~6000 executables), once all the ELF content is in the block cache. According to a stap eventcount.stp run, that required about 5 syscall.read events. Note that a $PATH search excludes shared libraries, which can also carry sys/sdt.h markers. Adding /usr/lib* in more than doubles the work, then there's /usr/libexec etc. - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/3] Perf support to SDT markers
Hemant Kumar writes: > [...] > A simple example to show this follows. > - Create a file with .d extension and mention the probe names in it with > provider name and marker name. > [...] > - Now create the probes.h and probes.o file : > $ dtrace -C -h -s probes.d -o probes.h > $ dtrace -C -G -s probes.d -o probes.o > [...] It may be worthwhile to document an even-simpler case: - no .d file - no invocation of the dtrace python script - no generated .h or .o file - in the C file, just add: #include void main () { /* ... */ STAP_PROBE(provider_name,probe_name); /* ... */ } - gcc file.c - stap -l 'process("./a.out").mark("*")' to list - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 0/3] Perf support to SDT markers
Hemant Kumar hks...@linux.vnet.ibm.com writes: [...] A simple example to show this follows. - Create a file with .d extension and mention the probe names in it with provider name and marker name. [...] - Now create the probes.h and probes.o file : $ dtrace -C -h -s probes.d -o probes.h $ dtrace -C -G -s probes.d -o probes.o [...] It may be worthwhile to document an even-simpler case: - no .d file - no invocation of the dtrace python script - no generated .h or .o file - in the C file, just add: #include sys/sdt.h void main () { /* ... */ STAP_PROBE(provider_name,probe_name); /* ... */ } - gcc file.c - stap -l 'process(./a.out).mark(*)' to list - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] ktap 0.1 released
"zhangwei(Jovi)" writes: > I'm pleased to announce that ktap release v0.1, this is the first official > release of ktap project [...] Congrats. > = what's ktap? > >Because this is the first release, so there wouldn't include too >much features, just contain several basic features about tracing, >[...] Nice progress. Reviewing the safety/security items from https://lkml.org/lkml/2013/1/17/623, I see improvement in most. For example, you seem to be using GFP_ATOMIC for run-time memory allocation, which is safer than before (though still could exhaust resources). OTOH your code doesn't handle *failure* of such allocation attempts (see call sites to kp_*alloc). There still doesn't seem to be safety constraints on the incoming byte code (like jump ranges, or loop counts). It's nice to see some arithmetic OP_* checks, and the user_string function is probably safe enough now. You'll need something analogous for kernel space (and possibly as verification for the various %s kp_printfs). The hash tables might be susceptible to the deliberate hash collision attacks from last year. It's nice to see the *_STACK_SIZE constraints in the bytecode interpreter; is there any C-level recursion remaining to consume excessive kernel stack? Exposing os.sleep/os.wait (or general kernel functions) to become callable from the scripts is fraught with danger. You just can't call the underlying functions from random kernel context (imagine from the most pessimal possible kprobe or tracepoint, somewhere within an atomic section), and you'll get crashes. You should write several time/space/invasivity stress-tests to help see how future progress improves the code's performance/safety on these and other problem areas. > = Planned Changes > >we are planning to enable more kernel ineroperability into ktap [...] As per the above, you'll want to be extremely careful about the idea to export FFI to let the lua scripts call into arbitrary kernel functions. Perhaps wrap it into a 'guru' mode flag? - FChE -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] ktap 0.1 released
zhangwei(Jovi) jovi.zhang...@huawei.com writes: I'm pleased to announce that ktap release v0.1, this is the first official release of ktap project [...] Congrats. = what's ktap? Because this is the first release, so there wouldn't include too much features, just contain several basic features about tracing, [...] Nice progress. Reviewing the safety/security items from https://lkml.org/lkml/2013/1/17/623, I see improvement in most. For example, you seem to be using GFP_ATOMIC for run-time memory allocation, which is safer than before (though still could exhaust resources). OTOH your code doesn't handle *failure* of such allocation attempts (see call sites to kp_*alloc). There still doesn't seem to be safety constraints on the incoming byte code (like jump ranges, or loop counts). It's nice to see some arithmetic OP_* checks, and the user_string function is probably safe enough now. You'll need something analogous for kernel space (and possibly as verification for the various %s kp_printfs). The hash tables might be susceptible to the deliberate hash collision attacks from last year. It's nice to see the *_STACK_SIZE constraints in the bytecode interpreter; is there any C-level recursion remaining to consume excessive kernel stack? Exposing os.sleep/os.wait (or general kernel functions) to become callable from the scripts is fraught with danger. You just can't call the underlying functions from random kernel context (imagine from the most pessimal possible kprobe or tracepoint, somewhere within an atomic section), and you'll get crashes. You should write several time/space/invasivity stress-tests to help see how future progress improves the code's performance/safety on these and other problem areas. = Planned Changes we are planning to enable more kernel ineroperability into ktap [...] As per the above, you'll want to be extremely careful about the idea to export FFI to let the lua scripts call into arbitrary kernel functions. Perhaps wrap it into a 'guru' mode flag? - FChE -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
systemtap 2.2.1 release
on + tapset functions are still incomplete relative to what is supported when the kernel backend is active + exception handling becomes completely broken in programs instrumented by the current version of dyninst (PR14702) + command line interrupts are slightly mishandled (PR15049) + not all registers are made available on 32-bit x86 (PR15136) See dyninst/README and the systemtap/dyninst Bugzilla component (http://tinyurl.com/stapdyn-PR-list) if you want all the gory details about the state of the feature. = Contributors for this release Dave Brolley, David Smith, Frank Ch. Eigler, Josh Stone, Lukas Berk, Mark Wielaard, Masanari Iida*, Negreanu Marius Adrian, Serguei Makarov, Timo Juhani Lindfors, Torsten Polle Special thanks to Serguei Makarov for drafting these notes. Special thanks to new contributors, marked with '*' above. = Bugs fixed for this release <http://sourceware.org/PR#> 11341 update_visitor::require/provide uses hazardous static_casts 12894 Provide a systemd target replacing the current stap-server initscript 14275 Possible hotspot.function(" ") style probes 14297 stap -l and pn() fail to expand complex wildcards 14491 Add a proper stapdyn transport layer 15053 stapdyn needs -G (setting global variables) support 15112 Can't connect to stap-server via IPv6 raw hex addresses 15114 [PATCH] Propagate uid and gid from nfsd module as well 15123 workaround for bad debuginfo for -mfentry 15147 _stp_error() doesn't behave as described 15155 syscall tapset doesn't know sendmmsg 15162 eh_frame table too big, may kernel panic 15168 tolerate ppc deprecated ptrace commands 15170 nfsd.proc4.write probe alias needs updating 15171 inet_get_local_port() tapset function is broken on rawhide kernels 15172 tolerate unavailable System.map, as on ubuntu 15173 'origin' renamed to 'whence' 15177 need to handle new 'whence' values of 'SEEK_DATA' and 'SEEK_HOLE' 15197 syscall.fork/nd_syscall.fork broken on rawhide kernels 15198 syscall.sigaltstack / nd_syscall.sigaltstack broken on rawhide 15211 syscall.exp failures on rawhide 15237 adapt to changes in hlist_* kernel api in 3.9 15279 Stop munging the uprobes IP with kernel 3.9 15290 Update the inode-uretprobes support for aarapov's latest iteration 15306 stapdyn IRPC on terminated process, child SEGV 15315 Implement basic process filtering for inode-uprobes 15363 don't abort for a measly inode-uprobes registration failure 15408 procfs probes broken on rawhide 15422 loc2c with 32-on-64 sometimes creates integer-widening-into-pointer gcc warnings 15445 kernel.data (hwbkpt) probes can cause kernel panic on i686 15446 procfs probes broken on rawhide (kernel 3.10) 15452 segmentation fault in libdw while running debugtyptes.stp on rawhide 15456 syscalls and nd_syscalls tapset compat probe points broken on kernel 3.10 15466 add fallback for timer.profile on kernels without register_timer_hook() pgpg_CCAqemEO.pgp Description: PGP signature
systemtap 2.2.1 release
to what is supported when the kernel backend is active + exception handling becomes completely broken in programs instrumented by the current version of dyninst (PR14702) + command line interrupts are slightly mishandled (PR15049) + not all registers are made available on 32-bit x86 (PR15136) See dyninst/README and the systemtap/dyninst Bugzilla component (http://tinyurl.com/stapdyn-PR-list) if you want all the gory details about the state of the feature. = Contributors for this release Dave Brolley, David Smith, Frank Ch. Eigler, Josh Stone, Lukas Berk, Mark Wielaard, Masanari Iida*, Negreanu Marius Adrian, Serguei Makarov, Timo Juhani Lindfors, Torsten Polle Special thanks to Serguei Makarov for drafting these notes. Special thanks to new contributors, marked with '*' above. = Bugs fixed for this release http://sourceware.org/PR# 11341 update_visitor::require/provide uses hazardous static_casts 12894 Provide a systemd target replacing the current stap-server initscript 14275 Possible hotspot.function( ) style probes 14297 stap -l and pn() fail to expand complex wildcards 14491 Add a proper stapdyn transport layer 15053 stapdyn needs -G (setting global variables) support 15112 Can't connect to stap-server via IPv6 raw hex addresses 15114 [PATCH] Propagate uid and gid from nfsd module as well 15123 workaround for bad debuginfo for -mfentry 15147 _stp_error() doesn't behave as described 15155 syscall tapset doesn't know sendmmsg 15162 eh_frame table too big, may kernel panic 15168 tolerate ppc deprecated ptrace commands 15170 nfsd.proc4.write probe alias needs updating 15171 inet_get_local_port() tapset function is broken on rawhide kernels 15172 tolerate unavailable System.map, as on ubuntu 15173 'origin' renamed to 'whence' 15177 need to handle new 'whence' values of 'SEEK_DATA' and 'SEEK_HOLE' 15197 syscall.fork/nd_syscall.fork broken on rawhide kernels 15198 syscall.sigaltstack / nd_syscall.sigaltstack broken on rawhide 15211 syscall.exp failures on rawhide 15237 adapt to changes in hlist_* kernel api in 3.9 15279 Stop munging the uprobes IP with kernel 3.9 15290 Update the inode-uretprobes support for aarapov's latest iteration 15306 stapdyn IRPC on terminated process, child SEGV 15315 Implement basic process filtering for inode-uprobes 15363 don't abort for a measly inode-uprobes registration failure 15408 procfs probes broken on rawhide 15422 loc2c with 32-on-64 sometimes creates integer-widening-into-pointer gcc warnings 15445 kernel.data (hwbkpt) probes can cause kernel panic on i686 15446 procfs probes broken on rawhide (kernel 3.10) 15452 segmentation fault in libdw while running debugtyptes.stp on rawhide 15456 syscalls and nd_syscalls tapset compat probe points broken on kernel 3.10 15466 add fallback for timer.profile on kernels without register_timer_hook() pgpg_CCAqemEO.pgp Description: PGP signature
Re: systemtap broken by removal of register_timer_hook
Hi - > [...] How about creating trace_tick() in > include/trace/events/timer.h and call it from tick_periodic() and > tick_sched_handle(). [...] Like this? >From facee64445c0dcc717e99c474c5c7dcdd31b9a74 Mon Sep 17 00:00:00 2001 From: "Frank Ch. Eigler" Date: Wed, 3 Apr 2013 10:35:21 -0400 Subject: [PATCH] profiling: add tick tracepoint Commit ba6fdda4 removed the timer_hook mechanism for modules to listen to profiling timer ticks (without having to set up more complicated perf mechanisms). To reduce the impact on out-of-tree users such as systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place, invoked right beside profile_tick() in kernel/time/tick-*.c. Tested with perf and systemtap. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mel Gorman Signed-off-by: Frank Ch. Eigler --- include/trace/events/timer.h | 20 kernel/time/tick-common.c| 2 ++ kernel/time/tick-sched.c | 2 ++ 3 files changed, 24 insertions(+) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 425bcfe..ec4c2d0 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -323,6 +323,26 @@ TRACE_EVENT(itimer_expire, (int) __entry->pid, (unsigned long long)__entry->now) ); + +struct pt_regs; + +/** + * tick - called when the profiling timer ticks + * @regs: pointer to struct pt_regs* + */ +TRACE_EVENT(tick, + TP_PROTO(struct pt_regs *regs), + TP_ARGS(regs), + TP_STRUCT__entry( + __field( struct pt_regs*, regs) + ), + TP_fast_assign( + __entry->regs = regs; + ), + TP_printk("ip=%p", (void *) instruction_pointer(__entry->regs)) +); + + #endif /* _TRACE_TIMER_H */ /* This part must be outside protection */ diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index b1600a6..5f4227f 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -18,6 +18,7 @@ #include #include #include +#include #include @@ -74,6 +75,7 @@ static void tick_periodic(int cpu) update_process_times(user_mode(get_irq_regs())); profile_tick(CPU_PROFILING); + trace_tick(get_irq_regs()); } /* diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a19a399..447be56 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -21,6 +21,7 @@ #include #include #include +#include #include @@ -140,6 +141,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) #endif update_process_times(user_mode(regs)); profile_tick(CPU_PROFILING); + trace_tick(get_irq_regs()); } /* -- 1.8.2 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: systemtap broken by removal of register_timer_hook
Hi - [...] How about creating trace_tick() in include/trace/events/timer.h and call it from tick_periodic() and tick_sched_handle(). [...] Like this? From facee64445c0dcc717e99c474c5c7dcdd31b9a74 Mon Sep 17 00:00:00 2001 From: Frank Ch. Eigler f...@redhat.com Date: Wed, 3 Apr 2013 10:35:21 -0400 Subject: [PATCH] profiling: add tick tracepoint Commit ba6fdda4 removed the timer_hook mechanism for modules to listen to profiling timer ticks (without having to set up more complicated perf mechanisms). To reduce the impact on out-of-tree users such as systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place, invoked right beside profile_tick() in kernel/time/tick-*.c. Tested with perf and systemtap. Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@kernel.org Cc: Mel Gorman mgor...@suse.de Signed-off-by: Frank Ch. Eigler f...@redhat.com --- include/trace/events/timer.h | 20 kernel/time/tick-common.c| 2 ++ kernel/time/tick-sched.c | 2 ++ 3 files changed, 24 insertions(+) diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h index 425bcfe..ec4c2d0 100644 --- a/include/trace/events/timer.h +++ b/include/trace/events/timer.h @@ -323,6 +323,26 @@ TRACE_EVENT(itimer_expire, (int) __entry-pid, (unsigned long long)__entry-now) ); + +struct pt_regs; + +/** + * tick - called when the profiling timer ticks + * @regs: pointer to struct pt_regs* + */ +TRACE_EVENT(tick, + TP_PROTO(struct pt_regs *regs), + TP_ARGS(regs), + TP_STRUCT__entry( + __field( struct pt_regs*, regs) + ), + TP_fast_assign( + __entry-regs = regs; + ), + TP_printk(ip=%p, (void *) instruction_pointer(__entry-regs)) +); + + #endif /* _TRACE_TIMER_H */ /* This part must be outside protection */ diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index b1600a6..5f4227f 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -18,6 +18,7 @@ #include linux/percpu.h #include linux/profile.h #include linux/sched.h +#include trace/events/timer.h #include asm/irq_regs.h @@ -74,6 +75,7 @@ static void tick_periodic(int cpu) update_process_times(user_mode(get_irq_regs())); profile_tick(CPU_PROFILING); + trace_tick(get_irq_regs()); } /* diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index a19a399..447be56 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -21,6 +21,7 @@ #include linux/sched.h #include linux/module.h #include linux/irq_work.h +#include trace/events/timer.h #include asm/irq_regs.h @@ -140,6 +141,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct pt_regs *regs) #endif update_process_times(user_mode(regs)); profile_tick(CPU_PROFILING); + trace_tick(get_irq_regs()); } /* -- 1.8.2 -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: systemtap broken by removal of register_timer_hook
Hi, Frederic - > > How about this? > > > > Author: Frank Ch. Eigler > > Date: Wed Apr 3 10:35:21 2013 -0400 > > > > profiling: add profile_tick tracepoint > > [...] > It would be better not to tie this to CONFIG_PROFILING. > A tracepoint in update_process_times() instead would be great but it's > sometimes called several times in a tick from some archs. > Probably we need something like: > > static inline tick_trace(struct pt_regs *regs) > { > trace_timer_tick(regs); > profile_tick(CPU_PROFILING); > } I looked into this, but found no natural place to define such an inline function from which to call into a tracepoint, without having to #include the file many times. Nor does it seem appropriate to do the identical #define CREATE_TRACE_POINTS part from all the different arch/.../*.c files that may call into that inline. If you'd like to stick to this idea, please advise further where you think the tracepoint definition & declarations should go. In the alternative, here is v2 of the patch, just changing the tracepoint-printing argument as suggested by jistone. - FChE --- Author: Frank Ch. Eigler Date: Wed Apr 3 10:35:21 2013 -0400 profiling: add profile_tick tracepoint Commit ba6fdda4 removed the timer_hook mechanism for modules to listen to profiling timer ticks (without having to set up more complicated perf mechanisms). To reduce the impact on out-of-tree users such as systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place. Tested with perf and systemtap. Cc: Frederic Weisbecker Cc: Ingo Molnar Cc: Mel Gorman Signed-off-by: Frank Ch. Eigler diff --git a/include/trace/events/profile.h b/include/trace/events/profile.h new file mode 100644 index 000..445aee7 --- /dev/null +++ b/include/trace/events/profile.h @@ -0,0 +1,37 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM profile + +#if !defined(_TRACE_PROFILE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PROFILE_H + +#include + + +struct pt_regs; + +/** + * profile_tick - called when the profiling timer ticks + * @type: profiling tick type, generally @CPU_PROFILING + * @regs: pointer to struct pt_regs* + */ + +TRACE_EVENT(profile_tick, + TP_PROTO(int type, struct pt_regs *regs), + TP_ARGS(type, regs), + TP_STRUCT__entry( + __field( int, type) + __field( struct pt_regs*, regs) + ), + TP_fast_assign( + __entry->type = type; + __entry->regs = regs; + ), + TP_printk("type=%d ip=%p", __entry->type, + instruction_pointer(__entry->regs)) +); + + +#endif /* _TRACE_PROFILE_H */ + +/* This part must be outside protection */ +#include diff --git a/kernel/profile.c b/kernel/profile.c index dc3384e..d61f921 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -29,6 +29,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + struct profile_hit { u32 pc, hits; }; @@ -414,6 +417,8 @@ void profile_tick(int type) { struct pt_regs *regs = get_irq_regs(); + trace_profile_tick(type, regs); + if (!user_mode(regs) && prof_cpu_mask != NULL && cpumask_test_cpu(smp_processor_id(), prof_cpu_mask)) profile_hit(type, (void *)profile_pc(regs)); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: systemtap broken by removal of register_timer_hook
Hi, Frederic - How about this? Author: Frank Ch. Eigler f...@redhat.com Date: Wed Apr 3 10:35:21 2013 -0400 profiling: add profile_tick tracepoint [...] It would be better not to tie this to CONFIG_PROFILING. A tracepoint in update_process_times() instead would be great but it's sometimes called several times in a tick from some archs. Probably we need something like: static inline tick_trace(struct pt_regs *regs) { trace_timer_tick(regs); profile_tick(CPU_PROFILING); } I looked into this, but found no natural place to define such an inline function from which to call into a tracepoint, without having to #include the event/FOO.h file many times. Nor does it seem appropriate to do the identical #define CREATE_TRACE_POINTS part from all the different arch/.../*.c files that may call into that inline. If you'd like to stick to this idea, please advise further where you think the tracepoint definition declarations should go. In the alternative, here is v2 of the patch, just changing the tracepoint-printing argument as suggested by jistone. - FChE --- Author: Frank Ch. Eigler f...@redhat.com Date: Wed Apr 3 10:35:21 2013 -0400 profiling: add profile_tick tracepoint Commit ba6fdda4 removed the timer_hook mechanism for modules to listen to profiling timer ticks (without having to set up more complicated perf mechanisms). To reduce the impact on out-of-tree users such as systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place. Tested with perf and systemtap. Cc: Frederic Weisbecker fweis...@gmail.com Cc: Ingo Molnar mi...@kernel.org Cc: Mel Gorman mgor...@suse.de Signed-off-by: Frank Ch. Eigler f...@redhat.com diff --git a/include/trace/events/profile.h b/include/trace/events/profile.h new file mode 100644 index 000..445aee7 --- /dev/null +++ b/include/trace/events/profile.h @@ -0,0 +1,37 @@ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM profile + +#if !defined(_TRACE_PROFILE_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_PROFILE_H + +#include linux/tracepoint.h + + +struct pt_regs; + +/** + * profile_tick - called when the profiling timer ticks + * @type: profiling tick type, generally @CPU_PROFILING + * @regs: pointer to struct pt_regs* + */ + +TRACE_EVENT(profile_tick, + TP_PROTO(int type, struct pt_regs *regs), + TP_ARGS(type, regs), + TP_STRUCT__entry( + __field( int, type) + __field( struct pt_regs*, regs) + ), + TP_fast_assign( + __entry-type = type; + __entry-regs = regs; + ), + TP_printk(type=%d ip=%p, __entry-type, + instruction_pointer(__entry-regs)) +); + + +#endif /* _TRACE_PROFILE_H */ + +/* This part must be outside protection */ +#include trace/define_trace.h diff --git a/kernel/profile.c b/kernel/profile.c index dc3384e..d61f921 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -29,6 +29,9 @@ #include asm/irq_regs.h #include asm/ptrace.h +#define CREATE_TRACE_POINTS +#include trace/events/profile.h + struct profile_hit { u32 pc, hits; }; @@ -414,6 +417,8 @@ void profile_tick(int type) { struct pt_regs *regs = get_irq_regs(); + trace_profile_tick(type, regs); + if (!user_mode(regs) prof_cpu_mask != NULL cpumask_test_cpu(smp_processor_id(), prof_cpu_mask)) profile_hit(type, (void *)profile_pc(regs)); -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/