systemtap release 4.4

2020-11-09 Thread Frank Ch. Eigler
The SystemTap team announces release 4.4

Enhancements to this release include: significant performance and
stability improvements to user-space probing, implicit thread-local
storage variables can now be accessed on x86_64, ppc and s390, support
for processing floating point values, significantly improved
concurrency for scripts using global variables via shortened critical
sections, new syntax for defining aliases with both a prologue and
epilogue, new @probewrite predicate and syscall arguments are writable
again

= Where to get it

  https://sourceware.org/systemtap/ - our project page
  https://sourceware.org/systemtap/ftp/releases/
  https://koji.fedoraproject.org/koji/packageinfo?packageID=615
  git tag release-4.4 (commit 988f439af39a)

  There have been over 135 commits since the last release.
  There have been 25+ bugs fixed / features added since the last release.

= SystemTap frontend (stap) changes

- New syntax for defining aliases with both a prologue and an epilogue:
  'probe ALIAS = PROBE {  }, {  }' 

- New @probewrite predicate. @probewrite(var) returns 1 if var has been 
  written to in the probe handler body and 0 otherwise. The check can
  only be used with probes that have an epilogue or prologue.

- Implicit thread local storage variables can now be accessed on
  x86_64, ppc, and s390.

= SystemTap backend changes

- Various performance and stability improvements to user-space probing.
  This includes replacing spinlocks with RCU locks in vma map and utrace
  task's hash table lookups which reduces CPU time a lot when there are
  a lot of target processes and vma tracker or task finder is enabled.
  Also increased the default hash table sizes to reduce hash conflicts.
  Special thanks to Yichun Zhang and Sultan Alsawaf for these contributions.

- The locks required to protect concurrent access to global variables
  has been optimized with a "pushdown" algorithm, so that they span 
  the smallest possible critical region.  Formerly, and with 
  --compatible=4.3, locks always spanned the entire probe handler.
  Lock pushdown means much greater potential concurrency between
  probes running on different CPUs.
 
- Systemtap now supports kernel-lockdown configurations that disable
  debugfs, by instead using procfs to carry relayfs transport files.

= SystemTap tapset changes

- Systemtap now supports extracting 64-bit floating point and stored
  in long type.  Also basic floating point arithmetic and comparison
  functions are provided in a tapset.  More automated syntax coming
  soon.  e.g.:
probe process.function("foo") {
fp = user_long(& $fp_variable)
println (fp_to_string (fp_add (string_to_fp("3.14"), fp)))
}

- Make syscall arguments writable again in non-DWARF probes on kernels
  that use syscall wrappers to pass arguments via pt_regs (currently
  x86_64 4.17+ and aarch64 4.19+). For example, the following probe
  adds rwx user permissions to any directory made by the process
  specified by stap -c:
  
probe nd_syscall.mkdir {
  if (pid() == target())
mode |= 0700
}

= SystemTap sample scripts

- All 180+ examples can be found at https://sourceware.org/systemtap/examples/

- New sample scripts:

  floatingpoint.stp Extract a floating point value from a process and
print the results of various simple floating point
operations

- The following sample scripts have been enabled to run on the stapbpf backend:

  general/sizeof.stp
  memory/overcommit.stp

= Examples of tested kernel versions 

  2.6.32 (RHEL6 x86_64)
  3.10.0 (RHEL7 x86_64)
  4.15.0 (Ubuntu 18.04 x86_64)
  4.18.0 (RHEL8 x86_64, aarch64, ppc64le, s390x)
  5.3.8  (Fedora 30 i686)
  5.8.16 (Fedora 32 x86_64)
  5.8.18 (Fedora 33 x86_64)
  5.9.0-rc7   (Fedora rawhide x86_64)
  5.10.0-rc1  (Fedora rawhide x86_64)

= Known issues with this release

- There are known issues on kernel 5.10+ after adapting to set_fs()
  removal, with some memory accesses that previously returned valid data
  instead returning -EFAULT (see PR26811).

- An sdt probe cannot parse a parameter that uses a segment register.
  (PR13429)

- The presence of a line such as
  *CFLAGS += $(call cc-option, -fno-var-tracking-assignments)
  in the linux kernel Makefile unnecessarily reduces debuginfo quality,
  consider removing that line if you build kernels.

= Contributors for this release

Aaron Merey, Alice Zhang, Craig Ringer, Frank Ch. Eigler, Martin Cermak,
Sagar Patel, Sergei Trofimovich*, Serhei Makarov, Stan Cox, Sultan Alsawaf*,
Thorsten Glaser*, William Cohen, Yichun Zhang (agentzh)

Special thanks to new contributors, marked with '*' above.
Special thanks to Aaron Merey for drafting these notes.

= Bugs fixed for this release <https://sourceware.org/PR#>

10013 Support ENABLED sdt probe macro
12663 statement probes on inlined-function-call sites: search .debug_lin

Re: [PATCH 0/2] perf probe: Support debuginfod client

2020-09-17 Thread Frank Ch. Eigler
Hi -

> > > I need to support this in pahole...
> > 
> > pahole/dwarves use elfutils, so it already has automatic support.
> > https://sourceware.org/elfutils/Debuginfod.html
> 
> I'm still not sure that which interface of elfutils I should use
> for this "automatic" debuginfod support. Are there good documentation
> about it?

The libdwfl part of the elfutils API falls back to debuginfod lookups
internally, so e.g. systemtap had to do nothing to benefit.


> Since this series just for the kernel binary, I have to check we
> can do something on user-space binaries.

It should work identically & transparently.  If you're using one of
a few key packages of a few mainstream distros, the public debuginfod
server may already have the material available.


- FChE



Re: [PATCH 0/2] perf probe: Support debuginfod client

2020-09-16 Thread Frank Ch. Eigler
Hi -

> > Nice, even uses the source code fetching part of the webapi!
> 
> So, can I take that as an Acked-by or Reviewed-by? 

Sure.

> I need to support this in pahole...

pahole/dwarves use elfutils, so it already has automatic support.

https://sourceware.org/elfutils/Debuginfod.html

- FChE



Re: [PATCH 0/2] perf probe: Support debuginfod client

2020-09-16 Thread Frank Ch. Eigler
Hi -

Nice, even uses the source code fetching part of the webapi!

- FChE



Re: [PATCH v5 00/21] kprobes: Unify kretprobe trampoline handlers and make kretprobe lockless

2020-09-07 Thread Frank Ch. Eigler
Masami Hiramatsu  writes:

> Sorry, for noticing this point, I Cc'd to systemtap. Is systemtap taking
> care of spinlock too?

On PRREMPT_RT configurations, systemtap uses the raw_spinlock_t
types/functions, to keep its probe handlers as atomic as we can make them.

- FChE



Re: [PATCH v4 00/10] Function Granular KASLR

2020-08-03 Thread Frank Ch. Eigler
Hi -

> > We have relocated based on sections, not some subset of function
> > symbols accessible that way, partly because DWARF line- and DIE- based
> > probes can map to addresses some way away from function symbols, into
> > function interiors, or cloned/moved bits of optimized code.  It would
> > take some work to prove that function-symbol based heuristic
> > arithmetic would have just as much reach.
> 
> Interesting. Do you have an example handy? 

No, I'm afraid I don't have one that I know cannot possibly be
expressed by reference to a function symbol only.  I'd look at
systemtap (4.3) probe point lists like:

% stap -vL 'kernel.statement("*@kernel/*verif*.c:*")'
% stap -vL 'module("amdgpu").statement("*@*execution*.c:*")'

which give an impression of computed PC addresses.

> It seems like something like that would reference the enclosing
> section, which means we can't just leave them out of the sysfs
> list... (but if such things never happen in the function-sections,
> then we *can* remove them...)

I'm not sure we can easily prove they can never happen there.

- FChE



Re: [PATCH v4 00/10] Function Granular KASLR

2020-08-03 Thread Frank Ch. Eigler
Hi -

On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote:
> [...]
> > Systemtap needs to know base addresses of loaded text & data sections,
> > in order to perform relocation of probe point PCs and context data
> > addresses.  It uses /sys/module/, kind of under protest, because
> > there seems to exist no MODULE_EXPORT'd API to get at that information
> > some other way.
> 
> Wouldn't /proc/kallsysms entries cover this? I must be missing
> something...

We have relocated based on sections, not some subset of function
symbols accessible that way, partly because DWARF line- and DIE- based
probes can map to addresses some way away from function symbols, into
function interiors, or cloned/moved bits of optimized code.  It would
take some work to prove that function-symbol based heuristic
arithmetic would have just as much reach.

- FChE



Re: [PATCH v4 00/10] Function Granular KASLR

2020-08-03 Thread Frank Ch. Eigler
Hi -

> > While this does seem to be the right solution for the extant problem, I
> > do want to take a moment and ask if the function sections need to be
> > exposed at all? What tools use this information, and do they just want
> > to see the bounds of the code region? (i.e. the start/end of all the
> > .text* sections) Perhaps .text.* could be excluded from the sysfs
> > section list?

> [[cc += FChE, see [0] for Evgenii's full mail ]]

Thanks!

> It looks like debugging tools like systemtap [1], gdb [2] and its
> add-symbol-file cmd, etc. peek at the /sys/module//section/ info.
> But yeah, it would be preferable if we didn't export a long sysfs
> representation if nobody actually needs it.

Systemtap needs to know base addresses of loaded text & data sections,
in order to perform relocation of probe point PCs and context data
addresses.  It uses /sys/module/, kind of under protest, because
there seems to exist no MODULE_EXPORT'd API to get at that information
some other way.

- FChE



systemtap 4.3 release

2020-06-11 Thread Frank Ch. Eigler
The SystemTap team announces release 4.3

Enhancements to this release include: Userspace probes may be targeted
by buildid as an alternate to a path name, script functions may use
probe $context variables, stapbpf improvements including try-catch
statements, and error probes.

= Where to get it

  https://sourceware.org/systemtap/ - our project page
  https://sourceware.org/systemtap/ftp/releases/
  https://koji.fedoraproject.org/koji/packageinfo?packageID=615
  git tag release-4.3 (commit c9c23c987d)

  There have been over 120.31415 commits since the last release.
  There have been 27+ bugs fixed / features added since the last release.

= SystemTap frontend (stap) changes

- The target of process probes may be specified by hexadecimal buildid
  as an alternative to a path name.  This makes it possible to probe a
  variety of versions or aliases of a program, even if they are
  running inside containers under a different path name.  Works best
  with a debuginfod server that publishes the executables / debuginfo.
  The following probes glibc.so 2.32-2.fc32.x86_64 from fedora running
  anywhere on your machine.
  # export DEBUGINFOD_URLS=https://debuginfod.elfutils.org/
  # stap -e 'probe process("7ca24d4dc3de9d62d9ad6bb25e5b70a3e57a342f")
   .function("*system") { log("hi") }'

- Functions can now be context-sensitive, meaning that they may make
  references to $context variables and similar constructs that could
  formerly appear only inside probe handlers.  This is implemented by
  cloning such functions for each probe.  
  Only some probe point (dwarf-based user & kernel) types are supported.
  function foo () { println ($$vars) }
  probe kernel.function("do_exit") { foo() }
  probe process("/bin/ls").function("main") { foo() }
  probe process("/lib*/libc.so.6").mark("*") { foo() }

- The process(EXE).begin probe handlers are now always triggered for
  already-running target processes.

= SystemTap backend changes

- Almost all of the kmalloc() allocations exceeding 4KB have been
  replaced by vmalloc(). This helps stap's kernel runtime work
  properly on systems with serious fragmentation in physical memory
  address space.

- More $variable resolution errors may be generated, especially for
  @var("") constructs that target global variables.  These are
  duplicate-eliminated by default, but may be seen with verbosity>=2.

- The stapbpf backend now supports try-catch statements, an improved
  error tapset, and error probes.

- The "Build-id mismatch" condition now becomes a warning, so while
  related probes are not inserted, the rest of the script may run.

= SystemTap tapset changes

- Added a new tapset function dump_stack() which prints the current
  kernel backtrace to the kernel trace buffer (as a thin wrapper
  around the kernel C API function dump_stack).

- The proc_mem_rss() tapset function now includes the resident shared
  memory pages as expected. The old behavior can be restored by the
  --compatible=4.2 option on the command line.

- Modules compiled with guru mode for a particular kernel version can
  now only be loaded on kernels with exactly matching version
  (vermagic string) instead of any kernel whose API matches according
  to the modversions mechanism. Use -B CONFIG_MODVERSIONS=y to restore
  the prior behaviour.

= SystemTap sample scripts

- All 180+ examples can be found at https://sourceware.org/systemtap/examples/

- New sample scripts:
  security-band-aids/cve-2018-101.stp
  security-band-aids/cve-2018-6485
  Historical emergency security band-aid scripts for example purposes only

= Examples of tested kernel versions 

2.6.32 (RHEL6 x86_64)
3.10.0 (RHEL7 x86_64)
4.15.0 (Ubuntu 18.04 x86_64)
4.18.0 (RHEL8 x86_64, aarch64, ppc64le, s390x)
5.3.8  (Fedora 30 i686)
5.3.9  (Fedora 31 x86_64)
5.4.0  (Fedora 32 x86_64)
5.7.0  (Fedora 33 x86_64) 

= Known issues with this release

- A change to syscall wrappers has resulted in the loss of the ability
  to modify syscall parameters.  (PR26015)

- An sdt probe cannot parse a parameter that uses a segment register.
  (PR13429)

- The presence of a line such as
  *CFLAGS += $(call cc-option, -fno-var-tracking-assignments)
  in the linux kernel Makefile unnecessarily reduces debuginfo quality,
  consider removing that line if you build kernels.

= Contributors for this release

Aaron Merey, Alice Zhang*, Craig Ringer*, Frank Ch. Eigler, Frank
Sorenson*, HATAYAMA Daisuke*, Juri Lelli*, Sagar Patel, Serhei Makarov,
Siddhesh Poyarekar, William Cohen, Yichun Zhang (agentzh)

Special thanks to new contributors, marked with '*' above.

= Bugs fixed for this release <https://sourceware.org/PR#>

6834 stap-client should not use bash network redirections
10280 allow relaxing of `uname -r` matching runtime assertion ro ABI-compatible 
kernel series
11249 uprobes fails on glibc get-pc-thunk ca

Re: [PATCH 2/3] module: Fix up module_notifier return values.

2019-06-24 Thread Frank Ch. Eigler
Hi -

> > While auditing all module notifiers I noticed a whole bunch of fail
> > wrt the return value. Notifiers have a 'special' return semantics.

>From peterz's comments, the patches, it's not obvious to me how one is
to choose between 0 (NOTIFY_DONE) and 1 (NOTIFY_OK) in the case of a
routine success.

> [...]
> I have a similar erroneous module notifier return value pattern
> in lttng-modules as well. I'll go fix it right away. CCing
> Frank Eigler from SystemTAP which AFAIK use a copy of
> lttng-tracepoint.c in their project, which should be fixed
> as well. I'm pasting the lttng-modules fix below.

Sure, following suit.  Thanks.

- FChE


systemtap 4.0 release

2018-10-13 Thread Frank Ch. Eigler
ly of executables run on the system

cpu_throttle.stp   Monitor Intel processors for throttling
   due to power or thermal limits

syscallsbypid.stp  Provide a per-process syscall tally on the system

syscallerrorsbypid.stp Provide a per-process syscall error tally

syscalllatency.stp Provide a per-process accumulation of syscall latency

- New stap-exporter-scripts/ subdirectory in systemtap.examples.

- Numerous example script improvements and new samples galore:

gmalloc_watch.stp   Tracing glib2 memory allocations

ioctl_handler.stp   Monitor which executables use ioctl syscalls
and what kernel code is handling the ioctl

libguestfs_log.stp  Trace libguestfs startup

measureinterval.stp Measure intervals between events

php-trace.stp   Tracing of PHP code execution

stap_time.stp   Provide elapsed times for passes
of SystemTap script compilation

tcl-funtop.stp  Profile Tcl calls

tcl-trace.stp   Callgraph tracing of Tcl code

cve-2018-14634.stp  historical emergency security band-aid,
for reference/education only


= Examples of tested kernel versions

  2.6.32 (RHEL 6 x86_64, i686)
  3.10.0 (RHEL 7 x86_64)
  4.15.0 (Ubuntu 18.04 x86_64)
  4.16.13 (Fedora 28 x86_64)
  4.18.0 (Fedora x86_64)
  4.18.12 (Fedora 28 x86_64, arm64, ppc64)
  4.19-rc7 (Fedora Rawhide x86_64)


= Known issues with this release

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, dramatically reducing debuginfo quality, which can cause
  debuginfo failures. The simplest fix is to erase, excise, nay, eradicate
  this line from the top level linux Makefile:

  KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)


= Coming soon

- prometheus-exporter is here, more tasty systemtap & http chocolate en route


= Contributors for this release

Aaron Merey, David Smith, Frank Ch. Eigler, Jafeer Uddin, Martin Cermak,
Masanari Iida, *Paulo Andrade, Serhei Makarov, Stan Cox, Victor Kamensky,
William Cohen, Yichun Zhang (agentzh), *Zexuan Luo

Special thanks to new contributors, marked with '*' above.
Special thanks to Serhei Makarov for assembling these notes.


= Bugs fixed for this release <https://sourceware.org/PR#>

14690 the syscall tapsets could be written to prefer the 'syscalls' tracepoints
21888 bpf variants of log()/etc. functions
22310 build parser syntax for all the new staptree types
23160 4.17 breaks syscalls tapset
23284 dmesg should identify the name of the stap script
23356 server.exp test case hangs on rawhide
23359 impose security constraints on @kderef, @kregister
23407 bpf: backend should support strings as first class values
23480 bpfinterp.cxx should respond to ^C
23488 support CONFIG_DEBUG_INFO_REDUCED builds
23510 Tapset function println() not supported in the bpf runtime
23599 Use of usymname() with stap -u leads to kernel module compilation errors
23608 long stapregex overflows arc_priority
23666 Aggregate operations specified in foreach loop is not respected by the 
translator
23736 rawhide 4.19 kernel panic during tracepoint enumeration
23760 .statement() wildcard probes fail if any cu/srcfile lacks debug_line data
23766 staprun -R (default) fails for modules with short hardcoded -m names



systemtap 4.0 release

2018-10-13 Thread Frank Ch. Eigler
ly of executables run on the system

cpu_throttle.stp   Monitor Intel processors for throttling
   due to power or thermal limits

syscallsbypid.stp  Provide a per-process syscall tally on the system

syscallerrorsbypid.stp Provide a per-process syscall error tally

syscalllatency.stp Provide a per-process accumulation of syscall latency

- New stap-exporter-scripts/ subdirectory in systemtap.examples.

- Numerous example script improvements and new samples galore:

gmalloc_watch.stp   Tracing glib2 memory allocations

ioctl_handler.stp   Monitor which executables use ioctl syscalls
and what kernel code is handling the ioctl

libguestfs_log.stp  Trace libguestfs startup

measureinterval.stp Measure intervals between events

php-trace.stp   Tracing of PHP code execution

stap_time.stp   Provide elapsed times for passes
of SystemTap script compilation

tcl-funtop.stp  Profile Tcl calls

tcl-trace.stp   Callgraph tracing of Tcl code

cve-2018-14634.stp  historical emergency security band-aid,
for reference/education only


= Examples of tested kernel versions

  2.6.32 (RHEL 6 x86_64, i686)
  3.10.0 (RHEL 7 x86_64)
  4.15.0 (Ubuntu 18.04 x86_64)
  4.16.13 (Fedora 28 x86_64)
  4.18.0 (Fedora x86_64)
  4.18.12 (Fedora 28 x86_64, arm64, ppc64)
  4.19-rc7 (Fedora Rawhide x86_64)


= Known issues with this release

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, dramatically reducing debuginfo quality, which can cause
  debuginfo failures. The simplest fix is to erase, excise, nay, eradicate
  this line from the top level linux Makefile:

  KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)


= Coming soon

- prometheus-exporter is here, more tasty systemtap & http chocolate en route


= Contributors for this release

Aaron Merey, David Smith, Frank Ch. Eigler, Jafeer Uddin, Martin Cermak,
Masanari Iida, *Paulo Andrade, Serhei Makarov, Stan Cox, Victor Kamensky,
William Cohen, Yichun Zhang (agentzh), *Zexuan Luo

Special thanks to new contributors, marked with '*' above.
Special thanks to Serhei Makarov for assembling these notes.


= Bugs fixed for this release <https://sourceware.org/PR#>

14690 the syscall tapsets could be written to prefer the 'syscalls' tracepoints
21888 bpf variants of log()/etc. functions
22310 build parser syntax for all the new staptree types
23160 4.17 breaks syscalls tapset
23284 dmesg should identify the name of the stap script
23356 server.exp test case hangs on rawhide
23359 impose security constraints on @kderef, @kregister
23407 bpf: backend should support strings as first class values
23480 bpfinterp.cxx should respond to ^C
23488 support CONFIG_DEBUG_INFO_REDUCED builds
23510 Tapset function println() not supported in the bpf runtime
23599 Use of usymname() with stap -u leads to kernel module compilation errors
23608 long stapregex overflows arc_priority
23666 Aggregate operations specified in foreach loop is not respected by the 
translator
23736 rawhide 4.19 kernel panic during tracepoint enumeration
23760 .statement() wildcard probes fail if any cu/srcfile lacks debug_line data
23766 staprun -R (default) fails for modules with short hardcoded -m names



Re: Code of Conduct: Let's revamp it.

2018-09-21 Thread Frank Ch. Eigler
Rik van Riel  writes:

> [...]  The goal of the code of conduct is to make the community
> welcoming, and to help people with being a part of the Linux
> community.  [...]

That may well be the goal.  But the proper way to evaluate policy is not
the laudability of its goals but its forseeable and/or actual effects.
Is there any plan to evaluate the CoC empirically somehow to see if it
accomplishes what its proponents hope?

- FChE


Re: Code of Conduct: Let's revamp it.

2018-09-21 Thread Frank Ch. Eigler
Rik van Riel  writes:

> [...]  The goal of the code of conduct is to make the community
> welcoming, and to help people with being a part of the Linux
> community.  [...]

That may well be the goal.  But the proper way to evaluate policy is not
the laudability of its goals but its forseeable and/or actual effects.
Is there any plan to evaluate the CoC empirically somehow to see if it
accomplishes what its proponents hope?

- FChE


systemtap 3.3 release

2018-06-08 Thread Frank Ch. Eigler
The SystemTap team announces release 3.3!

  eBPF backend extensions, easier access to examples, adapting to
  meltdown/spectre complications, real-time / high-cpu-count
  concurrency fixes


= Where to get it

  https://sourceware.org/systemtap/ - our project page
  https://sourceware.org/systemtap/ftp/releases/systemtap-3.3.tar.gz
  https://koji.fedoraproject.org/koji/packageinfo?packageID=615
  git tag release-3.3 (commit 48867d1cface944)

  There have been over 237 commits since the last release.
  There have been over 19 bugs fixed / features added since the last release.


= How to build it

  See the README and NEWS files at
  https://sourceware.org/git/?p=systemtap.git;a=tree

  Further information at https://sourceware.org/systemtap/wiki/


= SystemTap frontend (stap) changes

- The "stap --sysroot /PATH" option has received a revamp, so it
  works much better against cross-compiled environments.

- A new "stap --example FOO.stp" mode searches the example scripts
  distributed with systemtap for a file named FOO.stp, so its whole
  path does not need to be typed in.


= SystemTap backend changes

- The eBPF backend now supports uprobes, perf counter, timer, and
  tracepoint probes.

- The eBPF backend has learned to perform loops - at least in the
  userspace "begin/end" probe contexts, so one can iterate across BPF
  arrays for reporting.  (The linux kernel eBPF interpreter precludes
  loops and string processing.)  It can also handle much larger probe
  handler bodies, with a smarter register spiller/allocator.

- Systemtap's runtime has learned to deal with some of the collateral
  damage from kernel hardening after meltdown/spectre, including more
  pointer hiding and relocation.  The kptr_restrict procfs flag is
  forced on if running on a new enough kernel.

- Several low level locking-related fixes were added to the runtime
  that used uprobes/tracepoint apis, in order to work more reliably on
  real-time kernels and on high-cpu-count machines.


= SystemTap tapset changes

- Runtime/tapsets were ported to include up to kernel version 4.17.
  (The syscall tapsets are broken on kernel 4.17-rc, and will be fixed
  in a next release coming soon; PR23160.)

- Some MIPS support has been added.


= SystemTap sample scripts

 All 178 examples can be found at https://sourceware.org/systemtap/examples/

- io_submit.stp has been optimized for larger systems

- new example capture_ssl_master_secrets.stp is just as naughty as it sounds


= Examples of tested kernel versions

  2.6.32 (RHEL 6 x86 and x86_64)
  3.10.0 (RHEL 7 x86_64)
  4.16.5 (Fedora 27 x86_64)
  4.18-rc0   (Fedora rawhide x86_64)


= Known issues with this release

- The syscall tapset is broken for kernels >= 4.17.  Use the
  kernel.trace("sys_enter") probe until we get this fixed. (PR23160)

- Some post-meltdown/spectre kernel versions have broken uprobes
  (resulting in SIGILL in userspace programs) and kernel tracepoints.
  Kernel fixes are underway. (RHBZ1579521)

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, dramatically reducing debuginfo quality, which can cause
  debuginfo failures. The simplest fix is to erase, excise, nay, eradicate
  this line from the top level linux Makefile:
  
  KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)


= Coming soon

- http and systemtap coming together, like peanut butter and chocolate


= Contributors for this release

  Aaron Merey, *Aryeh Weinreb, *Bernhard Wiedemann, David Smith, Frank
  Ch. Eigler, *Gustavo Moreira, *Igor Gnatenko, *Iryna Shcherbina, *Jafeer
  Uddin, Jeff Moyer, *Lukas Herbolt, Mark Wielaard, Martin Cermak, *Petr
  Viktorin, Serhei Makarov, Stan Cox, Stefan Hajnoczi, Timo Juhani
  Lindfors, Victor Kamensky

  Special thanks to new contributors, marked with '*' above.


= Bugs fixed for this release <https://sourceware.org/PR#>

  21107   a few more access_ok tweaks needed  
  21890   bpf uprobes support   
  22004   dyninst does not handle R_*_IRELATIV in .rela.plt   
  22141   The RPM specfile needs an update handling the bpf bits  
  22248   failure processing linux-vdso64.so.1
  22311   bpf: drop the copy of the bpf map logic & snapshot-based pre-post 
begin {} synch
  22313   bpf: exit-state checking prologue  
  22314   bpf: add support for uprobes, uretprobe and tracepoint events   
  22323   bpf: format string tags appearing in output when wildcards are used   
  
  22327   the loadavg tapset no longer works on recent kernels   
  22328   bpf: add timer probes   
  22462   quoted include path  
  22536   Add shorthand option --bpf for --runtime=bpf
  22551   on rawhide, we're getting a compile error that init_timer() doesn't 
exist   
  22695   "

systemtap 3.3 release

2018-06-08 Thread Frank Ch. Eigler
The SystemTap team announces release 3.3!

  eBPF backend extensions, easier access to examples, adapting to
  meltdown/spectre complications, real-time / high-cpu-count
  concurrency fixes


= Where to get it

  https://sourceware.org/systemtap/ - our project page
  https://sourceware.org/systemtap/ftp/releases/systemtap-3.3.tar.gz
  https://koji.fedoraproject.org/koji/packageinfo?packageID=615
  git tag release-3.3 (commit 48867d1cface944)

  There have been over 237 commits since the last release.
  There have been over 19 bugs fixed / features added since the last release.


= How to build it

  See the README and NEWS files at
  https://sourceware.org/git/?p=systemtap.git;a=tree

  Further information at https://sourceware.org/systemtap/wiki/


= SystemTap frontend (stap) changes

- The "stap --sysroot /PATH" option has received a revamp, so it
  works much better against cross-compiled environments.

- A new "stap --example FOO.stp" mode searches the example scripts
  distributed with systemtap for a file named FOO.stp, so its whole
  path does not need to be typed in.


= SystemTap backend changes

- The eBPF backend now supports uprobes, perf counter, timer, and
  tracepoint probes.

- The eBPF backend has learned to perform loops - at least in the
  userspace "begin/end" probe contexts, so one can iterate across BPF
  arrays for reporting.  (The linux kernel eBPF interpreter precludes
  loops and string processing.)  It can also handle much larger probe
  handler bodies, with a smarter register spiller/allocator.

- Systemtap's runtime has learned to deal with some of the collateral
  damage from kernel hardening after meltdown/spectre, including more
  pointer hiding and relocation.  The kptr_restrict procfs flag is
  forced on if running on a new enough kernel.

- Several low level locking-related fixes were added to the runtime
  that used uprobes/tracepoint apis, in order to work more reliably on
  real-time kernels and on high-cpu-count machines.


= SystemTap tapset changes

- Runtime/tapsets were ported to include up to kernel version 4.17.
  (The syscall tapsets are broken on kernel 4.17-rc, and will be fixed
  in a next release coming soon; PR23160.)

- Some MIPS support has been added.


= SystemTap sample scripts

 All 178 examples can be found at https://sourceware.org/systemtap/examples/

- io_submit.stp has been optimized for larger systems

- new example capture_ssl_master_secrets.stp is just as naughty as it sounds


= Examples of tested kernel versions

  2.6.32 (RHEL 6 x86 and x86_64)
  3.10.0 (RHEL 7 x86_64)
  4.16.5 (Fedora 27 x86_64)
  4.18-rc0   (Fedora rawhide x86_64)


= Known issues with this release

- The syscall tapset is broken for kernels >= 4.17.  Use the
  kernel.trace("sys_enter") probe until we get this fixed. (PR23160)

- Some post-meltdown/spectre kernel versions have broken uprobes
  (resulting in SIGILL in userspace programs) and kernel tracepoints.
  Kernel fixes are underway. (RHBZ1579521)

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, dramatically reducing debuginfo quality, which can cause
  debuginfo failures. The simplest fix is to erase, excise, nay, eradicate
  this line from the top level linux Makefile:
  
  KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)


= Coming soon

- http and systemtap coming together, like peanut butter and chocolate


= Contributors for this release

  Aaron Merey, *Aryeh Weinreb, *Bernhard Wiedemann, David Smith, Frank
  Ch. Eigler, *Gustavo Moreira, *Igor Gnatenko, *Iryna Shcherbina, *Jafeer
  Uddin, Jeff Moyer, *Lukas Herbolt, Mark Wielaard, Martin Cermak, *Petr
  Viktorin, Serhei Makarov, Stan Cox, Stefan Hajnoczi, Timo Juhani
  Lindfors, Victor Kamensky

  Special thanks to new contributors, marked with '*' above.


= Bugs fixed for this release <https://sourceware.org/PR#>

  21107   a few more access_ok tweaks needed  
  21890   bpf uprobes support   
  22004   dyninst does not handle R_*_IRELATIV in .rela.plt   
  22141   The RPM specfile needs an update handling the bpf bits  
  22248   failure processing linux-vdso64.so.1
  22311   bpf: drop the copy of the bpf map logic & snapshot-based pre-post 
begin {} synch
  22313   bpf: exit-state checking prologue  
  22314   bpf: add support for uprobes, uretprobe and tracepoint events   
  22323   bpf: format string tags appearing in output when wildcards are used   
  
  22327   the loadavg tapset no longer works on recent kernels   
  22328   bpf: add timer probes   
  22462   quoted include path  
  22536   Add shorthand option --bpf for --runtime=bpf
  22551   on rawhide, we're getting a compile error that init_timer() doesn't 
exist   
  22695   "

Re: [RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation

2017-03-29 Thread Frank Ch. Eigler

mhiramat wrote:

> Here is a correction of patches to introduce kretprobe_instance
> dynamic allocation for avoiding kretprobe silently miss-hits.
> [...]

Thanks, this looks automatically useful also to systemtap users.

- FChE


Re: [RFC PATCH tip/master 0/3] kprobes: tracing: kretprobe_instance dynamic allocation

2017-03-29 Thread Frank Ch. Eigler

mhiramat wrote:

> Here is a correction of patches to introduce kretprobe_instance
> dynamic allocation for avoiding kretprobe silently miss-hits.
> [...]

Thanks, this looks automatically useful also to systemtap users.

- FChE


Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support

2017-02-09 Thread Frank Ch. Eigler

Hi, Tom -


tom.zanussi wrote:

> [...]
>> Hmm, this looks a bit hard to understand, I guess that onmatch() means
>> "if there is an event which has ts0 variable and the event's key matches
>> this key, take some action".
>
> Yes, that's pretty much it. It's essentially shorthand for this kind of
> common idiom, where timestamp[] is an associative array, which in our
> case is the tracing_map of the histogram: 
>
> event sched_wakeup()
> {
>   ts0[wakeup_pid] = now()
> }
> event sched_switch()
> {
>   if (ts0[next_pid])
>   latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> }

By the way, here is a working systemtap version of this demo:

# cat foo.stp
global ts0%, latency%
function now() { return gettimeofday_us() }

probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() }

probe kernel.trace("sched_switch") {
   if (ts0[$next->pid])
  latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid];
}

probe timer.s(5) {
   foreach ([pid+,x] in latency) {
  println("pid:", pid, " prio:", x)
  print(@hist_log(latency[pid,x]))
   }
   delete latency
}


# stap foo.stp
[...]
pid:20183 prio:109
value |-- count
2 |   0
4 |   0
8 |@  1
   16 |   0
   32 |   0

pid:29095 prio:120
value |-- count
0 |1
1 |8
2 |@@ 76
4 |@@ 60
8 |@@ 68
   16 |   16
   32 |0
   64 |0
[...]




> ts0 is basically a per-table-entry variable - there's one for each
> entry in the table, and it can only be accessed by events with
> matching keys.  [...]  So, that's a long-winded way of saying that the
> name ts0 is global across all tables (histograms) but an instance of
> ts0 is local to each entry in the table that owns the name.

In systemtap, one of the things we take care of is automatic concurrency
control over such shared variables.  Even if many CPUs run these same
functions and try to access the same ts0/latency hash tables at the same
time, things will work correctly.  I'm curious how your code deals with
this.


- FChE


Re: [RFC][PATCH 00/21] tracing: Inter-event (e.g. latency) support

2017-02-09 Thread Frank Ch. Eigler

Hi, Tom -


tom.zanussi wrote:

> [...]
>> Hmm, this looks a bit hard to understand, I guess that onmatch() means
>> "if there is an event which has ts0 variable and the event's key matches
>> this key, take some action".
>
> Yes, that's pretty much it. It's essentially shorthand for this kind of
> common idiom, where timestamp[] is an associative array, which in our
> case is the tracing_map of the histogram: 
>
> event sched_wakeup()
> {
>   ts0[wakeup_pid] = now()
> }
> event sched_switch()
> {
>   if (ts0[next_pid])
>   latency = now() - ts0[next_pid] /* next_pid == wakeup_pid */
> }

By the way, here is a working systemtap version of this demo:

# cat foo.stp
global ts0%, latency%
function now() { return gettimeofday_us() }

probe kernel.trace("sched_wakeup") { ts0[$p->pid] = now() }

probe kernel.trace("sched_switch") {
   if (ts0[$next->pid])
  latency[$next->pid,$next->prio] <<< now() - ts0[$next->pid];
}

probe timer.s(5) {
   foreach ([pid+,x] in latency) {
  println("pid:", pid, " prio:", x)
  print(@hist_log(latency[pid,x]))
   }
   delete latency
}


# stap foo.stp
[...]
pid:20183 prio:109
value |-- count
2 |   0
4 |   0
8 |@  1
   16 |   0
   32 |   0

pid:29095 prio:120
value |-- count
0 |1
1 |8
2 |@@ 76
4 |@@ 60
8 |@@ 68
   16 |   16
   32 |0
   64 |0
[...]




> ts0 is basically a per-table-entry variable - there's one for each
> entry in the table, and it can only be accessed by events with
> matching keys.  [...]  So, that's a long-winded way of saying that the
> name ts0 is global across all tables (histograms) but an instance of
> ts0 is local to each entry in the table that owns the name.

In systemtap, one of the things we take care of is automatic concurrency
control over such shared variables.  Even if many CPUs run these same
functions and try to access the same ts0/latency hash tables at the same
time, things will work correctly.  I'm curious how your code deals with
this.


- FChE


Re: [RFC][PATCH] x86: Verify access_ok() context

2017-01-19 Thread Frank Ch. Eigler
Hi, Thomas -

> Well, if you are not in thread context then the check is pointless:
>   __range_not_ok(addr, size, user_addr_max())
> and:
> #define user_addr_max() (current->thread.addr_limit.seg)
> 
> So what guarantees when you are not in context of current, i.e. in thread
> context, that the addr/size which is checked against the limits of current
> actually belongs to current?

We're probably in task context in that there is a valid current(), but
running with preemption and/or interrupts and/or pagefaults disabled
at that point, so in_task() objects.  Think of it like from a kprobes
handler callback, except maybe more temporary preemption blocking.


> I assume this is about systemtap modules. Can you please explain
> what you are trying to achieve? I guess you know that you actually
> access current, but then we need a seperate special function and not
> relaxing of the checks.

This part is used in a part of the runtime that is a userspace
analogue of probe_kernel_address(), where we're given a potential
userspace address.  We would like to quickly test whether it's even
plausible as a userspace address, before doing a (pagefault-disabled)
trial fetch/store to it.


- FChE


Re: [RFC][PATCH] x86: Verify access_ok() context

2017-01-19 Thread Frank Ch. Eigler
Hi, Thomas -

> Well, if you are not in thread context then the check is pointless:
>   __range_not_ok(addr, size, user_addr_max())
> and:
> #define user_addr_max() (current->thread.addr_limit.seg)
> 
> So what guarantees when you are not in context of current, i.e. in thread
> context, that the addr/size which is checked against the limits of current
> actually belongs to current?

We're probably in task context in that there is a valid current(), but
running with preemption and/or interrupts and/or pagefaults disabled
at that point, so in_task() objects.  Think of it like from a kprobes
handler callback, except maybe more temporary preemption blocking.


> I assume this is about systemtap modules. Can you please explain
> what you are trying to achieve? I guess you know that you actually
> access current, but then we need a seperate special function and not
> relaxing of the checks.

This part is used in a part of the runtime that is a userspace
analogue of probe_kernel_address(), where we're given a potential
userspace address.  We would like to quickly test whether it's even
plausible as a userspace address, before doing a (pagefault-disabled)
trial fetch/store to it.


- FChE


Re: [RFC][PATCH] x86: Verify access_ok() context

2017-01-19 Thread Frank Ch. Eigler
Hi, Thomas -

On Thu, Jan 19, 2017 at 07:12:48PM +0100, Thomas Gleixner wrote:
> [...]
> It does matter very much, because the fact that the warning triggers tells
> me that it's placed in code which is NOT executed in task context.
> [...]
> We are not papering over problems.

Understood.  We were interpreting the comments around access_ok to
mean that the underlying hazard condition was different (stricter)
than in_task().  If the warning could be made to match that hazard
condition more precisely, then safe but non-in_task() callers can use
access_ok() without the warning.

- FChE


Re: [RFC][PATCH] x86: Verify access_ok() context

2017-01-19 Thread Frank Ch. Eigler
Hi, Thomas -

On Thu, Jan 19, 2017 at 07:12:48PM +0100, Thomas Gleixner wrote:
> [...]
> It does matter very much, because the fact that the warning triggers tells
> me that it's placed in code which is NOT executed in task context.
> [...]
> We are not papering over problems.

Understood.  We were interpreting the comments around access_ok to
mean that the underlying hazard condition was different (stricter)
than in_task().  If the warning could be made to match that hazard
condition more precisely, then safe but non-in_task() callers can use
access_ok() without the warning.

- FChE


Re: BPF runtime for systemtap

2016-06-14 Thread Frank Ch. Eigler

brendan.d.gregg wrote:

> [...]
> Great! Is there a hello world example in there somewhere? I found this:
> [...]

Yup.  Here is a smoke test.  (A great many other things are not yet
working.)

% sudo ./stap  -v  --runtime=bpf -e 'global foo
probe kprobe.function("vfs_read"), kprobe.function("do_select") { foo++ } 
probe begin { printf("systemtap starting probe\n") }
probe end { printf("systemtap ending probe\n"); printf("foo = %d\n", foo) }'

Pass 1: parsed user script and 35 library scripts using 
198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/71real ms.
Pass 2: analyzed script: 4 probes, 0 functions, 0 embeds, 1 global using 
198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/0real ms.
Pass 4: compiled BPF into "stap_32349.bo" in 0usr/0sys/0real ms.
Pass 5: starting run.
systemtap starting probe
^Csystemtap ending probe
foo = 108812
Pass 5: run completed in 0usr/10sys/2525real ms.


Re: BPF runtime for systemtap

2016-06-14 Thread Frank Ch. Eigler

brendan.d.gregg wrote:

> [...]
> Great! Is there a hello world example in there somewhere? I found this:
> [...]

Yup.  Here is a smoke test.  (A great many other things are not yet
working.)

% sudo ./stap  -v  --runtime=bpf -e 'global foo
probe kprobe.function("vfs_read"), kprobe.function("do_select") { foo++ } 
probe begin { printf("systemtap starting probe\n") }
probe end { printf("systemtap ending probe\n"); printf("foo = %d\n", foo) }'

Pass 1: parsed user script and 35 library scripts using 
198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/71real ms.
Pass 2: analyzed script: 4 probes, 0 functions, 0 embeds, 1 global using 
198460virt/15804res/6416shr/9208data kb, in 0usr/0sys/0real ms.
Pass 4: compiled BPF into "stap_32349.bo" in 0usr/0sys/0real ms.
Pass 5: starting run.
systemtap starting probe
^Csystemtap ending probe
foo = 108812
Pass 5: run completed in 0usr/10sys/2525real ms.


systemtap 3.0 release

2016-03-27 Thread Frank Ch. Eigler
uot;nfs")}.function("nfs*")!
  => kernel.function("nfs*")!, module("nfs").function("nfs*")!

- Profiling timers at arbitrary frequencies are now provided and perf probes
  now support a frequency field as an alternative to sampling counts.

  probe timer.profile.freq.hz(N)
  probe perf.type(N).config(M).hz(X)

  The specified frequency is only accurate up to around 100hz. You may
  need to provide a higher value to achieve the desired rate.

- Added support for private global variables and private functions. The 
'private'
  keyword limits these to the tapset file they are defined in.


= SystemTap tapset changes

  ansi.stp  Functions ansi_set_color{2,3} are replaced by
overloaded ansi_set_color
  linux/[arm/]aux_syscalls.stp  Support for arm kernels less than 3.7.
  linux/arm/[nd_]syscalls.stp   Support for [nd_]syscall.execve for arm
kernels less than 3.7.
  linux/aux_syscalls.stpNew _stp_mlock2_str function to convert mlock2
syscall flags to a string.
  linux/context.stp New module_size() function.
  linux/conversions.stp - New kernel_string_quoted_utf[16|32] functions
combines @string_quoted and @kernel_string_utf*
- kernel_string* functions with alternative 
error
strings are replaced by overloaded variants
  linux/nd_syscalls.stp Add nd_syscall.mlock2 kprobe based probe point
  linux/perf.stpUpdate recent uapi/linux/perf_event.h bits
  linux/proc_mem.stpproc_mem_*_pid functions are replaced by 
overloaded
proc_mem_*
  linux/syscalls.stpAdd syscall.mlock2 kernel function probe point
  linux/task.stpNew task_cwd_path and task_exe_file functions
  linux/task_time.stp   task_{s,u}time_tid functions are replaced by
overloaded task_{s,u}time
  linux/uconversions.stpuser_string functions with alternative warning 
strings
are replaced by overloaded variants
  logging.stp   New overloaded variant of assert, 
assert(expression)
  print_stats.stpm  @prints* macros for printing stats
  timers.stpProbe point timer.profile.freq for profiling
  try_assign.stpm   New @try_assign macro
  uconversions.stp  Add user_string_quoted_utf[16|32] function that
quotes a given UTF-[16|32] string from a given
user address

- Internal tapset functions and global variables are marked as private,
  where possible.

- Some tapsets have been modified to make use of the new function overloading
  feature. Instead of having new function names with suffixes such as "2" or
  "pid" to indicate extra arguments, the functions now seem to have optional
  arguments.

- New tapset function string_quoted() to quote and \-escape general strings.
  String $context variables that are pretty-printed are now processed with
  such a quotation engine, falling back to a 0x%x (hex pointer) on errors

- Functions get_mmap_args() and get_32mmap_args() got deprecated.


= SystemTap sample scripts - now at 156 samples!

  who_sent_it.stp   Trace outgoing network packets using the netfilter probes,
printing the source thread name/id and destination host:port

- New to collection is a selection of security band aids for specific CVEs. They
  are historical emergency security band-aids, and are for reference/education
  only. The scripts can be found under the security-band-aids folder in the
  examples directory.

- A number of samples were tweaked for portability and demonstration of newer
  language/tapset facilities.


= Examples of tested kernel versions

  2.6.18 (RHEL 5 x86 and x86_64)
  2.6.32 (RHEL 6 x86 and x86_64)
  3.10.0 (RHEL 7 x86_64)
  4.1.6  (Fedora 22 x86_64)
  4.3.4  (Fedora 22 x86_64)
  4.6.0-rc0  (Fedora rawhide x86_64)


= Known issues with this release

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- The dyninst backend is still very much a prototype, with a number of
  issues, limitations, and general teething woes.  See dyninst/README
  and the systemtap/dyninst Bugzilla component (
  http://tinyurl. com/stapdyn-PR-list ) if you want all the gory
  details about the state of the feature.

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, reducing debuginfo quality which can cause debuginfo failures.
  A proposed workaround to this issue exists in:
  https://lkml.org/lkml/2014/11/21/505 . Fedora kernels are not affected by
  this issue.


= Contributors for this release

  Abegail Jakop, David S

systemtap 3.0 release

2016-03-27 Thread Frank Ch. Eigler
uot;nfs")}.function("nfs*")!
  => kernel.function("nfs*")!, module("nfs").function("nfs*")!

- Profiling timers at arbitrary frequencies are now provided and perf probes
  now support a frequency field as an alternative to sampling counts.

  probe timer.profile.freq.hz(N)
  probe perf.type(N).config(M).hz(X)

  The specified frequency is only accurate up to around 100hz. You may
  need to provide a higher value to achieve the desired rate.

- Added support for private global variables and private functions. The 
'private'
  keyword limits these to the tapset file they are defined in.


= SystemTap tapset changes

  ansi.stp  Functions ansi_set_color{2,3} are replaced by
overloaded ansi_set_color
  linux/[arm/]aux_syscalls.stp  Support for arm kernels less than 3.7.
  linux/arm/[nd_]syscalls.stp   Support for [nd_]syscall.execve for arm
kernels less than 3.7.
  linux/aux_syscalls.stpNew _stp_mlock2_str function to convert mlock2
syscall flags to a string.
  linux/context.stp New module_size() function.
  linux/conversions.stp - New kernel_string_quoted_utf[16|32] functions
combines @string_quoted and @kernel_string_utf*
- kernel_string* functions with alternative 
error
strings are replaced by overloaded variants
  linux/nd_syscalls.stp Add nd_syscall.mlock2 kprobe based probe point
  linux/perf.stpUpdate recent uapi/linux/perf_event.h bits
  linux/proc_mem.stpproc_mem_*_pid functions are replaced by 
overloaded
proc_mem_*
  linux/syscalls.stpAdd syscall.mlock2 kernel function probe point
  linux/task.stpNew task_cwd_path and task_exe_file functions
  linux/task_time.stp   task_{s,u}time_tid functions are replaced by
overloaded task_{s,u}time
  linux/uconversions.stpuser_string functions with alternative warning 
strings
are replaced by overloaded variants
  logging.stp   New overloaded variant of assert, 
assert(expression)
  print_stats.stpm  @prints* macros for printing stats
  timers.stpProbe point timer.profile.freq for profiling
  try_assign.stpm   New @try_assign macro
  uconversions.stp  Add user_string_quoted_utf[16|32] function that
quotes a given UTF-[16|32] string from a given
user address

- Internal tapset functions and global variables are marked as private,
  where possible.

- Some tapsets have been modified to make use of the new function overloading
  feature. Instead of having new function names with suffixes such as "2" or
  "pid" to indicate extra arguments, the functions now seem to have optional
  arguments.

- New tapset function string_quoted() to quote and \-escape general strings.
  String $context variables that are pretty-printed are now processed with
  such a quotation engine, falling back to a 0x%x (hex pointer) on errors

- Functions get_mmap_args() and get_32mmap_args() got deprecated.


= SystemTap sample scripts - now at 156 samples!

  who_sent_it.stp   Trace outgoing network packets using the netfilter probes,
printing the source thread name/id and destination host:port

- New to collection is a selection of security band aids for specific CVEs. They
  are historical emergency security band-aids, and are for reference/education
  only. The scripts can be found under the security-band-aids folder in the
  examples directory.

- A number of samples were tweaked for portability and demonstration of newer
  language/tapset facilities.


= Examples of tested kernel versions

  2.6.18 (RHEL 5 x86 and x86_64)
  2.6.32 (RHEL 6 x86 and x86_64)
  3.10.0 (RHEL 7 x86_64)
  4.1.6  (Fedora 22 x86_64)
  4.3.4  (Fedora 22 x86_64)
  4.6.0-rc0  (Fedora rawhide x86_64)


= Known issues with this release

- Some kernel crashes continue to be reported when a script probes
  broad kernel function wildcards.  (PR2725)

- The dyninst backend is still very much a prototype, with a number of
  issues, limitations, and general teething woes.  See dyninst/README
  and the systemtap/dyninst Bugzilla component (
  http://tinyurl. com/stapdyn-PR-list ) if you want all the gory
  details about the state of the feature.

- An upstream kernel commit #2062afb4f804a put "-fno-var-tracking-assignments"
  into KCFLAGS, reducing debuginfo quality which can cause debuginfo failures.
  A proposed workaround to this issue exists in:
  https://lkml.org/lkml/2014/11/21/505 . Fedora kernels are not affected by
  this issue.


= Contributors for this release

  Abegail Jakop, David S

systemtap 2.9 release

2015-10-08 Thread Frank Ch. Eigler
ing is truncated on older kernels, such
  as 2.6.32 (PR15757)

- The dyninst backend is still very much a prototype, with a number
  of issues, limitations, and general teething woes. For instance:
  + lack of support for multiarch/cross-instrumentation
  + tapset functions are still incomplete relative to what is supported
when the kernel backend is active
  + exception handling becomes completely broken in programs
instrumented by the current version of dyninst (PR14702)
  + not all registers are made available on 32-bit x86 (PR15136)

  See dyninst/README and the systemtap/dyninst Bugzilla component
  (http://tinyurl.com/stapdyn-PR-list) if you want all the gory 
  details about the state of the feature.


= Contributors for this release

  Abegail Jakop, David Smith, Felix Lu, Frank Ch. Eigler,
  Ivan Diorditsa*, Jose Castillo*, Josh Stone, Lukas Berk,
  Mark Wielaard, Martin Cermak, Mikhail Kulemin*, Nicolas Brito*
  Snehal Phule*

  Special thanks to new contributors, marked with '*' above.
  Special thanks to Felix Lu for compiling these notes.


= Bugs fixed for this release <https://sourceware.org/PR#>

  909   perf counter events, perfmon? kernel API
  2111  document syscalls tapset
  10487 flight recorder control from script
  10977 Getting the address size used in a module
  11263 exposing foo32 syscalls
  12151 support /* stable */ embedded-c pragma
  13664 support dwarf types for stap variables
  15972 core dump with process probes
  16493 Improve bkl.stp to add backtrace
  16968 bad formatting in many help pages for probes
  17831 kprobes_onthefly.exp fails on powerpc
  17893 el6: cannot stat `build/en-US/pdf/*SystemTap_Beginners_Guide*.pdf': No 
such file or directory
  17920 File descriptor to pathname function
  17921 kernel backtrace missing /proc/kallsyms symbols
  18455 const_folder::visit_binary_expression hurting type inference
  18462 macro deprecation
  18503 procfs .maxsize() overflow should generate error
  18555 ppc64le: can't probe demangled C++ function names
  18562 the listing_mode.exp test case has lots of errors on systems without 
uprobes
  18563 on ppc64, the mbrwatch.stp example script fails when tested
  18571 Tapset support and test coverage for bpf and seccomp syscalls.
  18577 on rhel7, listing_mode_sanity.exp always gets a failure when doing 
'stap -l **'
  18597 long_arg() doesn't correctly handle negative values in 32-on-64 
environment
  18598 stap_staticmarkers.stp tapset has no test case
  18630 dwarfless parameters from a uprobe need test coverage
  18634 on rawhide, using timer probes gets a compilation error
  18649 int_arg() misbehaves on x86[_64] for 32-bit uprobe in binary having 
debuginfo
  18650 powerpc variant of longlong_arg() for uprobes swaps the high and low 
half of its 64bit retval
  18651 Possible nd_syscall tapset cleanup based on PR18597 fix
  18711 Pass 4 failure on RHEL7 for examples netfilter_summary and 
netfilter_drop
  18751 support a STAP_PRINTF() macro for use in embedded-C functions
  18769 [ppc64BE/--dyninst] unknown operator @__compat_task
  18827 consistency check for syscall and nd_syscall tapset
  18856 nfsd.close probe alias fails on rawhide
  18885 Use /* unmodified-fnargs */ in tapsets
  18889 lost ability to probe kernel module initializers
  18936 script cache will fail if $jiffies is referenced
  18942 any script will include all the globals from tapset/argv.stp
  18944 the ioblock.stp tapset fails to compile on RHEL7
  18971 process_by_pid.exp issues
  18999 error("") stall (causing similar assert() stall)
  19000 several task tapset functions can cause kernel crash
  19021 the tapset function task_dentry_path() should handle more than just 
files
  19043 __bio_ino(), __rqstp_gid() and __rqstp_uid() can crash the kernel
  19045 kernel_string_quoted() can crash the kernel
  19057 _is_reset() can crash the rhel6 / s390 kernel
  19065 task_fd_lookup() can crash the s390x kernel when invoked with an 
invalid input
  19069 task_euid() doesn't compile on aarch64
  19070 Call to __ustack_raw(0) causes 'Unknown symbol in module' on rhel6-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


systemtap 2.9 release

2015-10-08 Thread Frank Ch. Eigler
ing is truncated on older kernels, such
  as 2.6.32 (PR15757)

- The dyninst backend is still very much a prototype, with a number
  of issues, limitations, and general teething woes. For instance:
  + lack of support for multiarch/cross-instrumentation
  + tapset functions are still incomplete relative to what is supported
when the kernel backend is active
  + exception handling becomes completely broken in programs
instrumented by the current version of dyninst (PR14702)
  + not all registers are made available on 32-bit x86 (PR15136)

  See dyninst/README and the systemtap/dyninst Bugzilla component
  (http://tinyurl.com/stapdyn-PR-list) if you want all the gory 
  details about the state of the feature.


= Contributors for this release

  Abegail Jakop, David Smith, Felix Lu, Frank Ch. Eigler,
  Ivan Diorditsa*, Jose Castillo*, Josh Stone, Lukas Berk,
  Mark Wielaard, Martin Cermak, Mikhail Kulemin*, Nicolas Brito*
  Snehal Phule*

  Special thanks to new contributors, marked with '*' above.
  Special thanks to Felix Lu for compiling these notes.


= Bugs fixed for this release <https://sourceware.org/PR#>

  909   perf counter events, perfmon? kernel API
  2111  document syscalls tapset
  10487 flight recorder control from script
  10977 Getting the address size used in a module
  11263 exposing foo32 syscalls
  12151 support /* stable */ embedded-c pragma
  13664 support dwarf types for stap variables
  15972 core dump with process probes
  16493 Improve bkl.stp to add backtrace
  16968 bad formatting in many help pages for probes
  17831 kprobes_onthefly.exp fails on powerpc
  17893 el6: cannot stat `build/en-US/pdf/*SystemTap_Beginners_Guide*.pdf': No 
such file or directory
  17920 File descriptor to pathname function
  17921 kernel backtrace missing /proc/kallsyms symbols
  18455 const_folder::visit_binary_expression hurting type inference
  18462 macro deprecation
  18503 procfs .maxsize() overflow should generate error
  18555 ppc64le: can't probe demangled C++ function names
  18562 the listing_mode.exp test case has lots of errors on systems without 
uprobes
  18563 on ppc64, the mbrwatch.stp example script fails when tested
  18571 Tapset support and test coverage for bpf and seccomp syscalls.
  18577 on rhel7, listing_mode_sanity.exp always gets a failure when doing 
'stap -l **'
  18597 long_arg() doesn't correctly handle negative values in 32-on-64 
environment
  18598 stap_staticmarkers.stp tapset has no test case
  18630 dwarfless parameters from a uprobe need test coverage
  18634 on rawhide, using timer probes gets a compilation error
  18649 int_arg() misbehaves on x86[_64] for 32-bit uprobe in binary having 
debuginfo
  18650 powerpc variant of longlong_arg() for uprobes swaps the high and low 
half of its 64bit retval
  18651 Possible nd_syscall tapset cleanup based on PR18597 fix
  18711 Pass 4 failure on RHEL7 for examples netfilter_summary and 
netfilter_drop
  18751 support a STAP_PRINTF() macro for use in embedded-C functions
  18769 [ppc64BE/--dyninst] unknown operator @__compat_task
  18827 consistency check for syscall and nd_syscall tapset
  18856 nfsd.close probe alias fails on rawhide
  18885 Use /* unmodified-fnargs */ in tapsets
  18889 lost ability to probe kernel module initializers
  18936 script cache will fail if $jiffies is referenced
  18942 any script will include all the globals from tapset/argv.stp
  18944 the ioblock.stp tapset fails to compile on RHEL7
  18971 process_by_pid.exp issues
  18999 error("") stall (causing similar assert() stall)
  19000 several task tapset functions can cause kernel crash
  19021 the tapset function task_dentry_path() should handle more than just 
files
  19043 __bio_ino(), __rqstp_gid() and __rqstp_uid() can crash the kernel
  19045 kernel_string_quoted() can crash the kernel
  19057 _is_reset() can crash the rhel6 / s390 kernel
  19065 task_fd_lookup() can crash the s390x kernel when invoked with an 
invalid input
  19069 task_euid() doesn't compile on aarch64
  19070 Call to __ustack_raw(0) causes 'Unknown symbol in module' on rhel6-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timing of module MODULE_STATE_COMING notifier

2015-08-31 Thread Frank Ch. Eigler
Hi, Rusty -


I wrote:

> [...]
> > Notifiers suck for stuff like this :( Module state has many steps,
> > so my preference has been to open-code explicit hooks.  [...]
> 
> You mean something like the trace_module_load()?  (We will probably
> experiment with hooking into that tracepoint instead of the notifier.)
> [...]

It turns out this works OK, except for EXPORT_TRACEPOINT_SYMBOL_GPL.
Could we get a set of EXPORT_TRACEPOINT_SYMBOL_GPL's for the
trace/events/module.h tracepoints (at least module_load and
module_free)?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timing of module MODULE_STATE_COMING notifier

2015-08-31 Thread Frank Ch. Eigler
Hi, Rusty -

Thanks for your response!

> [...]
> > That patch also moved the MODULE_STATE_COMING notifier call to
> > complete_formation(), which is relatively early to its former
> > do_init_module() call site.  It now precedes the parse_args(),
> > mod_sysfs_setup(), and trace_module_load() steps.
> 
> Yes, parse_args() can enter the module, so you really want it before
> then.

Understood.  (Perhaps mod_sysfs_setup() could sneak in ahead.)


> > Was the latter part of the change intended & necessary?  It is
> > negatively impacting systemtap, which was relying on
> > MODULE_STATE_COMING being called from a fairly complete module
> > state - just before the actual initializer function call.

> Notifiers suck for stuff like this :( Module state has many steps,
> so my preference has been to open-code explicit hooks.  [...]

You mean something like the trace_module_load()?  (We will probably
experiment with hooking into that tracepoint instead of the notifier.)
A more hard-coded one with an in-kernel callee probably wouldn't help
module-resident clients like us.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timing of module MODULE_STATE_COMING notifier

2015-08-31 Thread Frank Ch. Eigler
Hi, Rusty -


I wrote:

> [...]
> > Notifiers suck for stuff like this :( Module state has many steps,
> > so my preference has been to open-code explicit hooks.  [...]
> 
> You mean something like the trace_module_load()?  (We will probably
> experiment with hooking into that tracepoint instead of the notifier.)
> [...]

It turns out this works OK, except for EXPORT_TRACEPOINT_SYMBOL_GPL.
Could we get a set of EXPORT_TRACEPOINT_SYMBOL_GPL's for the
trace/events/module.h tracepoints (at least module_load and
module_free)?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: timing of module MODULE_STATE_COMING notifier

2015-08-31 Thread Frank Ch. Eigler
Hi, Rusty -

Thanks for your response!

> [...]
> > That patch also moved the MODULE_STATE_COMING notifier call to
> > complete_formation(), which is relatively early to its former
> > do_init_module() call site.  It now precedes the parse_args(),
> > mod_sysfs_setup(), and trace_module_load() steps.
> 
> Yes, parse_args() can enter the module, so you really want it before
> then.

Understood.  (Perhaps mod_sysfs_setup() could sneak in ahead.)


> > Was the latter part of the change intended & necessary?  It is
> > negatively impacting systemtap, which was relying on
> > MODULE_STATE_COMING being called from a fairly complete module
> > state - just before the actual initializer function call.

> Notifiers suck for stuff like this :( Module state has many steps,
> so my preference has been to open-code explicit hooks.  [...]

You mean something like the trace_module_load()?  (We will probably
experiment with hooking into that tracepoint instead of the notifier.)
A more hard-coded one with an in-kernel callee probably wouldn't help
module-resident clients like us.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


timing of module MODULE_STATE_COMING notifier

2015-08-30 Thread Frank Ch. Eigler
Hi, Rusty -

We just [1] came across your patch [2] from last year (merged into
3.17), wherein the RO/NX mapping settings for module sections were
moved to an earlier point in the module-loading sequence.

That patch also moved the MODULE_STATE_COMING notifier call to
complete_formation(), which is relatively early to its former
do_init_module() call site.  It now precedes the parse_args(),
mod_sysfs_setup(), and trace_module_load() steps.

Was the latter part of the change intended & necessary?  It is
negatively impacting systemtap, which was relying on
MODULE_STATE_COMING being called from a fairly complete module state -
just before the actual initializer function call.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18889

[2] commit 4982223e51e8ea9d09bb33c8323b5ec1877b2b51
Author: Rusty Russell 
Date:   Wed May 14 10:54:19 2014 +0930


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


timing of module MODULE_STATE_COMING notifier

2015-08-30 Thread Frank Ch. Eigler
Hi, Rusty -

We just [1] came across your patch [2] from last year (merged into
3.17), wherein the RO/NX mapping settings for module sections were
moved to an earlier point in the module-loading sequence.

That patch also moved the MODULE_STATE_COMING notifier call to
complete_formation(), which is relatively early to its former
do_init_module() call site.  It now precedes the parse_args(),
mod_sysfs_setup(), and trace_module_load() steps.

Was the latter part of the change intended  necessary?  It is
negatively impacting systemtap, which was relying on
MODULE_STATE_COMING being called from a fairly complete module state -
just before the actual initializer function call.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=18889

[2] commit 4982223e51e8ea9d09bb33c8323b5ec1877b2b51
Author: Rusty Russell ru...@rustcorp.com.au
Date:   Wed May 14 10:54:19 2014 +0930


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/debug: Remove perpetually broken, unmaintainable dwarf annotations

2015-05-29 Thread Frank Ch. Eigler
Hi -

On Fri, May 29, 2015 at 03:27:16PM -0500, Josh Poimboeuf wrote:
> [...]
> > > Also, with the feature missing completely, maybe someone finds a method to
> > > introduce it in a maintainable fashion, while with the feature included 
> > > upstream
> > > there's very little pressure to do that. As a bonus we'd also win a 
> > > workable dwarf
> > > unwinder.
> > 
> > Before doing something drastic like this, I think we should get Josh's
> > opinion, since I think he's working on a new (?) unwinder.
> 
> I'd definitely like to replace all the asm DWARF CFI annotations with
> something more automated and robust.  So it doesn't really affect me
> whether they're ripped out now or replaced later.  
> [...]
> Then again, I'm not sure how useful or reliable the existing annotations
> are anyway, so maybe it doesn't matter much.

In our experience as consumers of this CFI information for years in
systemtap, the annotations have been generally correct and reliable.
Their presence allows reliable, correct, and efficient
kernel->userspace backtracing as used in important systemtap scripts.

If the current complaint is primarily about testability, it would be
easy to add simple stap-based tests to the kernel to exercise the code
and confirm its operation.  Perhaps we could extract a specialized
self-contained test case (containing an unwinder).

I'm not in a position to judge the purported cost savings of removing
this code, but there is definitely a negative benefit as a loss of
useful functionality, esp. with no replacement in sight.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86/debug: Remove perpetually broken, unmaintainable dwarf annotations

2015-05-29 Thread Frank Ch. Eigler
Hi -

On Fri, May 29, 2015 at 03:27:16PM -0500, Josh Poimboeuf wrote:
 [...]
   Also, with the feature missing completely, maybe someone finds a method to
   introduce it in a maintainable fashion, while with the feature included 
   upstream
   there's very little pressure to do that. As a bonus we'd also win a 
   workable dwarf
   unwinder.
  
  Before doing something drastic like this, I think we should get Josh's
  opinion, since I think he's working on a new (?) unwinder.
 
 I'd definitely like to replace all the asm DWARF CFI annotations with
 something more automated and robust.  So it doesn't really affect me
 whether they're ripped out now or replaced later.  
 [...]
 Then again, I'm not sure how useful or reliable the existing annotations
 are anyway, so maybe it doesn't matter much.

In our experience as consumers of this CFI information for years in
systemtap, the annotations have been generally correct and reliable.
Their presence allows reliable, correct, and efficient
kernel-userspace backtracing as used in important systemtap scripts.

If the current complaint is primarily about testability, it would be
easy to add simple stap-based tests to the kernel to exercise the code
and confirm its operation.  Perhaps we could extract a specialized
self-contained test case (containing an unwinder).

I'm not in a position to judge the purported cost savings of removing
this code, but there is definitely a negative benefit as a loss of
useful functionality, esp. with no replacement in sight.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kbuild: Add an option to enable GCC VTA

2015-04-24 Thread Frank Ch. Eigler
Hi, Josh -

On Fri, Apr 24, 2015 at 08:40:02AM -0400, Josh Boyer wrote:
> [...]
> Frank, did you rebase this against some newer tree or something?

Yes; the lib/Kconfig.debug part didn't apply to current git.

> Curious why you sent it again.

At least as a patch-ping; the poor-debuginfo problems are reported to
affect non-fedora users too.


> > +ifdef CONFIG_DEBUG_INFO_VTA
> > +KBUILD_CFLAGS   += $(call cc-option, -fvar-tracking-assignments)
> > +else
> > +KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
> > +endif
> > +
> 
> Is there a reason you moved this hunk under the DWARF4 options instead
> of modifying it in-place like the original patch did?

Yes, this version appears a little safer, in the sense that without
CONFIG_DEBUG_INFO, neither setting of CONFIG_DEBUG_INFO_VTA would
affect the CFLAGS.  (In fact, Jakub advises the positive polarity
-fvar-tracking-assignments is redundant with -g, and the negative
polarity one only provides codegen-bug-protection in the
CONFIG_DEBUG_INFO case.)


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Kbuild: Add an option to enable GCC VTA

2015-04-24 Thread Frank Ch. Eigler
Hi, Josh -

On Fri, Apr 24, 2015 at 08:40:02AM -0400, Josh Boyer wrote:
 [...]
 Frank, did you rebase this against some newer tree or something?

Yes; the lib/Kconfig.debug part didn't apply to current git.

 Curious why you sent it again.

At least as a patch-ping; the poor-debuginfo problems are reported to
affect non-fedora users too.


  +ifdef CONFIG_DEBUG_INFO_VTA
  +KBUILD_CFLAGS   += $(call cc-option, -fvar-tracking-assignments)
  +else
  +KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
  +endif
  +
 
 Is there a reason you moved this hunk under the DWARF4 options instead
 of modifying it in-place like the original patch did?

Yes, this version appears a little safer, in the sense that without
CONFIG_DEBUG_INFO, neither setting of CONFIG_DEBUG_INFO_VTA would
affect the CFLAGS.  (In fact, Jakub advises the positive polarity
-fvar-tracking-assignments is redundant with -g, and the negative
polarity one only provides codegen-bug-protection in the
CONFIG_DEBUG_INFO case.)


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Kbuild: Add an option to enable GCC VTA

2015-04-23 Thread Frank Ch. Eigler
From: Josh Stone 

Due to isolated gcc codegen issues, gcc -fvar-tracking-assignments
was unconditionally disabled in commit 2062afb4f804 ("Fix gcc-4.9.0
miscompilation of load_balance()  in scheduler").

However, this reduces the debuginfo coverage for variable locations,
especially in inline functions.  VTA is certainly not perfect either
in those cases, but it is much better than without.  With compiler
versions that have fixed the codegen bugs, we would prefer to have the
better details for SystemTap, and surely other debuginfo consumers
like perf will benefit as well.

This patch simply makes CONFIG_DEBUG_INFO_VTA an option.  I considered
Frank and Linus's discussion of a cc-option-like -fcompare-debug test,
but I'm convinced that a narrow test of an arch-specific codegen issue
is not really useful.  GCC has their own regression tests for this, so
I'd suggest GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle is more
useful for kernel developers to test confidence.

In fact, I ran into a couple more issues when testing for this patch[1],
although neither of those had any codegen impact.
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1140872

With gcc-4.9.2-1.fc22, I can now build v3.18-rc5 with Fedora's i686 and
x86_64 configs, and this is completely clean with GCC_COMPARE_DEBUG.

Cc: Jakub Jelinek 
Cc: Josh Boyer 
Cc: Greg Kroah-Hartman 
Cc: Linus Torvalds 
Cc: Andrew Morton 
Cc: Markus Trippelsdorf 
Cc: Michel Dänzer 
Signed-off-by: Josh Stone 
Signed-off-by: Frank Ch. Eigler 
---
 Makefile  |  8 ++--
 lib/Kconfig.debug | 21 -
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 6cc5b2434224..c8e1fcfdb41a 100644
--- a/Makefile
+++ b/Makefile
@@ -704,8 +704,6 @@ KBUILD_CFLAGS   += -fomit-frame-pointer
 endif
 endif
 
-KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
-
 ifdef CONFIG_DEBUG_INFO
 ifdef CONFIG_DEBUG_INFO_SPLIT
 KBUILD_CFLAGS   += $(call cc-option, -gsplit-dwarf, -g)
@@ -718,6 +716,12 @@ ifdef CONFIG_DEBUG_INFO_DWARF4
 KBUILD_CFLAGS  += $(call cc-option, -gdwarf-4,)
 endif
 
+ifdef CONFIG_DEBUG_INFO_VTA
+KBUILD_CFLAGS   += $(call cc-option, -fvar-tracking-assignments)
+else
+KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
+endif
+
 ifdef CONFIG_DEBUG_INFO_REDUCED
 KBUILD_CFLAGS  += $(call cc-option, -femit-struct-debug-baseonly) \
   $(call cc-option,-fno-var-tracking)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 17670573dda8..e8d072d2b402 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -165,7 +165,26 @@ config DEBUG_INFO_DWARF4
  Generate dwarf4 debug info. This requires recent versions
  of gcc and gdb. It makes the debug information larger.
  But it significantly improves the success of resolving
- variables in gdb on optimized code.
+ variables in gdb on optimized code.  The gcc docs also
+ recommend enabling -fvar-tracking-assignments for maximum
+ benefit. (see DEBUG_INFO_VTA)
+
+config DEBUG_INFO_VTA
+   bool "Enable var-tracking-assignments for debuginfo"
+   depends on DEBUG_INFO
+   help
+ Enable gcc -fvar-tracking-assignments for improved debug
+ information on variable locations in optimized code.  Per
+ gcc, DEBUG_INFO_DWARF4 is recommended for best use of VTA,
+ and allows maximal access to local variables in tracers
+ and debuggers like perf, systemtap, kgdb, and crash.
+
+ VTA has been implicated in codegen bugs (gcc PR61801,
+ PR61904, both fixed in 2014-08), so this flag may be used
+ to exclude this rare class of problem.  One can also set
+ GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle in the
+ environment to automatically compile everything both ways,
+ generating an error if anything differs.
 
 config GDB_SCRIPTS
bool "Provide GDB scripts for kernel debugging"
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Kbuild: Add an option to enable GCC VTA

2015-04-23 Thread Frank Ch. Eigler
From: Josh Stone jist...@redhat.com

Due to isolated gcc codegen issues, gcc -fvar-tracking-assignments
was unconditionally disabled in commit 2062afb4f804 (Fix gcc-4.9.0
miscompilation of load_balance()  in scheduler).

However, this reduces the debuginfo coverage for variable locations,
especially in inline functions.  VTA is certainly not perfect either
in those cases, but it is much better than without.  With compiler
versions that have fixed the codegen bugs, we would prefer to have the
better details for SystemTap, and surely other debuginfo consumers
like perf will benefit as well.

This patch simply makes CONFIG_DEBUG_INFO_VTA an option.  I considered
Frank and Linus's discussion of a cc-option-like -fcompare-debug test,
but I'm convinced that a narrow test of an arch-specific codegen issue
is not really useful.  GCC has their own regression tests for this, so
I'd suggest GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle is more
useful for kernel developers to test confidence.

In fact, I ran into a couple more issues when testing for this patch[1],
although neither of those had any codegen impact.
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1140872

With gcc-4.9.2-1.fc22, I can now build v3.18-rc5 with Fedora's i686 and
x86_64 configs, and this is completely clean with GCC_COMPARE_DEBUG.

Cc: Jakub Jelinek ja...@redhat.com
Cc: Josh Boyer jwbo...@fedoraproject.org
Cc: Greg Kroah-Hartman gre...@linuxfoundation.org
Cc: Linus Torvalds torva...@linux-foundation.org
Cc: Andrew Morton a...@linux-foundation.org
Cc: Markus Trippelsdorf mar...@trippelsdorf.de
Cc: Michel Dänzer mic...@daenzer.net
Signed-off-by: Josh Stone jist...@redhat.com
Signed-off-by: Frank Ch. Eigler f...@redhat.com
---
 Makefile  |  8 ++--
 lib/Kconfig.debug | 21 -
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 6cc5b2434224..c8e1fcfdb41a 100644
--- a/Makefile
+++ b/Makefile
@@ -704,8 +704,6 @@ KBUILD_CFLAGS   += -fomit-frame-pointer
 endif
 endif
 
-KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
-
 ifdef CONFIG_DEBUG_INFO
 ifdef CONFIG_DEBUG_INFO_SPLIT
 KBUILD_CFLAGS   += $(call cc-option, -gsplit-dwarf, -g)
@@ -718,6 +716,12 @@ ifdef CONFIG_DEBUG_INFO_DWARF4
 KBUILD_CFLAGS  += $(call cc-option, -gdwarf-4,)
 endif
 
+ifdef CONFIG_DEBUG_INFO_VTA
+KBUILD_CFLAGS   += $(call cc-option, -fvar-tracking-assignments)
+else
+KBUILD_CFLAGS   += $(call cc-option, -fno-var-tracking-assignments)
+endif
+
 ifdef CONFIG_DEBUG_INFO_REDUCED
 KBUILD_CFLAGS  += $(call cc-option, -femit-struct-debug-baseonly) \
   $(call cc-option,-fno-var-tracking)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 17670573dda8..e8d072d2b402 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -165,7 +165,26 @@ config DEBUG_INFO_DWARF4
  Generate dwarf4 debug info. This requires recent versions
  of gcc and gdb. It makes the debug information larger.
  But it significantly improves the success of resolving
- variables in gdb on optimized code.
+ variables in gdb on optimized code.  The gcc docs also
+ recommend enabling -fvar-tracking-assignments for maximum
+ benefit. (see DEBUG_INFO_VTA)
+
+config DEBUG_INFO_VTA
+   bool Enable var-tracking-assignments for debuginfo
+   depends on DEBUG_INFO
+   help
+ Enable gcc -fvar-tracking-assignments for improved debug
+ information on variable locations in optimized code.  Per
+ gcc, DEBUG_INFO_DWARF4 is recommended for best use of VTA,
+ and allows maximal access to local variables in tracers
+ and debuggers like perf, systemtap, kgdb, and crash.
+
+ VTA has been implicated in codegen bugs (gcc PR61801,
+ PR61904, both fixed in 2014-08), so this flag may be used
+ to exclude this rare class of problem.  One can also set
+ GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle in the
+ environment to automatically compile everything both ways,
+ generating an error if anything differs.
 
 config GDB_SCRIPTS
bool Provide GDB scripts for kernel debugging
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
> > Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
> > and to extract variables at those probes, much as systemtap does.
> > Without var-tracking, probes placed at most interior points of
> > functions will make variables inaccessible.
> 
> .. and as mentioned, -O2 already does that for many things, even
> *with* tracking.

The whole point of variable tracking was to make -O2 usable (though
still imperfect) for those who use debuggers and such tools.


> [...]  I don't understand how you guys can be so cavalier about a
> compiler bug that has already resulted in actual real problems.

No one is minimizing the problem.  We are looking for a knob for those
who know that their compiler does not have that bug.  (Plus, those who
don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad
compiler.)


> You bring up theoretical cases that nobody has actually reported
> [...]

I assure you that the years of effort that went into gcc variable
tracking was justified with actual reports.


> Do you compile without -O2 too? Because I *guarantee* you that with
> -O2 (even with tracking), you'll get "local variable 'xyz' optimized
> away" cases.

One gets many fewer than without it, and also fewer false positives
(where the non-var-tracking debuginfo claims a variable may be
available, but points to the wrong place).


> [...]  Until you can get the compiler people to have some sane way
> to know the problem is gone, I'm not going to maintain a kernel that
> uses a known-broken compiler feature. It's that simple.

Would you consider a patch that does a gcc COMPARE_DEBUG-based test?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

> >>.  I don't disagree it should be
> >> disabled by default, but making it unconditional is going to force the
> >> distributions that care about perf, systemtap, and debuggers to
> >> manually revert this.
> >
> > Bah. I bet I use 'perf' more than most, and it doesn't care about
> > debug info. 

Actually, "perf probe" does (via HAVE_DWARF_SUPPORT), to place probes
and to extract variables at those probes, much as systemtap does.
Without var-tracking, probes placed at most interior points of
functions will make variables inaccessible.

Do you need a fully worked out example to see this?

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

 .  I don't disagree it should be
  disabled by default, but making it unconditional is going to force the
  distributions that care about perf, systemtap, and debuggers to
  manually revert this.
 
  Bah. I bet I use 'perf' more than most, and it doesn't care about
  debug info. 

Actually, perf probe does (via HAVE_DWARF_SUPPORT), to place probes
and to extract variables at those probes, much as systemtap does.
Without var-tracking, probes placed at most interior points of
functions will make variables inaccessible.

Do you need a fully worked out example to see this?

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3.15 33/37] Fix gcc-4.9.0 miscompilation of load_balance() in scheduler

2014-08-05 Thread Frank Ch. Eigler
Hi -

On Tue, Aug 05, 2014 at 03:36:39PM -0700, Linus Torvalds wrote:
  Actually, perf probe does (via HAVE_DWARF_SUPPORT), to place probes
  and to extract variables at those probes, much as systemtap does.
  Without var-tracking, probes placed at most interior points of
  functions will make variables inaccessible.
 
 .. and as mentioned, -O2 already does that for many things, even
 *with* tracking.

The whole point of variable tracking was to make -O2 usable (though
still imperfect) for those who use debuggers and such tools.


 [...]  I don't understand how you guys can be so cavalier about a
 compiler bug that has already resulted in actual real problems.

No one is minimizing the problem.  We are looking for a knob for those
who know that their compiler does not have that bug.  (Plus, those who
don't care about debug data could use CONFIG_DEBUG_INFO=n with the bad
compiler.)


 You bring up theoretical cases that nobody has actually reported
 [...]

I assure you that the years of effort that went into gcc variable
tracking was justified with actual reports.


 Do you compile without -O2 too? Because I *guarantee* you that with
 -O2 (even with tracking), you'll get local variable 'xyz' optimized
 away cases.

One gets many fewer than without it, and also fewer false positives
(where the non-var-tracking debuginfo claims a variable may be
available, but points to the wrong place).


 [...]  Until you can get the compiler people to have some sane way
 to know the problem is gone, I'm not going to maintain a kernel that
 uses a known-broken compiler feature. It's that simple.

Would you consider a patch that does a gcc COMPARE_DEBUG-based test?


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C

2014-07-30 Thread Frank Ch. Eigler
Hi, Alexei -

> My understanding of systemtap is that the whole .stp script is converted
> to C, compiled as .ko and loaded, so all map walking and prints are
> happening in the kernel. Similarly for ktap which has special functions
> in kernel to print histograms.

That is correct.

> I thought dtrace printf are also happening from the kernel. What is
> the trick they use to know which pieces of dtrace script should be
> run in user space?

It appears as though the bytecode language running in the kernel sends
some action commands back out to userspace, not just plain data.


> In ebpf examples there are two C files: one for kernel with ebpf isa
> and one for userspace as native. I thought about combining them,
> but couldn't figure out a clean way of doing it.

(#if ?)


> > What kind of locking/serialization is provided by the ebpf runtime
> > over shared variables such as my_map?
> 
> it's traditional rcu scheme.

OK, that protects the table structure, but:

> [...] In such case concurrent write access to map value can be done
> with bpf_xadd instruction, though using normal read/write is also
> allowed. In some cases the speed of racy var++ is preferred over
> 'lock xadd'.

... so concurrency control over shared values is left up to the
programmer.

> There are no lock/unlock function helpers available to ebpf
> programs, since program may terminate early with div by zero
> for example, so in-kernel lock helper implementation would
> be complicated and slow. It's possible to do, but for the use
> cases so far there is no need.

OK, I hope that works out.  I've been told that dtrace does something
similiar (!)  by eschewing protection on global variables such as
strings.  In their case it's less bad than it sounds because they are
used to offloading computation to userspace or to store only
thread-local state, and accept the corollary limitations on control.

(Systemtap does fully & automatically protect shared variables, even
in the face of run-time script errors.)


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C

2014-07-30 Thread Frank Ch. Eigler

ast wrote earlier:

> [...]
> dtrace/systemtap/ktap approach is to use one script file that should provide
> all desired functionality. That architectural decision overcomplicated their
> implementations.
>
> eBPF follows split model: everything that needs to process millions of events
> per second needs to run in kernel and needs to be short and deterministic,
> all other things like aggregation and nice graphs should run in user space.
> [...]

For the record, this is not entirely accurate as to dtrace.  dtrace
delegates aggregation and most reporting to userspace.  Also,
systemtap is "short and deterministic" even for aggregations & nice
graphs, but since it limits its storage & cpu consumption, its
arrays/reports cannot get super large.


> [...]
> +SEC("events/skb/kfree_skb")
> +int bpf_prog2(struct bpf_context *ctx)
> +{
> +[...]
> + value = bpf_map_lookup_elem(_map, );
> + if (value)
> + (*(long *) value) += 1;
> + else
> + bpf_map_update_elem(_map, , _val);
> + return 0;
> +}

What kind of locking/serialization is provided by the ebpf runtime
over shared variables such as my_map?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C

2014-07-30 Thread Frank Ch. Eigler

ast wrote earlier:

 [...]
 dtrace/systemtap/ktap approach is to use one script file that should provide
 all desired functionality. That architectural decision overcomplicated their
 implementations.

 eBPF follows split model: everything that needs to process millions of events
 per second needs to run in kernel and needs to be short and deterministic,
 all other things like aggregation and nice graphs should run in user space.
 [...]

For the record, this is not entirely accurate as to dtrace.  dtrace
delegates aggregation and most reporting to userspace.  Also,
systemtap is short and deterministic even for aggregations  nice
graphs, but since it limits its storage  cpu consumption, its
arrays/reports cannot get super large.


 [...]
 +SEC(events/skb/kfree_skb)
 +int bpf_prog2(struct bpf_context *ctx)
 +{
 +[...]
 + value = bpf_map_lookup_elem(my_map, loc);
 + if (value)
 + (*(long *) value) += 1;
 + else
 + bpf_map_update_elem(my_map, loc, init_val);
 + return 0;
 +}

What kind of locking/serialization is provided by the ebpf runtime
over shared variables such as my_map?


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC v3 net-next 3/3] samples: bpf: eBPF dropmon example in C

2014-07-30 Thread Frank Ch. Eigler
Hi, Alexei -

 My understanding of systemtap is that the whole .stp script is converted
 to C, compiled as .ko and loaded, so all map walking and prints are
 happening in the kernel. Similarly for ktap which has special functions
 in kernel to print histograms.

That is correct.

 I thought dtrace printf are also happening from the kernel. What is
 the trick they use to know which pieces of dtrace script should be
 run in user space?

It appears as though the bytecode language running in the kernel sends
some action commands back out to userspace, not just plain data.


 In ebpf examples there are two C files: one for kernel with ebpf isa
 and one for userspace as native. I thought about combining them,
 but couldn't figure out a clean way of doing it.

(#if ?)


  What kind of locking/serialization is provided by the ebpf runtime
  over shared variables such as my_map?
 
 it's traditional rcu scheme.

OK, that protects the table structure, but:

 [...] In such case concurrent write access to map value can be done
 with bpf_xadd instruction, though using normal read/write is also
 allowed. In some cases the speed of racy var++ is preferred over
 'lock xadd'.

... so concurrency control over shared values is left up to the
programmer.

 There are no lock/unlock function helpers available to ebpf
 programs, since program may terminate early with div by zero
 for example, so in-kernel lock helper implementation would
 be complicated and slow. It's possible to do, but for the use
 cases so far there is no need.

OK, I hope that works out.  I've been told that dtrace does something
similiar (!)  by eschewing protection on global variables such as
strings.  In their case it's less bad than it sounds because they are
used to offloading computation to userspace or to store only
thread-local state, and accept the corollary limitations on control.

(Systemtap does fully  automatically protect shared variables, even
in the face of run-time script errors.)


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler
Hi -

On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote:
> [...]
> I thought Markus told us that -fno-var-tracking-assignments makes
> absolutely no difference for non-debug kernels?

It does affect CONFIG_DEBUG_INFO kernels, and that config option is
set for all Red Hat kernels (-debug or plain).


> [...]  Is there some equivalent signalling system that gcc could use
> [...]

I'm not aware of anything trivial like a gcc --report-fixed-PRs kind
of thing.  But, kbuild could conceivably have a run-time test
involving test-running gcc with in that compare-debug mode with a
suitable test case.  We use the latter technique in systemtap for
auto-configuring to kernel versions/features; we got the $(CHECK_BUILD)
trick from vmware module makefiles.  It could be recast as a variant
of $(cc-option ...).


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler

torvalds wrote:

> [...]
> Actually, I prefer my patch that did it with cc-option checking, and
> does it unconditionally.
>
> Because if we do it even for non-debug builds - where it ostensibly
> shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working
> regardless of configuration.

Please note that the data produced by "-g -fvar-tracking" is consumed
by tools like systemtap, perf, crash, and makes a significant
difference to the observability of debug AND non-debug kernels.  (The
presence of compiled-in DEBUG_* self-checking code is orthogonal to
kernel observability via debuginfo.)  Please consider only disabling
var-tracking optionally/temporarily to work around this already-fixed
compiler bug, but not losing high-quality dwarf data permanently.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler

torvalds wrote:

 [...]
 Actually, I prefer my patch that did it with cc-option checking, and
 does it unconditionally.

 Because if we do it even for non-debug builds - where it ostensibly
 shouldn't matter - we then have that GCC_COMPARE_DEBUG thing working
 regardless of configuration.

Please note that the data produced by -g -fvar-tracking is consumed
by tools like systemtap, perf, crash, and makes a significant
difference to the observability of debug AND non-debug kernels.  (The
presence of compiled-in DEBUG_* self-checking code is orthogonal to
kernel observability via debuginfo.)  Please consider only disabling
var-tracking optionally/temporarily to work around this already-fixed
compiler bug, but not losing high-quality dwarf data permanently.

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Random panic in load_balance() with 3.16-rc

2014-07-28 Thread Frank Ch. Eigler
Hi -

On Mon, Jul 28, 2014 at 09:10:04AM -0400, Theodore Ts'o wrote:
 [...]
 I thought Markus told us that -fno-var-tracking-assignments makes
 absolutely no difference for non-debug kernels?

It does affect CONFIG_DEBUG_INFO kernels, and that config option is
set for all Red Hat kernels (-debug or plain).


 [...]  Is there some equivalent signalling system that gcc could use
 [...]

I'm not aware of anything trivial like a gcc --report-fixed-PRs kind
of thing.  But, kbuild could conceivably have a run-time test
involving test-running gcc with in that compare-debug mode with a
suitable test case.  We use the latter technique in systemtap for
auto-configuring to kernel versions/features; we got the $(CHECK_BUILD)
trick from vmware module makefiles.  It could be recast as a variant
of $(cc-option ...).


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2] Tracepoint: register/unregister struct tracepoint

2014-03-13 Thread Frank Ch. Eigler
Hi -

On Thu, Mar 13, 2014 at 12:10:48PM -0400, Mathieu Desnoyers wrote:

> [...]  Moreover, tracers are responsible for unregistering the probe
> before the module containing its associated tracepoint is unloaded.

Could you spell out please how a tracer is supposed to know early
enough that the module is going to be unloaded?

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH v2] Tracepoint: register/unregister struct tracepoint

2014-03-13 Thread Frank Ch. Eigler
Hi -

On Thu, Mar 13, 2014 at 12:10:48PM -0400, Mathieu Desnoyers wrote:

 [...]  Moreover, tracers are responsible for unregistering the probe
 before the module containing its associated tracepoint is unloaded.

Could you spell out please how a tracer is supposed to know early
enough that the module is going to be unloaded?

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs

2014-03-11 Thread Frank Ch. Eigler
Hi, Steven -

> > So it is a deferred-activation kind of call, with no way of knowing
> > when or if the tracepoints will start coming in.  Why is that
> > supported at all, considering that clients could monitor modules
> > coming & going via the module_notifier chain, and update registration
> > at that time?
> 
> That's my argument.

Was there an answer?


> > >> +entry = get_tracepoint(name);
> > >> +/* Make sure the entry was enabled */
> > >> +if (!entry || !entry->enabled)
> > >> +ret = -ENODEV;
> > 
> > For what it's worth, I agree with Mathieu that this sort of successful
> > failure result code is a problem for tracking what needs cleanup and
> > what doesn't.  (In systemtap's case, if this change gets merged, we'll
> > have to treat -ENODEV as if it were 0.)
> 
> Does systemtap enable tracepoints before they are created? That is, do
> you allow enabling of a tracepoint in a module that is not loaded yet?

We have no formal opinion on whether or not this makes sense.  If the
kernel permits it, fine.

> If not, than you want this as an error.

But it's not exactly an error!  It's a success of sorts, and means
that later on we have to unregister the callback, just as if it were
successful.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs

2014-03-11 Thread Frank Ch. Eigler
Hi, Steven -

  So it is a deferred-activation kind of call, with no way of knowing
  when or if the tracepoints will start coming in.  Why is that
  supported at all, considering that clients could monitor modules
  coming  going via the module_notifier chain, and update registration
  at that time?
 
 That's my argument.

Was there an answer?


   +entry = get_tracepoint(name);
   +/* Make sure the entry was enabled */
   +if (!entry || !entry-enabled)
   +ret = -ENODEV;
  
  For what it's worth, I agree with Mathieu that this sort of successful
  failure result code is a problem for tracking what needs cleanup and
  what doesn't.  (In systemtap's case, if this change gets merged, we'll
  have to treat -ENODEV as if it were 0.)
 
 Does systemtap enable tracepoints before they are created? That is, do
 you allow enabling of a tracepoint in a module that is not loaded yet?

We have no formal opinion on whether or not this makes sense.  If the
kernel permits it, fine.

 If not, than you want this as an error.

But it's not exactly an error!  It's a success of sorts, and means
that later on we have to unregister the callback, just as if it were
successful.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs

2014-03-10 Thread Frank Ch. Eigler

Hi -


>> From: Steven Rostedt 
>> 
>> Tracepoints were made to allow enabling a tracepoint in a module before that
>> module was loaded. When a tracepoint is enabled and it does not exist, the
>> name is stored and will be enabled when the tracepoint is created.
>> 
>> The problem with this approach is that when a tracepoint is enabled when
>> it expects to be there, it gives no warning that it does not exist.

So it is a deferred-activation kind of call, with no way of knowing
when or if the tracepoints will start coming in.  Why is that
supported at all, considering that clients could monitor modules
coming & going via the module_notifier chain, and update registration
at that time?


>> +entry = get_tracepoint(name);
>> +/* Make sure the entry was enabled */
>> +if (!entry || !entry->enabled)
>> +ret = -ENODEV;

For what it's worth, I agree with Mathieu that this sort of successful
failure result code is a problem for tracking what needs cleanup and
what doesn't.  (In systemtap's case, if this change gets merged, we'll
have to treat -ENODEV as if it were 0.)


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [for-next][PATCH 08/20] tracing: Warn if a tracepoint is not set via debugfs

2014-03-10 Thread Frank Ch. Eigler

Hi -


 From: Steven Rostedt rost...@goodmis.org
 
 Tracepoints were made to allow enabling a tracepoint in a module before that
 module was loaded. When a tracepoint is enabled and it does not exist, the
 name is stored and will be enabled when the tracepoint is created.
 
 The problem with this approach is that when a tracepoint is enabled when
 it expects to be there, it gives no warning that it does not exist.

So it is a deferred-activation kind of call, with no way of knowing
when or if the tracepoints will start coming in.  Why is that
supported at all, considering that clients could monitor modules
coming  going via the module_notifier chain, and update registration
at that time?


 +entry = get_tracepoint(name);
 +/* Make sure the entry was enabled */
 +if (!entry || !entry-enabled)
 +ret = -ENODEV;

For what it's worth, I agree with Mathieu that this sort of successful
failure result code is a problem for tracking what needs cleanup and
what doesn't.  (In systemtap's case, if this change gets merged, we'll
have to treat -ENODEV as if it were 0.)


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

2014-02-13 Thread Frank Ch. Eigler

rostedt wrote:

> [...]
> Oh! You are saying that if the kernel only *supports* signed modules,
> and you load a module that is not signed, it will taint the kernel?

Yes: this is the default for several distros.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH] Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE

2014-02-13 Thread Frank Ch. Eigler

rostedt wrote:

 [...]
 Oh! You are saying that if the kernel only *supports* signed modules,
 and you load a module that is not signed, it will taint the kernel?

Yes: this is the default for several distros.

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2014-02-09 Thread Frank Ch. Eigler
Hi -

> > So the similar thing happens when we enables events as below;
> > 
> >   # for i in /sys/kernel/debug/tracing/events/kprobes/* ; do date; echo 1 > 
> > $i; done
> >   Wed Jan 29 10:44:50 UTC 2014
> >   ...
> >
> > I tried it and canceled after 4 min passed. It enabled about 17k 
> > events and slowed down my system very much(I almost got hang check 
> > timer).
> 
> Ok, I guess that's the slowdown bug that Frank reported.

It could be, but it feels a bit different.  In my testing from
December, it's as though it wasn't the activated probes *hitting* that
were associated with the slowdown, but them merely being activated.
It was as though something with the kprobes/ftrace probe-registration
code performed a lot more work than it did longer ago.  (One way to
test this could be to be more careful in the selection of kprobes
being enabled.  For examle, emplace thousands, but only unused loaded
modules.)  I'm sorry I didn't get time to investigate further.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2014-02-09 Thread Frank Ch. Eigler
Hi -

  So the similar thing happens when we enables events as below;
  
# for i in /sys/kernel/debug/tracing/events/kprobes/* ; do date; echo 1  
  $i; done
Wed Jan 29 10:44:50 UTC 2014
...
 
  I tried it and canceled after 4 min passed. It enabled about 17k 
  events and slowed down my system very much(I almost got hang check 
  timer).
 
 Ok, I guess that's the slowdown bug that Frank reported.

It could be, but it feels a bit different.  In my testing from
December, it's as though it wasn't the activated probes *hitting* that
were associated with the slowdown, but them merely being activated.
It was as though something with the kprobes/ftrace probe-registration
code performed a lot more work than it did longer ago.  (One way to
test this could be to be more careful in the selection of kprobes
being enabled.  For examle, emplace thousands, but only unused loaded
modules.)  I'm sorry I didn't get time to investigate further.

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2013-12-20 Thread Frank Ch. Eigler
Hi -

mingo wrote:
> [...]
> For example a hash table (hashed by probe address) could be used in 
> addition to the list, to speed up basic operations.

In the past, when this sort of behavior popped up, it was due to
machine-wide halt/sync operations being done too eagerly.  At one
point, the kprobes-unregistration interface grew a mass-unregister API
to batch them (and save the machine sync's between operations).  Maybe
with the new checks/logic, a similar batching API may be needed for
the registration side.

I'll try to get more perf data once the VM comes back up; after a
couple of hours of the test getting started, it died (for possibly
unrelated reasons).


[  133.073670] stap_bc6054113aa63134d411836da0afefc3_123_1261: module 
verification failed: signature and/or  required key missing - tainting kernel
[  404.357210] stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948: systemtap: 
2.5/0.157, base: a0201000, memory: 
15882data/24text/1ctx/2058net/4625alloc kb, probes: 34692
[ 1655.745075] hrtimer: interrupt took 1225946 ns
[ 3969.175039] [sched_delayed] sched: RT throttling activated
[10812.665534] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[10812.665534] BUG: unable to handle kernel paging request at 88007902f038
[10812.665534] IP: [] 0x88007902f038
[10812.665534] PGD 2d90067 PUD 2d93067 PMD 8000790001e3 
[10812.665534] Oops: 0010 [#1] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#2] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#3] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#4] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665042] [ cut here ]
[10812.665042] WARNING: CPU: 0 PID: 1948 at arch/x86/kernel/kprobes/core.c:600 
reenter_kprobe+0x3c/0xd0()
[10812.665042] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 

Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2013-12-20 Thread Frank Ch. Eigler
Hi -

mingo wrote:
 [...]
 For example a hash table (hashed by probe address) could be used in 
 addition to the list, to speed up basic operations.

In the past, when this sort of behavior popped up, it was due to
machine-wide halt/sync operations being done too eagerly.  At one
point, the kprobes-unregistration interface grew a mass-unregister API
to batch them (and save the machine sync's between operations).  Maybe
with the new checks/logic, a similar batching API may be needed for
the registration side.

I'll try to get more perf data once the VM comes back up; after a
couple of hours of the test getting started, it died (for possibly
unrelated reasons).


[  133.073670] stap_bc6054113aa63134d411836da0afefc3_123_1261: module 
verification failed: signature and/or  required key missing - tainting kernel
[  404.357210] stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948: systemtap: 
2.5/0.157, base: a0201000, memory: 
15882data/24text/1ctx/2058net/4625alloc kb, probes: 34692
[ 1655.745075] hrtimer: interrupt took 1225946 ns
[ 3969.175039] [sched_delayed] sched: RT throttling activated
[10812.665534] kernel tried to execute NX-protected page - exploit attempt? 
(uid: 0)
[10812.665534] BUG: unable to handle kernel paging request at 88007902f038
[10812.665534] IP: [88007902f038] 0x88007902f038
[10812.665534] PGD 2d90067 PUD 2d93067 PMD 8000790001e3 
[10812.665534] Oops: 0010 [#1] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#2] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#3] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665534] Oops:  [#4] SMP 
[10812.665534] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net virtio_pci ata_generic 
virtio_ring pata_acpi virtio [last unloaded: 
stap_89fcd2b984e11a30dd08d141e6b47e13_123_1681]
[10812.665534] CPU: 1 PID: -30720 Comm: F8A1D0x Tainted: GF  O 
3.13.0-rc4-01828-g8b349c29efae #1
[10812.665534] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10812.665534] BUG: unable to handle kernel paging request at 8f896e09
[10812.665534] IP: [810d9f55] do_raw_spin_trylock+0x5/0x60
[10812.665534] PGD 0 
[10812.665534] Thread overran stack, or stack corrupted
[10812.665042] [ cut here ]
[10812.665042] WARNING: CPU: 0 PID: 1948 at arch/x86/kernel/kprobes/core.c:600 
reenter_kprobe+0x3c/0xd0()
[10812.665042] Modules linked in: 
stap_4c53547addc9f25dd87ac4afa0407ed6_36_1948(OF) rpcsec_gss_krb5 auth_rpcgss 
nfsv4 dns_resolver nfs lockd sunrpc fscache ppdev microcode parport_pc parport 
i2c_piix4 virtio_balloon i2c_core serio_raw virtio_net 

Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2013-12-19 Thread Frank Ch. Eigler

Hi, Masami -


masami.hiramatsu.pt wrote:

> Here is the version 6 of NOKPROBE_SYMBOL series. :)
> [...]

Some preliminary results from building these on top of tip/master on
x86-64.  

# stap -te "probe kprobe.function("*") {}"

starts up OK, without crashes, which looks like great progress.  But a
closer look indicates that the insertion of kprobes is taking about
three (!!) orders of magnitude longer than before, as judged by the
rate of increase of 'wc -l /sys/kernel/debug/kprobes/list'.  So, one
has to let the thing run for several hours just to get all the kprobes
inserted, never mind letting stress-testing begin.

For reference, here's the steady-state "perf top" output during all this
insertion work:

 54.81%  [kernel][k] _raw_spin_unlock_irqrestore
 38.13%  [kernel][k] __slab_alloc
  1.11%  [kernel][k] kprobe_ftrace_handler
  0.88%  [kernel][k] _raw_spin_unlock_irq

More notes once the machine gets far enough to get to the robustness
testing phase.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v6 00/22] kprobes: introduce NOKPROBE_SYMBOL(), cleanup and fixes crash bugs

2013-12-19 Thread Frank Ch. Eigler

Hi, Masami -


masami.hiramatsu.pt wrote:

 Here is the version 6 of NOKPROBE_SYMBOL series. :)
 [...]

Some preliminary results from building these on top of tip/master on
x86-64.  

# stap -te probe kprobe.function(*) {}

starts up OK, without crashes, which looks like great progress.  But a
closer look indicates that the insertion of kprobes is taking about
three (!!) orders of magnitude longer than before, as judged by the
rate of increase of 'wc -l /sys/kernel/debug/kprobes/list'.  So, one
has to let the thing run for several hours just to get all the kprobes
inserted, never mind letting stress-testing begin.

For reference, here's the steady-state perf top output during all this
insertion work:

 54.81%  [kernel][k] _raw_spin_unlock_irqrestore
 38.13%  [kernel][k] __slab_alloc
  1.11%  [kernel][k] kprobe_ftrace_handler
  0.88%  [kernel][k] _raw_spin_unlock_irq

More notes once the machine gets far enough to get to the robustness
testing phase.

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-08 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

> [...]
> Anyway, as far as I can see, there looks be two different models of
> tracing in our mind.
>
> A) Fixed event based tracing: In this model, there are several fixed
> "events" which well defined with fixed arguments. tracer handles these
> events and only use limited arguments. It's like a packet stream
> processing. ftrace, perf etc. are used this model.
>
> B) Flexible event-point tracing: In this model, each tracer(or even
> trace user) can freely define their own event, there will be some fixed
> tracing points defined, but arguments are defined by users. It's like a
> debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

It may be more useful to think of it as a contrast along the
hard-coded versus programmable axis.  (perf, systemtap, and ktap can
each reach to some extent across your "fixed" vs "flexible" line.
Each has some dynamic and some static-tracepoint capability.)


> e.g. B model has a good flexibility and A model is easy to use for
> beginners.

I don't think it's the model that dictates ease-of-use, but the
quality of implementation, logistics, documentation, and examples.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 4/5] use BPF in tracing filters

2013-12-08 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

 [...]
 Anyway, as far as I can see, there looks be two different models of
 tracing in our mind.

 A) Fixed event based tracing: In this model, there are several fixed
 events which well defined with fixed arguments. tracer handles these
 events and only use limited arguments. It's like a packet stream
 processing. ftrace, perf etc. are used this model.

 B) Flexible event-point tracing: In this model, each tracer(or even
 trace user) can freely define their own event, there will be some fixed
 tracing points defined, but arguments are defined by users. It's like a
 debugger's breakpoint debugging. systemtap, ktap etc. are used this model.

It may be more useful to think of it as a contrast along the
hard-coded versus programmable axis.  (perf, systemtap, and ktap can
each reach to some extent across your fixed vs flexible line.
Each has some dynamic and some static-tracepoint capability.)


 e.g. B model has a good flexibility and A model is easy to use for
 beginners.

I don't think it's the model that dictates ease-of-use, but the
quality of implementation, logistics, documentation, and examples.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-06 Thread Frank Ch. Eigler
Hi -

On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote:

> [...]
> > Would you plan to limit kprobes (or just the perf-probe frontend) to
> > only function-entries also?

> Exactly, yes :). Currently I have a patch for kprobe-tracer
> implementation (not only for perf-probe, but doesn't limit kprobes
> itself).

Interesting option.  It sounds like a restrictive expedient that could
result in kprobes never being made sufficiently robust.


> > If not, and if intra-function statement-granularity kprobes remain
> > allowed within a function-granularity whitelist, then you might
> > still have those "quantitative" problems.

> Yes, but as far as I've tested, the performance overhead is not
> high, especially as far as putting kprobes at the entry of those
> functions because of ftrace-based optimization.

(Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?)


> > Even worse, kprobes robustness problems can bite even with a small
> > whitelist, unless you can test the countless subset selections
> > cartesian-product the aggrevating factors (like other tracing
> > facilities being in use at the same time, limited memory, high irq
> > rates, debugging sessions, architectures, whatever).
> 
> And also, what script will run on each probe, right? :)

In the perf-probe world, the closest analogue could be varying the
contextual data that's being extracted (stack traces, parameters, ...).


> >> [...]  For the long term solution, I think we can introduce some
> >> kind of performance gatekeeper as systemtap does. Counting the
> >> miss-hit rate per second and if it go over a threshold, disable next
> >> miss-hit (or most miss-hit) probe (as OOM killer does).
> > 
> > That would make sense, but again it would not help deal with kprobes
> > robustness (in the kernel-crashing rather than kernel-slowdown sense).
> 
> Why would you think so? Is there any hidden path for calling kprobes
> mechanism?? The kernel crash problem just comes from bugs, not the
> quantitative issue.

I don't think we're disagreeing.  A performance-gatekeeper in
perf-probe or nearby would be useful (and manage the kprobe-quantity
problem).  It would not be sufficient to prevent the kernel-crashing
bugs.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-06 Thread Frank Ch. Eigler

hpa wrote:

>> I can see there may be some setups which don't have a compiler
>> (e.g. I know some people don't use systemtap because of that)
>> But this needs a custom gcc install too as far as I understand.
>
> Yes... but no compiler and secure boot tend to go together, or at
> least will in the future.

(Maybe not: we're already experimenting with support for secureboot in
systemtap.)

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-06 Thread Frank Ch. Eigler
Hi, Masami -

masami.hiramatsu.pt wrote:

> [...]
> >> [...]  Then, I'd like to propose this new whitelist feature in
> >> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
> >> disabling the whitelist.  That knob will be
> >> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
> >> kernel tainted so that we can check it from bug reports.
> > 
> > How would one assemble a reliable whitelist, if we haven't fully
> > characterized the problems that make the blacklist necessary?
> 
> As I said, we can use function graph tracer's list as the whitelist,
> since it doesn't include any functions invoked from ftrace's event
> handler. (Note that I don't mention the Systemtap or other user here)
>
> Whitelist is just for keeping the people away from the quantitative
> issue, who just want to trace their subsystems except for ftrace.
> [...]

Would you plan to limit kprobes (or just the perf-probe frontend) to
only function-entries also?  If not, and if intra-function
statement-granularity kprobes remain allowed within a
function-granularity whitelist, then you might still have those
"quantitative" problems.

Even worse, kprobes robustness problems can bite even with a small
whitelist, unless you can test the countless subset selections
cartesian-product the aggrevating factors (like other tracing
facilities being in use at the same time, limited memory, high irq
rates, debugging sessions, architectures, whatever).


> [...]  For the long term solution, I think we can introduce some
> kind of performance gatekeeper as systemtap does. Counting the
> miss-hit rate per second and if it go over a threshold, disable next
> miss-hit (or most miss-hit) probe (as OOM killer does).

That would make sense, but again it would not help deal with kprobes
robustness (in the kernel-crashing rather than kernel-slowdown sense).


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-06 Thread Frank Ch. Eigler
Hi, Masami -

masami.hiramatsu.pt wrote:

 [...]
  [...]  Then, I'd like to propose this new whitelist feature in
  kprobe-tracer (not raw kprobe itself). And a sysctl knob for
  disabling the whitelist.  That knob will be
  /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
  kernel tainted so that we can check it from bug reports.
  
  How would one assemble a reliable whitelist, if we haven't fully
  characterized the problems that make the blacklist necessary?
 
 As I said, we can use function graph tracer's list as the whitelist,
 since it doesn't include any functions invoked from ftrace's event
 handler. (Note that I don't mention the Systemtap or other user here)

 Whitelist is just for keeping the people away from the quantitative
 issue, who just want to trace their subsystems except for ftrace.
 [...]

Would you plan to limit kprobes (or just the perf-probe frontend) to
only function-entries also?  If not, and if intra-function
statement-granularity kprobes remain allowed within a
function-granularity whitelist, then you might still have those
quantitative problems.

Even worse, kprobes robustness problems can bite even with a small
whitelist, unless you can test the countless subset selections
cartesian-product the aggrevating factors (like other tracing
facilities being in use at the same time, limited memory, high irq
rates, debugging sessions, architectures, whatever).


 [...]  For the long term solution, I think we can introduce some
 kind of performance gatekeeper as systemtap does. Counting the
 miss-hit rate per second and if it go over a threshold, disable next
 miss-hit (or most miss-hit) probe (as OOM killer does).

That would make sense, but again it would not help deal with kprobes
robustness (in the kernel-crashing rather than kernel-slowdown sense).


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-06 Thread Frank Ch. Eigler

hpa wrote:

 I can see there may be some setups which don't have a compiler
 (e.g. I know some people don't use systemtap because of that)
 But this needs a custom gcc install too as far as I understand.

 Yes... but no compiler and secure boot tend to go together, or at
 least will in the future.

(Maybe not: we're already experimenting with support for secureboot in
systemtap.)

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-06 Thread Frank Ch. Eigler
Hi -

On Sat, Dec 07, 2013 at 08:19:13AM +0900, Masami Hiramatsu wrote:

 [...]
  Would you plan to limit kprobes (or just the perf-probe frontend) to
  only function-entries also?

 Exactly, yes :). Currently I have a patch for kprobe-tracer
 implementation (not only for perf-probe, but doesn't limit kprobes
 itself).

Interesting option.  It sounds like a restrictive expedient that could
result in kprobes never being made sufficiently robust.


  If not, and if intra-function statement-granularity kprobes remain
  allowed within a function-granularity whitelist, then you might
  still have those quantitative problems.

 Yes, but as far as I've tested, the performance overhead is not
 high, especially as far as putting kprobes at the entry of those
 functions because of ftrace-based optimization.

(Would that also make CONFIG_KPROBE_EVENT require KPROBES_ON_FTRACE?)


  Even worse, kprobes robustness problems can bite even with a small
  whitelist, unless you can test the countless subset selections
  cartesian-product the aggrevating factors (like other tracing
  facilities being in use at the same time, limited memory, high irq
  rates, debugging sessions, architectures, whatever).
 
 And also, what script will run on each probe, right? :)

In the perf-probe world, the closest analogue could be varying the
contextual data that's being extracted (stack traces, parameters, ...).


  [...]  For the long term solution, I think we can introduce some
  kind of performance gatekeeper as systemtap does. Counting the
  miss-hit rate per second and if it go over a threshold, disable next
  miss-hit (or most miss-hit) probe (as OOM killer does).
  
  That would make sense, but again it would not help deal with kprobes
  robustness (in the kernel-crashing rather than kernel-slowdown sense).
 
 Why would you think so? Is there any hidden path for calling kprobes
 mechanism?? The kernel crash problem just comes from bugs, not the
 quantitative issue.

I don't think we're disagreeing.  A performance-gatekeeper in
perf-probe or nearby would be useful (and manage the kprobe-quantity
problem).  It would not be sufficient to prevent the kernel-crashing
bugs.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-05 Thread Frank Ch. Eigler
Andi Kleen  writes:

> [...]  While it sounds interesting, I would strongly advise to make
> this capability only available to root. Traditionally lots of
> complex byte code languages which were designed to be "safe" and
> verifiable weren't really. e.g. i managed to crash things with
> "safe" systemtap multiple times. [...]

Note that systemtap has never been a byte code language, that avenue
being considered lkml-futile at the time, but instead pure C.  Its
safety comes from a mix of compiled-in checks (which you can inspect
via "stap -p3") and script-to-C translation checks (which are
self-explanatory).  Its risks come from bugs in the checks (quite
rare), problems in the runtime library (rare), and problems in
underlying kernel facilities (rare or frequent - consider kprobes).


> So the likelyhood of this having some hole somewhere (either in
> the byte code or in some library function) is high.

Very true!


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-05 Thread Frank Ch. Eigler

ast wrote:

>>[...]
> Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email:
> trace skb:kfree_skb {
> if (arg2 == 0x100) {
> printf("%x %x\n", arg1, arg2)
> }
> }
> [...]

For reference, you might try putting systemtap into the performance
comparison matrix too:

# stap -e 'probe kernel.trace("kfree_skb") { 
  if ($location == 0x100 /* || $location == 0x200 etc. */ ) {
 printf("%x %x\n", $skb, $location)
  }
   }'


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-05 Thread Frank Ch. Eigler

Hi, Masami -

masami.hiramatsu.pt wrote:

> [...]
> For the safeness of kprobes, I have an idea; introduce a whitelist
> for dynamic events. AFAICS, the biggest unstable issue of kprobes
> comes from putting *many* probes on the functions called from tracers.

Why do you think so?  We have had problems with single kprobes in the
"wrong" spot.  The main reason I showed spraying them widely is to get
wide coverage with minimal information/effort, not to suggest that the
number of concurrent probes per se is a problem.  (We have had
systemtap scripts probing some areas of the kernel with thousands of
active kprobes, e.g. for statement-by-statement variable-watching
jobs, and these have worked fine.)


> It doesn't crash the kernel but slows down so much, because every
> probes hit many other nested miss-hit probes. 

(kprobes does have code to detect & handle reentrancy.)

> This gives us a big performance impact. [...]

Sure, but I'd expect to see pure slowdowns show their impact with
time-related problems like watchdogs firing or timeouts.


> [...]  Then, I'd like to propose this new whitelist feature in
> kprobe-tracer (not raw kprobe itself). And a sysctl knob for
> disabling the whitelist.  That knob will be
> /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
> kernel tainted so that we can check it from bug reports.

How would one assemble a reliable whitelist, if we haven't fully
characterized the problems that make the blacklist necessary?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v4 0/6] kprobes: introduce NOKPROBE_SYMBOL() and fixes crash bugs

2013-12-05 Thread Frank Ch. Eigler

Hi, Masami -

masami.hiramatsu.pt wrote:

 [...]
 For the safeness of kprobes, I have an idea; introduce a whitelist
 for dynamic events. AFAICS, the biggest unstable issue of kprobes
 comes from putting *many* probes on the functions called from tracers.

Why do you think so?  We have had problems with single kprobes in the
wrong spot.  The main reason I showed spraying them widely is to get
wide coverage with minimal information/effort, not to suggest that the
number of concurrent probes per se is a problem.  (We have had
systemtap scripts probing some areas of the kernel with thousands of
active kprobes, e.g. for statement-by-statement variable-watching
jobs, and these have worked fine.)


 It doesn't crash the kernel but slows down so much, because every
 probes hit many other nested miss-hit probes. 

(kprobes does have code to detect  handle reentrancy.)

 This gives us a big performance impact. [...]

Sure, but I'd expect to see pure slowdowns show their impact with
time-related problems like watchdogs firing or timeouts.


 [...]  Then, I'd like to propose this new whitelist feature in
 kprobe-tracer (not raw kprobe itself). And a sysctl knob for
 disabling the whitelist.  That knob will be
 /proc/sys/debug/kprobe-event-whitelist and disabling it will mark
 kernel tainted so that we can check it from bug reports.

How would one assemble a reliable whitelist, if we haven't fully
characterized the problems that make the blacklist necessary?


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-05 Thread Frank Ch. Eigler

ast wrote:

[...]
 Did simple ktap test with 1M alloc_skb/kfree_skb toy test from earlier email:
 trace skb:kfree_skb {
 if (arg2 == 0x100) {
 printf(%x %x\n, arg1, arg2)
 }
 }
 [...]

For reference, you might try putting systemtap into the performance
comparison matrix too:

# stap -e 'probe kernel.trace(kfree_skb) { 
  if ($location == 0x100 /* || $location == 0x200 etc. */ ) {
 printf(%x %x\n, $skb, $location)
  }
   }'


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 0/5] tracing filters with BPF

2013-12-05 Thread Frank Ch. Eigler
Andi Kleen a...@firstfloor.org writes:

 [...]  While it sounds interesting, I would strongly advise to make
 this capability only available to root. Traditionally lots of
 complex byte code languages which were designed to be safe and
 verifiable weren't really. e.g. i managed to crash things with
 safe systemtap multiple times. [...]

Note that systemtap has never been a byte code language, that avenue
being considered lkml-futile at the time, but instead pure C.  Its
safety comes from a mix of compiled-in checks (which you can inspect
via stap -p3) and script-to-C translation checks (which are
self-explanatory).  Its risks come from bugs in the checks (quite
rare), problems in the runtime library (rare), and problems in
underlying kernel facilities (rare or frequent - consider kprobes).


 So the likelyhood of this having some hole somewhere (either in
 the byte code or in some library function) is high.

Very true!


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document

2013-12-03 Thread Frank Ch. Eigler

Alexei Starovoitov  writes:

> [...]
>> Having EBPF code manipulating pointers - or kernel memory - directly
>> seems like a nonstarter.  However, per your subsequent paragraph it
>> sounds like pointers are a special type at which point it shouldn't
>> matter at the EBPF level how many bytes it takes to represent it?
>
> bpf_check() will track every register through every insn.
> If pointer is stored in the register, it will know what type
> of pointer it is and will allow '*reg' operation only if pointer is valid.
> [...]
> BPF program actually can manipulate kernel memory directly
> when checker guarantees that it is safe to do so :)

It sounds like this sort of static analysis would have difficulty with
situations such as:

- multiple levels of indirection

- conditionals (where it can't trace a unique data/type flow for all pointers)

- aliasing (same reason)

- the possibility of bad (or userspace?) pointers arriving as
  parameters from the underlying trace events
  

> For example in tracing filters bpf_context access is restricted to:
> static const struct bpf_context_access ctx_access[MAX_CTX_OFF] = {
> [offsetof(struct bpf_context, regs.di)] = {
> FIELD_SIZEOF(struct bpf_context, regs.di),
> BPF_READ
> },

Are such constraints to be hard-coded in the kernel?


> Over course of development bpf_check() found several compiler bugs.
> I also tried all of sorts of ways to break bpf jail from inside of a
> bpf program, but so far checker catches everything I was able to throw
> at it.

(One can be sure that attackers will chew hard on this interface,
should it become reasonably accessible to userspace, so good job
starting to check carefully!)


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH tip 3/5] Extended BPF (64-bit BPF) design document

2013-12-03 Thread Frank Ch. Eigler

Alexei Starovoitov a...@plumgrid.com writes:

 [...]
 Having EBPF code manipulating pointers - or kernel memory - directly
 seems like a nonstarter.  However, per your subsequent paragraph it
 sounds like pointers are a special type at which point it shouldn't
 matter at the EBPF level how many bytes it takes to represent it?

 bpf_check() will track every register through every insn.
 If pointer is stored in the register, it will know what type
 of pointer it is and will allow '*reg' operation only if pointer is valid.
 [...]
 BPF program actually can manipulate kernel memory directly
 when checker guarantees that it is safe to do so :)

It sounds like this sort of static analysis would have difficulty with
situations such as:

- multiple levels of indirection

- conditionals (where it can't trace a unique data/type flow for all pointers)

- aliasing (same reason)

- the possibility of bad (or userspace?) pointers arriving as
  parameters from the underlying trace events
  

 For example in tracing filters bpf_context access is restricted to:
 static const struct bpf_context_access ctx_access[MAX_CTX_OFF] = {
 [offsetof(struct bpf_context, regs.di)] = {
 FIELD_SIZEOF(struct bpf_context, regs.di),
 BPF_READ
 },

Are such constraints to be hard-coded in the kernel?


 Over course of development bpf_check() found several compiler bugs.
 I also tried all of sorts of ways to break bpf jail from inside of a
 bpf program, but so far checker catches everything I was able to throw
 at it.

(One can be sure that attackers will chew hard on this interface,
should it become reasonably accessible to userspace, so good job
starting to check carefully!)


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler
Hi -

> > Does this new blacklist cover enough that the kernel now survives a 
> > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
> 
> That's generally the purpose of the annotations - if it doesn't then 
> that's a bug.

AFAIK, no kernel since kprobes was introduced has ever stood up to
that test.  perf probe lacks the wildcarding powers of systemtap, so
one needs to resort to something like:

# cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
   perf probe $symbol
done

then wait for a few hours for that to finish. Then, or while the loop
is still running, run

# perf record -e 'probe:*' -aR sleep 1

to take a kernel down.


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

> [...]  This series also includes a change which prohibits probing on
> the address in .entry.text because the code is used for very
> low-level sensitive interrupt/syscall entries. Probing such code may
> cause unexpected result (actually most of that area is already in
> the kprobe blacklist).  So I've decide to prohibit probing all of
> them. [...]

Does this new blacklist cover enough that the kernel now survives a
broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler

masami.hiramatsu.pt wrote:

 [...]  This series also includes a change which prohibits probing on
 the address in .entry.text because the code is used for very
 low-level sensitive interrupt/syscall entries. Probing such code may
 cause unexpected result (actually most of that area is already in
 the kprobe blacklist).  So I've decide to prohibit probing all of
 them. [...]

Does this new blacklist cover enough that the kernel now survives a
broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist

2013-11-20 Thread Frank Ch. Eigler
Hi -

  Does this new blacklist cover enough that the kernel now survives a 
  broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms?
 
 That's generally the purpose of the annotations - if it doesn't then 
 that's a bug.

AFAIK, no kernel since kprobes was introduced has ever stood up to
that test.  perf probe lacks the wildcarding powers of systemtap, so
one needs to resort to something like:

# cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do
   perf probe $symbol
done

then wait for a few hours for that to finish. Then, or while the loop
is still running, run

# perf record -e 'probe:*' -aR sleep 1

to take a kernel down.


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

2013-10-26 Thread Frank Ch. Eigler
Pekka Enberg  writes:

> Is there a technical reason why 'perf list' could not show all the
> available SDT markers on a system and that the 'mark to event'
> mapping cannot happen automatically? [...]

A quick experiment with:

  find `echo $PATH | tr : ' '` -type f -perm -555 | 
   xargs readelf -n 2>/dev/null | 
   grep STAP 2>/dev/null 

suggests reasonable performance for my F19 workstation (a second or
two over ~6000 executables), once all the ELF content is in the block
cache.  According to a stap eventcount.stp run, that required about
5 syscall.read events.

Note that a $PATH search excludes shared libraries, which can also
carry  markers.  Adding /usr/lib* in more than doubles the
work, then there's /usr/libexec etc.

- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/3] Support for perf to probe into SDT markers:

2013-10-26 Thread Frank Ch. Eigler
Pekka Enberg penb...@kernel.org writes:

 Is there a technical reason why 'perf list' could not show all the
 available SDT markers on a system and that the 'mark to event'
 mapping cannot happen automatically? [...]

A quick experiment with:

  find `echo $PATH | tr : ' '` -type f -perm -555 | 
   xargs readelf -n 2/dev/null | 
   grep STAP 2/dev/null 

suggests reasonable performance for my F19 workstation (a second or
two over ~6000 executables), once all the ELF content is in the block
cache.  According to a stap eventcount.stp run, that required about
5 syscall.read events.

Note that a $PATH search excludes shared libraries, which can also
carry sys/sdt.h markers.  Adding /usr/lib* in more than doubles the
work, then there's /usr/libexec etc.

- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] Perf support to SDT markers

2013-10-07 Thread Frank Ch. Eigler
Hemant Kumar  writes:

> [...]
> A simple example to show this follows.
> - Create a file with .d extension and mention the probe names in it with
> provider name and marker name.
> [...]
> - Now create the probes.h and probes.o file :
> $ dtrace -C -h -s probes.d -o probes.h
> $ dtrace -C -G -s probes.d -o probes.o
> [...]

It may be worthwhile to document an even-simpler case:

- no .d file
- no invocation of the dtrace python script
- no generated .h or .o file
- in the C file, just add:

  #include 

  void main () {
   /* ... */
   STAP_PROBE(provider_name,probe_name);
   /* ... */
  }

- gcc file.c
- stap -l 'process("./a.out").mark("*")' to list


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 0/3] Perf support to SDT markers

2013-10-07 Thread Frank Ch. Eigler
Hemant Kumar hks...@linux.vnet.ibm.com writes:

 [...]
 A simple example to show this follows.
 - Create a file with .d extension and mention the probe names in it with
 provider name and marker name.
 [...]
 - Now create the probes.h and probes.o file :
 $ dtrace -C -h -s probes.d -o probes.h
 $ dtrace -C -G -s probes.d -o probes.o
 [...]

It may be worthwhile to document an even-simpler case:

- no .d file
- no invocation of the dtrace python script
- no generated .h or .o file
- in the C file, just add:

  #include sys/sdt.h

  void main () {
   /* ... */
   STAP_PROBE(provider_name,probe_name);
   /* ... */
  }

- gcc file.c
- stap -l 'process(./a.out).mark(*)' to list


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] ktap 0.1 released

2013-05-21 Thread Frank Ch. Eigler
"zhangwei(Jovi)"  writes:

> I'm pleased to announce that ktap release v0.1, this is the first official
> release of ktap project [...]

Congrats.


> = what's ktap?
>
>Because this is the first release, so there wouldn't include too
>much features, just contain several basic features about tracing,
>[...]

Nice progress.  Reviewing the safety/security items from
https://lkml.org/lkml/2013/1/17/623, I see improvement in most.

For example, you seem to be using GFP_ATOMIC for run-time memory
allocation, which is safer than before (though still could exhaust
resources).  OTOH your code doesn't handle *failure* of such
allocation attempts (see call sites to kp_*alloc).

There still doesn't seem to be safety constraints on the incoming
byte code (like jump ranges, or loop counts).

It's nice to see some arithmetic OP_* checks, and the user_string
function is probably safe enough now.  You'll need something analogous
for kernel space (and possibly as verification for the various %s
kp_printfs).  The hash tables might be susceptible to the deliberate
hash collision attacks from last year.

It's nice to see the *_STACK_SIZE constraints in the bytecode
interpreter; is there any C-level recursion remaining to consume
excessive kernel stack?

Exposing os.sleep/os.wait (or general kernel functions) to become
callable from the scripts is fraught with danger.  You just can't call
the underlying functions from random kernel context (imagine from the
most pessimal possible kprobe or tracepoint, somewhere within an
atomic section), and you'll get crashes.

You should write several time/space/invasivity stress-tests to help
see how future progress improves the code's performance/safety on
these and other problem areas.


> = Planned Changes
>
>we are planning to enable more kernel ineroperability into ktap [...]

As per the above, you'll want to be extremely careful about the idea
to export FFI to let the lua scripts call into arbitrary kernel
functions.  Perhaps wrap it into a 'guru' mode flag?


- FChE
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [ANNOUNCE] ktap 0.1 released

2013-05-21 Thread Frank Ch. Eigler
zhangwei(Jovi) jovi.zhang...@huawei.com writes:

 I'm pleased to announce that ktap release v0.1, this is the first official
 release of ktap project [...]

Congrats.


 = what's ktap?

Because this is the first release, so there wouldn't include too
much features, just contain several basic features about tracing,
[...]

Nice progress.  Reviewing the safety/security items from
https://lkml.org/lkml/2013/1/17/623, I see improvement in most.

For example, you seem to be using GFP_ATOMIC for run-time memory
allocation, which is safer than before (though still could exhaust
resources).  OTOH your code doesn't handle *failure* of such
allocation attempts (see call sites to kp_*alloc).

There still doesn't seem to be safety constraints on the incoming
byte code (like jump ranges, or loop counts).

It's nice to see some arithmetic OP_* checks, and the user_string
function is probably safe enough now.  You'll need something analogous
for kernel space (and possibly as verification for the various %s
kp_printfs).  The hash tables might be susceptible to the deliberate
hash collision attacks from last year.

It's nice to see the *_STACK_SIZE constraints in the bytecode
interpreter; is there any C-level recursion remaining to consume
excessive kernel stack?

Exposing os.sleep/os.wait (or general kernel functions) to become
callable from the scripts is fraught with danger.  You just can't call
the underlying functions from random kernel context (imagine from the
most pessimal possible kprobe or tracepoint, somewhere within an
atomic section), and you'll get crashes.

You should write several time/space/invasivity stress-tests to help
see how future progress improves the code's performance/safety on
these and other problem areas.


 = Planned Changes

we are planning to enable more kernel ineroperability into ktap [...]

As per the above, you'll want to be extremely careful about the idea
to export FFI to let the lua scripts call into arbitrary kernel
functions.  Perhaps wrap it into a 'guru' mode flag?


- FChE
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


systemtap 2.2.1 release

2013-05-16 Thread Frank Ch. Eigler
on
  + tapset functions are still incomplete relative to what is supported
when the kernel backend is active
  + exception handling becomes completely broken in programs
instrumented by the current version of dyninst (PR14702)
  + command line interrupts are slightly mishandled (PR15049)
  + not all registers are made available on 32-bit x86 (PR15136)

  See dyninst/README and the systemtap/dyninst Bugzilla component
  (http://tinyurl.com/stapdyn-PR-list) if you want all the gory
  details about the state of the feature.


= Contributors for this release

Dave Brolley, David Smith, Frank Ch. Eigler, Josh Stone, Lukas Berk,
Mark Wielaard, Masanari Iida*, Negreanu Marius Adrian, Serguei Makarov,
Timo Juhani Lindfors, Torsten Polle

Special thanks to Serguei Makarov for drafting these notes.
Special thanks to new contributors, marked with '*' above.


= Bugs fixed for this release <http://sourceware.org/PR#>

11341 update_visitor::require/provide uses hazardous static_casts
12894 Provide a systemd target replacing the current stap-server initscript
14275 Possible hotspot.function(" ") style probes
14297 stap -l and pn() fail to expand complex wildcards
14491 Add a proper stapdyn transport layer
15053 stapdyn needs -G (setting global variables) support
15112 Can't connect to stap-server via IPv6 raw hex addresses
15114 [PATCH] Propagate uid and gid from nfsd module as well
15123 workaround for bad debuginfo for -mfentry
15147 _stp_error() doesn't behave as described
15155 syscall tapset doesn't know sendmmsg
15162 eh_frame table too big, may kernel panic
15168 tolerate ppc deprecated ptrace commands
15170 nfsd.proc4.write probe alias needs updating
15171 inet_get_local_port() tapset function is broken on rawhide kernels
15172 tolerate unavailable System.map, as on ubuntu
15173 'origin' renamed to 'whence'
15177 need to handle new 'whence' values of 'SEEK_DATA' and 'SEEK_HOLE'
15197 syscall.fork/nd_syscall.fork broken on rawhide kernels
15198 syscall.sigaltstack / nd_syscall.sigaltstack broken on rawhide
15211 syscall.exp failures on rawhide
15237 adapt to changes in hlist_* kernel api in 3.9
15279 Stop munging the uprobes IP with kernel 3.9
15290 Update the inode-uretprobes support for aarapov's latest iteration
15306 stapdyn IRPC on terminated process, child SEGV
15315 Implement basic process filtering for inode-uprobes
15363 don't abort for a measly inode-uprobes registration failure
15408 procfs probes broken on rawhide
15422 loc2c with 32-on-64 sometimes creates integer-widening-into-pointer gcc 
warnings
15445 kernel.data (hwbkpt) probes can cause kernel panic on i686
15446 procfs probes broken on rawhide (kernel 3.10)   
15452 segmentation fault in libdw while running debugtyptes.stp on rawhide   
15456 syscalls and nd_syscalls tapset compat probe points broken on kernel 3.10
15466 add fallback for timer.profile on kernels without register_timer_hook()




pgpg_CCAqemEO.pgp
Description: PGP signature


systemtap 2.2.1 release

2013-05-16 Thread Frank Ch. Eigler
 to what is supported
when the kernel backend is active
  + exception handling becomes completely broken in programs
instrumented by the current version of dyninst (PR14702)
  + command line interrupts are slightly mishandled (PR15049)
  + not all registers are made available on 32-bit x86 (PR15136)

  See dyninst/README and the systemtap/dyninst Bugzilla component
  (http://tinyurl.com/stapdyn-PR-list) if you want all the gory
  details about the state of the feature.


= Contributors for this release

Dave Brolley, David Smith, Frank Ch. Eigler, Josh Stone, Lukas Berk,
Mark Wielaard, Masanari Iida*, Negreanu Marius Adrian, Serguei Makarov,
Timo Juhani Lindfors, Torsten Polle

Special thanks to Serguei Makarov for drafting these notes.
Special thanks to new contributors, marked with '*' above.


= Bugs fixed for this release http://sourceware.org/PR#

11341 update_visitor::require/provide uses hazardous static_casts
12894 Provide a systemd target replacing the current stap-server initscript
14275 Possible hotspot.function( ) style probes
14297 stap -l and pn() fail to expand complex wildcards
14491 Add a proper stapdyn transport layer
15053 stapdyn needs -G (setting global variables) support
15112 Can't connect to stap-server via IPv6 raw hex addresses
15114 [PATCH] Propagate uid and gid from nfsd module as well
15123 workaround for bad debuginfo for -mfentry
15147 _stp_error() doesn't behave as described
15155 syscall tapset doesn't know sendmmsg
15162 eh_frame table too big, may kernel panic
15168 tolerate ppc deprecated ptrace commands
15170 nfsd.proc4.write probe alias needs updating
15171 inet_get_local_port() tapset function is broken on rawhide kernels
15172 tolerate unavailable System.map, as on ubuntu
15173 'origin' renamed to 'whence'
15177 need to handle new 'whence' values of 'SEEK_DATA' and 'SEEK_HOLE'
15197 syscall.fork/nd_syscall.fork broken on rawhide kernels
15198 syscall.sigaltstack / nd_syscall.sigaltstack broken on rawhide
15211 syscall.exp failures on rawhide
15237 adapt to changes in hlist_* kernel api in 3.9
15279 Stop munging the uprobes IP with kernel 3.9
15290 Update the inode-uretprobes support for aarapov's latest iteration
15306 stapdyn IRPC on terminated process, child SEGV
15315 Implement basic process filtering for inode-uprobes
15363 don't abort for a measly inode-uprobes registration failure
15408 procfs probes broken on rawhide
15422 loc2c with 32-on-64 sometimes creates integer-widening-into-pointer gcc 
warnings
15445 kernel.data (hwbkpt) probes can cause kernel panic on i686
15446 procfs probes broken on rawhide (kernel 3.10)   
15452 segmentation fault in libdw while running debugtyptes.stp on rawhide   
15456 syscalls and nd_syscalls tapset compat probe points broken on kernel 3.10
15466 add fallback for timer.profile on kernels without register_timer_hook()




pgpg_CCAqemEO.pgp
Description: PGP signature


Re: systemtap broken by removal of register_timer_hook

2013-04-30 Thread Frank Ch. Eigler
Hi -

> [...]  How about creating trace_tick() in
> include/trace/events/timer.h and call it from tick_periodic() and
> tick_sched_handle(). [...]

Like this?


>From facee64445c0dcc717e99c474c5c7dcdd31b9a74 Mon Sep 17 00:00:00 2001
From: "Frank Ch. Eigler" 
Date: Wed, 3 Apr 2013 10:35:21 -0400
Subject: [PATCH] profiling: add tick tracepoint

Commit ba6fdda4 removed the timer_hook mechanism for modules to listen
to profiling timer ticks (without having to set up more complicated
perf mechanisms).  To reduce the impact on out-of-tree users such as
systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place,
invoked right beside profile_tick() in kernel/time/tick-*.c.
Tested with perf and systemtap.

Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Mel Gorman 
Signed-off-by: Frank Ch. Eigler 
---
 include/trace/events/timer.h | 20 
 kernel/time/tick-common.c|  2 ++
 kernel/time/tick-sched.c |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 425bcfe..ec4c2d0 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -323,6 +323,26 @@ TRACE_EVENT(itimer_expire,
  (int) __entry->pid, (unsigned long long)__entry->now)
 );
 
+
+struct pt_regs;
+
+/**
+ * tick - called when the profiling timer ticks
+ * @regs:  pointer to struct pt_regs*
+ */
+TRACE_EVENT(tick,
+   TP_PROTO(struct pt_regs *regs),
+   TP_ARGS(regs),
+   TP_STRUCT__entry(
+   __field( struct pt_regs*,   regs)
+   ),
+   TP_fast_assign(
+   __entry->regs   = regs;
+   ),
+   TP_printk("ip=%p", (void *) instruction_pointer(__entry->regs))
+);
+
+
 #endif /*  _TRACE_TIMER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..5f4227f 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -74,6 +75,7 @@ static void tick_periodic(int cpu)
 
update_process_times(user_mode(get_irq_regs()));
profile_tick(CPU_PROFILING);
+   trace_tick(get_irq_regs());
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a19a399..447be56 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -140,6 +141,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct 
pt_regs *regs)
 #endif
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
+   trace_tick(get_irq_regs());
 }
 
 /*
-- 
1.8.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: systemtap broken by removal of register_timer_hook

2013-04-30 Thread Frank Ch. Eigler
Hi -

 [...]  How about creating trace_tick() in
 include/trace/events/timer.h and call it from tick_periodic() and
 tick_sched_handle(). [...]

Like this?


From facee64445c0dcc717e99c474c5c7dcdd31b9a74 Mon Sep 17 00:00:00 2001
From: Frank Ch. Eigler f...@redhat.com
Date: Wed, 3 Apr 2013 10:35:21 -0400
Subject: [PATCH] profiling: add tick tracepoint

Commit ba6fdda4 removed the timer_hook mechanism for modules to listen
to profiling timer ticks (without having to set up more complicated
perf mechanisms).  To reduce the impact on out-of-tree users such as
systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place,
invoked right beside profile_tick() in kernel/time/tick-*.c.
Tested with perf and systemtap.

Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@kernel.org
Cc: Mel Gorman mgor...@suse.de
Signed-off-by: Frank Ch. Eigler f...@redhat.com
---
 include/trace/events/timer.h | 20 
 kernel/time/tick-common.c|  2 ++
 kernel/time/tick-sched.c |  2 ++
 3 files changed, 24 insertions(+)

diff --git a/include/trace/events/timer.h b/include/trace/events/timer.h
index 425bcfe..ec4c2d0 100644
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -323,6 +323,26 @@ TRACE_EVENT(itimer_expire,
  (int) __entry-pid, (unsigned long long)__entry-now)
 );
 
+
+struct pt_regs;
+
+/**
+ * tick - called when the profiling timer ticks
+ * @regs:  pointer to struct pt_regs*
+ */
+TRACE_EVENT(tick,
+   TP_PROTO(struct pt_regs *regs),
+   TP_ARGS(regs),
+   TP_STRUCT__entry(
+   __field( struct pt_regs*,   regs)
+   ),
+   TP_fast_assign(
+   __entry-regs   = regs;
+   ),
+   TP_printk(ip=%p, (void *) instruction_pointer(__entry-regs))
+);
+
+
 #endif /*  _TRACE_TIMER_H */
 
 /* This part must be outside protection */
diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c
index b1600a6..5f4227f 100644
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -18,6 +18,7 @@
 #include linux/percpu.h
 #include linux/profile.h
 #include linux/sched.h
+#include trace/events/timer.h
 
 #include asm/irq_regs.h
 
@@ -74,6 +75,7 @@ static void tick_periodic(int cpu)
 
update_process_times(user_mode(get_irq_regs()));
profile_tick(CPU_PROFILING);
+   trace_tick(get_irq_regs());
 }
 
 /*
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a19a399..447be56 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -21,6 +21,7 @@
 #include linux/sched.h
 #include linux/module.h
 #include linux/irq_work.h
+#include trace/events/timer.h
 
 #include asm/irq_regs.h
 
@@ -140,6 +141,7 @@ static void tick_sched_handle(struct tick_sched *ts, struct 
pt_regs *regs)
 #endif
update_process_times(user_mode(regs));
profile_tick(CPU_PROFILING);
+   trace_tick(get_irq_regs());
 }
 
 /*
-- 
1.8.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: systemtap broken by removal of register_timer_hook

2013-04-19 Thread Frank Ch. Eigler
Hi, Frederic -


> > How about this?
> >
> > Author: Frank Ch. Eigler 
> > Date:   Wed Apr 3 10:35:21 2013 -0400
> >
> > profiling: add profile_tick tracepoint
> > [...]

> It would be better not to tie this to CONFIG_PROFILING.
> A tracepoint in update_process_times() instead would be great but it's
> sometimes called several times in a tick from some archs.
> Probably we need something like:
> 
> static inline tick_trace(struct pt_regs *regs)
> {
> trace_timer_tick(regs);
> profile_tick(CPU_PROFILING);
> }

I looked into this, but found no natural place to define such an
inline function from which to call into a tracepoint, without having
to #include the  file many times.  Nor does it seem
appropriate to do the identical #define CREATE_TRACE_POINTS part from
all the different arch/.../*.c files that may call into that inline.
If you'd like to stick to this idea, please advise further where you
think the tracepoint definition & declarations should go.

In the alternative, here is v2 of the patch, just changing the
tracepoint-printing argument as suggested by jistone.

- FChE

---

Author: Frank Ch. Eigler 
Date:   Wed Apr 3 10:35:21 2013 -0400

profiling: add profile_tick tracepoint

Commit ba6fdda4 removed the timer_hook mechanism for modules to listen
to profiling timer ticks (without having to set up more complicated
perf mechanisms).  To reduce the impact on out-of-tree users such as
systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place.
Tested with perf and systemtap.

Cc: Frederic Weisbecker 
Cc: Ingo Molnar 
Cc: Mel Gorman 
Signed-off-by: Frank Ch. Eigler 

diff --git a/include/trace/events/profile.h b/include/trace/events/profile.h
new file mode 100644
index 000..445aee7
--- /dev/null
+++ b/include/trace/events/profile.h
@@ -0,0 +1,37 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM profile
+
+#if !defined(_TRACE_PROFILE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PROFILE_H
+
+#include 
+
+
+struct pt_regs;
+
+/**
+ * profile_tick - called when the profiling timer ticks
+ * @type:  profiling tick type, generally @CPU_PROFILING
+ * @regs:  pointer to struct pt_regs*
+ */
+
+TRACE_EVENT(profile_tick,
+   TP_PROTO(int type, struct pt_regs *regs),
+   TP_ARGS(type, regs),
+   TP_STRUCT__entry(
+   __field( int,   type)
+   __field( struct pt_regs*,   regs)
+   ),
+   TP_fast_assign(
+   __entry->type   = type;
+   __entry->regs   = regs;
+   ),
+   TP_printk("type=%d ip=%p", __entry->type,
+ instruction_pointer(__entry->regs))
+);
+
+
+#endif /*  _TRACE_PROFILE_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/kernel/profile.c b/kernel/profile.c
index dc3384e..d61f921 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -29,6 +29,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 struct profile_hit {
u32 pc, hits;
 };
@@ -414,6 +417,8 @@ void profile_tick(int type)
 {
struct pt_regs *regs = get_irq_regs();
 
+   trace_profile_tick(type, regs);
+
if (!user_mode(regs) && prof_cpu_mask != NULL &&
cpumask_test_cpu(smp_processor_id(), prof_cpu_mask))
profile_hit(type, (void *)profile_pc(regs));
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: systemtap broken by removal of register_timer_hook

2013-04-19 Thread Frank Ch. Eigler
Hi, Frederic -


  How about this?
 
  Author: Frank Ch. Eigler f...@redhat.com
  Date:   Wed Apr 3 10:35:21 2013 -0400
 
  profiling: add profile_tick tracepoint
  [...]

 It would be better not to tie this to CONFIG_PROFILING.
 A tracepoint in update_process_times() instead would be great but it's
 sometimes called several times in a tick from some archs.
 Probably we need something like:
 
 static inline tick_trace(struct pt_regs *regs)
 {
 trace_timer_tick(regs);
 profile_tick(CPU_PROFILING);
 }

I looked into this, but found no natural place to define such an
inline function from which to call into a tracepoint, without having
to #include the event/FOO.h file many times.  Nor does it seem
appropriate to do the identical #define CREATE_TRACE_POINTS part from
all the different arch/.../*.c files that may call into that inline.
If you'd like to stick to this idea, please advise further where you
think the tracepoint definition  declarations should go.

In the alternative, here is v2 of the patch, just changing the
tracepoint-printing argument as suggested by jistone.

- FChE

---

Author: Frank Ch. Eigler f...@redhat.com
Date:   Wed Apr 3 10:35:21 2013 -0400

profiling: add profile_tick tracepoint

Commit ba6fdda4 removed the timer_hook mechanism for modules to listen
to profiling timer ticks (without having to set up more complicated
perf mechanisms).  To reduce the impact on out-of-tree users such as
systemtap, a TRACE_EVENT-flavoured tracepoint is added in its place.
Tested with perf and systemtap.

Cc: Frederic Weisbecker fweis...@gmail.com
Cc: Ingo Molnar mi...@kernel.org
Cc: Mel Gorman mgor...@suse.de
Signed-off-by: Frank Ch. Eigler f...@redhat.com

diff --git a/include/trace/events/profile.h b/include/trace/events/profile.h
new file mode 100644
index 000..445aee7
--- /dev/null
+++ b/include/trace/events/profile.h
@@ -0,0 +1,37 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM profile
+
+#if !defined(_TRACE_PROFILE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PROFILE_H
+
+#include linux/tracepoint.h
+
+
+struct pt_regs;
+
+/**
+ * profile_tick - called when the profiling timer ticks
+ * @type:  profiling tick type, generally @CPU_PROFILING
+ * @regs:  pointer to struct pt_regs*
+ */
+
+TRACE_EVENT(profile_tick,
+   TP_PROTO(int type, struct pt_regs *regs),
+   TP_ARGS(type, regs),
+   TP_STRUCT__entry(
+   __field( int,   type)
+   __field( struct pt_regs*,   regs)
+   ),
+   TP_fast_assign(
+   __entry-type   = type;
+   __entry-regs   = regs;
+   ),
+   TP_printk(type=%d ip=%p, __entry-type,
+ instruction_pointer(__entry-regs))
+);
+
+
+#endif /*  _TRACE_PROFILE_H */
+
+/* This part must be outside protection */
+#include trace/define_trace.h
diff --git a/kernel/profile.c b/kernel/profile.c
index dc3384e..d61f921 100644
--- a/kernel/profile.c
+++ b/kernel/profile.c
@@ -29,6 +29,9 @@
 #include asm/irq_regs.h
 #include asm/ptrace.h
 
+#define CREATE_TRACE_POINTS
+#include trace/events/profile.h
+
 struct profile_hit {
u32 pc, hits;
 };
@@ -414,6 +417,8 @@ void profile_tick(int type)
 {
struct pt_regs *regs = get_irq_regs();
 
+   trace_profile_tick(type, regs);
+
if (!user_mode(regs)  prof_cpu_mask != NULL 
cpumask_test_cpu(smp_processor_id(), prof_cpu_mask))
profile_hit(type, (void *)profile_pc(regs));
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >