A JVMTI agent, perf-map-agent, is providing a map file for symbol
translation under /tmp/perf-PID.map. Linux perf already hunts for such
a file when doing symbol translation.
As I wrote before, this is pretty hard to get right for a JVM, but
there are good approximations. Have you looked at the 'jstack' tool
which is part of the JDK? If you run it on a Java process, it will
give you exact stack traces with full inlining information. However
this only works at safepoints so it is probably not suitable for
profiling with performance counters.
Right, jstack works, and I get full correct stacks. I do really want
to take stacks at any moment: not just CPU samples, but when tracing
kernel TCP events, or PMC cache miss profiling, etc. perf can already
do many advanced tracing and profiling activities. I just needed the
Java stacks for context.
But you can also use 'jstack -F
-m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most
of the time even with inlined Java frames. This is probably the best
you can get when interrupting a running JVM at an arbitrary point in
time. As you mentioned in one of your blogs, the VM can be in the
C-Library or even in the kernel at that time which don't preserve the
frame pointer either. So it will be already hard to even walk up to
the first Java frame.
Well, the JVMs I'm looking at are already built with
-fno-omit-frame-pointer (which is good). I edited hotspot to preserve
it as well.
Here's before I changed hotspot:
http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-nofp.svg
Yes, most stacks are clearly broken.
After changing hotspot:
http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg
It's looking pretty good. If you look carefully on the far left and
right, there are 0.8% stacks in read() and write() directly from java,
which may well be broken (unless a java thread is calling these
directly; there could also be some gcc inlining going on). Even if
they are broken, I can see 98% of my profile. Plus, I'd be interested
to know what exactly is reusing the frame pointer, so we could fix
that too.
The Java stacks themselves are also about a third as deep as they
should be, due to inlining.
But nevertheless, if the output of 'jstack -F -m' is "good enough" for
your purpose, you can implement something similar in 'perf' or a
helper library of 'perf' and be happy (I don't actually know how perf
takes stack traces but I suppose there may some kind of callback
mechanism for walking unknown frames). This is actually not so hard.
I've recently implemented a "print_native_stack()" function within
hotspot itself (you can call it for example from gdb during debugging
- see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4).
Maye you could call this functions directly from 'perf' if perf
attaches with ptrace to the process (I assume it does or how else
could it walk the stack)?
An OS-cooperative stack walker would be great, and I think the hotspot
team is already doing this for Oracle Solaris. Thanks for the code
too, this is pretty interesting.
jstack -F -m eats 0.5s of CPU for me, so it would need work to make
this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary
kernel functions or tracepoints and get Java context from them, too.
Eg, TCP functions, memory allocation, disk I/O, etc.
These were just some random thoughts with the hope that they may be helpful.
Yes, thanks!
Brendan