G'Day Volker, On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis <volker.simo...@gmail.com> wrote: > Hi Brendan, > > I'm still not understanding who is taking the actual stack traces (let > alone the symbols) in your examples. Is this done by 'perf' itself > based only on the frame pointer?
perf is walking the frame pointers. A JVMTI agent, perf-map-agent, is providing a map file for symbol translation under /tmp/perf-PID.map. Linux perf already hunts for such a file when doing symbol translation. > > As I wrote before, this is pretty hard to get right for a JVM, but > there are good approximations. Have you looked at the 'jstack' tool > which is part of the JDK? If you run it on a Java process, it will > give you exact stack traces with full inlining information. However > this only works at safepoints so it is probably not suitable for > profiling with performance counters. Right, jstack works, and I get full correct stacks. I do really want to take stacks at any moment: not just CPU samples, but when tracing kernel TCP events, or PMC cache miss profiling, etc. perf can already do many advanced tracing and profiling activities. I just needed the Java stacks for context. > But you can also use 'jstack -F > -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most > of the time even with inlined Java frames. This is probably the best > you can get when interrupting a running JVM at an arbitrary point in > time. As you mentioned in one of your blogs, the VM can be in the > C-Library or even in the kernel at that time which don't preserve the > frame pointer either. So it will be already hard to even walk up to > the first Java frame. Well, the JVMs I'm looking at are already built with -fno-omit-frame-pointer (which is good). I edited hotspot to preserve it as well. Here's before I changed hotspot: http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-nofp.svg Yes, most stacks are clearly broken. After changing hotspot: http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg It's looking pretty good. If you look carefully on the far left and right, there are 0.8% stacks in read() and write() directly from java, which may well be broken (unless a java thread is calling these directly; there could also be some gcc inlining going on). Even if they are broken, I can see 98% of my profile. Plus, I'd be interested to know what exactly is reusing the frame pointer, so we could fix that too. The Java stacks themselves are also about a third as deep as they should be, due to inlining. > > But nevertheless, if the output of 'jstack -F -m' is "good enough" for > your purpose, you can implement something similar in 'perf' or a > helper library of 'perf' and be happy (I don't actually know how perf > takes stack traces but I suppose there may some kind of callback > mechanism for walking unknown frames). This is actually not so hard. > I've recently implemented a "print_native_stack()" function within > hotspot itself (you can call it for example from gdb during debugging > - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4). > Maye you could call this functions directly from 'perf' if perf > attaches with ptrace to the process (I assume it does or how else > could it walk the stack)? An OS-cooperative stack walker would be great, and I think the hotspot team is already doing this for Oracle Solaris. Thanks for the code too, this is pretty interesting. jstack -F -m eats 0.5s of CPU for me, so it would need work to make this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary kernel functions or tracepoints and get Java context from them, too. Eg, TCP functions, memory allocation, disk I/O, etc. > > These were just some random thoughts with the hope that they may be helpful. Yes, thanks! Brendan