Just to note that the implementation of “jstack -F” is not at all suitable for profiling since has a very high overhead (it attaches a debugger to the process).
/Staffan > On 5 dec 2014, at 20:22, Volker Simonis <volker.simo...@gmail.com> wrote: > > Hi Brendan, > > I'm still not understanding who is taking the actual stack traces (let > alone the symbols) in your examples. Is this done by 'perf' itself > based only on the frame pointer? > > As I wrote before, this is pretty hard to get right for a JVM, but > there are good approximations. Have you looked at the 'jstack' tool > which is part of the JDK? If you run it on a Java process, it will > give you exact stack traces with full inlining information. However > this only works at safepoints so it is probably not suitable for > profiling with performance counters. But you can also use 'jstack -F > -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most > of the time even with inlined Java frames. This is probably the best > you can get when interrupting a running JVM at an arbitrary point in > time. As you mentioned in one of your blogs, the VM can be in the > C-Library or even in the kernel at that time which don't preserve the > frame pointer either. So it will be already hard to even walk up to > the first Java frame. > > But nevertheless, if the output of 'jstack -F -m' is "good enough" for > your purpose, you can implement something similar in 'perf' or a > helper library of 'perf' and be happy (I don't actually know how perf > takes stack traces but I suppose there may some kind of callback > mechanism for walking unknown frames). This is actually not so hard. > I've recently implemented a "print_native_stack()" function within > hotspot itself (you can call it for example from gdb during debugging > - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4). > Maye you could call this functions directly from 'perf' if perf > attaches with ptrace to the process (I assume it does or how else > could it walk the stack)? > > These were just some random thoughts with the hope that they may be helpful. > > Regards, > Volker > > PS: by the way - the flame graphs look really impressive and it would > be really nice to have something like this for Java. > > > On Thu, Dec 4, 2014 at 11:55 PM, Brendan Gregg > <brendan.d.gr...@gmail.com> wrote: >> G'Day, >> >> I've hacked hotspot to return the frame pointer, in part to see what this >> involves, and also to have a working prototype for analysis. Along with an >> agent to resolve symbols, this has allowed full stack profiling using Linux >> perf_events. The following flame graphs show the resulting profiles. >> >> A mixed mode CPU flame graph of a vert.x benchmark (click to zoom): >> >> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg >> >> Same thing, but this time disabling inlining, to show more frames: >> >> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-flamegraph.svg >> >> As expected, performance is worse without inlining. You can compare the >> flame graphs side by side to see why. Less time spent doing work / I/O! >> >> https://github.com/brendangregg/Misc/blob/master/java/openjdk8_b132-fp.diff >> is my patch, and currently only works for x86-64. It removes RBP from the >> register pools, and inserts "mov(rbp, rsp)" into two function prologues. It >> is also unsupported: use at your own risk. I'm not a veteran hotspot >> engineer, so chances I messed something up are high. >> >> I'd love to be able to enable frame pointers in Oracle JDK, eg, with an >> -XX:+NoOmitFramePointer option. It could be put under >> -XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long >> as we had some way to turn it on. If someone wants to include (improve, >> rewrite) my patch, please do. >> >> I don't have much perf data yet, but on the vert.x microbenchmark it looked >> like returning the frame pointer cost 2.6% performance. I hope that's >> somewhat worst-case for production workloads. (I was also able to recover >> the 2.6% by fine tuning other options, so were this a production change, I'd >> be hoping not to regress performance at all.) >> >> We've discussed this before >> (http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-October/thread.html#15939). >> The Solaris-assisted approach that Serguei Spitsyn described (JDK-6617153) >> should work very well. The JVM can run as-is, full stacks can be generated >> on-demand, and symbols should always be correct. >> >> The frame pointer approach costs a little performance, and only shows >> partial stacks after inlining (unless you disable inlining, but that can >> cost >40% performance). There is the other issue Volker Simonis mentioned as >> well, where some stacks may not be profiled correctly. And, if you are >> unlucky, symbols can move during the profile, so any static perf-map-agent >> map will translate some incorrectly (I've considered developing a way to >> detect this, and highlight such frames as dubious.) >> >> At Netflix we are mostly Java on Linux. Switching to Oracle Solaris for this >> feature is going to be a tough sell, especially when the value of full stack >> profiling isn't widely understood. I personally think it might be a bit >> easier if a -XX:+NoOmitFramePointer option existed, so Linux users can try >> the feature, then consider the better Solaris version after gaining solid >> experience on why it is so important. >> >> We recently blogged about the value of stack profiling and flame graphs, >> http://techblog.netflix.com/2014/11/nodejs-in-flames.html, although this was >> for Node.js, which already has frame pointer support. >> >> If anyone wants to try generating these mixed mode CPU flame graphs >> themselves (in a test environment!), the first step is to compile OpenJDK 8 >> b132 with the previous patch, and get that running. Also install the >> packages for the "perf" command. The remaining steps would be something >> like: >> >> # git clone --depth=1 https://github.com/brendangregg/FlameGraph >> # git clone --depth=1 https://github.com/jrudolph/perf-map-agent >> # cd perf-map-agent >> # export JAVA_HOME=/... >> # cmake . >> # make >> # perf record -F 99 -p `pgrep -n java` -g -- sleep 30 >> # java -cp attach-main.jar:$JAVA_HOME/lib/tools.jar >> net.virtualvoid.perf.AttachOnce `pgrep -n java` >> # perf script > ../FlameGraph/out.stacks >> # cd ../FlameGraph >> # ./stackcollapse-perf.pl < out.stacks | ./flamegraph.pl --color=java > >> out.svg >> >> Finally, if you are new to CPU flame graphs, see >> http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html . >> >> Brendan