G'Day, I've hacked hotspot to return the frame pointer, in part to see what this involves, and also to have a working prototype for analysis. Along with an agent to resolve symbols, this has allowed full stack profiling using Linux perf_events. The following flame graphs show the resulting profiles.
A mixed mode CPU flame graph of a vert.x benchmark (click to zoom): http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg Same thing, but this time disabling inlining, to show more frames: http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-flamegraph.svg As expected, performance is worse without inlining. You can compare the flame graphs side by side to see why. Less time spent doing work / I/O! https://github.com/brendangregg/Misc/blob/master/java/openjdk8_b132-fp.diff is my patch, and currently only works for x86-64. It removes RBP from the register pools, and inserts "mov(rbp, rsp)" into two function prologues. It is also unsupported: use at your own risk. I'm not a veteran hotspot engineer, so chances I messed something up are high. I'd love to be able to enable frame pointers in Oracle JDK, eg, with an -XX:+NoOmitFramePointer option. It could be put under -XX:+UnlockDiagnosticVMOptions or XX:+UnlockExperimentalVMOptions. So long as we had some way to turn it on. If someone wants to include (improve, rewrite) my patch, please do. I don't have much perf data yet, but on the vert.x microbenchmark it looked like returning the frame pointer cost 2.6% performance. I hope that's somewhat worst-case for production workloads. (I was also able to recover the 2.6% by fine tuning other options, so were this a production change, I'd be hoping not to regress performance at all.) We've discussed this before ( http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2014-October/thread.html#15939). The Solaris-assisted approach that Serguei Spitsyn described (JDK-6617153) should work very well. The JVM can run as-is, full stacks can be generated on-demand, and symbols should always be correct. The frame pointer approach costs a little performance, and only shows partial stacks after inlining (unless you disable inlining, but that can cost >40% performance). There is the other issue Volker Simonis mentioned as well, where some stacks may not be profiled correctly. And, if you are unlucky, symbols can move during the profile, so any static perf-map-agent map will translate some incorrectly (I've considered developing a way to detect this, and highlight such frames as dubious.) At Netflix we are mostly Java on Linux. Switching to Oracle Solaris for this feature is going to be a tough sell, especially when the value of full stack profiling isn't widely understood. I personally think it might be a bit easier if a -XX:+NoOmitFramePointer option existed, so Linux users can try the feature, then consider the better Solaris version after gaining solid experience on why it is so important. We recently blogged about the value of stack profiling and flame graphs, http://techblog.netflix.com/2014/11/nodejs-in-flames.html, although this was for Node.js, which already has frame pointer support. If anyone wants to try generating these mixed mode CPU flame graphs themselves (in a test environment!), the first step is to compile OpenJDK 8 b132 with the previous patch, and get that running. Also install the packages for the "perf" command. The remaining steps would be something like: # git clone --depth=1 https://github.com/brendangregg/FlameGraph # git clone --depth=1 https://github.com/jrudolph/perf-map-agent # cd perf-map-agent # export JAVA_HOME=/... # cmake . # make # perf record -F 99 -p `pgrep -n java` -g -- sleep 30 # java -cp attach-main.jar:$JAVA_HOME/lib/tools.jar net.virtualvoid.perf.AttachOnce `pgrep -n java` # perf script > ../FlameGraph/out.stacks # cd ../FlameGraph # ./stackcollapse-perf.pl < out.stacks | ./flamegraph.pl --color=java > out.svg Finally, if you are new to CPU flame graphs, see http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html . Brendan