> On 8 dec 2014, at 16:05, Maynard Johnson <mayna...@us.ibm.com> wrote: > > On 12/05/2014 05:09 PM, Brendan Gregg wrote: >> G'Day Volker, >> >> On Fri, Dec 5, 2014 at 11:22 AM, Volker Simonis >> <volker.simo...@gmail.com> wrote: >>> Hi Brendan, >>> >>> I'm still not understanding who is taking the actual stack traces (let >>> alone the symbols) in your examples. Is this done by 'perf' itself >>> based only on the frame pointer? >> >> perf is walking the frame pointers. > Volker, to be specific, the perf profiling tool has a user space part and a > kernel space part. The collection of stack traces is done by the kernel. > When a user-specified event (or series of events) occur, the process > being profiled is interrupted and the sampled information (which can > optionally include a full stack trace) is made available to the user space > perf tool to be saved to a file for future post-profiling processing. > > During the profiling phase, the perf tool collects information about the > profiled process's memory mappings, which allows for this address-to-symbol. > resolution, It's in the post-profiling phase where the sampled instruction, > along with its associated stack trace, are resolved to the appropriate symbol > (i.e., function/method) in a specific binary file (e.g., library, exectuable). > > And if the VM creates a /tmp/perf-<PID>.map file to save information about > JITed methods, the perf's post-profiling tool will find it and use it to > correlate sampled addresses it collected from the VM's executable anonymous > memory mappings to the method names.
Is there a way in this .map file to express that different JITed methods are located at the same address at different times? This typically happens a lot when classes and their JITed methods are being unloaded from the VM. That space will be reused by a different method. I’m guessing this would confuse perf. /Staffan > > -Maynard >> >> A JVMTI agent, perf-map-agent, is providing a map file for symbol >> translation under /tmp/perf-PID.map. Linux perf already hunts for such >> a file when doing symbol translation. >> >>> >>> As I wrote before, this is pretty hard to get right for a JVM, but >>> there are good approximations. Have you looked at the 'jstack' tool >>> which is part of the JDK? If you run it on a Java process, it will >>> give you exact stack traces with full inlining information. However >>> this only works at safepoints so it is probably not suitable for >>> profiling with performance counters. >> >> Right, jstack works, and I get full correct stacks. I do really want >> to take stacks at any moment: not just CPU samples, but when tracing >> kernel TCP events, or PMC cache miss profiling, etc. perf can already >> do many advanced tracing and profiling activities. I just needed the >> Java stacks for context. >> >>> But you can also use 'jstack -F >>> -m' which gives you a 'best effort' mixed Java/C++ stacaktrace (most >>> of the time even with inlined Java frames. This is probably the best >>> you can get when interrupting a running JVM at an arbitrary point in >>> time. As you mentioned in one of your blogs, the VM can be in the >>> C-Library or even in the kernel at that time which don't preserve the >>> frame pointer either. So it will be already hard to even walk up to >>> the first Java frame. >> >> Well, the JVMs I'm looking at are already built with >> -fno-omit-frame-pointer (which is good). I edited hotspot to preserve >> it as well. >> >> Here's before I changed hotspot: >> >> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-nofp.svg >> >> Yes, most stacks are clearly broken. >> >> After changing hotspot: >> >> http://www.brendangregg.com/FlameGraphs/cpu-mixedmode-vertx.svg >> >> It's looking pretty good. If you look carefully on the far left and >> right, there are 0.8% stacks in read() and write() directly from java, >> which may well be broken (unless a java thread is calling these >> directly; there could also be some gcc inlining going on). Even if >> they are broken, I can see 98% of my profile. Plus, I'd be interested >> to know what exactly is reusing the frame pointer, so we could fix >> that too. >> >> The Java stacks themselves are also about a third as deep as they >> should be, due to inlining. >> >>> >>> But nevertheless, if the output of 'jstack -F -m' is "good enough" for >>> your purpose, you can implement something similar in 'perf' or a >>> helper library of 'perf' and be happy (I don't actually know how perf >>> takes stack traces but I suppose there may some kind of callback >>> mechanism for walking unknown frames). This is actually not so hard. >>> I've recently implemented a "print_native_stack()" function within >>> hotspot itself (you can call it for example from gdb during debugging >>> - see http://hg.openjdk.java.net/jdk9/dev/hotspot/rev/86183a940db4). >>> Maye you could call this functions directly from 'perf' if perf >>> attaches with ptrace to the process (I assume it does or how else >>> could it walk the stack)? >> >> An OS-cooperative stack walker would be great, and I think the hotspot >> team is already doing this for Oracle Solaris. Thanks for the code >> too, this is pretty interesting. >> >> jstack -F -m eats 0.5s of CPU for me, so it would need work to make >> this into a 99 Hertz-capable profiler. Plus I'd like to pick arbitrary >> kernel functions or tracepoints and get Java context from them, too. >> Eg, TCP functions, memory allocation, disk I/O, etc. >> >>> >>> These were just some random thoughts with the hope that they may be helpful. >> >> Yes, thanks! >> >> Brendan >> >