Hi everybody, I really don't want to prevent the "Good Enough" solution and as far as I understand, this solution doesn't require any code changes to HotSpot, right? It will just add an additional Python artifact to the OpenJDK delivery which will be used by gdb.
But in general I have to agree with Erik and Staffan. Getting a mixed stack trace of a running Java process or a core file is notoriously hard. The best we have today is the native, built-in stack walking code in HotSpot which is used for hs_err files and which can be also called from within gdb (see print_native_stack() in src/share/vm/utilities/debug.cpp). If you look at that code (and at the functions it calls like frame.sender(), os::get_sender_for_C_frame(&fr), fraame.is_java_frame(), frame.is_native_frame(), frame.is_runtime_frame(), etc...) you will see that there are a lot of special cases to handle. And even that code is not perfect. I can easily show you examples where it doesn't work (mostly at the beginning of methods/stubs when the new frame is being set up but still not complete). All this complicated (and platform dependent) code is replicated in Java in the SA agent. You can easily verify that it isn't 100% perfect by running "jstack -m -F <java_pid>" against a running Java VM. Besides the problem of having frames in inconsistent state another big problem is the fact that we can only reliably unwind inlined Java frames from a native frame at safepoints. But that's of course not guaranteed if we use "jstack -F" or if we ask for a stack-trace in gdb at an arbitrary PC. Now if we replicate this SA code one more time in a Python library for GDB, you'll probably agree that it can't work more reliably than the original SA code. This may be good enough for some use cases, but it won't be perfect. I'm not a gdb/DWARF expert but I think what we really need is to generate debug information for all the generated code. We need to know for every single PC of generated code the corresponding frame information and how to get to the previous frame. I know it's possible and I know that gdb has callbacks to consume this debug information which is generated at runtime (see [1]) although I've never programmed it myself. LLVM seems to use this technique and has some documentation available ([2,3]). I suppose this is the direction Erik wanted to go and I think that would be the right way. Regards, Volker [1] https://sourceware.org/gdb/onlinedocs/gdb/JIT-Interface.html [2] http://llvm.org/docs/DebuggingJITedCode.html [3] http://llvm.org/releases/2.9/docs/DebuggingJITedCode.html On Mon, Feb 16, 2015 at 10:40 AM, Andrew Haley <a...@redhat.com> wrote: > On 15/02/15 19:55, Staffan Larsen wrote: > >> I think what Erik suggested was if there was some way the JVM could >> expose data in a format that is easy to interpret by other tools >> (such as the python gdb plugin, but also plugins for other >> debuggers, or SA). Of course this would have to be data, not code, >> so that it would be available in core files as well. I haven’t seen >> the python module you have written so I don’t know how complex is >> it, but we should think of ways to make such code even simpler if >> possible. If we had data exposed in an easy-to-read format it would >> perhaps make maintenance of these tools simpler. We have a problem >> with SA today that it is way to dependent on the code in the JVM - a >> small change in data structures in the JVM will break SA, something >> we hare looking for solutions to. > > I'm sure that's true, but let's not allow the Best to be the enemy of > the Good Enough; this is a contribution that we can use today. > > Andrew.