Hi,

I would like propose to

1. Replace duplicated stack walking code with unified API
2. Create a new version of AsyncGetCallTrace, tentatively called 
"AsyncGetCallTrace2", with more information on more frames using the unified API

A demo (as well as this text) is available at 
https://github.com/parttimenerd/asgct2-demo
if you want to see a prototype of this proposal in action.

Unify Stack Walking
================

There are currently multiple implementations of stack walking in JFR and for 
AsyncGetCallTrace. 
They each implement their own extension of vframeStream but with comparable 
features
and check for problematic frames.

My proposal is, therefore, to replace the stack walking code with a unified API 
that
includes all error checking and vframeStream extensions in a single place.
The prosposed new class is called StackWalker and could be part of
`jfr/recorder/stacktrace` [1].
This class also supports getting information on C frames so it can be 
potentially
used for walking stacks in VMError (used to create hs_err files), further
reducing the amount of different stack walking code.

AsyncGetCallTrace2
================

The AsyncGetCallTrace call has seen increasing use in recent years
in profilers like async-profiler.
But it is not really an API (not exported in any header) and
the information on frames it returns is pretty limited 
(only the method and bci for Java frames) which makes implementing
profilers and other tooling harder. Tools like async-profiler
have to resort to complicated code to partially obtain the information
that the JVM already has.
Information that is currently hidden and impossible to obtain is

- whether a compiled frame is inlined (currently only obtainable for the 
topmost compiled frames)
  -  although this can be obtained using JFR 
- C frames that are not at the top of the stack
- compilation level (C1 or C2 compiled)

This information is helpful when profiling and tuning the VM for
a given application and also for profiling code that uses
JNI heavily.

Using the proposed StackWalker class, implementing a new API 
that returns more information on frames is possible 
as a thin wrapper over the StackWalker API [2]. 
This also improves the maintainability as the code used
in this API is used in multiple places and is therefore
also better tested than the previous implementation, see 
[1] for the implementation.

The following describes the proposed API:

```cpp
void AsyncGetCallTrace2(asgct2::CallTrace *trace, jint depth, void* ucontext);
```

The structure of `CallTrace` is the same as the original
`ASGCT_CallTrace` with the same error codes encoded in <= 0
values of `num_frames`.

```cpp
typedef struct {
  JNIEnv *env_id;                   // Env where trace was recorded
  jint num_frames;                  // number of frames in this trace
  CallFrame *frames;                // frames
  void* frame_info;                 // more information on frames
} CallTrace;
```

The only difference is that the `frames` array also contains
information on C frames and the field `frame_info`.
The `frame_info` is currently null and can later be used
for extended information on each frame, being an array with
an element for each frame. But the type of the
elements in this array is implementation specific.
This akin to `compile_info` field in JVMTI's CompiledMethodLoad 
[3] and used for extending the information returned by the
API later.

Protoype
------------

Currently `CallFrame` is implemented in the prototype [4] as

```cpp
typedef struct {
  void *machine_pc;           // program counter, for C and native frames 
(frames of native methods)
  uint8_t type;               // frame type (single byte)
  uint8_t comp_level;         // highest compilation level of a method related 
to a Java frame
  // information from original CallFrame
  jint bci;                   // bci for Java frames
  jmethodID method_id;        // method ID for Java frames
} CallFrame;
```

The `FrameTypeId` is based on the frame type in JFRStackFrame:

```cpp
enum FrameTypeId {
  FRAME_INTERPRETED = 0, 
  FRAME_JIT         = 1, // JIT compiled
  FRAME_INLINE      = 2, // inlined JITed methods
  FRAME_NATIVE      = 3, // native wrapper to call C methods from Java
  FRAME_CPP         = 4  // c/c++/... frames, stub frames have CompLevel_all
};
```

The `comp_level` states the compilation level of the method related to the frame
with higher numbers representing "more" compilation. `0` is defined as
interpreted. It is modeled after the `CompLevel` enum in 
`compiler/compilerDefinitions`:

```cpp
// Enumeration to distinguish tiers of compilation
enum CompLevel {
  // ...
  CompLevel_none              = 0,         // Interpreter
  CompLevel_simple            = 1,         // C1
  CompLevel_limited_profile   = 2,         // C1, invocation & backedge counters
  CompLevel_full_profile      = 3,         // C1, invocation & backedge 
counters + mdo
  CompLevel_full_optimization = 4          // C2 or JVMCI
};
```

The traces produced by this prototype are fairly large
(each frame requires 24 is instead of 16 bytes on 64 bit systems) and some data 
is
duplicated.
The reason for this is that it simplified the extension of async-profiler
for the prototype, as it only extends the data structures of
the original AsyncGetCallTrace API without changing the original fields.

Proposal
------------

But packing the information and reducing duplication is of course possible
if we step away from the former constraint:

```cpp
enum FrameTypeId {
  FRAME_JAVA         = 1, // JIT compiled and interpreted
  FRAME_JAVA_INLINED = 2, // inlined JIT compiled
  FRAME_NATIVE       = 3, // native wrapper to call C methods from Java
  FRAME_STUB         = 4, // VM generated stubs
  FRAME_CPP          = 5  // C/C++/... frames
};

typedef struct {     
  uint8_t type;            // frame type
  uint8_t comp_level;
  uint16_t bci;            // 0 < bci < 65536
  jmethodID method_id;
} JavaFrame;               // used for FRAME_JAVA and FRAME_JAVA_INLINED

typedef struct {
  FrameTypeId type;     // single byte type
  void *machine_pc;
} NonJavaFrame;         // used for FRAME_NATIVE, FRAME_STUB and FRAME_CPP

typedef union {
  FrameTypeId type;     // to distinguish between JavaFrame and NonJavaFrame
  JavaFrame java_frame;
  NonJavaFrame non_java_frame;
} CallFrame;
```

This uses the same amount of space per frame (16 bytes) as the original but 
encodes far more information.

Best regards
Johannes

[1] 
https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/jfr/recorder/stacktrace/stackWalker.hpp

[2] 
https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.cpp****

[3] 
https://docs.oracle.com/javase/8/docs/platform/jvmti/jvmti.html#CompiledMethodLoad

[4] 
https://github.com/parttimenerd/jdk/blob/parttimenerd_asgct2/src/hotspot/share/prims/asgct2.hpp

Reply via email to