> When the signal sent for AsyncGetCallTrace or JFR would land on a runtime > stub (like arraycopy), a vtable stub, or the prolog of a compiled method, it > wouldn't be able to detect the sender (caller) frame for multiple reasons. > This patch fixes these cases through adding CodeBlob-specific frame parser > which are in the best position to know how a frame is setup. > > The following examples have been profiled with honest-profiler which uses > `AsyncGetCallTrace`. > > # `Prof1` > > public class Prof1 { > > public static void main(String[] args) { > StringBuilder sb = new StringBuilder(); > for (int i = 0; i < 1000000; i++) { > sb.append("ab"); > sb.delete(0, 1); > } > System.out.println(sb.length()); > } > } > > > - Baseline: > > Flat Profile (by method): > (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] > (t 0.5,s 0.2) Prof1::main > (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) > java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift > (t 0.0,s 0.0) java.lang.String::getBytes > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > (t 0.0,s 0.0) java.lang.StringBuilder::delete > (t 0.2,s 0.0) java.lang.StringBuilder::append > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > > - With `StubRoutinesBlob::FrameParser`: > > Flat Profile (by method): > (t 98.7,s 98.7) > java.lang.AbstractStringBuilder::ensureCapacityInternal > (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete > (t 99.8,s 0.2) Prof1::main > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] > (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append > (t 98.8,s 0.0) java.lang.StringBuilder::append > (t 0.9,s 0.0) java.lang.StringBuilder::delete > > > # `Prof2` > > import java.util.function.Supplier; > > public class Prof2 { > > public static void main(String[] args) { > var rand = new java.util.Random(0); > Supplier[] suppliers = { > () -> 0, > () -> 1, > () -> 2, > () -> 3, > }; > > long sum = 0; > for (int i = 0; i >= 0; i++) { > sum += (int)suppliers[i % suppliers.length].get(); > } > } > } > > > - Baseline: > > Flat Profile (by method): > (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] > (t 39.2,s 35.2) Prof2::main > (t 1.4,s 1.4) Prof2::lambda$main$3 > (t 1.0,s 1.0) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > (t 0.0,s 0.0) java.lang.Thread::exit > (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get > (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get > (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get > (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get > > > - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: > > Flat Profile (by method): > (t 74.1,s 70.3) Prof2::main > (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get > (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get > (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] > (t 1.2,s 1.2) Prof2::lambda$main$2 > (t 0.9,s 0.9) Prof2::lambda$main$3 > (t 0.9,s 0.9) Prof2::lambda$main$1 > (t 0.7,s 0.7) Prof2::lambda$main$0 > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3]
Ludovic Henry has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains five new commits since the last revision: - Disable checks in FrameParser when known to be safe - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub The program is the following: ``` import java.util.function.Supplier; public class Prof2 { public static void main(String[] args) { var rand = new java.util.Random(0); Supplier[] suppliers = { () -> 0, () -> 1, () -> 2, () -> 3, }; long sum = 0; for (int i = 0; i >= 0; i++) { sum += (int)suppliers[i % suppliers.length].get(); } } } ``` The results are as follows: - Baseline (from previous commit): Flat Profile (by method): (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] (t 40.3,s 36.1) Prof2::main (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 1.1,s 1.1) Prof2::lambda$main$2 (t 1.1,s 1.1) Prof2::lambda$main$0 (t 1.0,s 1.0) Prof2::lambda$main$1 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] - With unwind from vtable stub Flat Profile (by method): (t 74.1,s 70.3) Prof2::main (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] (t 1.2,s 1.2) Prof2::lambda$main$2 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.9,s 0.9) Prof2::lambda$main$1 (t 0.7,s 0.7) Prof2::lambda$main$0 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] We attribute the vtable stub to the caller and not the callee, which is already an improvement from the existing case. - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog When sampling hits the prolog of a method, Hotspot assumes it's unable to parse the frame. This change allows to parse such frame on x86 by specializing which instruction it's hitting in the prolog. The results are as follows: - Baseline: Flat Profile (by method): (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] (t 39.2,s 35.2) Prof2::main (t 1.4,s 1.4) Prof2::lambda$main$3 (t 1.0,s 1.0) Prof2::lambda$main$2 (t 0.9,s 0.9) Prof2::lambda$main$1 (t 0.7,s 0.7) Prof2::lambda$main$0 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.lang.Thread::exit (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get - With incomplete frame parsing: Flat Profile (by method): (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] (t 40.3,s 36.1) Prof2::main (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get (t 1.1,s 1.1) Prof2::lambda$main$2 (t 1.1,s 1.1) Prof2::lambda$main$0 (t 1.0,s 1.0) Prof2::lambda$main$1 (t 0.9,s 0.9) Prof2::lambda$main$3 (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.util.Locale::getInstance (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6] (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClassOrNull (t 0.0,s 0.0) java.lang.ClassLoader::loadClass (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString (t 0.0,s 0.0) java.lang.Class::forName0 (t 0.0,s 0.0) java.util.Locale::initDefault (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::loadClass (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource (t 0.0,s 0.0) java.lang.String::toLowerCase (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain (t 0.0,s 0.0) java.util.Locale::<clinit> (t 0.0,s 0.0) jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull (t 0.0,s 0.0) jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass (t 0.0,s 0.0) java.lang.Class::forName The program is as follows: ``` import java.util.function.Supplier; public class Prof2 { public static void main(String[] args) { var rand = new java.util.Random(0); Supplier[] suppliers = { () -> 0, () -> 1, () -> 2, () -> 3, }; long sum = 0; for (int i = 0; i >= 0; i++) { sum += (int)suppliers[i % suppliers.length].get(); } } } ``` We see that the results are particularely useful in this case as the methods are very short (it only returns an integer), and the probability of hitting the prolog is then very high. - Allow AsyncGetCallTrace and JFR to walk a stub frame When the signal sent for AsyncGetCallTrace or JFR would land on a stub (like arraycopy), it wouldn't be able to detect the sender (caller) frame because `_cb->frame_size() == 0`. Because we fully control how the prolog and epilog of stub code is generated, we know there are two cases: 1. A stack frame is allocated via macroAssembler->enter(), and consists in `push rbp; mov rsp, rbp;`. 2. No stack frames are allocated and rbp is left unchanged and rsp is decremented with the `call` instruction that push the return `pc` on the stack. For case 1., we can easily know the sender frame by simply looking at rbp, especially since we know that all stubs preserve the frame pointer (on x86 at least). For case 2., we end up returning the sender's sender, but that already gives us more information than what we have today. The results are as follows: - Baseline: Flat Profile (by method): (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] (t 0.5,s 0.2) Prof1::main (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) java.lang.AbstractStringBuilder::ensureCapacityInternal (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift (t 0.0,s 0.0) java.lang.String::getBytes (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt (t 0.0,s 0.0) java.lang.StringBuilder::delete (t 0.2,s 0.0) java.lang.StringBuilder::append (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt - With StubRoutinesBlob::FrameParser Flat Profile (by method): (t 98.7,s 98.7) java.lang.AbstractStringBuilder::ensureCapacityInternal (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete (t 99.8,s 0.2) Prof1::main (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append (t 98.8,s 0.0) java.lang.StringBuilder::append (t 0.9,s 0.0) java.lang.StringBuilder::delete The program is as follows: ``` public class Prof1 { public static void main(String[] args) { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 1000000; i++) { sb.append("ab"); sb.delete(0, 1); } System.out.println(sb.length()); } } ``` We now account for the arraycopy stub which is called by AbstractStringBuilder::ensureCapacityInternal. It was previously ignored because it would not know how to parse the frame for the arraycopy stub and would fall in the AGCT::Unknown Java[ERR=-5] section. However, it still isn't perfect since it doesn't point to the arraycopy stub directly. - Extract sender frame parsing to CodeBlock::FrameParser Whether and how a frame is setup is controlled by the code generator for the specific CodeBlock. The CodeBlock is then in the best place to know how to parse the sender's frame from the current frame in the given CodeBlock. This refactoring proposes to extract this parsing out of `frame` and into a `CodeBlock::FrameParser`. This FrameParser is then specialized in the relevant inherited children of CodeBlock. This change is to largely facilitate adding new supported cases for JDK-8252417 like runtime stubs. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/4436/files - new: https://git.openjdk.java.net/jdk/pull/4436/files/137a0c48..85f218c8 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=4436&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=4436&range=00-01 Stats: 6 lines in 2 files changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/4436.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/4436/head:pull/4436 PR: https://git.openjdk.java.net/jdk/pull/4436