Hello, As it's been quite a while already, I wanted to nudge this RFR forward. Who would be best qualified to review this PR?
Thank you! Ludovic On Wed, Aug 18, 2021 at 9:21 PM Marcus Hirt <h...@openjdk.java.net> wrote: > On Mon, 19 Jul 2021 09:25:59 GMT, Ludovic Henry <luhe...@openjdk.org> > wrote: > > >> When the signal sent for AsyncGetCallTrace or JFR would land on a > runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled > method, it wouldn't be able to detect the sender (caller) frame for > multiple reasons. This patch fixes these cases through adding > CodeBlob-specific frame parser which are in the best position to know how a > frame is setup. > >> > >> The following examples have been profiled with honest-profiler which > uses `AsyncGetCallTrace`. > >> > >> # `Prof1` > >> > >> public class Prof1 { > >> > >> public static void main(String[] args) { > >> StringBuilder sb = new StringBuilder(); > >> for (int i = 0; i < 1000000; i++) { > >> sb.append("ab"); > >> sb.delete(0, 1); > >> } > >> System.out.println(sb.length()); > >> } > >> } > >> > >> > >> - Baseline: > >> > >> Flat Profile (by method): > >> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] > >> (t 0.5,s 0.2) Prof1::main > >> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append > >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > >> (t 0.0,s 0.0) > java.lang.AbstractStringBuilder::ensureCapacityInternal > >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift > >> (t 0.0,s 0.0) java.lang.String::getBytes > >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > >> (t 0.0,s 0.0) java.lang.StringBuilder::delete > >> (t 0.2,s 0.0) java.lang.StringBuilder::append > >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete > >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > >> > >> - With `StubRoutinesBlob::FrameParser`: > >> > >> Flat Profile (by method): > >> (t 98.7,s 98.7) > java.lang.AbstractStringBuilder::ensureCapacityInternal > >> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete > >> (t 99.8,s 0.2) Prof1::main > >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > >> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] > >> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append > >> (t 98.8,s 0.0) java.lang.StringBuilder::append > >> (t 0.9,s 0.0) java.lang.StringBuilder::delete > >> > >> > >> # `Prof2` > >> > >> import java.util.function.Supplier; > >> > >> public class Prof2 { > >> > >> public static void main(String[] args) { > >> var rand = new java.util.Random(0); > >> Supplier[] suppliers = { > >> () -> 0, > >> () -> 1, > >> () -> 2, > >> () -> 3, > >> }; > >> > >> long sum = 0; > >> for (int i = 0; i >= 0; i++) { > >> sum += (int)suppliers[i % suppliers.length].get(); > >> } > >> } > >> } > >> > >> > >> - Baseline: > >> > >> Flat Profile (by method): > >> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] > >> (t 39.2,s 35.2) Prof2::main > >> (t 1.4,s 1.4) Prof2::lambda$main$3 > >> (t 1.0,s 1.0) Prof2::lambda$main$2 > >> (t 0.9,s 0.9) Prof2::lambda$main$1 > >> (t 0.7,s 0.7) Prof2::lambda$main$0 > >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > >> (t 0.0,s 0.0) java.lang.Thread::exit > >> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get > >> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get > >> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get > >> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get > >> > >> > >> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: > >> > >> Flat Profile (by method): > >> (t 74.1,s 70.3) Prof2::main > >> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get > >> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get > >> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > >> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > >> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] > >> (t 1.2,s 1.2) Prof2::lambda$main$2 > >> (t 0.9,s 0.9) Prof2::lambda$main$3 > >> (t 0.9,s 0.9) Prof2::lambda$main$1 > >> (t 0.7,s 0.7) Prof2::lambda$main$0 > >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > > > Ludovic Henry has updated the pull request with a new target base due to > a merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains eight additional > commits since the last revision: > > > > - Remove FrameParser and the required allocation > > > > The need for the allocation would be it non async-safe. However, > > AsyncGetCallTrace is async-safe and thus can't allow for allocations. > > - Merge branch 'master' of https://github.com/openjdk/jdk into > fix-8178287 > > - Fix comments > > - Disable checks in FrameParser when known to be safe > > - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub > > > > The program is the following: > > > > ``` > > import java.util.function.Supplier; > > > > public class Prof2 { > > > > public static void main(String[] args) { > > var rand = new java.util.Random(0); > > Supplier[] suppliers = { > > () -> 0, > > () -> 1, > > () -> 2, > > () -> 3, > > }; > > > > long sum = 0; > > for (int i = 0; i >= 0; i++) { > > sum += (int)suppliers[i % suppliers.length].get(); > > } > > } > > } > > ``` > > > > The results are as follows: > > > > - Baseline (from previous commit): > > > > Flat Profile (by method): > > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] > > (t 40.3,s 36.1) Prof2::main > > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get > > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get > > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > > (t 1.1,s 1.1) Prof2::lambda$main$2 > > (t 1.1,s 1.1) Prof2::lambda$main$0 > > (t 1.0,s 1.0) Prof2::lambda$main$1 > > (t 0.9,s 0.9) Prof2::lambda$main$3 > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > > > - With unwind from vtable stub > > > > Flat Profile (by method): > > (t 74.1,s 70.3) Prof2::main > > (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get > > (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get > > (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > > (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > > (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] > > (t 1.2,s 1.2) Prof2::lambda$main$2 > > (t 0.9,s 0.9) Prof2::lambda$main$3 > > (t 0.9,s 0.9) Prof2::lambda$main$1 > > (t 0.7,s 0.7) Prof2::lambda$main$0 > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > > > We attribute the vtable stub to the caller and not the callee, which > is > > already an improvement from the existing case. > > - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog > > > > When sampling hits the prolog of a method, Hotspot assumes it's unable > > to parse the frame. This change allows to parse such frame on x86 by > > specializing which instruction it's hitting in the prolog. > > > > The results are as follows: > > > > - Baseline: > > > > Flat Profile (by method): > > (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] > > (t 39.2,s 35.2) Prof2::main > > (t 1.4,s 1.4) Prof2::lambda$main$3 > > (t 1.0,s 1.0) Prof2::lambda$main$2 > > (t 0.9,s 0.9) Prof2::lambda$main$1 > > (t 0.7,s 0.7) Prof2::lambda$main$0 > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > (t 0.0,s 0.0) java.lang.Thread::exit > > (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get > > (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get > > (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get > > (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get > > > > - With incomplete frame parsing: > > > > Flat Profile (by method): > > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] > > (t 40.3,s 36.1) Prof2::main > > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get > > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get > > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get > > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get > > (t 1.1,s 1.1) Prof2::lambda$main$2 > > (t 1.1,s 1.1) Prof2::lambda$main$0 > > (t 1.0,s 1.0) Prof2::lambda$main$1 > > (t 0.9,s 0.9) Prof2::lambda$main$3 > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > (t 0.0,s 0.0) java.util.Locale::getInstance > > (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6] > > (t 0.0,s 0.0) > jdk.internal.loader.BuiltinClassLoader::loadClassOrNull > > (t 0.0,s 0.0) java.lang.ClassLoader::loadClass > > (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString > > (t 0.0,s 0.0) java.lang.Class::forName0 > > (t 0.0,s 0.0) java.util.Locale::initDefault > > (t 0.0,s 0.0) > jdk.internal.loader.BuiltinClassLoader::loadClass > > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader > > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource > > (t 0.0,s 0.0) java.lang.String::toLowerCase > > (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass > > (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain > > (t 0.0,s 0.0) java.util.Locale::<clinit> > > (t 0.0,s 0.0) > jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull > > (t 0.0,s 0.0) > jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass > > (t 0.0,s 0.0) java.lang.Class::forName > > > > The program is as follows: > > > > ``` > > import java.util.function.Supplier; > > > > public class Prof2 { > > > > public static void main(String[] args) { > > var rand = new java.util.Random(0); > > Supplier[] suppliers = { > > () -> 0, > > () -> 1, > > () -> 2, > > () -> 3, > > }; > > > > long sum = 0; > > for (int i = 0; i >= 0; i++) { > > sum += (int)suppliers[i % suppliers.length].get(); > > } > > } > > } > > ``` > > > > We see that the results are particularely useful in this case as the > > methods are very short (it only returns an integer), and the > probability > > of hitting the prolog is then very high. > > - Allow AsyncGetCallTrace and JFR to walk a stub frame > > > > When the signal sent for AsyncGetCallTrace or JFR would land on a stub > > (like arraycopy), it wouldn't be able to detect the sender (caller) > > frame because `_cb->frame_size() == 0`. > > > > Because we fully control how the prolog and epilog of stub code is > > generated, we know there are two cases: > > 1. A stack frame is allocated via macroAssembler->enter(), and > consists > > in `push rbp; mov rsp, rbp;`. > > 2. No stack frames are allocated and rbp is left unchanged and rsp is > > decremented with the `call` instruction that push the return `pc` on > the > > stack. > > > > For case 1., we can easily know the sender frame by simply looking at > > rbp, especially since we know that all stubs preserve the frame > pointer > > (on x86 at least). > > > > For case 2., we end up returning the sender's sender, but that already > > gives us more information than what we have today. > > > > The results are as follows: > > > > - Baseline: > > > > Flat Profile (by method): > > (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] > > (t 0.5,s 0.2) Prof1::main > > (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > (t 0.0,s 0.0) > java.lang.AbstractStringBuilder::ensureCapacityInternal > > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift > > (t 0.0,s 0.0) java.lang.String::getBytes > > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > > (t 0.0,s 0.0) java.lang.StringBuilder::delete > > (t 0.2,s 0.0) java.lang.StringBuilder::append > > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete > > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt > > > > - With StubRoutinesBlob::FrameParser > > > > Flat Profile (by method): > > (t 98.7,s 98.7) > java.lang.AbstractStringBuilder::ensureCapacityInternal > > (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete > > (t 99.8,s 0.2) Prof1::main > > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] > > (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] > > (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append > > (t 98.8,s 0.0) java.lang.StringBuilder::append > > (t 0.9,s 0.0) java.lang.StringBuilder::delete > > > > The program is as follows: > > > > ``` > > public class Prof1 { > > > > public static void main(String[] args) { > > StringBuilder sb = new StringBuilder(); > > for (int i = 0; i < 1000000; i++) { > > sb.append("ab"); > > sb.delete(0, 1); > > } > > System.out.println(sb.length()); > > } > > } > > ``` > > > > We now account for the arraycopy stub which is called by > > AbstractStringBuilder::ensureCapacityInternal. It was previously > ignored > > because it would not know how to parse the frame for the arraycopy > stub > > and would fall in the AGCT::Unknown Java[ERR=-5] section. > > > > However, it still isn't perfect since it doesn't point to the > arraycopy stub > > directly. > > - Extract sender frame parsing to CodeBlock::FrameParser > > > > Whether and how a frame is setup is controlled by the code generator > > for the specific CodeBlock. The CodeBlock is then in the best place > to know how > > to parse the sender's frame from the current frame in the given > CodeBlock. > > > > This refactoring proposes to extract this parsing out of `frame` and > into a > > `CodeBlock::FrameParser`. This FrameParser is then specialized in the > relevant > > inherited children of CodeBlock. > > > > This change is to largely facilitate adding new supported cases for > JDK-8252417 > > like runtime stubs. > > Ok. Do you know who should be the primary reviewer? Rickard Bäckman? Nils > Eliasson? > > ------------- > > PR: https://git.openjdk.java.net/jdk/pull/4436 >