[Sending again now with the email address properly registered to the mailing lists]
Hello, As it's been quite a while already, I wanted to nudge this RFR forward. Who would be best qualified to review this PR? Thank you! Ludovic On Fri, Aug 20, 2021 at 8:49 AM Ludovic Henry <[email protected]> wrote: > Hello, > > As it's been quite a while already, I wanted to nudge this RFR forward. > Who would be best qualified to review this PR? > > Thank you! > Ludovic > > On Wed, Aug 18, 2021 at 9:21 PM Marcus Hirt <[email protected]> wrote: > >> On Mon, 19 Jul 2021 09:25:59 GMT, Ludovic Henry <[email protected]> >> wrote: >> >> >> When the signal sent for AsyncGetCallTrace or JFR would land on a >> runtime stub (like arraycopy), a vtable stub, or the prolog of a compiled >> method, it wouldn't be able to detect the sender (caller) frame for >> multiple reasons. This patch fixes these cases through adding >> CodeBlob-specific frame parser which are in the best position to know how a >> frame is setup. >> >> >> >> The following examples have been profiled with honest-profiler which >> uses `AsyncGetCallTrace`. >> >> >> >> # `Prof1` >> >> >> >> public class Prof1 { >> >> >> >> public static void main(String[] args) { >> >> StringBuilder sb = new StringBuilder(); >> >> for (int i = 0; i < 1000000; i++) { >> >> sb.append("ab"); >> >> sb.delete(0, 1); >> >> } >> >> System.out.println(sb.length()); >> >> } >> >> } >> >> >> >> >> >> - Baseline: >> >> >> >> Flat Profile (by method): >> >> (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] >> >> (t 0.5,s 0.2) Prof1::main >> >> (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append >> >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> >> (t 0.0,s 0.0) >> java.lang.AbstractStringBuilder::ensureCapacityInternal >> >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift >> >> (t 0.0,s 0.0) java.lang.String::getBytes >> >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> >> (t 0.0,s 0.0) java.lang.StringBuilder::delete >> >> (t 0.2,s 0.0) java.lang.StringBuilder::append >> >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete >> >> (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> >> >> >> - With `StubRoutinesBlob::FrameParser`: >> >> >> >> Flat Profile (by method): >> >> (t 98.7,s 98.7) >> java.lang.AbstractStringBuilder::ensureCapacityInternal >> >> (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete >> >> (t 99.8,s 0.2) Prof1::main >> >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> >> (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] >> >> (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append >> >> (t 98.8,s 0.0) java.lang.StringBuilder::append >> >> (t 0.9,s 0.0) java.lang.StringBuilder::delete >> >> >> >> >> >> # `Prof2` >> >> >> >> import java.util.function.Supplier; >> >> >> >> public class Prof2 { >> >> >> >> public static void main(String[] args) { >> >> var rand = new java.util.Random(0); >> >> Supplier[] suppliers = { >> >> () -> 0, >> >> () -> 1, >> >> () -> 2, >> >> () -> 3, >> >> }; >> >> >> >> long sum = 0; >> >> for (int i = 0; i >= 0; i++) { >> >> sum += (int)suppliers[i % suppliers.length].get(); >> >> } >> >> } >> >> } >> >> >> >> >> >> - Baseline: >> >> >> >> Flat Profile (by method): >> >> (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] >> >> (t 39.2,s 35.2) Prof2::main >> >> (t 1.4,s 1.4) Prof2::lambda$main$3 >> >> (t 1.0,s 1.0) Prof2::lambda$main$2 >> >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> >> (t 0.0,s 0.0) java.lang.Thread::exit >> >> (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get >> >> (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get >> >> (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get >> >> (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get >> >> >> >> >> >> - With `VtableBlob::FrameParser` and `nmethod::FrameParser`: >> >> >> >> Flat Profile (by method): >> >> (t 74.1,s 70.3) Prof2::main >> >> (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get >> >> (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get >> >> (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> >> (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> >> (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] >> >> (t 1.2,s 1.2) Prof2::lambda$main$2 >> >> (t 0.9,s 0.9) Prof2::lambda$main$3 >> >> (t 0.9,s 0.9) Prof2::lambda$main$1 >> >> (t 0.7,s 0.7) Prof2::lambda$main$0 >> >> (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > >> > Ludovic Henry has updated the pull request with a new target base due >> to a merge or a rebase. The incremental webrev excludes the unrelated >> changes brought in by the merge/rebase. The pull request contains eight >> additional commits since the last revision: >> > >> > - Remove FrameParser and the required allocation >> > >> > The need for the allocation would be it non async-safe. However, >> > AsyncGetCallTrace is async-safe and thus can't allow for allocations. >> > - Merge branch 'master' of https://github.com/openjdk/jdk into >> fix-8178287 >> > - Fix comments >> > - Disable checks in FrameParser when known to be safe >> > - Allow AsyncGetCallTrace and JFR to unwind stack from vtable stub >> > >> > The program is the following: >> > >> > ``` >> > import java.util.function.Supplier; >> > >> > public class Prof2 { >> > >> > public static void main(String[] args) { >> > var rand = new java.util.Random(0); >> > Supplier[] suppliers = { >> > () -> 0, >> > () -> 1, >> > () -> 2, >> > () -> 3, >> > }; >> > >> > long sum = 0; >> > for (int i = 0; i >= 0; i++) { >> > sum += (int)suppliers[i % suppliers.length].get(); >> > } >> > } >> > } >> > ``` >> > >> > The results are as follows: >> > >> > - Baseline (from previous commit): >> > >> > Flat Profile (by method): >> > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] >> > (t 40.3,s 36.1) Prof2::main >> > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get >> > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get >> > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> > (t 1.1,s 1.1) Prof2::lambda$main$2 >> > (t 1.1,s 1.1) Prof2::lambda$main$0 >> > (t 1.0,s 1.0) Prof2::lambda$main$1 >> > (t 0.9,s 0.9) Prof2::lambda$main$3 >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > >> > - With unwind from vtable stub >> > >> > Flat Profile (by method): >> > (t 74.1,s 70.3) Prof2::main >> > (t 6.5,s 5.5) Prof2$$Lambda$29.0x0000000800081220::get >> > (t 6.6,s 5.4) Prof2$$Lambda$28.0x0000000800081000::get >> > (t 5.7,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> > (t 5.9,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> > (t 4.9,s 4.9) AGCT::Unknown Java[ERR=-5] >> > (t 1.2,s 1.2) Prof2::lambda$main$2 >> > (t 0.9,s 0.9) Prof2::lambda$main$3 >> > (t 0.9,s 0.9) Prof2::lambda$main$1 >> > (t 0.7,s 0.7) Prof2::lambda$main$0 >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > >> > We attribute the vtable stub to the caller and not the callee, which >> is >> > already an improvement from the existing case. >> > - Allow AsyncGetCallTrace and JFR to unwind stack from nmethod's prolog >> > >> > When sampling hits the prolog of a method, Hotspot assumes it's >> unable >> > to parse the frame. This change allows to parse such frame on x86 by >> > specializing which instruction it's hitting in the prolog. >> > >> > The results are as follows: >> > >> > - Baseline: >> > >> > Flat Profile (by method): >> > (t 60.7,s 60.7) AGCT::Unknown Java[ERR=-5] >> > (t 39.2,s 35.2) Prof2::main >> > (t 1.4,s 1.4) Prof2::lambda$main$3 >> > (t 1.0,s 1.0) Prof2::lambda$main$2 >> > (t 0.9,s 0.9) Prof2::lambda$main$1 >> > (t 0.7,s 0.7) Prof2::lambda$main$0 >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > (t 0.0,s 0.0) java.lang.Thread::exit >> > (t 0.9,s 0.0) Prof2$$Lambda$2.0x0000000800c00c28::get >> > (t 1.0,s 0.0) Prof2$$Lambda$3.0x0000000800c01000::get >> > (t 1.4,s 0.0) Prof2$$Lambda$4.0x0000000800c01220::get >> > (t 0.7,s 0.0) Prof2$$Lambda$1.0x0000000800c00a08::get >> > >> > - With incomplete frame parsing: >> > >> > Flat Profile (by method): >> > (t 39.3,s 39.3) AGCT::Unknown Java[ERR=-5] >> > (t 40.3,s 36.1) Prof2::main >> > (t 6.4,s 5.3) Prof2$$Lambda$28.0x0000000800081000::get >> > (t 6.1,s 5.1) Prof2$$Lambda$29.0x0000000800081220::get >> > (t 6.0,s 5.0) Prof2$$Lambda$27.0x0000000800080c28::get >> > (t 6.1,s 5.0) Prof2$$Lambda$26.0x0000000800080a08::get >> > (t 1.1,s 1.1) Prof2::lambda$main$2 >> > (t 1.1,s 1.1) Prof2::lambda$main$0 >> > (t 1.0,s 1.0) Prof2::lambda$main$1 >> > (t 0.9,s 0.9) Prof2::lambda$main$3 >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > (t 0.0,s 0.0) java.util.Locale::getInstance >> > (t 0.0,s 0.0) AGCT::Not walkable Java[ERR=-6] >> > (t 0.0,s 0.0) >> jdk.internal.loader.BuiltinClassLoader::loadClassOrNull >> > (t 0.0,s 0.0) java.lang.ClassLoader::loadClass >> > (t 0.0,s 0.0) sun.net.util.URLUtil::urlNoFragString >> > (t 0.0,s 0.0) java.lang.Class::forName0 >> > (t 0.0,s 0.0) java.util.Locale::initDefault >> > (t 0.0,s 0.0) >> jdk.internal.loader.BuiltinClassLoader::loadClass >> > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getLoader >> > (t 0.0,s 0.0) jdk.internal.loader.URLClassPath::getResource >> > (t 0.0,s 0.0) java.lang.String::toLowerCase >> > (t 0.0,s 0.0) sun.launcher.LauncherHelper::loadMainClass >> > (t 0.0,s 0.0) sun.launcher.LauncherHelper::checkAndLoadMain >> > (t 0.0,s 0.0) java.util.Locale::<clinit> >> > (t 0.0,s 0.0) >> jdk.internal.loader.BuiltinClassLoader::findClassOnClassPathOrNull >> > (t 0.0,s 0.0) >> jdk.internal.loader.ClassLoaders$AppClassLoader::loadClass >> > (t 0.0,s 0.0) java.lang.Class::forName >> > >> > The program is as follows: >> > >> > ``` >> > import java.util.function.Supplier; >> > >> > public class Prof2 { >> > >> > public static void main(String[] args) { >> > var rand = new java.util.Random(0); >> > Supplier[] suppliers = { >> > () -> 0, >> > () -> 1, >> > () -> 2, >> > () -> 3, >> > }; >> > >> > long sum = 0; >> > for (int i = 0; i >= 0; i++) { >> > sum += (int)suppliers[i % suppliers.length].get(); >> > } >> > } >> > } >> > ``` >> > >> > We see that the results are particularely useful in this case as the >> > methods are very short (it only returns an integer), and the >> probability >> > of hitting the prolog is then very high. >> > - Allow AsyncGetCallTrace and JFR to walk a stub frame >> > >> > When the signal sent for AsyncGetCallTrace or JFR would land on a >> stub >> > (like arraycopy), it wouldn't be able to detect the sender (caller) >> > frame because `_cb->frame_size() == 0`. >> > >> > Because we fully control how the prolog and epilog of stub code is >> > generated, we know there are two cases: >> > 1. A stack frame is allocated via macroAssembler->enter(), and >> consists >> > in `push rbp; mov rsp, rbp;`. >> > 2. No stack frames are allocated and rbp is left unchanged and rsp is >> > decremented with the `call` instruction that push the return `pc` on >> the >> > stack. >> > >> > For case 1., we can easily know the sender frame by simply looking at >> > rbp, especially since we know that all stubs preserve the frame >> pointer >> > (on x86 at least). >> > >> > For case 2., we end up returning the sender's sender, but that >> already >> > gives us more information than what we have today. >> > >> > The results are as follows: >> > >> > - Baseline: >> > >> > Flat Profile (by method): >> > (t 99.4,s 99.4) AGCT::Unknown Java[ERR=-5] >> > (t 0.5,s 0.2) Prof1::main >> > (t 0.2,s 0.2) java.lang.AbstractStringBuilder::append >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > (t 0.0,s 0.0) >> java.lang.AbstractStringBuilder::ensureCapacityInternal >> > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::shift >> > (t 0.0,s 0.0) java.lang.String::getBytes >> > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> > (t 0.0,s 0.0) java.lang.StringBuilder::delete >> > (t 0.2,s 0.0) java.lang.StringBuilder::append >> > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::delete >> > (t 0.0,s 0.0) java.lang.AbstractStringBuilder::putStringAt >> > >> > - With StubRoutinesBlob::FrameParser >> > >> > Flat Profile (by method): >> > (t 98.7,s 98.7) >> java.lang.AbstractStringBuilder::ensureCapacityInternal >> > (t 0.9,s 0.9) java.lang.AbstractStringBuilder::delete >> > (t 99.8,s 0.2) Prof1::main >> > (t 0.1,s 0.1) AGCT::Unknown not Java[ERR=-3] >> > (t 0.0,s 0.0) AGCT::Unknown Java[ERR=-5] >> > (t 98.8,s 0.0) java.lang.AbstractStringBuilder::append >> > (t 98.8,s 0.0) java.lang.StringBuilder::append >> > (t 0.9,s 0.0) java.lang.StringBuilder::delete >> > >> > The program is as follows: >> > >> > ``` >> > public class Prof1 { >> > >> > public static void main(String[] args) { >> > StringBuilder sb = new StringBuilder(); >> > for (int i = 0; i < 1000000; i++) { >> > sb.append("ab"); >> > sb.delete(0, 1); >> > } >> > System.out.println(sb.length()); >> > } >> > } >> > ``` >> > >> > We now account for the arraycopy stub which is called by >> > AbstractStringBuilder::ensureCapacityInternal. It was previously >> ignored >> > because it would not know how to parse the frame for the arraycopy >> stub >> > and would fall in the AGCT::Unknown Java[ERR=-5] section. >> > >> > However, it still isn't perfect since it doesn't point to the >> arraycopy stub >> > directly. >> > - Extract sender frame parsing to CodeBlock::FrameParser >> > >> > Whether and how a frame is setup is controlled by the code generator >> > for the specific CodeBlock. The CodeBlock is then in the best place >> to know how >> > to parse the sender's frame from the current frame in the given >> CodeBlock. >> > >> > This refactoring proposes to extract this parsing out of `frame` and >> into a >> > `CodeBlock::FrameParser`. This FrameParser is then specialized in >> the relevant >> > inherited children of CodeBlock. >> > >> > This change is to largely facilitate adding new supported cases for >> JDK-8252417 >> > like runtime stubs. >> >> Ok. Do you know who should be the primary reviewer? Rickard Bäckman? Nils >> Eliasson? >> >> ------------- >> >> PR: https://git.openjdk.java.net/jdk/pull/4436 >> >
