Hi Yasumasa,
I looked at you changes.
It is hard to understand if this fully solves the issue.
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/amd64/LinuxAMD64CFrame.java.frames.html
This fix looks like a hack.
Should we just propagate the Debugging exception instead of trying to maintain unsupportedDwarf flag?
Also, I don't like that DWARF-specific abbreviations (like CIE, IDE, LSDA, etc.) are used without any comments explaining them.
The code has to be generally readable without looking into the DWARF spec each time.
I'm submitting mach5 jobs to make sure the issue has been resolved with your fix.
Thanks,
Serguei
On 3/20/20 17:55, Yasumasa Suenaga wrote:
I looked at you changes.
It is hard to understand if this fully solves the issue.
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/amd64/LinuxAMD64CFrame.java.frames.html
@@ -34,10 +34,11 @@ public static LinuxAMD64CFrame getTopFrame(LinuxDebugger dbg, Address rip, ThreadContext context) { Address libptr = dbg.findLibPtrByAddress(rip); Address cfa = context.getRegisterAsAddress(AMD64ThreadContext.RBP); DwarfParser dwarf = null; + boolean unsupportedDwarf = false; if (libptr != null) { // Native frame try { dwarf = new DwarfParser(libptr); dwarf.processDwarf(rip);
@@ -45,24 +46,33 @@ !dwarf.isBPOffsetAvailable()) ? context.getRegisterAsAddress(AMD64ThreadContext.RBP) : context.getRegisterAsAddress(dwarf.getCFARegister()) .addOffsetTo(dwarf.getCFAOffset()); } catch (DebuggerException e) { - // Bail out to Java frame case + if (dwarf != null) { + // DWARF processing should succeed when the frame is native + // but it might fail if CIE has language personality routine + // and/or LSDA. + dwarf = null; + unsupportedDwarf = true; + } else { + throw e; + } } } return (cfa == null) ? null - : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf); + : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf, !unsupportedDwarf); }
@@ -121,13 +131,25 @@ } return isValidFrame(nextCFA, context) ? nextCFA : null; } - private DwarfParser getNextDwarf(Address nextPC) { - DwarfParser nextDwarf = null; + @Override + public CFrame sender(ThreadProxy thread) { + if (!possibleNext) { + return null; + } + + ThreadContext context = thread.getContext(); + + Address nextPC = getNextPC(dwarf != null); + if (nextPC == null) { + return null; + } + DwarfParser nextDwarf = null; + boolean unsupportedDwarf = false; if ((dwarf != null) && dwarf.isIn(nextPC)) { nextDwarf = dwarf; } else { Address libptr = dbg.findLibPtrByAddress(nextPC); if (libptr != null) {
@@ -138,33 +160,29 @@ } } } if (nextDwarf != null) { + try { nextDwarf.processDwarf(nextPC); + } catch (DebuggerException e) { + // DWARF processing should succeed when the frame is native + // but it might fail if CIE has language personality routine + // and/or LSDA. + nextDwarf = null; + unsupportedDwarf = true; }
This fix looks like a hack.
Should we just propagate the Debugging exception instead of trying to maintain unsupportedDwarf flag?
Also, I don't like that DWARF-specific abbreviations (like CIE, IDE, LSDA, etc.) are used without any comments explaining them.
The code has to be generally readable without looking into the DWARF spec each time.
I'm submitting mach5 jobs to make sure the issue has been resolved with your fix.
Thanks,
Serguei
On 3/20/20 17:55, Yasumasa Suenaga wrote:
Thanks Chris!
I'm waiting for reviewers for this change.
Yasumasa
On 2020/03/21 4:23, Chris Plummer wrote:
Hi Yasumasa,
The failure is due to JDK-8231634, so not something you need to worry about.
thanks,
Chris
On 3/20/20 2:45 AM, Yasumasa Suenaga wrote:
Hi Chris,
I uploaded new webrev which includes reverting change for ProblemList:
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/
I tested it on submit repo (mach5-one-ysuenaga-JDK-8240956-2-20200320-0814-9586301),
but it has failed in ClhsdbJstackXcompStress.java.
However I think it is not caused by this change because ClhsdbJstackXcompStress.java tests `jhsdb jstack`, not mixed mode, it would not parse DWARF.
Thanks,
Yasumasa
On 2020/03/20 13:55, Chris Plummer wrote:
Hi Yasumasa,
The test has been problem listed so please add undoing this to your webrev. Here's the diff that problem listed it:
diff --git a/test/hotspot/jtreg/ProblemList.txt b/test/hotspot/jtreg/ProblemList.txt
--- a/test/hotspot/jtreg/ProblemList.txt
+++ b/test/hotspot/jtreg/ProblemList.txt
@@ -115,7 +115,7 @@
serviceability/sa/ClhsdbPrintAll.java 8193639 solaris-all
serviceability/sa/ClhsdbPrintAs.java 8193639 solaris-all
serviceability/sa/ClhsdbPrintStatics.java 8193639 solaris-all
-serviceability/sa/ClhsdbPstack.java 8193639 solaris-all
+serviceability/sa/ClhsdbPstack.java 8193639,8240956 solaris-all,linux-all
serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java 8193639 solaris-all
serviceability/sa/ClhsdbScanOops.java 8193639,8235220,8230731 solaris-all,linux-x64,macosx-x64,windows-x64
serviceability/sa/ClhsdbSource.java 8193639 solaris-all
thanks,
Chris
On 3/16/20 5:07 AM, Yasumasa Suenaga wrote:
Hi all,
This webrev has passed submit repo (mach5-one-ysuenaga-JDK-8240956-20200316-0924-9487169) and additional tests.
So please review it:
JBS: https://bugs.openjdk.java.net/browse/JDK-8240956
webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/
Thanks,
Yasumasa
On 2020/03/16 21:03, Yasumasa Suenaga wrote:
Thank you so much, David!
Yasumasa
On 2020/03/16 21:01, David Holmes wrote:
On 16/03/2020 9:46 pm, David Holmes wrote:
On 16/03/2020 7:20 pm, Yasumasa Suenaga wrote:
Hi David,
I missed loop condition, so I fixed it and pushed to submit repo.
Could you try again?
http://hg.openjdk.java.net/jdk/submit/rev/9c148df17f23
webrev is here:
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/
Test job resubmitted. Will advise results if it completes before I go to bed :)
Seems to have passed okay.
David
David
Thanks a lot!
Yasumasa
On 2020/03/16 16:17, David Holmes wrote:
Sorry it is still crashing.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f98ec01e94e, pid=13702, tid=13704
#
# JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0640217.suenaga.source)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0640217.suenaga.source, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e
#
Same as before.
David
-----
On 16/03/2020 4:57 pm, David Holmes wrote:
On 16/03/2020 4:51 pm, Yasumasa Suenaga wrote:
On 2020/03/16 15:43, Chris Plummer wrote:
BTW, if you submit it to the submit repo, we can then go and run additional internal tests (and even more builds) using that job.
Thanks for that tip Chris!
I've pushed the change to submit repo, but I've not yet received the result.
I will share you when I get job ID.
We can see the id. Just need to wait for the builds to complete before submitting the additional tests.
David
Yasumasa
Chris
On 3/15/20 11:36 PM, Yasumasa Suenaga wrote:
Hi David,
Thank you for testing it.
I updated webrev to avoid bailout to Java frame when DWARF has language personality routine or LSDA.
Could you try it?
http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.01/
It works well on my Fedora 31 and Oracle Linux 7.7 .
I've pushed it to submit repo.
Diff from webrev.00 is here:
http://hg.openjdk.java.net/jdk/submit/rev/6f11cd275652
Thanks,
Yasumasa
On 2020/03/16 13:12, David Holmes wrote:
Correction ...
On 16/03/2020 12:53 pm, David Holmes wrote:
On 16/03/2020 12:17 pm, David Holmes wrote:
Hi Yasumasa,
I can't review this as I know nothing about the code, but I'm putting the patch through our internal testing.
Sorry but the crashes still exist:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007fcdd403894e, pid=16948, tid=16949
#
# JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e
in fact they seem worse as the test seems to always crash now.
Not worse - sorry. I see 6 failures out of 119 runs of the test in linux-x64. I don't see a pattern as to where it fails versus passes.
It doesn't fail for me locally.
David
David
David
On 14/03/2020 11:35 am, Yasumasa Suenaga wrote:
Hi all,
Please review this change:
JBS: https://bugs.openjdk.java.net/browse/JDK-8240956
webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.00/
JDK-8234624 introduced DWARF parser in SA for unwinding native frames in jstack mixed mode.
However some error has seen intermittently after that.
I investigated the cause of this, I found two concerns:
A: lack of buffer (.eh_frame section data) range check
B: Language personality routine and Language Specific Data Area (LSDA) are not considered
I addd range check for .eh_frame processing, and ignore personality routine and LSDA in this webrev.
Also I added bailout code if DWARF processing is failed due to these concerns.
This change has passed all tests on submit repo (mach5-one-ysuenaga-JDK-8240956-20200313-1518-9434671),
also I tested it on my Fedora 31 box and Oracle Linux 7.7 container.
Thanks,
Yasumasa