Hi Yasumasa,

Oops, didn't catch this - I also had done some manual testing and in mach5 but clearly not enough.

Generally I think this looks good.

"lastFrame" can mean last as in final, or last as in previous. "last" is one of those annoying English words.  Here it means final, if we get an Exception during processDwarf, use this to flag that we should return null from sender().  "finalFrame" would be clearer to me, anything else probably gets more verbose than you wanted.

Yes I like having the limit on the while loop in process_dwarf(), always worried how sane the information is that we are parsing through.

Thanks!
Kevin


On 24/03/2020 23:47, Yasumasa Suenaga wrote:
Thanks Serguei!

I will push it when I get second reviewer.


Yasumasa


On 2020/03/25 1:39, serguei.spit...@oracle.com wrote:
Hi Yasumasa,

I'm okay with this update.
My mach5 test run for this patch is passed.

Thanks,
Serguei


On 3/23/20 17:08, Yasumasa Suenaga wrote:
Hi Serguei,

Thanks for your comment!
I uploaded new webrev:

  http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.04/

Also I pushed it to submit repo:

  http://hg.openjdk.java.net/jdk/submit/rev/fade6a949bd1

On 2020/03/24 7:39, serguei.spit...@oracle.com wrote:
Hi Yasumasa,

The mach5 tier5 testing looks good.
The serviceability/sa/ClhsdbPstack.java is failed without fix and is not failed with it.

Thanks,
Serguei


On 3/23/20 10:18, serguei.spit...@oracle.com wrote:
Hi Yasumasa,

I looked at you changes.
It is hard to understand if this fully solves the issue.

http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/linux/amd64/LinuxAMD64CFrame.java.frames.html

@@ -34,10 +34,11 @@
       public static LinuxAMD64CFrame getTopFrame(LinuxDebugger dbg, Address rip, ThreadContext context) {
        Address libptr = dbg.findLibPtrByAddress(rip);
        Address cfa = context.getRegisterAsAddress(AMD64ThreadContext.RBP);
        DwarfParser dwarf = null;
+ boolean unsupportedDwarf = false;
          if (libptr != null) { // Native frame
          try {
            dwarf = new DwarfParser(libptr);
            dwarf.processDwarf(rip);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -45,24 +46,33 @@
                   !dwarf.isBPOffsetAvailable())
                      ? context.getRegisterAsAddress(AMD64ThreadContext.RBP)                       : context.getRegisterAsAddress(dwarf.getCFARegister())
.addOffsetTo(dwarf.getCFAOffset());
          } catch (DebuggerException e) {
- // Bail out to Java frame case
+ if (dwarf != null) {
+ // DWARF processing should succeed when the frame is native
+ // but it might fail if CIE has language personality routine
+ // and/or LSDA.
+ dwarf = null;
+ unsupportedDwarf = true;
+ } else {
+ throw e;
+ }
          }
        }
          return (cfa == null) ? null
- : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf);
+ : new LinuxAMD64CFrame(dbg, cfa, rip, dwarf, !unsupportedDwarf);
     }

@@ -121,13 +131,25 @@
       }
         return isValidFrame(nextCFA, context) ? nextCFA : null;
     }
  - private DwarfParser getNextDwarf(Address nextPC) {
- DwarfParser nextDwarf = null;
+ @Override
+ public CFrame sender(ThreadProxy thread) {
+ if (!possibleNext) {
+ return null;
+ }
+
+ ThreadContext context = thread.getContext();
+
+ Address nextPC = getNextPC(dwarf != null);
+ if (nextPC == null) {
+ return null;
+ }
  + DwarfParser nextDwarf = null;
+ boolean unsupportedDwarf = false;
       if ((dwarf != null) && dwarf.isIn(nextPC)) {
         nextDwarf = dwarf;
       } else {
         Address libptr = dbg.findLibPtrByAddress(nextPC);
         if (libptr != null) {
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
@@ -138,33 +160,29 @@
           }
         }
       }
         if (nextDwarf != null) {
+ try {
         nextDwarf.processDwarf(nextPC);
+ } catch (DebuggerException e) {
+ // DWARF processing should succeed when the frame is native
+ // but it might fail if CIE has language personality routine
+ // and/or LSDA.
+ nextDwarf = null;
+ unsupportedDwarf = true;
       }

This fix looks like a hack.
Should we just propagate the Debugging exception instead of trying to maintain unsupportedDwarf flag?

DwarfParser::processDwarf would throw DebuggerException if it cannot find DWARF which relates to PC. PC at this point is for next frame. So current frame (`this` object) is valid, and it should be processed.


Also, I don't like that DWARF-specific abbreviations (like CIE, IDE,LSDA, etc.) are used without any comments explaining them. The code has to be generally readable without looking into the DWARF spec each time.

I added comments for them in this webrev.


Thanks,

Yasumasa


I'm submitting mach5 jobs to make sure the issue has been resolved with your fix.

Thanks,
Serguei


On 3/20/20 17:55, Yasumasa Suenaga wrote:
Thanks Chris!
I'm waiting for reviewers for this change.


Yasumasa


On 2020/03/21 4:23, Chris Plummer wrote:
Hi Yasumasa,

The failure is due to JDK-8231634, so not something you need to worry about.

thanks,

Chris

On 3/20/20 2:45 AM, Yasumasa Suenaga wrote:
Hi Chris,

I uploaded new webrev which includes reverting change for ProblemList:

http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.03/

I tested it on submit repo (mach5-one-ysuenaga-JDK-8240956-2-20200320-0814-9586301),
but it has failed in ClhsdbJstackXcompStress.java.
However I think it is not caused by this change because ClhsdbJstackXcompStress.java tests `jhsdb jstack`, not mixed mode, it would not parse DWARF.


Thanks,

Yasumasa


On 2020/03/20 13:55, Chris Plummer wrote:
Hi Yasumasa,

The test has been problem listed so please add undoing this to your webrev. Here's the diff that problem listed it:

diff --git a/test/hotspot/jtreg/ProblemList.txt b/test/hotspot/jtreg/ProblemList.txt
--- a/test/hotspot/jtreg/ProblemList.txt
+++ b/test/hotspot/jtreg/ProblemList.txt
@@ -115,7 +115,7 @@
  serviceability/sa/ClhsdbPrintAll.java 8193639 solaris-all
  serviceability/sa/ClhsdbPrintAs.java 8193639 solaris-all
  serviceability/sa/ClhsdbPrintStatics.java 8193639 solaris-all
-serviceability/sa/ClhsdbPstack.java 8193639 solaris-all
+serviceability/sa/ClhsdbPstack.java 8193639,8240956 solaris-all,linux-all   serviceability/sa/ClhsdbRegionDetailsScanOopsForG1.java 8193639 solaris-all   serviceability/sa/ClhsdbScanOops.java 8193639,8235220,8230731 solaris-all,linux-x64,macosx-x64,windows-x64
  serviceability/sa/ClhsdbSource.java 8193639 solaris-all

thanks,

Chris

On 3/16/20 5:07 AM, Yasumasa Suenaga wrote:
Hi all,

This webrev has passed submit repo (mach5-one-ysuenaga-JDK-8240956-20200316-0924-9487169) and additional tests.
So please review it:

  JBS: https://bugs.openjdk.java.net/browse/JDK-8240956
  webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/


Thanks,

Yasumasa


On 2020/03/16 21:03, Yasumasa Suenaga wrote:
Thank you so much, David!

Yasumasa


On 2020/03/16 21:01, David Holmes wrote:
On 16/03/2020 9:46 pm, David Holmes wrote:
On 16/03/2020 7:20 pm, Yasumasa Suenaga wrote:
Hi David,

I missed loop condition, so I fixed it and pushed to submit repo.
Could you try again?

http://hg.openjdk.java.net/jdk/submit/rev/9c148df17f23

webrev is here:

http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.02/

Test job resubmitted. Will advise results if it completes before I go to bed :)

Seems to have passed okay.

David

David


Thanks a lot!

Yasumasa


On 2020/03/16 16:17, David Holmes wrote:
Sorry it is still crashing.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f98ec01e94e, pid=13702, tid=13704
#
# JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0640217.suenaga.source) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0640217.suenaga.source, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e
#

Same as before.

David
-----

On 16/03/2020 4:57 pm, David Holmes wrote:
On 16/03/2020 4:51 pm, Yasumasa Suenaga wrote:
On 2020/03/16 15:43, Chris Plummer wrote:
BTW, if you submit it to the submit repo, we can then go and run additional internal tests (and even more builds) using that job.

Thanks for that tip Chris!

I've pushed the change to submit repo, but I've not yet received the result.
I will share you when I get job ID.

We can see the id. Just need to wait for the builds to complete before submitting the additional tests.

David

Yasumasa

Chris

On 3/15/20 11:36 PM, Yasumasa Suenaga wrote:
Hi David,

Thank you for testing it.

I updated webrev to avoid bailout to Java frame when DWARF has language personality routine or LSDA.
Could you try it?

http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.01/

It works well on my Fedora 31 and Oracle Linux 7.7 .
I've pushed it to submit repo.

Diff from webrev.00 is here:
http://hg.openjdk.java.net/jdk/submit/rev/6f11cd275652


Thanks,

Yasumasa


On 2020/03/16 13:12, David Holmes wrote:
Correction ...

On 16/03/2020 12:53 pm, David Holmes wrote:
On 16/03/2020 12:17 pm, David Holmes wrote:
Hi Yasumasa,

I can't review this as I know nothing about the code, but I'm putting the patch through our internal testing.

Sorry but the crashes still exist:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fcdd403894e, pid=16948, tid=16949
#
# JRE version: Java(TM) SE Runtime Environment (15.0) (fastdebug build 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 15-internal+0-2020-03-16-0214474.david.holmes.jdk-dev, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# C  [libsaproc.so+0x494e] DwarfParser::process_dwarf(unsigned long)+0x4e

in fact they seem worse as the test seems to always crash now.

Not worse - sorry. I see 6 failures out of 119 runs of the test in linux-x64. I don't see a pattern as to where it fails versus passes.

It doesn't fail for me locally.

David

David

David

On 14/03/2020 11:35 am, Yasumasa Suenaga wrote:
Hi all,

Please review this change:

   JBS: https://bugs.openjdk.java.net/browse/JDK-8240956    webrev: http://cr.openjdk.java.net/~ysuenaga/JDK-8240956/webrev.00/

JDK-8234624 introduced DWARF parser in SA for unwinding native frames in jstack mixed mode. However some error has seen intermittently after that.

I investigated the cause of this, I found two concerns:

   A: lack of buffer (.eh_frame section data) range check    B: Language personality routine and Language Specific Data Area (LSDA) are not considered

I addd range check for .eh_frame processing, and ignore personality routine and LSDA in this webrev. Also I added bailout code if DWARF processing is failed due to these concerns.

This change has passed all tests on submit repo (mach5-one-ysuenaga-JDK-8240956-20200313-1518-9434671), also I tested it on my Fedora 31 box and Oracle Linux 7.7 container.


Thanks,

Yasumasa











Reply via email to