On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <[email protected]> wrote:

>> If you run ClhsdbPmap.java, you can see pmap output for both core and live 
>> processes. The sizes of the maps are very large for both of them, and 
>> actually a bit bigger with the live process. Here's the output for a live 
>> process:
>> 
>> 0x000014755360c000   4048K   /usr/lib64/libnss_sss.so.2
>> 0x0000147553815000   4012K   /usr/lib64/libnss_files-2.17.so
>> 0x0000147560208000   4064K   /usr/lib64/libm-2.17.so
>> 0x000014756050a000   3032K   /usr/lib64/librt-2.17.so
>> 0x0000147560712000   32892K  
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
>> 0x0000147562731000   4924K   /usr/lib64/libc-2.17.so
>> 0x0000147562aff000   3076K   /usr/lib64/libdl-2.17.so
>> 0x0000147562d03000   3060K   /usr/lib64/libpthread-2.17.so
>> 0x0000147562f1f000   2948K   /usr/lib64/libz.so.1.2.7
>> 0x0000147563135000   2860K   /usr/lib64/ld-2.17.so
>> 0x0000147563164000   92K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
>> 0x000014756317b000   80K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
>> 0x00001475631e0000   156K    
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
>> 0x0000147563207000   128K    
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
>> 0x000014756332c000   68K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
>> 0x0000563c950bf000   16K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
>> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test 
>> failure. It's only a 68k file but has a 4064k map. It's second in the list. 
>> I'm not sure if this is the order we would always see on linux systems. My 
>> assumption was it was the library at the highest address that was causing 
>> the problem, and that the inteprerter was located right after it, but that 
>> might not be the case.
>> 
>> The address in the interpreter that we are doing findpc on turned up at 
>> `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" 
>> command to ClhsdbFindPC, and from my test runs the interpreter seemed to 
>> alway be just before the first library. However, maybe on some systems it is 
>> intermixed with the libraries.
>
> I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It 
> almost works fine, but it is not perfect solution.
> For example, let's consider for libnss_sss (provided by Fedora 33) - 
> `/proc/<PID>/maps` shows libnss as following. There are 5 segments.
> 
> 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 
> However I could see only 4 segments in libnss_sss.so when I ran `readelf -l 
> /usr/lib64/libnss_sss.so.2`:
> 
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
>                  0x0000000000001468 0x0000000000001468  R      0x1000
>   LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
>                  0x0000000000006931 0x0000000000006931  R E    0x1000
>   LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
>                  0x0000000000001110 0x0000000000001110  R      0x1000
>   LOAD           0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
>                  0x000000000000044c 0x0000000000000658  RW     0x1000
> 
> Linux kernel seems to separate final segment (0xbc78) into RO and RW segments 
> when it attempts to load shared library. (but I'm not sure)
> 
> I think we need to refactor handling shared libraries in other ways.
> 
> For live process, we can use `/proc/<PID>/maps`.
> For coredump, we can use `NT_FILE` in note section in corefile, It has valid 
> segments as below.
> 
> $ readelf -n core
>   :
>     0x00007f0ba6ec5000  0x00007f0ba6ec7000  0x0000000000000000
> 
>     0x00007f0ba6ec7000  0x00007f0ba6ece000  0x0000000000000002
> 
>     0x00007f0ba6ece000  0x00007f0ba6ed0000  0x0000000000000009
> 
>     0x00007f0ba6ed0000  0x00007f0ba6ed1000  0x000000000000000a
> 
>     0x00007f0ba6ed1000  0x00007f0ba6ed2000  0x000000000000000b
> 
> 
> But they makes big change to SA.
> As an option, we can integrate this change at first, then we will refactor 
> them.
> What do you think?
> (I want to resolve this problem with smaller fix if I can of course, so 
> another solutions are welcome)

@YaSuenag I asked Dan to run a modified `ClhsdbFindPC` that also issues a 
`clhsdb pmap` command so we can see what the maps look like, and compare them 
to the address being looked up. This is before your latest fix, so the the 
sizes are still too big, but that's ok for this analysis. First, this is the 
`findpc` command that was suppose to show the address in the interpreter:

hsdb> + findpc 0x00002ab36ca942b6
Address 0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6

And here's the pmap output . I had to manually sort by address, and I also 
added the location of the interpreter address being looked up.

0x00005652c8fd0000      16K     <jdkdir>/jdk/bin/java
0x00002ab3692ae000      3400K   /lib64/ld-linux-x86-64.so.2
0x00002ab3692e0000      12K     
<jdkdir>/test/hotspot/jtreg/native/libLingeredApp.so
0x00002ab3692ed000      84K     <jdkdir>/jdk/bin/../lib/libjli.so
0x00002ab369406000      144K    <jdkdir>/jdk/lib/libjimage.so
0x00002ab36942a000      200K    <jdkdir>/jdk/lib/libjava.so
0x00002ab3694bc000      88K     <jdkdir>/jdk/lib/libnio.so
0x00002ab3694d6000      3240K   /lib/x86_64-linux-gnu/libz.so.1
0x00002ab3696f0000      3136K   /lib/x86_64-linux-gnu/libpthread.so.0
0x00002ab36990d000      3020K   /lib/x86_64-linux-gnu/libdl.so.2
0x00002ab369b11000      5052K   /lib/x86_64-linux-gnu/libc.so.6
0x00002ab369edb000      31100K  <jdkdir>/jdk/lib/server/libjvm.so
0x00002ab36bd3a000      2840K   /lib/x86_64-linux-gnu/librt.so.1
0x00002ab36bf42000      4856K   /lib/x86_64-linux-gnu/libm.so.6
0x00002ab36c24b000      3796K   /lib/x86_64-linux-gnu/libnss_compat.so.2
0x00002ab36c454000      3760K   /lib/x86_64-linux-gnu/libnsl.so.1
0x00002ab36c66d000      3660K   /lib/x86_64-linux-gnu/libnss_nis.so.2
0x00002ab36c879000      3612K   /lib/x86_64-linux-gnu/libnss_files.so.2
0x00002ab36ca942b6: /lib/x86_64-linux-gnu/libnss_files.so.2 + 0x21b2b6
0x00002ab38fc08000      112K    <jdkdir>/jdk/lib/libnet.so
0x00002ab38fc55000      3756K   /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0x00002ab3bc000000      4096K   /lib/x86_64-linux-gnu/libgcc_s.so.1

There appears to be a very large gap between `libnss_files.so.2` and 
`libnet.so` (about 590mb) so I assume a lot of hotspot memory allocations are 
located in this area, including the interpreter.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2563

Reply via email to