On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <[email protected]> wrote:

>> If you run ClhsdbPmap.java, you can see pmap output for both core and live 
>> processes. The sizes of the maps are very large for both of them, and 
>> actually a bit bigger with the live process. Here's the output for a live 
>> process:
>> 
>> 0x000014755360c000   4048K   /usr/lib64/libnss_sss.so.2
>> 0x0000147553815000   4012K   /usr/lib64/libnss_files-2.17.so
>> 0x0000147560208000   4064K   /usr/lib64/libm-2.17.so
>> 0x000014756050a000   3032K   /usr/lib64/librt-2.17.so
>> 0x0000147560712000   32892K  
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
>> 0x0000147562731000   4924K   /usr/lib64/libc-2.17.so
>> 0x0000147562aff000   3076K   /usr/lib64/libdl-2.17.so
>> 0x0000147562d03000   3060K   /usr/lib64/libpthread-2.17.so
>> 0x0000147562f1f000   2948K   /usr/lib64/libz.so.1.2.7
>> 0x0000147563135000   2860K   /usr/lib64/ld-2.17.so
>> 0x0000147563164000   92K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
>> 0x000014756317b000   80K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
>> 0x00001475631e0000   156K    
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
>> 0x0000147563207000   128K    
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
>> 0x000014756332c000   68K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
>> 0x0000563c950bf000   16K     
>> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
>> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test 
>> failure. It's only a 68k file but has a 4064k map. It's second in the list. 
>> I'm not sure if this is the order we would always see on linux systems. My 
>> assumption was it was the library at the highest address that was causing 
>> the problem, and that the inteprerter was located right after it, but that 
>> might not be the case.
>> 
>> The address in the interpreter that we are doing findpc on turned up at 
>> `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" 
>> command to ClhsdbFindPC, and from my test runs the interpreter seemed to 
>> alway be just before the first library. However, maybe on some systems it is 
>> intermixed with the libraries.
>
> I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It 
> almost works fine, but it is not perfect solution.
> For example, let's consider for libnss_sss (provided by Fedora 33) - 
> `/proc/<PID>/maps` shows libnss as following. There are 5 segments.
> 
> 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133                     
> /usr/lib64/libnss_sss.so.2
> 
> However I could see only 4 segments in libnss_sss.so when I ran `readelf -l 
> /usr/lib64/libnss_sss.so.2`:
> 
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
>                  0x0000000000001468 0x0000000000001468  R      0x1000
>   LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
>                  0x0000000000006931 0x0000000000006931  R E    0x1000
>   LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
>                  0x0000000000001110 0x0000000000001110  R      0x1000
>   LOAD           0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
>                  0x000000000000044c 0x0000000000000658  RW     0x1000
> 
> Linux kernel seems to separate final segment (0xbc78) into RO and RW segments 
> when it attempts to load shared library. (but I'm not sure)
> 
> I think we need to refactor handling shared libraries in other ways.
> 
> For live process, we can use `/proc/<PID>/maps`.
> For coredump, we can use `NT_FILE` in note section in corefile, It has valid 
> segments as below.
> 
> $ readelf -n core
>   :
>     0x00007f0ba6ec5000  0x00007f0ba6ec7000  0x0000000000000000
> 
>     0x00007f0ba6ec7000  0x00007f0ba6ece000  0x0000000000000002
> 
>     0x00007f0ba6ece000  0x00007f0ba6ed0000  0x0000000000000009
> 
>     0x00007f0ba6ed0000  0x00007f0ba6ed1000  0x000000000000000a
> 
>     0x00007f0ba6ed1000  0x00007f0ba6ed2000  0x000000000000000b
> 
> 
> But they makes big change to SA.
> As an option, we can integrate this change at first, then we will refactor 
> them.
> What do you think?
> (I want to resolve this problem with smaller fix if I can of course, so 
> another solutions are welcome)

@YaSuenag https://bugs.openjdk.java.net/browse/JDK-8250826 is the bug I was 
thinking of that sounds like the RO/RW issue you were talking about.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2563

Reply via email to