Re: RFR: 8261710: SA DSO objects have sizes that are too large

Yasumasa Suenaga Tue, 16 Feb 2021 22:49:15 -0800

On Wed, 17 Feb 2021 00:13:02 GMT, Chris Plummer <[email protected]> wrote:


>> I should add that the failures Dan is seeing are with #id3, which is no 
>> -Xcomp, but with a core file instead of a live process. With -Xcomp this 
>> part of the test is not run, so possibly this is just an issue with the dso 
>> size calculation for core files, and works correctly with a live process.
>
> If you run ClhsdbPmap.java, you can see pmap output for both core and live 
> processes. The sizes of the maps are very large for both of them, and 
> actually a bit bigger with the live process. Here's the output for a live 
> process:
> 
> 0x000014755360c000    4048K   /usr/lib64/libnss_sss.so.2
> 0x0000147553815000    4012K   /usr/lib64/libnss_files-2.17.so
> 0x0000147560208000    4064K   /usr/lib64/libm-2.17.so
> 0x000014756050a000    3032K   /usr/lib64/librt-2.17.so
> 0x0000147560712000    32892K  
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so
> 0x0000147562731000    4924K   /usr/lib64/libc-2.17.so
> 0x0000147562aff000    3076K   /usr/lib64/libdl-2.17.so
> 0x0000147562d03000    3060K   /usr/lib64/libpthread-2.17.so
> 0x0000147562f1f000    2948K   /usr/lib64/libz.so.1.2.7
> 0x0000147563135000    2860K   /usr/lib64/ld-2.17.so
> 0x0000147563164000    92K     
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so
> 0x000014756317b000    80K     
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so
> 0x00001475631e0000    156K    
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so
> 0x0000147563207000    128K    
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so
> 0x000014756332c000    68K     
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so
> 0x0000563c950bf000    16K     
> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java
> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test 
> failure. It's only a 68k file but has a 4064k map. It's second in the list. 
> I'm not sure if this is the order we would always see on linux systems. My 
> assumption was it was the library at the highest address that was causing the 
> problem, and that the inteprerter was located right after it, but that might 
> not be the case.
> 
> The address in the interpreter that we are doing findpc on turned up at 
> `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" 
> command to ClhsdbFindPC, and from my test runs the interpreter seemed to 
> alway be just before the first library. However, maybe on some systems it is 
> intermixed with the libraries.

I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It almost 
works fine, but it is not perfect solution.
For example, let's consider for libnss_sss (provided by Fedora 33) - 
`/proc/<PID>/maps` shows libnss as following. There are 5 segments.

7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133                     
/usr/lib64/libnss_sss.so.2
7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133                     
/usr/lib64/libnss_sss.so.2
7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133                     
/usr/lib64/libnss_sss.so.2
7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133                     
/usr/lib64/libnss_sss.so.2
7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133                     
/usr/lib64/libnss_sss.so.2

However I could see only 4 segments in libnss_sss.so when I ran `readelf -l 
/usr/lib64/libnss_sss.so.2`:

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000001468 0x0000000000001468  R      0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000006931 0x0000000000006931  R E    0x1000
  LOAD           0x0000000000009000 0x0000000000009000 0x0000000000009000
                 0x0000000000001110 0x0000000000001110  R      0x1000
  LOAD           0x000000000000ac78 0x000000000000bc78 0x000000000000bc78
                 0x000000000000044c 0x0000000000000658  RW     0x1000

Linux kernel seems to separate final segment (0xbc78) into RO and RW segments 
when it attempts to load shared library. (but I'm not sure)

I think we need to refactor handling shared libraries in other ways.

For live process, we can use `/proc/<PID>/maps`.
For coredump, we can use `NT_FILE` in note section in corefile, It has valid 
segments as below.

$ readelf -n core
  :
    0x00007f0ba6ec5000  0x00007f0ba6ec7000  0x0000000000000000

    0x00007f0ba6ec7000  0x00007f0ba6ece000  0x0000000000000002

    0x00007f0ba6ece000  0x00007f0ba6ed0000  0x0000000000000009

    0x00007f0ba6ed0000  0x00007f0ba6ed1000  0x000000000000000a

    0x00007f0ba6ed1000  0x00007f0ba6ed2000  0x000000000000000b


But they makes big change to SA.
As an option, we can integrate this change at first, then we will refactor them.
What do you think?
(I want to resolve this problem with smaller fix if I can of course, so another 
solutions are welcome)

-------------

PR: https://git.openjdk.java.net/jdk/pull/2563

Re: RFR: 8261710: SA DSO objects have sizes that are too large

Reply via email to