On Wed, 17 Feb 2021 06:46:27 GMT, Yasumasa Suenaga <[email protected]> wrote:
>> If you run ClhsdbPmap.java, you can see pmap output for both core and live >> processes. The sizes of the maps are very large for both of them, and >> actually a bit bigger with the live process. Here's the output for a live >> process: >> >> 0x000014755360c000 4048K /usr/lib64/libnss_sss.so.2 >> 0x0000147553815000 4012K /usr/lib64/libnss_files-2.17.so >> 0x0000147560208000 4064K /usr/lib64/libm-2.17.so >> 0x000014756050a000 3032K /usr/lib64/librt-2.17.so >> 0x0000147560712000 32892K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/server/libjvm.so >> 0x0000147562731000 4924K /usr/lib64/libc-2.17.so >> 0x0000147562aff000 3076K /usr/lib64/libdl-2.17.so >> 0x0000147562d03000 3060K /usr/lib64/libpthread-2.17.so >> 0x0000147562f1f000 2948K /usr/lib64/libz.so.1.2.7 >> 0x0000147563135000 2860K /usr/lib64/ld-2.17.so >> 0x0000147563164000 92K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnet.so >> 0x000014756317b000 80K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libnio.so >> 0x00001475631e0000 156K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjava.so >> 0x0000147563207000 128K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjimage.so >> 0x000014756332c000 68K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/lib/libjli.so >> 0x0000563c950bf000 16K >> /scratch/cplummer/ws/jdk/jdk.clean/build/linux-x64-debug/images/jdk/bin/java >> `/usr/lib64/libnss_files-2.17.so` is the one that turned up in the test >> failure. It's only a 68k file but has a 4064k map. It's second in the list. >> I'm not sure if this is the order we would always see on linux systems. My >> assumption was it was the library at the highest address that was causing >> the problem, and that the inteprerter was located right after it, but that >> might not be the case. >> >> The address in the interpreter that we are doing findpc on turned up at >> `libnss_files.so.2 + 0x21b116`, or at an offset of 2200k. I added a "pmap" >> command to ClhsdbFindPC, and from my test runs the interpreter seemed to >> alway be just before the first library. However, maybe on some systems it is >> intermixed with the libraries. > > I pushed new change to use `ELF_PHDR.p_filesz` instead of `p_memsz`. It > almost works fine, but it is not perfect solution. > For example, let's consider for libnss_sss (provided by Fedora 33) - > `/proc/<PID>/maps` shows libnss as following. There are 5 segments. > > 7f0ba6ec5000-7f0ba6ec7000 r--p 00000000 08:03 340133 > /usr/lib64/libnss_sss.so.2 > 7f0ba6ec7000-7f0ba6ece000 r-xp 00002000 08:03 340133 > /usr/lib64/libnss_sss.so.2 > 7f0ba6ece000-7f0ba6ed0000 r--p 00009000 08:03 340133 > /usr/lib64/libnss_sss.so.2 > 7f0ba6ed0000-7f0ba6ed1000 r--p 0000a000 08:03 340133 > /usr/lib64/libnss_sss.so.2 > 7f0ba6ed1000-7f0ba6ed2000 rw-p 0000b000 08:03 340133 > /usr/lib64/libnss_sss.so.2 > > However I could see only 4 segments in libnss_sss.so when I ran `readelf -l > /usr/lib64/libnss_sss.so.2`: > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000001468 0x0000000000001468 R 0x1000 > LOAD 0x0000000000002000 0x0000000000002000 0x0000000000002000 > 0x0000000000006931 0x0000000000006931 R E 0x1000 > LOAD 0x0000000000009000 0x0000000000009000 0x0000000000009000 > 0x0000000000001110 0x0000000000001110 R 0x1000 > LOAD 0x000000000000ac78 0x000000000000bc78 0x000000000000bc78 > 0x000000000000044c 0x0000000000000658 RW 0x1000 > > Linux kernel seems to separate final segment (0xbc78) into RO and RW segments > when it attempts to load shared library. (but I'm not sure) > > I think we need to refactor handling shared libraries in other ways. > > For live process, we can use `/proc/<PID>/maps`. > For coredump, we can use `NT_FILE` in note section in corefile, It has valid > segments as below. > > $ readelf -n core > : > 0x00007f0ba6ec5000 0x00007f0ba6ec7000 0x0000000000000000 > > 0x00007f0ba6ec7000 0x00007f0ba6ece000 0x0000000000000002 > > 0x00007f0ba6ece000 0x00007f0ba6ed0000 0x0000000000000009 > > 0x00007f0ba6ed0000 0x00007f0ba6ed1000 0x000000000000000a > > 0x00007f0ba6ed1000 0x00007f0ba6ed2000 0x000000000000000b > > > But they makes big change to SA. > As an option, we can integrate this change at first, then we will refactor > them. > What do you think? > (I want to resolve this problem with smaller fix if I can of course, so > another solutions are welcome) @YaSuenag https://bugs.openjdk.java.net/browse/JDK-8250826 is the bug I was thinking of that sounds like the RO/RW issue you were talking about. ------------- PR: https://git.openjdk.java.net/jdk/pull/2563
