On Thu, 16 Oct 2025 13:32:28 GMT, Yasumasa Suenaga <[email protected]> wrote:

> `jhsdb jstack --mixed` with coredump cannot resolve function symbol which has 
> `.cold` attribute.
> 
> 
> ----------------- 120485 -----------------
> "Thread-0" #24 prio=5 tid=0x00007f50dc1aa7c0 nid=120485 waiting on condition 
> [0x00007f50c0d1a000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>    JavaThread state: _thread_blocked
> 0x00007f50e4710735 __GI_abort + 0x8b
> 0x00007f50e1e01f33 ????????
> 
> 
> 0x7f50e1e01f33 was `os::abort(bool, void const*, void const*) [clone .cold]` 
> and I could see it in GDB. However it has `.cold` suffix, it means the code 
> has been relocated as ["cold" 
> function](https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-cold-function-attribute).
>  In GDB, we can see the code in another area from function body as following:
> 
> 
> (gdb) disas 0x7f50e1e01f2e, 0x7f50e1e01f34
> Dump of assembler code from 0x7f50e1e01f2e to 0x7f50e1e01f34:
>    0x00007f50e1e01f2e <_ZN2os5abortEbPKvS1_.cold+0>: call 0x7f50e1e01010 
> <abort@plt>
> => 0x00007f50e1e01f33: nop
> End of assembler dump.
> 
> 
> libsaproc.so checks address range to resolve symbol whether the address is in 
> between `start` and `start + size - 1`. As you can see in assembler dump, the 
> code in `.cold` section is `call` instruction, thus IP points next `nop`, 
> thus we should allow address range between `start` and `start + size`.
> 
> After this PR, you can see the right symbol as following:
> 
> 
> ----------------- 120485 -----------------
> "Thread-0" #24 prio=5 tid=0x00007f50dc1aa7c0 nid=120485 waiting on condition 
> [0x00007f50c0d1a000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>    JavaThread state: _thread_blocked
> 0x00007f50e4710735      __GI_abort + 0x8b
> 0x00007f50e1e01f33      os::abort(bool, void const*, void const*) [clone 
> .cold] + 0x5

I looked into the spec / implementation in GDB, binutils, GCC - but I could not 
find out unfortunately... It might be a convention that debugger like GDB can 
handle it correctly.

In GDB, we can see following frame information. `_ZN2os5abortEbPKvS1_.cold` 
(`os::abort(bool, void const*, void const*) [clone .cold]` in demangling) from 
jumped by ` _ZN2os5abortEbPKvS1_` (`os::abort(bool, void const*, void 
const*)`), thus they are same frame.


(gdb) disas
Dump of assembler code for function _ZN2os5abortEbPKvS1_:
Address range 0x7f3e1ef0f850 to 0x7f3e1ef0f888:
   0x00007f3e1ef0f850 <+0>:     push   %rbp
   0x00007f3e1ef0f851 <+1>:     mov    %rsp,%rbp
   0x00007f3e1ef0f854 <+4>:     push   %rbx
   0x00007f3e1ef0f855 <+5>:     mov    %edi,%ebx
   0x00007f3e1ef0f857 <+7>:     sub    $0x8,%rsp
   0x00007f3e1ef0f85b <+11>:    call   0x7f3e1ef0f820 <_ZN2os8shutdownEv>
   0x00007f3e1ef0f860 <+16>:    test   %bl,%bl
   0x00007f3e1ef0f862 <+18>:    jne    0x7f3e1ef0f86e <_ZN2os5abortEbPKvS1_+30>
   0x00007f3e1ef0f864 <+20>:    mov    $0x1,%edi
   0x00007f3e1ef0f869 <+25>:    call   0x7f3e1da01250 <_exit@plt>
   0x00007f3e1ef0f86e <+30>:    lea    0x1167254(%rip),%rax        # 
0x7f3e20076ac9 <DumpPrivateMappingsInCore>
   0x00007f3e1ef0f875 <+37>:    cmpb   $0x0,(%rax)
   0x00007f3e1ef0f878 <+40>:    je     0x7f3e1da01f2e 
<_ZN2os5abortEbPKvS1_.cold>
   0x00007f3e1ef0f87e <+46>:    call   0x7f3e1e1d2010 
<_ZN11ClassLoader15close_jrt_imageEv>
   0x00007f3e1ef0f883 <+51>:    jmp    0x7f3e1da01f2e 
<_ZN2os5abortEbPKvS1_.cold>
Address range 0x7f3e1da01f2e to 0x7f3e1da01f33:
   0x00007f3e1da01f2e <-22075682>:      call   0x7f3e1da01010 <abort@plt>
End of assembler dump.


The key point is that `RIP` does not point jump'ed code (in `.cold`). `RIP` 
points the next instruction. It is not covered in both symtab and DWARF.


(gdb) disas 0x7f3e1da01f2e,0x7f3e1da01f34
Dump of assembler code from 0x7f3e1da01f2e to 0x7f3e1da01f34:
   0x00007f3e1da01f2e <_ZN2os5abortEbPKvS1_.cold+0>:    call   0x7f3e1da01010 
<abort@plt>
=> 0x00007f3e1da01f33:  nop
End of assembler dump.


My goal is to unwind all of call frames in `jhsdb jstack --mixed` without any 
unknown symbols.  To realize it, I think it is better to scan the address `RIP 
- 1` when SA cannot resolve symbol / find DWARF CFA as a fallback. I think it 
is reasonable than to make change in symtab.c like this PR. @kevinjwalls What 
do you think?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/27846#issuecomment-3418401185

Reply via email to