Re: RFR(M): 8247272: SA ELF file support has never worked for 64-bit causing address to symbol name mapping to fail

Chris Plummer Tue, 14 Jul 2020 17:33:49 -0700

Hi Serguei,

These are the C data structures in question:

typedef struct
{
Elf32_Word    st_name;        /* Symbol name (string tbl index) */
Elf32_Addr    st_value;        /* Symbol value */
Elf32_Word    st_size;        /* Symbol size */
unsigned char    st_info;        /* Symbol type and binding */
unsigned char    st_other;        /* Symbol visibility */
Elf32_Section    st_shndx;        /* Section index */
} Elf32_Sym;

typedef struct
{
Elf64_Word    st_name;        /* Symbol name (string tbl index) */
unsigned char    st_info;        /* Symbol type and binding */
unsigned char st_other;        /* Symbol visibility */
Elf64_Section    st_shndx;        /* Section index */
Elf64_Addr    st_value;        /* Symbol value */
Elf64_Xword    st_size;        /* Symbol size */
} Elf64_Sym;

I can't re-order the initialization since it is reading in the values from the file input stream. So they have to be read in the order they appear in the structs.

"int" is the right type for "name_ndx":

typedef uint32_t Elf32_Word;
typedef uint32_t Elf64_Word;

I think the only size that differs between 32 and 64-bit is the address and file offset types:

/* Type of addresses. */
typedef uint32_t Elf32_Addr;
typedef uint64_t Elf64_Addr;

/* Type of file offsets. */
typedef uint32_t Elf32_Off;
typedef uint64_t Elf64_Off;

So these are the types that need to be longs in java. I see now I changed a few java int fields to long, whose ELf types are Elf32_Word/Elf64_Word (a 32bit unsigned int), but not all of them. I'm not sure now why I did these. It was about a month ago when I made these changes. However, since the types are unsigned 32-bit, probably it is best to store them in a long anyway, even when reading 32-bit elf files. However, I doubt this works with the underlying readInt() API that is being used. I think we really need a readUInt() API that reads a 32-bit unsigned int and returns long instead of an int. But that's really outside the scope of this CR since it fixes something that is broken just as much with 32-bit as with 64-bit.

thanks,

Chris

On 7/14/20 5:01 PM, serguei.spit...@oracle.com wrote:

Hi Chris,

I only have one question so far.

http://cr.openjdk.java.net/~cjplummer/8247272/webrev.01/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/debugger/posix/elf/ELFFileParser.java.frames.html
 627             private int name_ndx;                       // Elf32_Word or Elf64_Word
 ...
 651                 switch (getObjectSize()) {
 652                     case CLASS_32: {
 653                         name_ndx = readInt();
 654                         value = readInt();
 655                         size = readInt();
 656                         info = readByte();
 657                         other = readByte();
 658                         section_header_ndx = readShort();
 659                         break;
 660                     }
 661                     case CLASS_64: {
 662                         name_ndx = readInt();
 663                         info = readByte();
 664                         other = readByte();
 665                         section_header_ndx = readShort();
 666                         value = readWord();
 667                         size = readWord();
 668                         break;
 669                     }
Should the local name_ndx also have type long?
Could you, please, order lines the same way for CLASS_32 and CLASS_64?

Thanks,
Serguei

On 7/14/20 12:01, Chris Plummer wrote:
Hello,

Can I get a second reviewer please. As noted below, it's easier to look at webrev.00 first to just see the coding changes. webrev.01 just adds some updated comments.

thanks,

Chris

On 7/8/20 2:07 PM, Kevin Walls wrote:

Thanks Chris, it's a bit of clutter, but truthful clutter. 8-)

On 08/07/2020 20:26, Chris Plummer wrote:

Webrev has been updated with the suggested comment changes. Note to new reviewers, look in webrev.00 first since it doesn't have the clutter of the comment changes, making it easier to see which lines actually have code changes.

http://cr.openjdk.java.net/~cjplummer/8247272/webrev.01/index.html

thanks,

Chris

On 7/8/20 11:04 AM, Chris Plummer wrote:

Hi Kevin,

Thanks for the review. I'll add the additional Elf64_Addr and Elf64_Off comments. Probably the others should be updated too. Although they are the same size, they do have different names. For example:

/* Type for a 16-bit quantity. */
typedef uint16_t Elf32_Half;
typedef uint16_t Elf64_Half;

thanks,

Chris

On 7/8/20 3:47 AM, Kevin Walls wrote:

Hi Chris --

This is a great story/history lesson.

You could if you like, edit those comments in ElfFileParser.java so "Elf32_Addr" as they will contain either "Elf64_Addr or Elf32_Addr", similarly Elf64_Off. The other Elf64 fields are the same as the 32 bit ones.

Yes, the symbol fields are ordered differently.

So all looks good to me!

Thanks
Kevin

On 08/07/2020 07:20, Chris Plummer wrote:

Hello,

Please help review the following:

http://cr.openjdk.java.net/~cjplummer/8247272/webrev.00/index.html
https://bugs.openjdk.java.net/browse/JDK-8247272

The short story is that SA address to native symbol name mapping/lookup has never worked on 64-bit, and this is due to the java level ELF file support only supporting 32-bit. This CR fixes that, and I believe also maintains 32-bit compatibility, although I have no way of testing that.

There is more to the story however on how we got here. Before going into the gory detail below, I just want to point out that currently nothing is using this support, and therefore it is technically not fixing anything, although I did verify that the fixes work (see details below). Also, I intend to remove all the java level ELF file support as part of JDK-8247516 [1]. The only reason I want to push these changes first is because I already did the work to get it working with 64-bit, and would like to get it archived before removing it in case for some reason it is revived in the future.

Now for the ugly details on how we got here (and you really don't need to read this unless you have any concerns with what I stated above). It starts with the clhsdb "whatis" command, which was the only (indirect) user of this java level ELF file support. It's implementation is in _javascript_, so we have not had access to it ever since JDK9 module support broke the SA _javascript_ support (and _javascript_ support is now removed). I started the process of converting "whatis" to java. It is basically the same as the clhsdb "findpc" command, except it also checks for native symbols, which it does with the following code:

var dso = loadObjectContainingPC(addr);
var sym = dso.closestSymbolToPC(addr);
return sym.name + '+' + sym.offset;

Converting this to java was trivial. I just stuck support for it in the PointerFinder class, which is what findpc relies on. However, it always failed to successfully lookup a symbol. I found that DSO.closestSymbolToPC() called into the java level ELF support, and that was failing badly. After some debugging I noticed that the values read in for various ELF headers were mostly garbage. It then occurred to me that it was reading in 32-bit values that probably needed to be 64-bit. Sure enough, this code was never converted to 64-bit support. I then went and tried "whatis" on JDK8, the last version where it was available, and it failed there also with 64-bit binaries. So this is why I initially fixed it to work with 64-bit, and also how I tested it (using the modified findpc on a native symbol). But the story continues...

DSO.java, and as a consequence the java ELF file support, is used by all our posix ports to do address to symbol lookups. So I figured that after fixing the java level ELF file support for 64-bit, my improved findpc would start working on OSX also. No such luck, and for obvious reasons. OSX uses mach-o files. This ELF code should never have been used for it, and of course has never worked.

So I was left trying to figure out how to do OSX address to native symbol lookups. I then recalled that there was a CFrame.closestSymbolToPC() API that did address to native symbol lookups for native stack traces, and wondered how it was ever working (even on linux with the broken ELF 64-bit support). It turns out this takes a very different path to do the lookups, ending up in native code in libsaproc, where we also have ELF file support. I then converted DSO.closestSymbolToPC(addr) to use this libsaproc code instead, and it worked fine. So now there was no need for the java level ELF file support since its only user was DSO.closestSymbolToPC(addr). I should also add that this is the approach that has always been used on windows, with both CFrame.closestSymbolToPC() and DSO.closestSymbolToPC(addr) using the same libsaproc support.

There is still a bit more to the story. After diverting DSO.closestSymbolToPC(addr) to the libsaproc lookup code, it still didn't work for OSX. I thought it would just work since the native BsdDebuggerLocal.lookupByName0() is implemented, and it seems to trickle down to the proper lower level APIs to find the symbol, but there were two issues. The first is that for processes there is no support for looking up all the libraries and populating the list of ps_prochandle structures that are used to do the symbol lookups. This was just never implemented (also is why PMap does not work for OSX processes). For core files the ps_prochandle structs are there, but the lookup code was badly broken. That has now been fixed by JDK-8247515 [2], currently out for review. So the end result is we'll have address to native symbol lookup for everything but OSX processes.

If your still here, thanks for listening!

Chris

[1] https://bugs.openjdk.java.net/browse/JDK-8247516
[2] https://bugs.openjdk.java.net/browse/JDK-8247515

Re: RFR(M): 8247272: SA ELF file support has never worked for 64-bit causing address to symbol name mapping to fail

Reply via email to