Thanks Chris - all sounds good to me.  Thanks for all the MachO insights...


On 23/07/2020 22:27, Chris Plummer wrote:
Just a minor update of some new findings (no new code change). The DB hash table being used by default will overwrite an existing entry, not duplicate it, and this is indeed what was happening. The second entry added was the one with a 0 offset. When I enable the R_NOOVERWRITE flag, it stops the overwrite and that also fixes the problem, but that's only because the entry with offset 0 comes last. The fix I've done is better since it avoids the offet 0 entry altogether, so even if it came first it would not get used.

thanks,

Chris

On 7/22/20 10:25 PM, Chris Plummer wrote:
Hi Kevin,

Thanks for the review. Unfortunately there was yet another bug which I have now addressed. Although testing with mach5 went fine, when I tried with a local build I had issues. SA couldn't even get past an early part of the core file handling which involves doing some adjustments related to CDS. It needs to look up a couple of hotspot symbols by name to do this, and get their values (such as _SharedBaseAddress). Although the symbol -> address lookup seemed to work, the values retrieved from the address were garbage. After some debugging I noticed the 3 symbols being looked up all had the same address. Then I noticed this address was at offset 0 of the libjvm segment. After a lot more debugging I found the problem. These symbols were actually in the symbol table twice, once with a proper offset and once with an offset of 0. I'm not sure why the ones with an offset of 0 were there (other than they originated in the mach-o symbol table).

The reason this didn't always happen is because SA takes all the symbols it finds and adds them to a hash table for fast symbol -> address lookup. If a symbol is in there twice, it's a tossup which you'll get. It could change from build to build in fact. The trigger for my local build was probably how I ran configure, which likely is not the same as mach5, although I'm  unsure if this just gave me the unlucky hashing, or if in fact it resulted in the entries with offset 0. In any case, the fix is to ignore entries with offset 0. Here's the updated webrev:

http://cr.openjdk.java.net/~cjplummer/8247515/webrev.03/index.html

All the changes since webrev.02 are in build_symtab() in symtab.c. Besides ignoring entries with offset 0 to fix the bug, I also did some cleanup. There used to be two loops to iterate over the symbols. There wasn't really a good reason for this, so now there is just one. Also, it no longer adds entries with a file offset 0, an offset into the string section of 0, or an empty string. By doing this the size of the libjvm symbol table went from about 240k entries to 90k. Since it was originally allocated at it's full potential size, it's now reallocate to the smaller size after symbol table processing is over.

thanks,

Chris

On 7/22/20 2:45 AM, Kevin Walls wrote:

Thanks Chris, yes looks good, I like that we check the library bounds before calling nearest_symbol.

--
Kevin


On 21/07/2020 21:05, Chris Plummer wrote:
Hi Serguei and Kevin,

The webrev has been updated:

http://cr.openjdk.java.net/~cjplummer/8247515/webrev.02/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515

Two issues were addressed:

(1) Code in symbol_for_pc() assumed the caller had first checked to make sure that the symbol is in a library, where-as some callers assume NULL will be returned if the symbol is not in a library. This is the case for pstack for example, which first blindly does a pc to symbol lookup, and only if that returns null does it then check if the pc is in the codecache or interpreter. The logic in symbol_for_pc() assumed that if the pc was greater than the start address of the last library in the list, then it must be in that library. So in stack traces the frames for compiled or interpreted pc's showed up as the last symbol in the last library, plus some very large offset. The fix is to now track the size of libraries so we can do a proper range check.

(2) There are issues with finding system libraries. See [1] JDK-8249779. Because of this I disabled support for trying to locate them. This was done in ps_core.c, and there are "JDK-8249779" comment references in the code where I did this. The end result of this is that PMap for core files won't show system libraries, and PStack for core files won't show symbols for addresses in system libraries. Note that currently support for PMap and PStack is disabled for OSX, but I will shortly send out a review to enable them for OSX core files as part of the fix for [2] JDK-8248882.

[1] https://bugs.openjdk.java.net/browse/JDK-8249779
[2] https://bugs.openjdk.java.net/browse/JDK-8248882

thanks,

Chris

On 7/14/20 5:46 PM, serguei.spit...@oracle.com wrote:
Okay, I'll wait for new webrev version to review.

Thanks,
Serguei


On 7/14/20 17:40, Chris Plummer wrote:
Hi Serguie,

Thanks for reviewing. This webrev is in limbo right now as I discovered some issues that Kevin and I have been discussing off line. One is that the code assumes the caller has first checked to make sure that the symbol is in a library, where-as the actual callers assume NULL will be returned if the symbol is not in a library. The end result is that we end up returning a symbol, even for address in the code cache or interpreter. So in stack traces these frame show up as the last symbol in the last library, plus some very large offset. I have a fix for that were now I track the size of each library. But there is another issue with the code that tries to discover all libraries in the core file. It misses a very large number of system libraries. We understand why, but are not sure of the solution. I might just change to code to only worry about user libraries (JDK libs and other JNI libs).

Some comments below also.

On 7/14/20 4:37 PM, serguei.spit...@oracle.com wrote:
Hi Chris,

I like the suggestion from Kevin below.
I'm not sure which suggestion since I updated the webrev based on his initial suggestion.

I have a couple of minor comments so far.

http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/libproc_impl.c.frames.html
313 if (!lib->next || lib->next->base >= addr) {
I wonder if the check above has to be:
313 if (!lib->next || lib->next->base > addr) {
Yes, > would be better, although this goes away with my fix that tracks the size of each library.


http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/symtab.c.frames.html
417 if (offset_from_sym >= 0) { // ignore symbols that comes after "offset"
Replace: comes => come
Ok.

thanks,

Chris

Thanks,
Serguei


On 7/8/20 03:23, Kevin Walls wrote:


Sure -- I was thinking lowest_offset_from_sym initialising starting at a high positive integer (that would now be PTRDIFF_MAX I think) to save a comparison with e.g. -1, you can just check if the new offset is less than lowest_offset_from_sym

With the ptrdiff_t change you made, this all looks good to me however you decide. 8-)



On 07/07/2020 21:17, Chris Plummer wrote:
Hi Kevin,

Thanks for the review. Yes, that lack of initialization of lowest_offset_from_sym is a bug. I'm real surprised the compiler didn't catch it as it will be uninitialized garbage the first time it is referenced. Fortunately usually the eventual offset is very small if not 0, so probably this never prevented a proper match. I think there's also another bug:

 415       uintptr_t offset_from_sym = offset - sym->offset;

"offset" is the passed in offset, essentially the address of the symbol we are interested in, but given as an offset from the start of the DSO. "sym->offset" is also an offset from the start of the DSO. It could be located before or after "offset". This means the math could result in a negative number, which when converted to unsigned would be a very large positive number. This happens whenever you check a symbol that is actually located after the address you are looking up. The end result is harmless, because it just means there's no way we will match that symbol, which is what you want, but it would be good to clean this up.

I think what is best is to use ptrdiff_t and initialize lowest_offset_from_sym to -1. I've updated the webrev:

http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/index.html

thanks,

Chris

On 7/7/20 4:09 AM, Kevin Walls wrote:
Hi Chris,

Yes I think this looks good.

Question: In nearest_symbol, do we need to initialize lowest_offset_from_sym to something impossibly high, as if it defaults to zero we never find a better/nearer result?

Thanks
Kevin


On 07/07/2020 06:10, Chris Plummer wrote:
Hello,

Please help review the following:

http://cr.openjdk.java.net/~cjplummer/8247515/webrev.00/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515

The CR contains a description of the issues being addressed. There is also no test for this symbol lookup support yet. It will be there after I push JDK-8247516 and JDK-8247514, which are both blocked by the CR.

[1] https://bugs.openjdk.java.net/browse/JDK-8247516
[2] https://bugs.openjdk.java.net/browse/JDK-8247514

thanks,

Chris









Reply via email to