Hi Serguei and Kevin,
The webrev has been updated:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.02/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515
Two issues were addressed:
(1) Code in symbol_for_pc() assumed the caller had first checked to make sure that the symbol is in a library, where-as some callers assume NULL will be returned if the symbol is not in a library. This is the case for pstack for example, which first blindly does a pc to symbol lookup, and only if that returns null does it then check if the pc is in the codecache or interpreter. The logic in symbol_for_pc() assumed that if the pc was greater than the start address of the last library in the list, then it must be in that library. So in stack traces the frames for compiled or interpreted pc's showed up as the last symbol in the last library, plus some very large offset. The fix is to now track the size of libraries so we can do a proper range check.
(2) There are issues with finding system libraries. See [1] JDK-8249779. Because of this I disabled support for trying to locate them. This was done in ps_core.c, and there are "JDK-8249779" comment references in the code where I did this. The end result of this is that PMap for core files won't show system libraries, and PStack for core files won't show symbols for addresses in system libraries. Note that currently support for PMap and PStack is disabled for OSX, but I will shortly send out a review to enable them for OSX core files as part of the fix for [2] JDK-8248882.
[1] https://bugs.openjdk.java.net/browse/JDK-8249779
[2] https://bugs.openjdk.java.net/browse/JDK-8248882
thanks,
Chris
On 7/14/20 5:46 PM, serguei.spit...@oracle.com wrote:
The webrev has been updated:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.02/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515
Two issues were addressed:
(1) Code in symbol_for_pc() assumed the caller had first checked to make sure that the symbol is in a library, where-as some callers assume NULL will be returned if the symbol is not in a library. This is the case for pstack for example, which first blindly does a pc to symbol lookup, and only if that returns null does it then check if the pc is in the codecache or interpreter. The logic in symbol_for_pc() assumed that if the pc was greater than the start address of the last library in the list, then it must be in that library. So in stack traces the frames for compiled or interpreted pc's showed up as the last symbol in the last library, plus some very large offset. The fix is to now track the size of libraries so we can do a proper range check.
(2) There are issues with finding system libraries. See [1] JDK-8249779. Because of this I disabled support for trying to locate them. This was done in ps_core.c, and there are "JDK-8249779" comment references in the code where I did this. The end result of this is that PMap for core files won't show system libraries, and PStack for core files won't show symbols for addresses in system libraries. Note that currently support for PMap and PStack is disabled for OSX, but I will shortly send out a review to enable them for OSX core files as part of the fix for [2] JDK-8248882.
[1] https://bugs.openjdk.java.net/browse/JDK-8249779
[2] https://bugs.openjdk.java.net/browse/JDK-8248882
thanks,
Chris
On 7/14/20 5:46 PM, serguei.spit...@oracle.com wrote:
Okay, I'll wait for new webrev version to review.
Thanks,
Serguei
On 7/14/20 17:40, Chris Plummer wrote:
Hi Serguie,
Thanks for reviewing. This webrev is in limbo right now as I discovered some issues that Kevin and I have been discussing off line. One is that the code assumes the caller has first checked to make sure that the symbol is in a library, where-as the actual callers assume NULL will be returned if the symbol is not in a library. The end result is that we end up returning a symbol, even for address in the code cache or interpreter. So in stack traces these frame show up as the last symbol in the last library, plus some very large offset. I have a fix for that were now I track the size of each library. But there is another issue with the code that tries to discover all libraries in the core file. It misses a very large number of system libraries. We understand why, but are not sure of the solution. I might just change to code to only worry about user libraries (JDK libs and other JNI libs).
Some comments below also.
On 7/14/20 4:37 PM, serguei.spit...@oracle.com wrote:
I'm not sure which suggestion since I updated the webrev based on his initial suggestion.Hi Chris,
I like the suggestion from Kevin below.
Yes, > would be better, although this goes away with my fix that tracks the size of each library.
I have a couple of minor comments so far.
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/libproc_impl.c.frames.html
313 if (!lib->next || lib->next->base >= addr) {
I wonder if the check above has to be:
313 if (!lib->next || lib->next->base > addr) {
Ok.
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/symtab.c.frames.html
417 if (offset_from_sym >= 0) { // ignore symbols that comes after "offset"
Replace: comes => come
thanks,
Chris
Thanks,
Serguei
On 7/8/20 03:23, Kevin Walls wrote:
Sure -- I was thinking lowest_offset_from_sym initialising starting at a high positive integer (that would now be PTRDIFF_MAX I think) to save a comparison with e.g. -1, you can just check if the new offset is less than lowest_offset_from_sym
With the ptrdiff_t change you made, this all looks good to me however you decide. 8-)
On 07/07/2020 21:17, Chris Plummer wrote:
Hi Kevin,
Thanks for the review. Yes, that lack of initialization of lowest_offset_from_sym is a bug. I'm real surprised the compiler didn't catch it as it will be uninitialized garbage the first time it is referenced. Fortunately usually the eventual offset is very small if not 0, so probably this never prevented a proper match. I think there's also another bug:
415 uintptr_t offset_from_sym = offset - sym->offset;
"offset" is the passed in offset, essentially the address of the symbol we are interested in, but given as an offset from the start of the DSO. "sym->offset" is also an offset from the start of the DSO. It could be located before or after "offset". This means the math could result in a negative number, which when converted to unsigned would be a very large positive number. This happens whenever you check a symbol that is actually located after the address you are looking up. The end result is harmless, because it just means there's no way we will match that symbol, which is what you want, but it would be good to clean this up.
I think what is best is to use ptrdiff_t and initialize lowest_offset_from_sym to -1. I've updated the webrev:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/index.html
thanks,
Chris
On 7/7/20 4:09 AM, Kevin Walls wrote:
Hi Chris,
Yes I think this looks good.
Question: In nearest_symbol, do we need to initialize lowest_offset_from_sym to something impossibly high, as if it defaults to zero we never find a better/nearer result?
Thanks
Kevin
On 07/07/2020 06:10, Chris Plummer wrote:
Hello,
Please help review the following:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.00/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515
The CR contains a description of the issues being addressed. There is also no test for this symbol lookup support yet. It will be there after I push JDK-8247516 and JDK-8247514, which are both blocked by the CR.
[1] https://bugs.openjdk.java.net/browse/JDK-8247516
[2] https://bugs.openjdk.java.net/browse/JDK-8247514
thanks,
Chris