Thanks Chris - all sounds good to me. Thanks for all the MachO insights...
On 23/07/2020 22:27, Chris Plummer wrote:
Just a minor update of some new findings (no new code change). The DB
hash table being used by default will overwrite an existing entry, not
duplicate it, and this is indeed what was happening. The second entry
added was the one with a 0 offset. When I enable the R_NOOVERWRITE
flag, it stops the overwrite and that also fixes the problem, but
that's only because the entry with offset 0 comes last. The fix I've
done is better since it avoids the offet 0 entry altogether, so even
if it came first it would not get used.
thanks,
Chris
On 7/22/20 10:25 PM, Chris Plummer wrote:
Hi Kevin,
Thanks for the review. Unfortunately there was yet another bug which
I have now addressed. Although testing with mach5 went fine, when I
tried with a local build I had issues. SA couldn't even get past an
early part of the core file handling which involves doing some
adjustments related to CDS. It needs to look up a couple of hotspot
symbols by name to do this, and get their values (such as
_SharedBaseAddress). Although the symbol -> address lookup seemed to
work, the values retrieved from the address were garbage. After some
debugging I noticed the 3 symbols being looked up all had the same
address. Then I noticed this address was at offset 0 of the libjvm
segment. After a lot more debugging I found the problem. These
symbols were actually in the symbol table twice, once with a proper
offset and once with an offset of 0. I'm not sure why the ones with
an offset of 0 were there (other than they originated in the mach-o
symbol table).
The reason this didn't always happen is because SA takes all the
symbols it finds and adds them to a hash table for fast symbol ->
address lookup. If a symbol is in there twice, it's a tossup which
you'll get. It could change from build to build in fact. The trigger
for my local build was probably how I ran configure, which likely is
not the same as mach5, although I'm unsure if this just gave me the
unlucky hashing, or if in fact it resulted in the entries with offset
0. In any case, the fix is to ignore entries with offset 0. Here's
the updated webrev:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.03/index.html
All the changes since webrev.02 are in build_symtab() in symtab.c.
Besides ignoring entries with offset 0 to fix the bug, I also did
some cleanup. There used to be two loops to iterate over the symbols.
There wasn't really a good reason for this, so now there is just one.
Also, it no longer adds entries with a file offset 0, an offset into
the string section of 0, or an empty string. By doing this the size
of the libjvm symbol table went from about 240k entries to 90k. Since
it was originally allocated at it's full potential size, it's now
reallocate to the smaller size after symbol table processing is over.
thanks,
Chris
On 7/22/20 2:45 AM, Kevin Walls wrote:
Thanks Chris, yes looks good, I like that we check the library
bounds before calling nearest_symbol.
--
Kevin
On 21/07/2020 21:05, Chris Plummer wrote:
Hi Serguei and Kevin,
The webrev has been updated:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.02/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515
Two issues were addressed:
(1) Code in symbol_for_pc() assumed the caller had first checked to
make sure that the symbol is in a library, where-as some callers
assume NULL will be returned if the symbol is not in a library.
This is the case for pstack for example, which first blindly does a
pc to symbol lookup, and only if that returns null does it then
check if the pc is in the codecache or interpreter. The logic in
symbol_for_pc() assumed that if the pc was greater than the start
address of the last library in the list, then it must be in that
library. So in stack traces the frames for compiled or interpreted
pc's showed up as the last symbol in the last library, plus some
very large offset. The fix is to now track the size of libraries so
we can do a proper range check.
(2) There are issues with finding system libraries. See [1]
JDK-8249779. Because of this I disabled support for trying to
locate them. This was done in ps_core.c, and there are
"JDK-8249779" comment references in the code where I did this. The
end result of this is that PMap for core files won't show system
libraries, and PStack for core files won't show symbols for
addresses in system libraries. Note that currently support for PMap
and PStack is disabled for OSX, but I will shortly send out a
review to enable them for OSX core files as part of the fix for [2]
JDK-8248882.
[1] https://bugs.openjdk.java.net/browse/JDK-8249779
[2] https://bugs.openjdk.java.net/browse/JDK-8248882
thanks,
Chris
On 7/14/20 5:46 PM, serguei.spit...@oracle.com wrote:
Okay, I'll wait for new webrev version to review.
Thanks,
Serguei
On 7/14/20 17:40, Chris Plummer wrote:
Hi Serguie,
Thanks for reviewing. This webrev is in limbo right now as I
discovered some issues that Kevin and I have been discussing off
line. One is that the code assumes the caller has first checked
to make sure that the symbol is in a library, where-as the actual
callers assume NULL will be returned if the symbol is not in a
library. The end result is that we end up returning a symbol,
even for address in the code cache or interpreter. So in stack
traces these frame show up as the last symbol in the last
library, plus some very large offset. I have a fix for that were
now I track the size of each library. But there is another issue
with the code that tries to discover all libraries in the core
file. It misses a very large number of system libraries. We
understand why, but are not sure of the solution. I might just
change to code to only worry about user libraries (JDK libs and
other JNI libs).
Some comments below also.
On 7/14/20 4:37 PM, serguei.spit...@oracle.com wrote:
Hi Chris,
I like the suggestion from Kevin below.
I'm not sure which suggestion since I updated the webrev based on
his initial suggestion.
I have a couple of minor comments so far.
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/libproc_impl.c.frames.html
313 if (!lib->next || lib->next->base >= addr) {
I wonder if the check above has to be:
313 if (!lib->next || lib->next->base > addr) {
Yes, > would be better, although this goes away with my fix that
tracks the size of each library.
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/src/jdk.hotspot.agent/macosx/native/libsaproc/symtab.c.frames.html
417 if (offset_from_sym >= 0) { // ignore symbols that comes
after "offset"
Replace: comes => come
Ok.
thanks,
Chris
Thanks,
Serguei
On 7/8/20 03:23, Kevin Walls wrote:
Sure -- I was thinking lowest_offset_from_sym initialising
starting at a high positive integer (that would now be
PTRDIFF_MAX I think) to save a comparison with e.g. -1, you can
just check if the new offset is less than lowest_offset_from_sym
With the ptrdiff_t change you made, this all looks good to me
however you decide. 8-)
On 07/07/2020 21:17, Chris Plummer wrote:
Hi Kevin,
Thanks for the review. Yes, that lack of initialization of
lowest_offset_from_sym is a bug. I'm real surprised the
compiler didn't catch it as it will be uninitialized garbage
the first time it is referenced. Fortunately usually the
eventual offset is very small if not 0, so probably this never
prevented a proper match. I think there's also another bug:
415 uintptr_t offset_from_sym = offset - sym->offset;
"offset" is the passed in offset, essentially the address of
the symbol we are interested in, but given as an offset from
the start of the DSO. "sym->offset" is also an offset from the
start of the DSO. It could be located before or after
"offset". This means the math could result in a negative
number, which when converted to unsigned would be a very large
positive number. This happens whenever you check a symbol that
is actually located after the address you are looking up. The
end result is harmless, because it just means there's no way
we will match that symbol, which is what you want, but it
would be good to clean this up.
I think what is best is to use ptrdiff_t and initialize
lowest_offset_from_sym to -1. I've updated the webrev:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.01/index.html
thanks,
Chris
On 7/7/20 4:09 AM, Kevin Walls wrote:
Hi Chris,
Yes I think this looks good.
Question: In nearest_symbol, do we need to initialize
lowest_offset_from_sym to something impossibly high, as if it
defaults to zero we never find a better/nearer result?
Thanks
Kevin
On 07/07/2020 06:10, Chris Plummer wrote:
Hello,
Please help review the following:
http://cr.openjdk.java.net/~cjplummer/8247515/webrev.00/index.html
https://bugs.openjdk.java.net/browse/JDK-8247515
The CR contains a description of the issues being addressed.
There is also no test for this symbol lookup support yet. It
will be there after I push JDK-8247516 and JDK-8247514,
which are both blocked by the CR.
[1] https://bugs.openjdk.java.net/browse/JDK-8247516
[2] https://bugs.openjdk.java.net/browse/JDK-8247514
thanks,
Chris