On Tue, Jun 20, 2023 at 4:38 AM David Holmes <[email protected]> wrote:
> On 19/06/2023 11:46 pm, Jaroslav Bachorík wrote: > > Hi, > > > > On Thu, Jun 1, 2023 at 5:49 PM <[email protected] > > <mailto:[email protected]>> wrote: > > > > Hmmm... seems like concurrent class unloading has revealed a > situation > > in some > > code where class refs should have been held, but were not held > because > > it's very > > difficult to do, and the code (mostly) got away with it for a long > > time... > > > > David's quote from the JNI spec makes things very, very clear: > > > > > "A field or method ID does not prevent the VM from unloading the > > class from > > > which the ID has been derived. After the class is unloaded, the > > method or > > > field ID becomes invalid and may not be passed to any function > > taking > > such > > > an ID. The native code, therefore, must make sure to: > > > - keep a live reference to the underlying class, or > > > - recompute the method or field ID > > > > > > if it intends to use a method or field ID for an extended period > of > > time." > > > > and the JVM/TI spec does depend on the JNI spec for certain pieces > > like the > > semantics of jmethodID. > > > > I agree with David that fixing this is going to be difficult and a > major > > effort which is why we've all only chipped at this boulder before... > > > > > > I apologize for being a stickler, but it seems that the concurrent class > > unloading and the ephemeral nature of jmethodIDs could cause problems > > even for a much less exotic usecase of the JVMTI GetStackTrace > > ( > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace > < > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#GetStackTrace > >). > > IIUC, the stacktrace consists of stack frames and each stack frame > > contains a jmethodID identifying the associated Java method. Further on, > > obtaining the stacktrace will not create references to corresponding > > classes so the jmethodIDs can go 'bad' at any given moment, without an > > official way to make sure such a 'bad' jmethodID would not crash JVM. > > Yep all correct. The jvmtiFrameInfo contains the jmethodID as applicable > when the stacktrace was taken. If the current scope is not ensuring > liveness then those method ids could be invalidated at any time. > > I agree this is a hole in the JVM TI specification, but it is not a hole > I know how to patch. Additional API are needed here along with changes > to the validity requirements of jMethodIds. > Thanks! Yes, this makes sense. I just wanted to make sure my perception was correct. Cheers, -JB- > > Cheers, > David > ----- > > > Cheers, > > > > -JB- > > > > > > Dan > > > > > > On 6/1/23 4:28 AM, David Holmes wrote: > > > On 1/06/2023 5:16 pm, Jaroslav Bachorík wrote: > > >> Hi David, > > >> > > >> On Thu, Jun 1, 2023 at 3:56 AM David Holmes > > <[email protected] <mailto:[email protected]> > > >> <mailto:[email protected] > > <mailto:[email protected]>>> wrote: > > >> > > >> Hi Jaroslav, > > >> > > >> On 31/05/2023 9:12 pm, Jaroslav Bachorík wrote: > > >> > Dear Team, > > >> > > > >> > I've been investigating the unusual JVM crashes occurring > in > > >> JVMTI calls > > >> > on a J9 JVM. During my investigation, I scrutinized the > > >> `jmethodID` > > >> > definition closely, available here: [jmethodID > > >> > > > >> > > definition]( > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > > > > >> > > < > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > >> > > >> > > < > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > > > > >> > > < > https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > <https://docs.oracle.com/en/java/javase/17/docs/specs/jvmti.html#jmethodID > >>>). > > >> > > >> > > > >> > To paraphrase, the definition suggests that `jmethodID` > > >> identifies a > > >> > Java method, initializer, or constructor. These > > identifiers, once > > >> > returned by JVM TI functions and events, can be safely > > stored. > > >> However, > > >> > when the class is unloaded, they become invalid, > > rendering them > > >> > unsuitable for use. > > >> > > > >> > My interpretation is that the JVMTI user should verify the > > >> validity of a > > >> > `jmethodID` value before using it to prevent potential > > crashes. > > >> Would > > >> > you agree with this interpretation? > > >> > > >> Not quite - as you note you can't verify the jmethodID > > validity. > > >> What > > >> the user needs to do, in line with what Dan was saying, is > > ensure > > >> that > > >> they keep track of the classes to which the methods belong > > and keep > > >> them > > >> alive if necessary. Now that may be easier said than done, > but > > >> that is > > >> the gist of it. This comes from the JNI spec: > > >> > > >> "A field or method ID does not prevent the VM from unloading > > the > > >> class > > >> from which the ID has been derived. After the class is > > unloaded, the > > >> method or field ID becomes invalid and may not be passed to > any > > >> function > > >> taking such an ID. The native code, therefore, must make > > sure to: > > >> > > >> keep a live reference to the underlying class, or > > >> recompute the method or field ID > > >> > > >> if it intends to use a method or field ID for an extended > > period of > > >> time." > > >> > > >> > This sounds like a sensible requirement, but its practical > > >> application > > >> > remains unclear. As far as I know, methods can be unloaded > > >> concurrently > > >> > to the native code executing JVMTI functions. This > > introduces a > > >> > potential race condition where the JVM unloads the > > methods during > > >> the > > >> > check->use flow, making it only a partial solution. To > > complicate > > >> > matters further, no method exists to confirm whether a > > >> `jmethodID` is valid. > > >> > > > >> > Theoretically, we could monitor the `CompiledMethodUnload` > > >> event to > > >> > track the validity state, creating a constantly expanding > > set of > > >> > unloaded `jmethodID` values or a bloom filter, if one > > does not > > >> care > > >> > about few potential false positives. This strategy, > however, > > >> doesn't > > >> > address the potential race condition, and it could even > > >> exacerbate it > > >> > due to possible event delays. This delay might mistakenly > > >> validate a > > >> > `jmethodID` value that has already been unloaded, but for > > >> which the > > >> > event hasn't been delivered yet. > > >> > > > >> > Honestly, I don't see a way to use `jmethodID` safely > > unless the > > >> code > > >> > using it suspends the entire JVM and doesn't resume until > > it's > > >> finished > > >> > with that `jmethodID`. Any other approach might lead to > JVM > > >> crashes, as > > >> > we've observed with J9. > > >> > > > >> > Lastly, it's noteworthy that Hotspot takes meticulous > > measures to > > >> ensure > > >> > that using jmethodIDs for unloaded methods doesn't crash > the > > >> JVM and > > >> > even provides useful information. This observation has > > led me to > > >> > question whether the documentation aligns with the Hotspot > > >> > implementation, especially given that following closely > the > > >> > documentation appears to increase the risk associated > > with the > > >> use of > > >> > `jmethodID` values. > > >> > > >> There have been folk who wanted to make this area more > > user-friendly > > >> but > > >> that shouldn't be mistaken for moving towards a world where > > >> jmethodIDs > > >> are always safe to use. > > >> > > >> > > >> Yes, I see your point. Unfortunately, this confirms my worries > that > > >> using AsyncGetCallTrace (ASGCT) on a system strictly adhering to > > the > > >> JVMTI spec of jmethoID is not really possible without risking > > random > > >> and quite frequent crashes on systems with concurrent class > > unloading > > >> enabled. > > >> FTR, ASGCT will record the stack trace as a list of frames, each > > one > > >> containing the corresponding jmethodID value. Considering that > the > > >> most common usage of ASGCT is in a signal handler it makes it > > >> impossible to use JVMTI calls to resolve the holder class and > > create > > >> a strong reference to prevent it from being unloaded. > > >> And even if this would be possible we would need to figure out > when > > >> to release the class reference when it is no more needed - and > > it is > > >> not really clear how we could do that reliably, leaving us with > the > > >> option of holding the class references indefinitely or risking > > >> crashing JVM. > > > > > > I would have to agree you cannot use jmethodID's for that > > purpose, not > > > without also recording (and keeping-alive) the associated > > classes. As > > > to when you would clear such references I can't say as I don't > know > > > how the ASGCT stack record would be used. > > > > > >> I want to emphasize that not being able to resolve additional > > details > > >> for a jmethodID pointing to a method of an unloaded class is not > an > > >> issue, as long as the JVMTI call does not crash. I think that > > >> https://bugs.openjdk.org/browse/JDK-8268364 > > <https://bugs.openjdk.org/browse/JDK-8268364> > > >> <https://bugs.openjdk.org/browse/JDK-8268364 > > <https://bugs.openjdk.org/browse/JDK-8268364>> did address exactly > the > > >> problem of concurrent class unloading causing races in the code > > that > > >> is checking for validity of jmethodID and then using it. > > > > > > Yes but internal to the JVMTI implementation. The basic null check > > > that was used was found to be insufficient when used with ZGC and > so > > > we would crash more often than was considered reasonable. There > is a > > > quoted comment in the code in > > > https://bugs.openjdk.org/browse/JDK-8268088 > > <https://bugs.openjdk.org/browse/JDK-8268088>: > > > > > > // These aren't deallocated and are going to look like a leak, > but > > > that's > > > // needed because we can't really get rid of jmethodIDs because > we > > > don't > > > // know when native code is going to stop using them. The spec > > says > > > that > > > // they're "invalid" but existing programs likely rely on their > > being > > > // NULL after class unloading. > > > > > > that kind of sums up the position of trying to accommodate "bad > > code" > > > in a reasonable way. > > > > > >> Can this be summarize in a way that the user is not guaranteed > > to get > > >> any additional information for an invalid jmethodID but it would > be > > >> really nice for JVM not to crash when jmethodID becomes invalid > as > > >> there is no way for the user to check for its validity in an > atomic > > >> manner > > >> - and yes, even calling GetMethodDeclaringClass in order to > obtain > > >> the class one could create a strong reference is a subject to > racy > > >> behaviour so it really can not be used as a workaround. > > > > > > The only real solution IMO would be to encode jmethodIDs in a way > > that > > > validity can be tracked and then have all JVMTI methods be able to > > > check that validity and guarantee atomicity of the method such > that > > > the id can't become invalid whilst in use. Whether that is at all > > > feasible/practical I do not know - but it sounds like a major > effort > > > to me. > > > > > > Cheers, > > > David > > > > > >> Cheers, > > >> > > >> -JB- > > >> > > >> > > >> Cheers, > > >> David > > >> > > >> > I welcome your thoughts and perspectives on this matter. > > >> > > > >> > Best regards, > > >> > > > >> > Jaroslav > > >> > > >
