Re: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

David Holmes Thu, 18 Jun 2020 07:06:08 -0700

On 18/06/2020 11:55 pm, Daniel D. Daugherty wrote:

On 6/18/20 9:18 AM, David Holmes wrote:

On 18/06/2020 7:07 pm, Yasumasa Suenaga wrote:

On 2020/06/18 17:36, David Holmes wrote:
On 18/06/2020 3:47 pm, Yasumasa Suenaga wrote:
Hi David,
Both ThreadsListHandle and ResourceMarks would use`Thread::current()` for their resource. It is set as defaultparameter in c'tor.
Do you mean we should it explicitly in c'tor?
Yes pass current_thread so we don't do the additional unnecessarycalls to Thread::current().
Ok, I've fixed them. Could you review again?

   http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.02/


Updates look good. One nit I missed before:

src/hotspot/share/prims/jvmtiEnv.cpp

// It need to perform at safepoint for gathering stable data

please change to:

// This need to be performed at a safepoint to gather stable data

Just a comment on this comment... I still haven't gotten to the webrevyet...


Perhaps:

     // This needs to be performed at a safepoint to gather stable data.


There is a second line that continues the sentence

// because monitor owner / waiters might not be suspended.

David
-----

Dan
Thanks,
David
Thanks,

Yasumasa
David
Thanks,

Yasumasa


On 2020/06/18 13:58, David Holmes wrote:
Hi Yasumasa,

On 18/06/2020 12:59 pm, Yasumasa Suenaga wrote:
Hi Serguei,

Thanks for your comment!
I uploaded new webrev:

http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.01/

I'm not sure the following change is correct.
Can we assume owning_thread is not NULL at safepoint?
We can if "owner != NULL". So that change seem fine to me.
But given this is now only executed at a safepoint there areadditional simplifications that can be made:
- current thread determination can be simplified:

945   Thread* current_thread = Thread::current();

becomes:

    Thread* current_thread = VMThread::vm_thread();
    assert(current_thread == Thread::current(), "must be");

- these comments can be removed

  994       // Use current thread since function can be called from a
  995       // JavaThread or the VMThread.
1053       // Use current thread since function can be called from a
1054       // JavaThread or the VMThread.
- these TLH constructions should be passing current_thread(existing bug)
996       ThreadsListHandle tlh;
1055       ThreadsListHandle tlh;

- All ResourceMarks should be passing current_thread (existing bug)
Aside: there is a major inconsistency between the spec andimplementation for this method. I've traced the history to see howthis came about from JVMDI (ref JDK-4546581) but it never resultedin the JVM TI specification clearly stating what thewaiters/waiter_count means. I will file a bug to have the specclarified to match the implementation (even though I think theimplementation is what is wrong). :(
Thanks,
David
-----
All tests on submit repo and serviceability/jvmti andvmTestbase/nsk/jvmti have been passed with this change.
```
// This monitor is owned so we have to find the owningJavaThread. owning_thread =Threads::owning_thread_from_monitor_owner(tlh.list(), owner);- // Cannot assume (owning_thread != NULL) here because thisfunction- // may not have been called at a safepoint and theowning_thread
-      // might not be suspended.
-      if (owning_thread != NULL) {
- // The monitor's owner either has to be the currentthread, at safepoint- // or it has to be suspended. Any of these conditionswill prevent both- // contending and waiting threads from modifying thestate of
-        // the monitor.
- if (!at_safepoint &&!owning_thread->is_thread_fully_suspended(true, &debug_bits)) {- // Don't worry! This return ofJVMTI_ERROR_THREAD_NOT_SUSPENDED- // will not make it back to the JVM/TI agent. Theerror code will- // get intercepted inJvmtiEnv::GetObjectMonitorUsage() which- // will retry the call via a VM_GetObjectMonitorUsageVM op.
-          return JVMTI_ERROR_THREAD_NOT_SUSPENDED;
-        }
-        HandleMark hm;
+ assert(owning_thread != NULL, "owning JavaThread must notbe NULL");
          Handle     th(current_thread, owning_thread->threadObj());
          ret.owner = (jthread)jni_reference(calling_thread, th);

```


Thanks,

Yasumasa


On 2020/06/18 0:42, serguei.spit...@oracle.com wrote:
Hi Yasumasa,

This fix is not enough.
The function JvmtiEnvBase::get_object_monitor_usage works in twomodes: in VMop and non-VMop.
The non-VMop mode has to be removed.

Thanks,
Serguei


On 6/17/20 02:18, Yasumasa Suenaga wrote:
(Change subject for RFR)

Hi,

I filed it to JBS and upload a webrev for it.
Could you review it?

  JBS: https://bugs.openjdk.java.net/browse/JDK-8247729
webrev:http://cr.openjdk.java.net/~ysuenaga/JDK-8247729/webrev.00/
This change has passed tests on submit repo.
Also I tested it with serviceability/jvmti andvmTestbase/nsk/jvmti on Linux x64.
Thanks,

Yasumasa


On 2020/06/17 14:37, serguei.spit...@oracle.com wrote:
Yes. It seems we have a consensus.
Thank you for taking care about it.

Thanks,
Serguei


On 6/16/20 18:34, David Holmes wrote:
Ok, may I file it to JBS and fix it?
Go for it! :)

Cheers,
David

On 17/06/2020 10:23 am, Yasumasa Suenaga wrote:
On 2020/06/17 8:47, serguei.spit...@oracle.com wrote:
Hi Dan, David and Yasumasa,


On 6/16/20 07:39, Daniel D. Daugherty wrote:
On 6/15/20 9:28 PM, David Holmes wrote:
On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
On 6/15/20 7:19 PM, David Holmes wrote:
On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
On 6/15/20 6:14 PM, David Holmes wrote:
Hi Dan,

On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
On 6/15/20 3:26 AM, David Holmes wrote:
On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
Hi David,

On 2020/06/15 14:15, David Holmes wrote:
Hi Yasumasa,

On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
Hi all,
I wonder whyJvmtiEnvBase::get_object_monitor_usage()(implementation of GetObjectMonitorUsage()) doesnot perform at safepoint.
GetObjectMonitorUsage will use a safepoint if thetarget is not suspended:
jvmtiError
JvmtiEnv::GetObjectMonitorUsage(jobject object,jvmtiMonitorUsage* info_ptr) { JavaThread* calling_thread =JavaThread::current(); jvmtiError err =get_object_monitor_usage(calling_thread, object,info_ptr);
   if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
// Some of the critical threads were notsuspended. go to a safepoint and try againVM_GetObjectMonitorUsage op(this, calling_thread,object, info_ptr);
VMThread::execute(&op);
     err = op.result();
   }
   return err;
} /* end GetObject */
I saw this code, so I guess there are some caseswhen JVMTI_ERROR_THREAD_NOT_SUSPENDED is notreturned from get_object_monitor_usage().
Monitor owner would be acquired from monitorobject at first [1], but it would performconcurrently.If owner thread is not suspended, the ownermight be changed to others in subsequent code.
For example, the owner might release the monitorbefore [2].
The expectation is that when we find an ownerthread it is either suspended or not. If it issuspended then it cannot release the monitor. Ifit is not suspended we detect that and redo thewhole query at a safepoint.
I think the owner thread might resumeunfortunately after suspending check.
Yes you are right. I was thinking resuming alsorequired a safepoint but it only requires theThreads_lock. So yes the code is wrong.
Which code is wrong?
Yes, a rogue resume can happen when theGetObjectMonitorUsage() callerhas started the process of gathering the informationwhile not at asafepoint. Thus the information returned byGetObjectMonitorUsage()
might be stale, but that's a bug in the agent code.
The code tries to make sure that it either collectsdata about a monitor owned by a thread that issuspended, or else it collects that data at asafepoint. But the owning thread can be resumed justafter the code determined it was suspended. Themonitor can then be released and the informationgathered not only stale but potentially completelywrong as it could now be owned by a different threadand will report that thread's entry count.
If the agent is not using SuspendThread(), then assoon asGetObjectMonitorUsage() returns to the caller theinformationcan be stale. In fact as soon as the implementationreturnsfrom the safepoint that gathered the info, the targetthread
could have moved on.
That isn't the issue. That the info is stale is fine.But the expectation is that the information wasactually an accurate snapshot of the state of themonitor at some point in time. The current code doesnot ensure that.
Please explain. I clearly don't understand why you thinkthe inforeturned isn't "an accurate snapshot of the state of themonitor
at some point in time".
Because it may not be a "snapshot" at all. There is noatomicity**. The reported owner thread may not own it anylonger when the entry count is read, so straight away youmay have the wrong entry count information. The set ofthreads trying to acquire the monitor, or wait on themonitor can change in unexpected ways. It would bepossible for instance to report the same thread as beingthe owner, being blocked trying to enter the monitor, andbeing in the wait-set of the monitor - apparently all atthe same time!
** even if the owner is suspended we don't have completeatomicity because threads can join the set of threadstrying to enter the monitor (unless they are all suspended).
Consider the case when the monitor's owner is _not_suspended:
- GetObjectMonitorUsage() uses a safepoint to gather theinfo about the object's monitor. Since we're at a safepoint, theinfo that we are gathering cannot change until we return fromthe safepoint.
    It is a snapshot and a valid one at that.

Consider the case when the monitor's owner is suspended:
- GetObjectMonitorUsage() will gather info about theobject's monitor while _not_ at a safepoint. Assuming that noother
    thread is suspended, then entry_count can change because
    another thread can block on entry while we are gathering
    info. waiter_count and waiters can change if a thread was
    in a timed wait that has timed out and now that thread is
blocked on re-entry. I don't think thatnotify_waiter_count
    and notify_waiters can change.
So in this case, the owner info and notify info isstable,
    but the entry_count and waiter info is not stable.

Consider the case when the monitor is not owned:
- GetObjectMonitorUsage() will start to gather infoabout the object's monitor while _not_ at a safepoint. If itfinds a thread on the entry queue that is not suspended, thenit will
    bail out and redo the info gather at a safepoint. I just
noticed that it doesn't check for suspension for thethreads on the waiters list so a timed Object.wait() call cancause
    some confusion here.

    So in this case, the owner info is not stable if a thread
    comes out of a timed wait and reenters the monitor. This
    case is no different than if a "barger" thread comes in
    after the NULL owner field is observed and enters the
    monitor. We'll return that there is no owner, a list of
    suspended pending entry thread and a list of waiting
    threads. The reality is that the object's monitor is
    owned by the "barger" that completely bypassed the entry
    queue by virtue of seeing the NULL owner field at exactly
    the right time.

So the owner field is only stable when we have an owner. If
that owner is not suspended, then the other fields are also
stable because we gathered the info at a safepoint. If the
owner is suspended, then the owner and notify info is stable,
but the entry_count and waiter info is not stable.

If we have a NULL owner field, then the info is only stable
if you have a non-suspended thread on the entry list. Ouch!
That's deterministic, but not without some work.


Okay so only when we gather the info at a safepoint is all
of it a valid and stable snapshot. Unfortunately, we only
do that at a safepoint when the owner thread is not suspended
or if owner == NULL and one of the entry threads is not
suspended. If either of those conditions is not true, then
the different pieces of info is unstable to varying degrees.

As for this claim:
It would be possible for instance to report the same thread
as being the owner, being blocked trying to enter themonitor,
and being in the wait-set of the monitor - apparently all at
the same time!
I can't figure out a way to make that scenario work. If the
thread is seen as the owner and is not suspended, then we
gather info at a safepoint. If it is suspended, then it can't
then be seen as on the entry queue or on the wait queue since
it is suspended. If it is seen on the entry queue and is not
suspended, then we gather info at a safepoint. If it is
suspended on the entry queue, then it can't be seen on the
wait queue.

So the info instability of this API is bad, but it's not
quite that bad. :-) (That is a small mercy.)


Handshaking is not going to make this situation any better
for GetObjectMonitorUsage(). If the monitor is owned and we
handshake with the owner, the stability or instability of
the other fields remains the same as when SuspendThread is
used. Handshaking with all threads won't make the data as
stable as when at a safepoint because individual threads
can resume execution after doing their handshake so there
will still be field instability.


Short version: GetObjectMonitorUsage() should only gather
data at a safepoint. Yes, I've changed my mind.
I agree with this.
The advantages are:
  - the result is stable
  - the implementation can be simplified

Performance impact is not very clear but should not be that
big as suspending all the threads has some overhead too.
I'm not sure if using handshakes can make performance better.
Ok, may I file it to JBS and fix it?

Yasumasa
Thanks,
Serguei
Dan
David
-----
The only way to make sure you don't have staleinformation isto use SuspendThread(), but it's not required. Perhapsthe docshould have more clear about the possibility ofreturning stale
info. That's a question for Robert F.
GetObjectMonitorUsage says nothing about thread'sbeing suspended so I can't see how this could beconstrued as an agent bug.
In your scenario above, you mention that the targetthread wassuspended, GetObjectMonitorUsage() was called whilethe targetwas suspended, and then the target thread was resumedafterGetObjectMonitorUsage() checked for suspension, butbefore
GetObjectMonitorUsage() was able to gather the info.
All three of those calls: SuspendThread(),GetObjectMonitorUsage()and ResumeThread() are made by the agent and the agentshould notresume the target thread while also callingGetObjectMonitorUsage().The calls were allowed to be made out of order soagent bug.
Perhaps. I was thinking more generally about anindependent resume, but you're right that doesn'treally make a lot of sense. But when the spec saysnothing about suspension ...
And it is intentional that suspension is not required.JVM/DI and JVM/PIused to require suspension for these kinds ofget-the-info APIs. JVM/TI
intentionally was designed to not require suspension.
As I've said before, we could add a note about the databeing potentiallystale unless SuspendThread is used. I think of it likestat(2). You canfetch the file's info, but there's no guarantee that theinfo is currentby the time you process what you got back. Is it toomuch motherhood tostate that the data might be stale? I could go eitherway...
Using a handshake on the owner thread will allow thisto be fixed in the future without forcing/using anysafepoints.
I have to think about that which is why I'm avoidingtalking about
handshakes in this thread.
Effectively the handshake can "suspend" the threadwhilst the monitor is queried. In effect the operationwould create a per-thread safepoint.
I "know" that, but I still need time to think about itand probably
see the code to see if there are holes...
Semantically it is no different to the code actuallysuspending the owner thread, but it can't actually dothat because suspends/resume don't nest.
Yeah... we used have a suspend count back when wetracked internal and
external suspends separately. That was a nightmare...

Dan
Cheers,
David
Dan
Cheers,
David
Dan
JavaThread::is_ext_suspend_completed() is used tocheck thread state, it returns `true` when thethread is sleeping [3], or when it performs innative [4].
Sure but if the thread is actually suspended itcan't continue execution in the VM or in Java code.
This appears to be an optimisation for theassumed common case where threads are firstsuspended and then the monitors are queried.
I agree with this, but I could find out it fromJVMTI spec - it just says "Get information aboutthe object's monitor."
Yes it was just an implementation optimisation,nothing to do with the spec.
GetObjectMonitorUsage() might return incorrectinformation in some case.
It starts with finding owner thread, but the ownermight be just before wakeup.So I think it is more safe ifGetObjectMonitorUsage() is called at safepoint inany case.
Except we're moving away from safepoints to usingHandshakes, so this particular operation willrequire that the apparent owner is Handshake-safe(by entering a handshake with it) before queryingthe monitor. This would still be preferable I thinkto always using a safepoint for the entire operation.
Cheers,
David
-----
Thanks,

Yasumasa
[3]http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l671[4]http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/runtime/thread.cpp#l684
However there is still a potential bug as thethread reported as the owner may not be suspendedat the time we first see it, and may release themonitor, but then it may get suspended before wecall:
owning_thread =Threads::owning_thread_from_monitor_owner(tlh.list(),owner);
and so we think it is still the monitor owner andproceed to query the monitor information in aracy way. This can't happen when suspensionitself requires a safepoint as the current threadwon't go to that safepoint during this code.However, if suspension is implemented via adirect handshake with the target thread then wehave a problem.
David
-----
Thanks,

Yasumasa
[1]http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l973[2]http://hg.openjdk.java.net/jdk/jdk/file/76a17c8143d8/src/hotspot/share/prims/jvmtiEnvBase.cpp#l996

Re: RFR: 8247729: GetObjectMonitorUsage() might return inconsistent information

Reply via email to