On 6/15/20 9:28 PM, David Holmes wrote:
On 16/06/2020 10:57 am, Daniel D. Daugherty wrote:
On 6/15/20 7:19 PM, David Holmes wrote:
On 16/06/2020 8:40 am, Daniel D. Daugherty wrote:
On 6/15/20 6:14 PM, David Holmes wrote:
Hi Dan,
On 15/06/2020 11:38 pm, Daniel D. Daugherty wrote:
On 6/15/20 3:26 AM, David Holmes wrote:
On 15/06/2020 4:02 pm, Yasumasa Suenaga wrote:
Hi David,
On 2020/06/15 14:15, David Holmes wrote:
Hi Yasumasa,
On 15/06/2020 2:49 pm, Yasumasa Suenaga wrote:
Hi all,
I wonder why JvmtiEnvBase::get_object_monitor_usage()
(implementation of GetObjectMonitorUsage()) does not
perform at safepoint.
GetObjectMonitorUsage will use a safepoint if the target
is not suspended:
jvmtiError
JvmtiEnv::GetObjectMonitorUsage(jobject object,
jvmtiMonitorUsage* info_ptr) {
JavaThread* calling_thread = JavaThread::current();
jvmtiError err =
get_object_monitor_usage(calling_thread, object, info_ptr);
if (err == JVMTI_ERROR_THREAD_NOT_SUSPENDED) {
// Some of the critical threads were not suspended.
go to a safepoint and try again
VM_GetObjectMonitorUsage op(this, calling_thread,
object, info_ptr);
VMThread::execute(&op);
err = op.result();
}
return err;
} /* end GetObject */
I saw this code, so I guess there are some cases when
JVMTI_ERROR_THREAD_NOT_SUSPENDED is not returned from
get_object_monitor_usage().
Monitor owner would be acquired from monitor object at
first [1], but it would perform concurrently.
If owner thread is not suspended, the owner might be
changed to others in subsequent code.
For example, the owner might release the monitor before
[2].
The expectation is that when we find an owner thread it
is either suspended or not. If it is suspended then it
cannot release the monitor. If it is not suspended we
detect that and redo the whole query at a safepoint.
I think the owner thread might resume unfortunately after
suspending check.
Yes you are right. I was thinking resuming also required a
safepoint but it only requires the Threads_lock. So yes
the code is wrong.
Which code is wrong?
Yes, a rogue resume can happen when the
GetObjectMonitorUsage() caller
has started the process of gathering the information while
not at a
safepoint. Thus the information returned by
GetObjectMonitorUsage()
might be stale, but that's a bug in the agent code.
The code tries to make sure that it either collects data
about a monitor owned by a thread that is suspended, or else
it collects that data at a safepoint. But the owning thread
can be resumed just after the code determined it was
suspended. The monitor can then be released and the
information gathered not only stale but potentially
completely wrong as it could now be owned by a different
thread and will report that thread's entry count.
If the agent is not using SuspendThread(), then as soon as
GetObjectMonitorUsage() returns to the caller the information
can be stale. In fact as soon as the implementation returns
from the safepoint that gathered the info, the target thread
could have moved on.
That isn't the issue. That the info is stale is fine. But the
expectation is that the information was actually an accurate
snapshot of the state of the monitor at some point in time.
The current code does not ensure that.
Please explain. I clearly don't understand why you think the info
returned isn't "an accurate snapshot of the state of the monitor
at some point in time".
Because it may not be a "snapshot" at all. There is no
atomicity**. The reported owner thread may not own it any longer
when the entry count is read, so straight away you may have the
wrong entry count information. The set of threads trying to
acquire the monitor, or wait on the monitor can change in
unexpected ways. It would be possible for instance to report the
same thread as being the owner, being blocked trying to enter
the monitor, and being in the wait-set of the monitor -
apparently all at the same time!
** even if the owner is suspended we don't have complete
atomicity because threads can join the set of threads trying to
enter the monitor (unless they are all suspended).
Consider the case when the monitor's owner is _not_ suspended:
- GetObjectMonitorUsage() uses a safepoint to gather the info
about
the object's monitor. Since we're at a safepoint, the info that
we are gathering cannot change until we return from the
safepoint.
It is a snapshot and a valid one at that.
Consider the case when the monitor's owner is suspended:
- GetObjectMonitorUsage() will gather info about the object's
monitor while _not_ at a safepoint. Assuming that no other
thread is suspended, then entry_count can change because
another thread can block on entry while we are gathering
info. waiter_count and waiters can change if a thread was
in a timed wait that has timed out and now that thread is
blocked on re-entry. I don't think that notify_waiter_count
and notify_waiters can change.
So in this case, the owner info and notify info is stable,
but the entry_count and waiter info is not stable.
Consider the case when the monitor is not owned:
- GetObjectMonitorUsage() will start to gather info about the
object's monitor while _not_ at a safepoint. If it finds a
thread on the entry queue that is not suspended, then it will
bail out and redo the info gather at a safepoint. I just
noticed that it doesn't check for suspension for the threads
on the waiters list so a timed Object.wait() call can cause
some confusion here.
So in this case, the owner info is not stable if a thread
comes out of a timed wait and reenters the monitor. This
case is no different than if a "barger" thread comes in
after the NULL owner field is observed and enters the
monitor. We'll return that there is no owner, a list of
suspended pending entry thread and a list of waiting
threads. The reality is that the object's monitor is
owned by the "barger" that completely bypassed the entry
queue by virtue of seeing the NULL owner field at exactly
the right time.
So the owner field is only stable when we have an owner. If
that owner is not suspended, then the other fields are also
stable because we gathered the info at a safepoint. If the
owner is suspended, then the owner and notify info is stable,
but the entry_count and waiter info is not stable.
If we have a NULL owner field, then the info is only stable
if you have a non-suspended thread on the entry list. Ouch!
That's deterministic, but not without some work.
Okay so only when we gather the info at a safepoint is all
of it a valid and stable snapshot. Unfortunately, we only
do that at a safepoint when the owner thread is not suspended
or if owner == NULL and one of the entry threads is not
suspended. If either of those conditions is not true, then
the different pieces of info is unstable to varying degrees.
As for this claim:
It would be possible for instance to report the same thread
as being the owner, being blocked trying to enter the monitor,
and being in the wait-set of the monitor - apparently all at
the same time!
I can't figure out a way to make that scenario work. If the
thread is seen as the owner and is not suspended, then we
gather info at a safepoint. If it is suspended, then it can't
then be seen as on the entry queue or on the wait queue since
it is suspended. If it is seen on the entry queue and is not
suspended, then we gather info at a safepoint. If it is
suspended on the entry queue, then it can't be seen on the
wait queue.
So the info instability of this API is bad, but it's not
quite that bad. :-) (That is a small mercy.)
Handshaking is not going to make this situation any better
for GetObjectMonitorUsage(). If the monitor is owned and we
handshake with the owner, the stability or instability of
the other fields remains the same as when SuspendThread is
used. Handshaking with all threads won't make the data as
stable as when at a safepoint because individual threads
can resume execution after doing their handshake so there
will still be field instability.
Short version: GetObjectMonitorUsage() should only gather
data at a safepoint. Yes, I've changed my mind.