Clarification ...
On 17/12/2019 12:40 pm, coleen.phillim...@oracle.com wrote:
Short answer below.
On 12/16/19 5:51 PM, David Holmes wrote:
Hi Coleen,
A quick initial response ...
On 16/12/2019 11:26 pm, coleen.phillim...@oracle.com wrote:
On 12/16/19 8:04 AM, David Holmes wrote:
Hi Coleen,
On 16/12/2019 9:41 pm, coleen.phillim...@oracle.com wrote:
Summary: Start ServiceThread before compiler threads, and run
nmethod barriers for zgc before adding to the service thread queue,
or posting the events on the java thread queue.
I can't comment on most of this but the earlier starting of the
service thread has some concerns:
- there is a lot of JDK level initialization which now will not have
happened before the service thread is started and it is far from
obvious that all possible initialization dependencies will be satisfied
I agree that the order of initialization is very sensitive. From the
actions that the service thread does, the one that I found was a
problem was that events were posted before the LIVE phase (see
comment in has_events()), which could have happened with the existing
code, but the window for the race is a lot smaller. The other
actions can be run if there's a GC before initialization but would be
a bug in the initialization code, and I didn't find these bugs in all
my testing. There are some ordering dependencies that do have odd
side effects (between the compiler thread startup and initialization
jsr292 classes) which have comments. This patch doesn't touch those.
- current starting of the service thread in Management::initialize
is guarded by "#if INCLUDE_MANAGEMENT", but now you are starting the
service thread unconditionally for all builds. Hmm just saw your
latest comment to the bug report - so the service thread is now (for
quite some time?) being used for other than management tasks and so
should always be present even if INCLUDE_MANAGEMENT is not enabled.
Is that sufficient or are there likely to be other changes needed to
actually ensure that all works correctly e.g. any code the service
thread executes that is only defined for INCLUDE_MANAGEMENT will
need to be compiled out explicitly.
I asked Jie offline to check the minimal build. I don't think there
are other INCLUDE_MANAGEMENT actions in the service thread and I'm
not sure why it was initialized there in the first place. The
minimal vm would have been broken ie. hashtables would not have been
cleaned up, etc, but I'm not sure how well that is tested or if one
would notice.
- the service thread and the notification thread are (were?) closely
related but now started at completely different times
The notification thread is limited to "services" so it makes sense
where it is. The ServiceThread does lots of other things. Maybe it
needs renaming in 15.
The bug report states the problem as:
"The graal crash is because compiled_method_load events are added to
the ServiceThread's deferred event queue before the ServiceThread is
created so are not walked to keep them from being zombied."
so why isn't the solution to ensure the deferred event queue is
walked? I'm not clear how starting the service thread relates to
walking the queue.
The service thread is responsible for walking the deferred event
queue. See ServiceThread::oops_do/nmethods_do. The design could
be changed to have some global walk somewhere of this queue, but
essentially this queue is processed by the service thread.
Sorry I don't follow. I thought "oops_do" and friends are for the GC
threads and/or VMThread to call to process oops when GC updates them.
The oops_do and nmethods_do() can be called by a thread walk in
handshakes (by the sweeper thread) and by parallel GC thread walks.
There isn't a single entry to do the thread-specific closures that we
need to do for these deferred event queues. I tried a version that
walked the queues with a static call but missed some places where it
would be needed to make this call (didn't work). Keeping this
associated with the ServiceThread simplifies a lot.
Just to clarify that further, the thread walk requires the thread
appears in ALL_JAVA_THREADS but that only happens after the
ServiceThread has been started. So in essence we don't really need the
ServiceThread to have commenced execution earlier, but we need it to
have been created. Those two steps are combined in practice.
Cheers,
David
thanks,
Coleen
David
-----
I had an additional change to make the queue non-static but want to
limit the change at this point.
Thanks,
Coleen
Thanks,
David
See bug for description of the problems found with the new
Zombie.java test.
open webrev at
http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8235829
Ran tier1 all platforms, and tier2-8 testing, as well as rerunning
original test failure from bug
https://bugs.openjdk.java.net/browse/JDK-8173361.
Thanks,
Coleen