Thanks Serguei!
Coleen
On 12/18/19 1:33 PM, serguei.spit...@oracle.com wrote:
Hi Coleen,
Just wanted to confirm the webrev V2 version looks okay to me.
Sorry for replying on the wrong mailing thread.
Thanks,
Serguei
On 12/18/19 08:42, coleen.phillim...@oracle.com wrote:
On 12/18/19 8:45 AM, David Holmes wrote:
Thanks for the additional info Coleen!
Just to add a bit more to the initialization history. The
ServiceThread is a generalization of the LowMemoryDetectorThread
that was part of the management API, and so it was initialized in
Management::initialize. When it turned into the ServiceThread - to
process JVMTI deferred events in addition to the low-memory-detector
events - the initialization placement remained the same. Then later
the INCLUDE_MANAGEMENT guards were added (JDK-7189254, October
2012). Later still we started adding other items of work for the
ServiceThread. The earliest was the AllocationContextService
notification in September 2014 but as that no longer exists I can't
tell if that was the first non-management related use. Then the
StringTable use was added 18 months ago - which definitely was
outside the realm of the management API. So that is when the
MinimalVM was first "broken". So it is good that is fixed.
With regard to the placement in the initialization order, my
remaining concern was with JVMTI event processing that might happen
via events generated very early in the init sequence. But you have
now modified things so that we will only process events in the LIVE
phase, which only activates after all the class library
initialization is complete.
So overall I'm no longer significantly concerned about the change to
the initialization order as I think you have it all covered. Thanks
for bearing with me and all the off-list discussion.
Thank you for having this discussion with me and provoking me to
recheck the ServiceThread. I think we can do further work to
future-proof initialization order but the design needs to be improved.
Thanks for reviewing,
Coleen
Cheers,
David
-----
On 18/12/2019 1:27 am, coleen.phillim...@oracle.com wrote:
On 12/16/19 11:04 PM, David Holmes wrote:
Clarification ...
On 17/12/2019 12:40 pm, coleen.phillim...@oracle.com wrote:
Short answer below.
On 12/16/19 5:51 PM, David Holmes wrote:
Hi Coleen,
A quick initial response ...
On 16/12/2019 11:26 pm, coleen.phillim...@oracle.com wrote:
On 12/16/19 8:04 AM, David Holmes wrote:
Hi Coleen,
On 16/12/2019 9:41 pm, coleen.phillim...@oracle.com wrote:
Summary: Start ServiceThread before compiler threads, and run
nmethod barriers for zgc before adding to the service thread
queue, or posting the events on the java thread queue.
I can't comment on most of this but the earlier starting of
the service thread has some concerns:
- there is a lot of JDK level initialization which now will
not have happened before the service thread is started and it
is far from obvious that all possible initialization
dependencies will be satisfied
I agree that the order of initialization is very sensitive.
From the actions that the service thread does, the one that I
found was a problem was that events were posted before the LIVE
phase (see comment in has_events()), which could have happened
with the existing code, but the window for the race is a lot
smaller. The other actions can be run if there's a GC before
initialization but would be a bug in the initialization code,
and I didn't find these bugs in all my testing. There are some
ordering dependencies that do have odd side effects (between
the compiler thread startup and initialization jsr292 classes)
which have comments. This patch doesn't touch those.
- current starting of the service thread in
Management::initialize is guarded by "#if INCLUDE_MANAGEMENT",
but now you are starting the service thread unconditionally
for all builds. Hmm just saw your latest comment to the bug
report - so the service thread is now (for quite some time?)
being used for other than management tasks and so should
always be present even if INCLUDE_MANAGEMENT is not enabled.
Is that sufficient or are there likely to be other changes
needed to actually ensure that all works correctly e.g. any
code the service thread executes that is only defined for
INCLUDE_MANAGEMENT will need to be compiled out explicitly.
I asked Jie offline to check the minimal build. I don't think
there are other INCLUDE_MANAGEMENT actions in the service
thread and I'm not sure why it was initialized there in the
first place. The minimal vm would have been broken ie.
hashtables would not have been cleaned up, etc, but I'm not
sure how well that is tested or if one would notice.
- the service thread and the notification thread are (were?)
closely related but now started at completely different times
The notification thread is limited to "services" so it makes
sense where it is. The ServiceThread does lots of other
things. Maybe it needs renaming in 15.
The bug report states the problem as:
"The graal crash is because compiled_method_load events are
added to the ServiceThread's deferred event queue before the
ServiceThread is created so are not walked to keep them from
being zombied."
so why isn't the solution to ensure the deferred event queue
is walked? I'm not clear how starting the service thread
relates to walking the queue.
The service thread is responsible for walking the deferred
event queue. See ServiceThread::oops_do/nmethods_do. The
design could be changed to have some global walk somewhere of
this queue, but essentially this queue is processed by the
service thread.
Sorry I don't follow. I thought "oops_do" and friends are for
the GC threads and/or VMThread to call to process oops when GC
updates them.
The oops_do and nmethods_do() can be called by a thread walk in
handshakes (by the sweeper thread) and by parallel GC thread
walks. There isn't a single entry to do the thread-specific
closures that we need to do for these deferred event queues. I
tried a version that walked the queues with a static call but
missed some places where it would be needed to make this call
(didn't work). Keeping this associated with the ServiceThread
simplifies a lot.
Just to clarify that further, the thread walk requires the thread
appears in ALL_JAVA_THREADS but that only happens after the
ServiceThread has been started. So in essence we don't really need
the ServiceThread to have commenced execution earlier, but we need
it to have been created. Those two steps are combined in practice.
Yes. Then the ServiceThread waits on the Service_lock until
notified by these events:
while (((sensors_changed = (!UseNotificationThread &&
LowMemoryDetector::has_pending_requests())) |
(has_jvmti_events =
_jvmti_service_queue.has_events()) |
(has_gc_notification_event = (!UseNotificationThread
&& GCNotifier::has_event())) |
(has_dcmd_notification_event =
(!UseNotificationThread &&
DCmdFactory::has_pending_jmx_notification())) |
(stringtable_work = StringTable::has_work()) |
(symboltable_work = SymbolTable::has_work()) |
(resolved_method_table_work =
ResolvedMethodTable::has_work()) |
(thread_id_table_work = ThreadIdTable::has_work()) |
(protection_domain_table_work =
SystemDictionary::pd_cache_table()->has_work()) |
(oopstorage_work =
OopStorage::has_cleanup_work_and_reset())
) == 0) {
The first, third and fourth events are from management.cpp events
that were initialized after the ServiceThread was started.
The second event I have changed, to wait until LIVE phase to return
true.
The stringtable, symboltable, resolved_method_table, thread_id and
pd table have static _has_work variables initialized to false.
The oopstorage_work has similar, but may update a time-based
counter a bit earlier with the service thread starting earlier. I
think this is harmless.
It is possible that after the service thread starts and before the
compiler thread starts, there could be a GC that notifies the
stringtable to clean up. This seems like a good thing that the GC
would clean up these tables with this order. I ran the tier4 graal
tests and there were no failures.
Thanks,
Coleen
Cheers,
David
thanks,
Coleen
David
-----
I had an additional change to make the queue non-static but
want to limit the change at this point.
Thanks,
Coleen
Thanks,
David
See bug for description of the problems found with the new
Zombie.java test.
open webrev at
http://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8235829
Ran tier1 all platforms, and tier2-8 testing, as well as
rerunning original test failure from bug
https://bugs.openjdk.java.net/browse/JDK-8173361.
Thanks,
Coleen