Re: [14] RFR 8235829: graal crashes with Zombie.java test

David Holmes Wed, 18 Dec 2019 05:48:54 -0800

Thanks for the additional info Coleen!

Just to add a bit more to the initialization history. The ServiceThreadis a generalization of the LowMemoryDetectorThread that was part of themanagement API, and so it was initialized in Management::initialize.When it turned into the ServiceThread - to process JVMTI deferred eventsin addition to the low-memory-detector events - the initializationplacement remained the same. Then later the INCLUDE_MANAGEMENT guardswere added (JDK-7189254, October 2012). Later still we started addingother items of work for the ServiceThread. The earliest was theAllocationContextService notification in September 2014 but as that nolonger exists I can't tell if that was the first non-management relateduse. Then the StringTable use was added 18 months ago - which definitelywas outside the realm of the management API. So that is when theMinimalVM was first "broken". So it is good that is fixed.

With regard to the placement in the initialization order, my remainingconcern was with JVMTI event processing that might happen via eventsgenerated very early in the init sequence. But you have now modifiedthings so that we will only process events in the LIVE phase, which onlyactivates after all the class library initialization is complete.

So overall I'm no longer significantly concerned about the change to theinitialization order as I think you have it all covered. Thanks forbearing with me and all the off-list discussion.


Cheers,
David
-----

On 18/12/2019 1:27 am, coleen.phillim...@oracle.com wrote:

On 12/16/19 11:04 PM, David Holmes wrote:
Clarification ...

On 17/12/2019 12:40 pm, coleen.phillim...@oracle.com wrote:
Short answer below.

On 12/16/19 5:51 PM, David Holmes wrote:
Hi Coleen,

A quick initial response ...

On 16/12/2019 11:26 pm, coleen.phillim...@oracle.com wrote:
On 12/16/19 8:04 AM, David Holmes wrote:
Hi Coleen,

On 16/12/2019 9:41 pm, coleen.phillim...@oracle.com wrote:
Summary: Start ServiceThread before compiler threads, and runnmethod barriers for zgc before adding to the service threadqueue, or posting the events on the java thread queue.
I can't comment on most of this but the earlier starting of theservice thread has some concerns:
- there is a lot of JDK level initialization which now will nothave happened before the service thread is started and it is farfrom obvious that all possible initialization dependencies will besatisfied
I agree that the order of initialization is very sensitive. Fromthe actions that the service thread does, the one that I found wasa problem was that events were posted before the LIVE phase (seecomment in has_events()), which could have happened with theexisting code, but the window for the race is a lot smaller. Theother actions can be run if there's a GC before initialization butwould be a bug in the initialization code, and I didn't find thesebugs in all my testing. There are some ordering dependencies thatdo have odd side effects (between the compiler thread startup andinitialization jsr292 classes) which have comments. This patchdoesn't touch those.
- current starting of the service thread in Management::initializeis guarded by "#if INCLUDE_MANAGEMENT", but now you are startingthe service thread unconditionally for all builds. Hmm just sawyour latest comment to the bug report - so the service thread isnow (for quite some time?) being used for other than managementtasks and so should always be present even if INCLUDE_MANAGEMENTis not enabled. Is that sufficient or are there likely to be otherchanges needed to actually ensure that all works correctly e.g.any code the service thread executes that is only defined forINCLUDE_MANAGEMENT will need to be compiled out explicitly.
I asked Jie offline to check the minimal build. I don't thinkthere are other INCLUDE_MANAGEMENT actions in the service threadand I'm not sure why it was initialized there in the first place.The minimal vm would have been broken ie. hashtables would not havebeen cleaned up, etc, but I'm not sure how well that is tested orif one would notice.
- the service thread and the notification thread are (were?)closely related but now started at completely different times
The notification thread is limited to "services" so it makes sensewhere it is. The ServiceThread does lots of other things. Maybeit needs renaming in 15.
The bug report states the problem as:
"The graal crash is because compiled_method_load events are addedto the ServiceThread's deferred event queue before theServiceThread is created so are not walked to keep them from beingzombied."
so why isn't the solution to ensure the deferred event queue iswalked? I'm not clear how starting the service thread relates towalking the queue.
The service thread is responsible for walking the deferred eventqueue. See ServiceThread::oops_do/nmethods_do. The design couldbe changed to have some global walk somewhere of this queue, butessentially this queue is processed by the service thread.
Sorry I don't follow. I thought "oops_do" and friends are for the GCthreads and/or VMThread to call to process oops when GC updates them.
The oops_do and nmethods_do() can be called by a thread walk inhandshakes (by the sweeper thread) and by parallel GC thread walks.There isn't a single entry to do the thread-specific closures that weneed to do for these deferred event queues. I tried a version thatwalked the queues with a static call but missed some places where itwould be needed to make this call (didn't work). Keeping thisassociated with the ServiceThread simplifies a lot.
Just to clarify that further, the thread walk requires the threadappears in ALL_JAVA_THREADS but that only happens after theServiceThread has been started. So in essence we don't really need theServiceThread to have commenced execution earlier, but we need it tohave been created. Those two steps are combined in practice.
Yes. Then the ServiceThread waits on the Service_lock until notified bythese events:
while (((sensors_changed = (!UseNotificationThread &&LowMemoryDetector::has_pending_requests())) |
               (has_jvmti_events = _jvmti_service_queue.has_events()) |
(has_gc_notification_event = (!UseNotificationThread &&GCNotifier::has_event())) | (has_dcmd_notification_event = (!UseNotificationThread &&DCmdFactory::has_pending_jmx_notification())) |
               (stringtable_work = StringTable::has_work()) |
               (symboltable_work = SymbolTable::has_work()) |
(resolved_method_table_work =ResolvedMethodTable::has_work()) |
               (thread_id_table_work = ThreadIdTable::has_work()) |
(protection_domain_table_work =SystemDictionary::pd_cache_table()->has_work()) |
               (oopstorage_work = OopStorage::has_cleanup_work_and_reset())
              ) == 0) {
The first, third and fourth events are from management.cpp events thatwere initialized after the ServiceThread was started.
The second event I have changed, to wait until LIVE phase to return true.
The stringtable, symboltable, resolved_method_table, thread_id and pdtable have static _has_work variables initialized to false.The oopstorage_work has similar, but may update a time-based counter abit earlier with the service thread starting earlier. I think this isharmless.
It is possible that after the service thread starts and before thecompiler thread starts, there could be a GC that notifies thestringtable to clean up. This seems like a good thing that the GC wouldclean up these tables with this order. I ran the tier4 graal tests andthere were no failures.
Thanks,
Coleen
Cheers,
David
thanks,
Coleen
David
-----
I had an additional change to make the queue non-static but want tolimit the change at this point.
Thanks,
Coleen
Thanks,
David
See bug for description of the problems found with the newZombie.java test.
open webrev athttp://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8235829
Ran tier1 all platforms, and tier2-8 testing, as well asrerunning original test failure from bughttps://bugs.openjdk.java.net/browse/JDK-8173361.
Thanks,
Coleen

Re: [14] RFR 8235829: graal crashes with Zombie.java test

Reply via email to