Re: [14] RFR 8235829: graal crashes with Zombie.java test

coleen . phillimore Wed, 18 Dec 2019 14:07:22 -0800

Thanks Serguei!
Coleen

On 12/18/19 1:33 PM, [email protected] wrote:

Hi Coleen,
Just wanted to confirm the webrev V2 version looks okay to me.
Sorry for replying on the wrong mailing thread.

Thanks,
Serguei


On 12/18/19 08:42, [email protected] wrote:
On 12/18/19 8:45 AM, David Holmes wrote:
Thanks for the additional info Coleen!
Just to add a bit more to the initialization history. TheServiceThread is a generalization of the LowMemoryDetectorThreadthat was part of the management API, and so it was initialized inManagement::initialize. When it turned into the ServiceThread - toprocess JVMTI deferred events in addition to the low-memory-detectorevents - the initialization placement remained the same. Then laterthe INCLUDE_MANAGEMENT guards were added (JDK-7189254, October2012). Later still we started adding other items of work for theServiceThread. The earliest was the AllocationContextServicenotification in September 2014 but as that no longer exists I can'ttell if that was the first non-management related use. Then theStringTable use was added 18 months ago - which definitely wasoutside the realm of the management API. So that is when theMinimalVM was first "broken". So it is good that is fixed.
With regard to the placement in the initialization order, myremaining concern was with JVMTI event processing that might happenvia events generated very early in the init sequence. But you havenow modified things so that we will only process events in the LIVEphase, which only activates after all the class libraryinitialization is complete.
So overall I'm no longer significantly concerned about the change tothe initialization order as I think you have it all covered. Thanksfor bearing with me and all the off-list discussion.
Thank you for having this discussion with me and provoking me torecheck the ServiceThread. I think we can do further work tofuture-proof initialization order but the design needs to be improved.
Thanks for reviewing,
Coleen
Cheers,
David
-----

On 18/12/2019 1:27 am, [email protected] wrote:
On 12/16/19 11:04 PM, David Holmes wrote:
Clarification ...

On 17/12/2019 12:40 pm, [email protected] wrote:
Short answer below.

On 12/16/19 5:51 PM, David Holmes wrote:
Hi Coleen,

A quick initial response ...

On 16/12/2019 11:26 pm, [email protected] wrote:
On 12/16/19 8:04 AM, David Holmes wrote:
Hi Coleen,

On 16/12/2019 9:41 pm, [email protected] wrote:
Summary: Start ServiceThread before compiler threads, and runnmethod barriers for zgc before adding to the service threadqueue, or posting the events on the java thread queue.
I can't comment on most of this but the earlier starting ofthe service thread has some concerns:
- there is a lot of JDK level initialization which now willnot have happened before the service thread is started and itis far from obvious that all possible initializationdependencies will be satisfied
I agree that the order of initialization is very sensitive.From the actions that the service thread does, the one that Ifound was a problem was that events were posted before the LIVEphase (see comment in has_events()), which could have happenedwith the existing code, but the window for the race is a lotsmaller. The other actions can be run if there's a GC beforeinitialization but would be a bug in the initialization code,and I didn't find these bugs in all my testing. There are someordering dependencies that do have odd side effects (betweenthe compiler thread startup and initialization jsr292 classes)which have comments. This patch doesn't touch those.
- current starting of the service thread inManagement::initialize is guarded by "#if INCLUDE_MANAGEMENT",but now you are starting the service thread unconditionallyfor all builds. Hmm just saw your latest comment to the bugreport - so the service thread is now (for quite some time?)being used for other than management tasks and so shouldalways be present even if INCLUDE_MANAGEMENT is not enabled.Is that sufficient or are there likely to be other changesneeded to actually ensure that all works correctly e.g. anycode the service thread executes that is only defined forINCLUDE_MANAGEMENT will need to be compiled out explicitly.
I asked Jie offline to check the minimal build. I don't thinkthere are other INCLUDE_MANAGEMENT actions in the servicethread and I'm not sure why it was initialized there in thefirst place. The minimal vm would have been broken ie.hashtables would not have been cleaned up, etc, but I'm notsure how well that is tested or if one would notice.
- the service thread and the notification thread are (were?)closely related but now started at completely different times
The notification thread is limited to "services" so it makessense where it is. The ServiceThread does lots of otherthings. Maybe it needs renaming in 15.
The bug report states the problem as:
"The graal crash is because compiled_method_load events areadded to the ServiceThread's deferred event queue before theServiceThread is created so are not walked to keep them frombeing zombied."
so why isn't the solution to ensure the deferred event queueis walked? I'm not clear how starting the service threadrelates to walking the queue.
The service thread is responsible for walking the deferredevent queue. See ServiceThread::oops_do/nmethods_do. Thedesign could be changed to have some global walk somewhere ofthis queue, but essentially this queue is processed by theservice thread.
Sorry I don't follow. I thought "oops_do" and friends are forthe GC threads and/or VMThread to call to process oops when GCupdates them.
The oops_do and nmethods_do() can be called by a thread walk inhandshakes (by the sweeper thread) and by parallel GC threadwalks. There isn't a single entry to do the thread-specificclosures that we need to do for these deferred event queues. Itried a version that walked the queues with a static call butmissed some places where it would be needed to make this call(didn't work). Keeping this associated with the ServiceThreadsimplifies a lot.
Just to clarify that further, the thread walk requires the threadappears in ALL_JAVA_THREADS but that only happens after theServiceThread has been started. So in essence we don't really needthe ServiceThread to have commenced execution earlier, but we needit to have been created. Those two steps are combined in practice.
Yes. Then the ServiceThread waits on the Service_lock untilnotified by these events:
while (((sensors_changed = (!UseNotificationThread &&LowMemoryDetector::has_pending_requests())) | (has_jvmti_events =_jvmti_service_queue.has_events()) | (has_gc_notification_event = (!UseNotificationThread&& GCNotifier::has_event())) | (has_dcmd_notification_event =(!UseNotificationThread &&DCmdFactory::has_pending_jmx_notification())) |
               (stringtable_work = StringTable::has_work()) |
               (symboltable_work = SymbolTable::has_work()) |
(resolved_method_table_work =ResolvedMethodTable::has_work()) |
               (thread_id_table_work = ThreadIdTable::has_work()) |
(protection_domain_table_work =SystemDictionary::pd_cache_table()->has_work()) | (oopstorage_work =OopStorage::has_cleanup_work_and_reset())
              ) == 0) {
The first, third and fourth events are from management.cpp eventsthat were initialized after the ServiceThread was started.The second event I have changed, to wait until LIVE phase to returntrue.The stringtable, symboltable, resolved_method_table, thread_id andpd table have static _has_work variables initialized to false.The oopstorage_work has similar, but may update a time-basedcounter a bit earlier with the service thread starting earlier. Ithink this is harmless.
It is possible that after the service thread starts and before thecompiler thread starts, there could be a GC that notifies thestringtable to clean up. This seems like a good thing that the GCwould clean up these tables with this order. I ran the tier4 graaltests and there were no failures.
Thanks,
Coleen
Cheers,
David
thanks,
Coleen
David
-----
I had an additional change to make the queue non-static butwant to limit the change at this point.
Thanks,
Coleen
Thanks,
David
See bug for description of the problems found with the newZombie.java test.
open webrev athttp://cr.openjdk.java.net/~coleenp/2019/8235829.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8235829
Ran tier1 all platforms, and tier2-8 testing, as well asrerunning original test failure from bughttps://bugs.openjdk.java.net/browse/JDK-8173361.
Thanks,
Coleen

Re: [14] RFR 8235829: graal crashes with Zombie.java test

Reply via email to