Hi Robbin,

On 29/10/2018 6:08 AM, Robbin Ehn wrote:
Hi Dan,

Thanks for looking at this, here is the update:
Inc: http://cr.openjdk.java.net/~rehn/8212933/v2/inc/webrev/
Full: http://cr.openjdk.java.net/~rehn/8212933/v2/webrev/

I can't say I really understand the change in protocol here and why all the cancel operations are no longer needed. I see the handshake VM operations reusing the initial "threads list" but I'm unclear how they might be affected if additional threads are added to the system before the Threads_lock is acquired?

A couple of specific comments:

src/hotspot/share/runtime/handshake.hpp

cancel_inner() is dead now.

---

src/hotspot/share/runtime/handshake.cpp

This was an odd looking for loop before your change and now looks even more strange:

 for ( ; JavaThread *thr = jtiwh.next(); ) {

can it not simply be a more normal looking:

 for (JavaThread *thr = jtiwh.next(); thr != NULL; thr = jtiwh.next()) {

?

---

Thanks,
David

/Robbin

On 26/10/2018 17:38, Daniel D. Daugherty wrote:
On 10/26/18 10:33 AM, Robbin Ehn wrote:
Hi, please review.

When the VM thread executes a handshake it uses different ThreadsLists during the execution. A JavaThread that is armed for the handshake when it is already in the exit path in VM will cancel the handshake. Even if the VM thread cannot see this thread after the initial ThreadsList which where used for arming, the
handshake can progress when the exiting thread cancels the handshake.

But if a third thread takes a ThreadsList where the exiting JavaThread is present and tries to execute a VM operation, hence waiting on VM thread to finish the handshake, the JavaThread in the exit path can never reach the handshake cancellation point. VM thread cannot finishes the handshake and the third thread is stuck waiting on the VM thread.

To allow holding a ThreadsList when executing a VM operation we instead let the VM thread use the same ThreadsList over the entire handshake making all armed threads visible to the VM thread at all time. And if VM thread spots a terminated thread it will count that thread is already done by only clearing
it's operation.

Passes local stress testing, t1-5 and the deadlock is no longer reproduce-able.
Added a jtreg handshake + thread suspend test as a reproducer.

Issue: https://bugs.openjdk.java.net/browse/JDK-8212933
Code: http://cr.openjdk.java.net/~rehn/8212933/v1/webrev/

src/hotspot/share/runtime/handshake.hpp
     No comments.

src/hotspot/share/runtime/handshake.cpp
     L358: void HandshakeState::process_by_vmthread(JavaThread* target) {
     L359:   assert(Thread::current()->is_VM_thread(), "should call from vm thread");
         Both calls to handshake_process_by_vmthread() which calls this
         function are made with the Threads_lock held:

         MutexLockerEx ml(Threads_lock, Mutex::_no_safepoint_check_flag);

         Looks like the lock is grabbed because of
         possibly_vmthread_can_process_handshake() which asserts:

         L351:   // An externally suspended thread cannot be resumed while the
         L352:   // Threads_lock is held so it is safe.
         L353:   // Note that this method is allowed to produce false positives.          L354:   assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");
         L355:   if (target->is_ext_suspended()) {
         L356:     return true;
         L357:   }

         Also looks like vmthread_can_process_handshake() needs the
         Threads_lock for the same externally suspended thread check.

         So I was going to ask that you add:

         assert(Threads_lock->owned_by_self(), "Not holding Threads_lock.");

         after L359, but how about a comment instead:

         // Threads_lock must be held here, but that is assert()ed in
         // possibly_vmthread_can_process_handshake().


src/hotspot/share/runtime/thread.hpp
     No comments.

src/hotspot/share/runtime/thread.cpp
     No comments.

src/hotspot/share/runtime/threadSMR.cpp
     No comments.

test/hotspot/jtreg/runtime/handshake/HandshakeWalkSuspendExitTest.java
     Very nice test! It specifically exercises ThreadLocalHandshakes
     with JavaThread suspend/resume. runtime/Thread/SuspendAtExit.java
     only ran into this bug by accident (JDK-8212152) so I like the
     targeted test.

     L49:         while(!exit_now) {
         nit - please add a space before '('

     L51:             for (int i = 0; i < _threads.length; i+=2) {
     L58:             for (int i = 0; i < _threads.length; i+=2) {
         nit - please added spaces around '+='

         So why every other thread? A comment would be good...

     L52:                 wb.handshakeWalkStack(null, true);
         I'm guessing the 'null' parameter means current thread, but
         that's a guess on my part. A comment would be good.

     L82:         for (int i = 0; i < _threads.length; i++) {
     L83:             _threads[i].join();
     L84:         }
         Thanks for cleaning up the test_threads. That will make
         the JTREG thread sweeper happy. However, you don't save
         the test_exit_thread references and you don't clean those
         up either. Yes, I realize that they are supposed to exit,
         but if something hangs up on exit, I'd rather have a join()
         hang failure in this test's code than have the JTREG thread
         sweeper catch it.

Dan


Thanks, Robbin

Reply via email to