Re: RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default

Daniel D. Daugherty Fri, 10 Jul 2015 11:03:44 -0700

> http://cr.openjdk.java.net/~dholmes/8130728/webrev/


src/share/vm/runtime/globals.hpp
    No comments.

Thumbs up.

Time to see what shakes out with this option turned off.

The BSD/MacOS X blame belongs to me. Our thinking at the time was
to keep the "bsd" files matching the "linux" version unless we
had a specific reason for changing the code. Of course, we thought
we would come back later and clean things up... which just didn't
happen... I don't know the AIX history.

Dan


On 7/9/15 3:27 AM, David Holmes wrote:

https://bugs.openjdk.java.net/browse/JDK-8130728
The workaround triggered by WorkAroundNPTLTimedWaitHang (anexperimental flag set to 1 by default) was to alleviate a hang thatcould occur on a pthread_cond_timedwait on Linux, if the timeout valuewas a time in the past - see JDK-6314875. This glibc bug was fixed in2005 in glibc 2.3.5-3
https://lists.debian.org/debian-glibc/2005/08/msg00201.html
but the workaround persists and was (unfortunately) copied into theBSD and AIX ports.
It is time to remove that workaround but before we can do that we needto be sure that we are not in fact hitting the workaround code. Tothat end I propose to use this bug to switch the flag's value to 0 toverify correct operation.
If that goes well then we can remove the code either later in JDK 9 orearly in JDK 10.
To be clear the flag impacts both the wait and the signal part of thecondvar usage (park and unpark). On the wait side we have:
    status = pthread_cond_timedwait(_cond, _mutex, &abst);
    if (status != 0 && WorkAroundNPTLTimedWaitHang) {
      pthread_cond_destroy(_cond);
      pthread_cond_init(_cond, os::Linux::condAttr());
    }

so we will now skip this potential recovery code.
On the signal side we signal while holding the mutex if the workaroundis enabled, and we signal after dropping the mutex otherwise. So thiscode will now be using a new path. Here is the code inPlatformEvent::unpark
  assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
  if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
    AnyWaiters = 0;
    pthread_cond_signal(_cond);
  }
  status = pthread_mutex_unlock(_mutex);
  assert_status(status == 0, status, "mutex_unlock");
  if (AnyWaiters != 0) {
// Note that we signal() *after* dropping the lock for "immortal"Events.
    // This is safe and avoids a common class of  futile wakeups. In rare
    // circumstances this can cause a thread to return prematurely from
// cond_{timed}wait() but the spurious wakeup is benign and thevictim
    // will simply re-test the condition and re-park itself.
// This provides particular benefit if the underlying platformdoes not
    // provide wait morphing.
    status = pthread_cond_signal(_cond);
    assert_status(status == 0, status, "cond_signal");
  }

and similarly in Parker::unpark

     if (WorkAroundNPTLTimedWaitHang) {
        status = pthread_cond_signal(&_cond[_cur_index]);
        assert(status == 0, "invariant");
        status = pthread_mutex_unlock(_mutex);
        assert(status == 0, "invariant");
      } else {
        status = pthread_mutex_unlock(_mutex);
        assert(status == 0, "invariant");
        status = pthread_cond_signal(&_cond[_cur_index]);
        assert(status == 0, "invariant");
      }
This may cause performance perturbations that will need to beinvestigated - but in theory, as per the comment above, signallingoutside the lock should be beneficial in the linux case because thereis no wait-morphing (where a signal simply takes a waiting thread andplaces it into the mutex queue thereby avoiding the need to awaken thethread so it can enqueue itself).
The change itself is of course trivial:

http://cr.openjdk.java.net/~dholmes/8130728/webrev/
I'd appreciate any comments/concerns particularly from the BSD and AIXfolk who acquired this unnecessary workaround.
If deemed necessary we could add a flag that controls whether thesignal happens with or without the lock held.
Thanks,
David

Re: RFR: 8130728: Disable WorkAroundNPTLTimedWaitHang by default

Reply via email to