> http://cr.openjdk.java.net/~dholmes/8130728/webrev/

src/share/vm/runtime/globals.hpp
    No comments.

Thumbs up.

Time to see what shakes out with this option turned off.

The BSD/MacOS X blame belongs to me. Our thinking at the time was
to keep the "bsd" files matching the "linux" version unless we
had a specific reason for changing the code. Of course, we thought
we would come back later and clean things up... which just didn't
happen... I don't know the AIX history.

Dan


On 7/9/15 3:27 AM, David Holmes wrote:
https://bugs.openjdk.java.net/browse/JDK-8130728

The workaround triggered by WorkAroundNPTLTimedWaitHang (an experimental flag set to 1 by default) was to alleviate a hang that could occur on a pthread_cond_timedwait on Linux, if the timeout value was a time in the past - see JDK-6314875. This glibc bug was fixed in 2005 in glibc 2.3.5-3

https://lists.debian.org/debian-glibc/2005/08/msg00201.html

but the workaround persists and was (unfortunately) copied into the BSD and AIX ports.

It is time to remove that workaround but before we can do that we need to be sure that we are not in fact hitting the workaround code. To that end I propose to use this bug to switch the flag's value to 0 to verify correct operation.

If that goes well then we can remove the code either later in JDK 9 or early in JDK 10.

To be clear the flag impacts both the wait and the signal part of the condvar usage (park and unpark). On the wait side we have:

    status = pthread_cond_timedwait(_cond, _mutex, &abst);
    if (status != 0 && WorkAroundNPTLTimedWaitHang) {
      pthread_cond_destroy(_cond);
      pthread_cond_init(_cond, os::Linux::condAttr());
    }

so we will now skip this potential recovery code.

On the signal side we signal while holding the mutex if the workaround is enabled, and we signal after dropping the mutex otherwise. So this code will now be using a new path. Here is the code in PlatformEvent::unpark

  assert(AnyWaiters == 0 || AnyWaiters == 1, "invariant");
  if (AnyWaiters != 0 && WorkAroundNPTLTimedWaitHang) {
    AnyWaiters = 0;
    pthread_cond_signal(_cond);
  }
  status = pthread_mutex_unlock(_mutex);
  assert_status(status == 0, status, "mutex_unlock");
  if (AnyWaiters != 0) {
// Note that we signal() *after* dropping the lock for "immortal" Events.
    // This is safe and avoids a common class of  futile wakeups. In rare
    // circumstances this can cause a thread to return prematurely from
// cond_{timed}wait() but the spurious wakeup is benign and the victim
    // will simply re-test the condition and re-park itself.
// This provides particular benefit if the underlying platform does not
    // provide wait morphing.
    status = pthread_cond_signal(_cond);
    assert_status(status == 0, status, "cond_signal");
  }

and similarly in Parker::unpark

     if (WorkAroundNPTLTimedWaitHang) {
        status = pthread_cond_signal(&_cond[_cur_index]);
        assert(status == 0, "invariant");
        status = pthread_mutex_unlock(_mutex);
        assert(status == 0, "invariant");
      } else {
        status = pthread_mutex_unlock(_mutex);
        assert(status == 0, "invariant");
        status = pthread_cond_signal(&_cond[_cur_index]);
        assert(status == 0, "invariant");
      }

This may cause performance perturbations that will need to be investigated - but in theory, as per the comment above, signalling outside the lock should be beneficial in the linux case because there is no wait-morphing (where a signal simply takes a waiting thread and places it into the mutex queue thereby avoiding the need to awaken the thread so it can enqueue itself).

The change itself is of course trivial:

http://cr.openjdk.java.net/~dholmes/8130728/webrev/

I'd appreciate any comments/concerns particularly from the BSD and AIX folk who acquired this unnecessary workaround.

If deemed necessary we could add a flag that controls whether the signal happens with or without the lock held.

Thanks,
David

Reply via email to