On 2/18/15 7:02 PM, David Holmes wrote:
Hi Dan,

On 19/02/2015 2:28 AM, Daniel D. Daugherty wrote:
David,

Thanks for the fast review! Replies embedded below...


On 2/17/15 5:53 PM, David Holmes wrote:
Hi Dan,

A few comments, mainly on the comments :)

Apropos since the changeset is mostly comments...



On 18/02/2015 8:42 AM, Daniel D. Daugherty wrote:
On 2/17/15 3:22 PM, Carsten Varming wrote:
Dear Daniel,

Looks good to me.

Thanks for the fast review.

The line: "OrderAccess::fence();  // ensure WatcherThread sees update
in main loop" seems unnecessary as the lock acts as a memory barrier.

Yes, I keep looking at that line from the original work on
JDK-7127792 and wonder why it's there... I'll chase that down
with the original folks...

Only needed when using lock-free access, so can be removed now.

Oops - my bad!

Yes, that will be the likely outcome.

However, the original code added by JDK-7127792 wasn't lock-free so
I still want to chase down the original thinking.

The lock-free code was here:

void WatcherThread::run() {
  assert(this == watcher_thread(), "just checking");

  this->record_stack_base_and_size();
  this->initialize_thread_local_storage();
  this->set_active_handles(JNIHandleBlock::allocate_block());
  while(!_should_terminate) {

and of course you still have a lock-free access in the new code:

1300     if (_should_terminate) {
1301       // check for termination before posting the next tick
1302       break;
1303     }

so the fence() could stay, but its omission would be harmless I think because not seeing the update immediately is no different to having the update happen just after the check - it's racy code. The only other option would be to always access _should_terminate with the PeriodicTask_lock held - but that is subverting the intent of the lock somewhat as _should_terminate is part of the WatcherThread state. Actually the more I look at this the more I think the lock has already been subverted to assist with the WT termination protocol.

Just to make sure we're on the same page I think Carsten was
talking about this block (from 8072439):

1334   {
1335     // Follow normal safepoint aware lock enter protocol since the
1336     // WatcherThread is stopped by another JavaThread.
1337     MutexLocker ml(PeriodicTask_lock);
1338     _should_terminate = true;
1339 OrderAccess::fence(); // ensure WatcherThread sees update in main loop
1340
1341     WatcherThread* watcher = watcher_thread();
1342     if (watcher != NULL) {
1343 // unpark the WatcherThread so it can see that it should terminate
1344       watcher->unpark();
1345     }
1346   }

where the fence() on line 1339 is not needed because when we drop
the PeriodicTask_lock on line 1346 we get a fence() equivalent.

This is the block before my fix for JDK-8047720:

1360   {
1361 MutexLockerEx ml(PeriodicTask_lock, Mutex::_no_safepoint_check_flag);
1362     _should_terminate = true;
1363 OrderAccess::fence(); // ensure WatcherThread sees update in main loop
1364
1365     WatcherThread* watcher = watcher_thread();
1366     if (watcher != NULL)
1367       watcher->unpark();
1368   }

It has the same fence() call and the same lock exit (on different
line numbers) so I'm wondering about the original thinking here.
I still haven't heard back from Rickard...




java.cpp:

Typo: likelyhood -> likelihood

Fixed.


---

task.cpp

 81 int PeriodicTask::time_to_wait() {
 82   assert(Thread::current()->is_Watcher_thread(), "must be
WatcherThread");

This is currently true but not sure it has to be true. If we are
assuming/constraining a subset of the task API to only be callable by
the WatcherThread then perhaps we should document that in task.hpp ?
And as the WatcherThread is a friend, none of those methods need to be
public - ie execute_if_pending and time_to_wait (realtime_tick is
already private).

I was too focused on adding asserts that enforced how it works today
and not on how it might be used down the road. There's no reason to
restrict the caller of time_to_wait() to the WatcherThread. I've
deleted the assert on line 82 and I added a comment to task.hpp:

   // Requires the PeriodicTask_lock.
   static int time_to_wait();

By leaving time_to_wait() public, that allows code other than the
WatcherThread to use it perhaps for debugging...

Okay. I still think the API is somewhat confused about its public interface - execute_if_pending should be protected for use by WT only (and subclasses implement it of course).

Yes, but I think I'll leave that for another bug (probably to
be fixed by someone else).




112 void PeriodicTask::enroll() {
 113   // Follow normal safepoint aware lock enter protocol if the
caller does
 114   // not already own the PeriodicTask_lock. Otherwise, we don't
try to
 115   // enter it again to avoid deadlocking with a safepoint that
started
 116   // after the initial enter and before this enter.
 117   //
 118   MutexLockerEx ml(PeriodicTask_lock->owned_by_self() ? NULL
 119                                                       :
PeriodicTask_lock);

The deadlock is not with a safepoint interaction but the normal
self-deadlock that would occur trying to lock() a Monitor/Mutex that
you already own. (Monitors/Mutexes do not support recursive locking -
in contrast to ObjectMonitors.) Ditto for disenroll.

I have been working on Java Monitors for waaaayyyy too long!

:)

In the header comment for src/share/vm/runtime/mutex.cpp:

//   Native Monitors do *not* support nesting or recursion but ...


The original comment is the correct one with respect to acquiring with
a safepoint check if not already locked:

/* XXX could be called from a JavaThread, so we have to check for
 * safepoint when taking the lock to avoid deadlocking */

I'm still not fond of that comment. Perhaps:

// Follow normal safepoint aware lock enter protocol if the caller does
   // not already own the PeriodicTask_lock. Otherwise, we don't try to
// enter it again because VM internal Mutexes do not support recursion.

I'm trying to make it clear that we're following safepoint aware protocol
even though we're using MutexLockerEx. I also wanted a clear statement
about why we skipped the enter if we already owned the lock. I don't
think that we could be called from a JavaThread to be the important part.

Being a JavaThread is the only reason the safepoint protocol has to be considered - for non-JavaThreads we won't look at the safepoint state.

The enroll() and disenroll() methods are called from the JavaThreads and they can not be considered "leaf" methods, so they must obey the safepoint protocol wrt the lock.

Anyway your comment is fine.

Thanks.




---

thread.cpp:

1198 int WatcherThread::sleep() const {
1199   // The WatcherThread does not honor the safepoint protocol for
1200   // PeriodicTask_lock in order to provide more consistent task
1201   // execution timing.

Not sure that is really the correct characterisation. The WT doesn't
need to honor the safepoint protocol because it isn't a JavaThread and
so won't block even if a safepoint is in progress. So checking for a
safepoint is simply a waste of time.

Much better way of putting it. Thanks! How about:

   // The WatcherThread is not a JavaThread so we do not honor the
   // safepoint protocol for the PeriodicTask_lock.

Ok.

Thanks.





3565     MutexLocker ml(PeriodicTask_lock);

Why did you allow for a safepoint check here? It would only be
necessary if a JavaThread might concurrently acquire the
PeriodicTask_lock and a safepoint is initiated - during VM
initialization. Not completely impossible I guess but have never heard
of this occurring, so would require some other startup change to
trigger it.

There's no good reason to use MutexLockerEx here and I didn't
want to write a comment explaining why we were using one.
Earlier code in Threads::create_vm() uses MutexLocker instead
of MutexLockerEx.

Okay. Change is harmless.

Cool. I think we're on the same page.

Thanks again for the reviews.

Dan

Cheers,
David


Thanks again for the very thorough review!

Dan



Thanks,
David
-----

Dan



Carsten

On Tue, Feb 17, 2015 at 4:44 PM, Daniel D. Daugherty
<daniel.daughe...@oracle.com <mailto:daniel.daughe...@oracle.com>>
wrote:

    Greetings,

    My fix for the following bug:

        JDK-8047720 Xprof hangs on Solaris

    that was pushed to JDK9 last June needs to be cleaned up.

    Thanks to Alex Garthwaite (agarthwa...@twitter.com
    <mailto:agarthwa...@twitter.com>) and Carsten
    Varming (varm...@gmail.com <mailto:varm...@gmail.com>) for
    reporting the mess that I made
    in WatcherThread::stop() and for suggesting fixes.

This code review is for a general cleanup pass on PeriodicTask_lock and some of the surrounding code. This is a targeted review in that
    I would like to hear from three groups of people:

    1) The author and reviewers for:

JDK-7127792 Add the ability to change an existing PeriodicTask's
                   execution interval

       Rickard, David H, and Markus G.

    2) The reviewers for:

       JDK-8047720 Xprof hangs on Solaris

       Markus G and Coleen

    3) Alex and Carsten


    Here's the webrev URL:

http://cr.openjdk.java.net/~dcubed/8072439-webrev/0-for_jdk9_hs_rt/
<http://cr.openjdk.java.net/%7Edcubed/8072439-webrev/0-for_jdk9_hs_rt/>

    I've attached the original RFR for JDK-8047720 that explains
    the original deadlock that was being fixed. Similar testing
    will be done with this fix.

    Dan





Reply via email to