Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Tom Benson Fri, 28 Aug 2015 08:10:12 -0700

Hi,
One more pair of eyes on this.  8^)

On 8/27/2015 8:16 PM, Kim Barrett wrote:

On Aug 27, 2015, at 5:42 PM, Daniel D. Daugherty <daniel.daughe...@oracle.com> 
wrote:

Sorry for starting another e-mail thread fork in an already complicated
review...

OK, that was fascinating.  No, really, I mean it.


It made me realize that we've been arguing and talking past each other
in part because we're really dealing with two distinct though closely
related bugs here.

I've been primarily thinking about the case where we're calling
vm_abort / os::abort, where the we presently delete the PerfData
memory even though there can be arbitrary other threads running.  This
was the case in JDK-8129978, which is how I got involved here in the
first place.  In that bug we were in vm_exit_during_initialization and
had called perfMemory_exit when some thread attempted to inflate a
monitor (which is not one of the conflicting cases discussed by Dan).

The problem Dan has been looking at, JDK-8049304, is about a "normal"
VM shutdown.  In this case, the problem is that we believe it is safe
to delete the PerfData, because we've safepointed, and yet some thread
unexpectedly runs and attempts to touch the deleted data anyway.

I think Dan's proposed fix (mostly) avoids the specific instance of
JDK-8129978, but doesn't solve the more general problem of abnormal
exit deleting the PerfData while some running thread is touching some
non-monitor-related part of that data.  My proposal to leave it to the
OS to deal with memory cleanup on process exit would deal with this
case.

I think Dan's proposed fix (mostly) avoids problems like JDK-8049304.
And the approach I've been talking about doesn't help at all for this
case.  But I wonder if Dan's proposed fix can be improved.  A "futile
wakeup" case doesn't seem to me like one which requires super-high
performance.  Would it be ok, in the two problematic cases that Dan
identified, to use some kind of atomic / locking protocol with the
cleanup?  Or is the comment for the counter increment in EnterI (and
only there) correct that it's important to avoid a lock or atomics
here (and presumably in ReenterI too).

I notice that EnteriI/ReenterI both end with OrderAccess::fence(). Canthe potential update of _sync_FutileWakeups be delayed until that point,to take advantage of the fence to make the sync hole even smaller?You've got a release() (and and short nap!) with the store inPerfDataManager::destroy() to try to close the window somewhat.

But I think rather than the release_store() you used, you want a store,followed by a release(). release_store() puts a fence before the storeto ensure earlier updates are seen before the current one, no?

Also, I think the comment above that release_store() could beclarified. It is fine as is if you're familiar with this bug report anddiscussion, but... I think it should explicitly say there is still avery small window for the lack of true synchronization to cause afailure. And perhaps that the release_store() (or store/release()) isnot half of an acquire/release pair.

Tom

Re: RFR (S) 8049304: race between VM_Exit and _sync_FutileWakeups->inc()

Reply via email to