Jim Rees wrote:
>   Yeah, the pstack output I have shows the CheckHost thread being idle at 
>   the time, so it might not be that.
> 
> It's idle, but it's holding a lock on the host it's trying to talk to, and
> has dropped the host hash lock (H_LOCK).  This is backwards.  The
> speculation is that this is causing a deadlock.

Dropping the H_LOCK before dropping h_Lock_r is not a problem.   What
would be a problem is attempting to obtain H_LOCK while already holding
h_Lock_r.

If you find yourself in this situation it means that the code

(1) must drop h_Lock_r, obtain H_LOCK, obtain h_Lock_r

(2) the above produces a race condition when h_Lock_r is dropped, so you
must validate that the state of the world is as you think it is supposed
to be and handle any potential changes that may have occurred

(3) an alternative solution should be examined to see if it can be
implemented in a different manner that doesn't require the additional
locks.   in most cases H_LOCK is being held to prevent the host object
from being deleted while it is in use.  This usage should be re-written
as obtain H_LOCK, increment the reference count on the host object,
release H_LOCK, do the work that must be done, obtain H_LOCK, decrement
the reference count, and release H_LOCK.   If done in this manner, it
is possible to avoid the need to hold both locks at the same time.

Jeffrey Altman

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to