On Fri, 25 Sep 2009 11:44:29 +0100
Alan Maguire <Alan.Maguire at Sun.COM> wrote:

> Michael Hunter wrote:
> > On Thu, 24 Sep 2009 08:24:03 -0700
> > Michael Hunter <Michael.Hunter at Sun.COM> wrote:
> >
> > [...]
> >   
> >> I think the real way to do this is to reference count anything that can
> >> be held by multiple threads.  That'll also fan out through the code but
> >> there are fewer error paths and we can prove it works fairly easily.
> >>     
> >
> > New rough webrev at /net/coupe.eng/builds/mph/nwam1_work/webrev as we
> > talked about this morning.
> >
> >   
> Wow, that was quick! So under this reference
> counting scheme, on creation we set the refcount
> to 2, since the list has a reference to the object along
> with the reference derived from creating it in a locked
> state. When we are finished with objects, we drop the
> reference count by 1 since we have one less reference,
> but when we destroy an object we drop the reference
> count by 2 since we're not only intending to let our
> reference to it go but also the reference derived from
> the object list. If the count is 0 at this point, we call
> nwamd_object_fini() which will actually remove the
> object.

yes

> 
> I think I see a potential deadlock during object
> destruction - if thread A is trying to destroy the
> object, it holds the object lock. Meanwhile, thread
> B is spinning on the object lock waiting
> to increase the refcnt in nwamd_object_find().
> It holds the object list lock, but not the object
> lock. Meanwhile, thread A holds the object
> lock but is attempting to acquire the object
> list lock to walk the list and remove the object
> (via nwamd_object_fini()).

*blech* yes

> 
> If this is an issue, I _think_ we could get around
> it by manipulating the refcount before attempting
> to acquire the lock in nwamd_object_find() using
> atomic_add_XXX() (and using the atomic operation
> for other increases/decreases). This would mean
> that we're registered an interest in acquiring the object
> so the refcount will not be 0 and we won't attempt
> to fini() the object, just release it, and thread B
> in the above example would then be responsible
> for fini()'ing it. As I say, I may be missing something
> here - let me know what you think. Thanks!

See my comment in nwamd_object_decref() about maintaining the locking
hierarchy between acquiring the list and locking the object.  That is
the thing to review in the new webrev.

/net/coupe.eng/builds/mph/nwam1_work/webrev

                mph

> 
> Alan

Reply via email to