[nwam-dev] seb-092 CR requested

Michael Hunter Mon, 28 Sep 2009 16:11:23 -0700

On Mon, 28 Sep 2009 15:38:38 +0100
Alan Maguire <Alan.Maguire at Sun.COM> wrote:


[...]
> > See my comment in nwamd_object_decref() about maintaining the locking
> > hierarchy between acquiring the list and locking the object.  That is
> > the thing to review in the new webrev.
> >
> > /net/coupe.eng/builds/mph/nwam1_work/webrev
> >
> >   
> A few questions
> 
> - Just so I follow: in nwamd_object_find() we spin
> until we acquire the object lock and then increase the
> refcount, all while holding the object list lock.
> Another thread doing a release_and_destroy()
> on the object will - if the refcount has fallen to
> 0 - first bump the refcount to 1, unlock the object
> then lock the object list and the object. While holding
> the object list and object locks we decrease the
> refcount by 1 and if it reaches 0, call fini() on the
> object. So because the reference
> count is > 0 for the destroying thread, we are
> guaranteed that another thread cannot destroy the
> object in the interim, and because destruction is
> predicated on having the object list lock, a thread that
> was waiting in nwamd_object_find() on the object
> lock will proceed first.

yes

> 
> - do we need to ensure that multiple release_and_destroy()s
> don't happen (leading to a refcount of < 0)? I can't
> see any potential for this occuring in the daemon, but maybe
> we could state this explicitly as an assumption?

That's a logic error no different from a multiple free.  I'll add
asserts to catch the refcount going negative.  We can catch it but I
think fixing it much like just ignoring multiple free is a mistake.

> 
> - not something we're attempting to address here I know,
> but do you have any thoughts on how we can increase
> concurrency in this object model? One problem we have
> at present is that if thread A holds the object lock for
> a given object, and thread B wishes to acquire the same
> object, it will hold the object list lock while it waits on the
> object lock. As a consequence, no other threads can access
> the object list (and hence any other objects) while this
> is occurring.

The list lock is a read/writer lock.  nwamd_object_find() only grabs
the reader lock.  We only grab the list writer lock when we want to
add/remove an element to/from the list.  The points at which we are
adding/destroying objects should be at points that a little contention
isn't a big deal.

> You can see this in practice with the doors
> thread when nwamd is busy and we attempt an
> "nwamadm list".

If nwamd is just busy doing its normal thing and objects are not being
added/destroyed this should't happen.  We should use dtrace to see
where the contention is.

> I wonder if we can use the reference
> counting approach to alleviate this? What if we took
> an additional step and got rid of the object mutex
> altogether? Locking an object simply reduces to
> locking the object list, searching it, and when we
> find the object we just bump its reference count
> (using atomic_add_int()). Destroying it simly reduces
> to locking the object list, and if the object reference
> count is 0 after decreasing the refcount, we remove
> the object. What do you think?

I can't see how we can maintain object integrity this way.  Right now
we grab the object mutex and hold it whenever we are manipulating the
object.  If we just used reference counts then data (e.g. state) could
change in the middle of an algorithm.  That won't work.  We could
change the object mutex to be a reader/writer lock and then only grab
the writer lock if we intended to change the object.  I suspect that
won't give us as much gain as you might expect as we often enter a
function with the intent to examine an object and then based on that
information change the object.  That would have to start out as a
writer lock or reevaluate after getting the writer lock.

We should really gather data about what is causing the contention
before deciding on changing any of this.  My experience is that often
this kind of thing boils down to something other then originally
expected.

                mph

> 
> Thanks!
> 
> Alan

[nwam-dev] seb-092 CR requested

Reply via email to