Re: [pkg-discuss] [REVIEW] Image Locking (bug 1668)

johansen Wed, 13 Jan 2010 18:49:11 -0800

On Wed, Jan 13, 2010 at 08:30:59PM -0600, Shawn Walker wrote:
> >unnecessarily.  Unless I've misunderstood, can't we have the NRLock do
> >double-duty as the enforcer of recursive image-locking, and as the
> >object that serializes concurrent access to the image?
> 
> There's two separate locks here.  There's an API lock that the API
> uses for its own objects, and then there's a image lock that's
> needed for changing the image.
> 
> Acquiring an activity lock does not imply the image is locked, and
> vice-versa.


It would be helpful then, if we could document exactly what the locking
protocol is for the API and the Image.  I had thought that the activity
lock was to protect the image against concurrent API accesses.  If it's
not, then, er, what is it used for?  Programmers need to be able to look
at this code and figure out what locks to take and when.  I think I have
a pretty decent understanding of this code, and even I'm confused about
when each of these should be taken, apparently.

> >>As Danek pointed out though, a caller may not want to wait forever.
> >>We don't know how clients will use the API, and so we can't guess
> >>what their use case is.
> >
> >It sounded like Danek was talking about the case where an unprivileged
> >process blocks out a privileged one that's trying to modify the image.
> 
> I look at it as the same reason why transport has a tuneable timeout
> mechanism.  If you don't have one, you're completely at the whims of
> the underlying OS' default timeout mechanism.  I, personally, would
> not want to call a function that blocks unless I had some reasonable
> guarantee that it wouldn't wait longer than a certain amount of
> time.

What do you do when you write multi-threaded C programs, then?  The
timeout for networking is necessary because the endpoint may have
dropped off the network.  We have no idea if it has vanished or not.  In
the case of using an intraprocess mutex or a file lock, process death
isn't a concern.  In the former your address space goes away, in the
latter the OS releases the locks for you.  

> I have run into bugs in the API in the past where it wasn't
> releasing a lock and then would wait forever (I gave up after a
> minute or so) while attempting to acquire a lock.
> 
> My fear is that blocking on the lock will lead to unfortunate issues
> where the client just hangs waiting because of some locking bug.

That's a reasonable concern, but instead of working around the problem
with a timeout, we should track down the bug that's causing the
deadlock.

> >>My use case was that I didn't want pkg(1) to immediately exit
> >>needlessly with a lock error, so the timeout was my attempt to
> >>mitigate that.  I just didn't want to shove the responsibility of
> >>timeout code onto callers and felt it was nice to place it into the
> >>api instead.
> >>
> >>It was my earnest attempt to slightly reduce potential user
> >>frustration of "it said it was locked by pid XXX, but I don't see
> >>one!".  In other words, trying to avoid the "execute pkg in a loop
> >>until the lock is available."
> >
> >IMO, you can still do this.  In the CLI it should be perfectly okay to
> >block.  You can perform a trylock first, and if that fails, print a
> >message saying, "Pid XXX holds this lock.  We're waiting for it.  Press
> >CTRL-C or kill the offending process if you don't feel like waiting."
> 
> I don't think that behaviour lends itself well to automated
> scripting or the test suite.  If we discover that we need this
> enhancement, we can do so later.
> 
> I'm just not comfortable implementing that kind of behaviour right now.

You could pass a --no-wait option to image modifying commands that are
run from the test suite if you're worried about them blocking during an
automated test run.

> >>I still don't agree that a timeout isn't potentially useful, but
> >>since there appear to be no firm, existing use cases for it I won't
> >>keep the logic.  I'll post a new webrev shortly that always attempts
> >>a non-blocking lock and fails immediately if it can't acquire one.
> >
> >I don't think you want a non-blocking only approach.  Let the caller
> >decide whether to block.  IMO, the GUI should use this in non-blocking
> >mode unless they're prepared to wait.  The CLI can block and let the
> >user intervene if they're tired of waiting.
> 
> For now, I don't have sufficient motivation to provide control over
> blocking behaviour.  If we find that clients need it, we can add it
> later.

If you want to prevent clients from having to re-run the pkg client in a
loop, it actually makes sense to implement blocking.  Clients that know
they want to wait can, and those that would rather return an error
quickly can take the non-blocking route.

-j
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Re: [pkg-discuss] [REVIEW] Image Locking (bug 1668)

Reply via email to