On Fri, Jan 15, 2010 at 12:46:44AM -0600, Shawn Walker wrote:
> On 01/14/10 05:37 PM, Shawn Walker wrote:
> >On 01/13/10 08:40 PM, Shawn Walker wrote:
> >>On 01/13/10 06:42 PM, Shawn Walker wrote:
> >>>On 01/13/10 02:07 PM, Shawn Walker wrote:
> >>>>Greetings,
> >>>>
> >>>>The following webrev contains fixes for the following issue:
> >>>>
> >>>>1668 image locking needed to prevent race conditions
> >>>
> >>>1st webrev:
> >>>http://cr.opensolaris.org/~swalker/pkg-lock/
> >>>
> >>>new webrev (diff against 1st):
> >>>http://cr.opensolaris.org/~swalker/pkg-lock-2/
> >>
> >>new webrev (diff against 2nd):
> >>http://cr.opensolaris.org/~swalker/pkg-lock-3/
> >
> >new webrev (last i hope?) (diff against 3rd):
> >http://cr.opensolaris.org/~swalker/pkg-lock-4/
>
> new webrev (diff against 4th):
> http://cr.opensolaris.org/~swalker/pkg-lock-5/
>
> full webrev (complete set of changes):
> http://cr.opensolaris.org/~swalker/pkg-lock-5-full/
>
> Change Summary:
> * changed unlock to truncate lock file instead of removing since the
> lock file itself is only used so that it has something to use
> lockf() on and to store information about the caller who acquired
> the lock
>
> * added iso-8601 utc timestamp to lock file to aid debugging in
> trying to determine how long a lock has been held
>
> * updated unit tests to account for truncation/size checks
>
src/modules/client/api_errors.py
- i'm not a python expert, so perhaps this is ok, but all the __init__
parameters to ImageLockedError default to None, and when returning
an error string you only check if pid and pid_name are None.
shouldn't you also check if hostname is None?
src/modules/client/image.py
- you're privous comments regarding Image.__lock indicate that it is a
debug mechanism used to prevent recursive lock acquisition within a
process. if my understanding is correct, the following code is
incorrect:
# First, attempt to obtain a thread lock.
if not self.__lock.acquire(blocking=blocking):
raise api_errors.ImageLockedError()
with this code you will:
- fail if a thread tries to lock an image twice. good.
- if another thread has already locked the image and
blocking is true, you will hang. bad.
- if another thread has already locked the image and
blocking is false, you will return ImageLockedError.
also bad.
perhaps the following would be better:
# sanity check, detect multiple image lock attempts
if not self.__lock.acquire(blocking=False):
raise AssertionError "oops..."
- could you file a bug on the lack of read operation locking?
i think that the concerns that danek raised could be addressed by
having a revokable read lock.
for example, a client wanting to aquire a read operation could do:
- a non blocking open of the fifo: /var/pkg/lock_read
- aquire a read lock on the fifo
- create a thread that does a read from the fifo and blocks
- if this blocked thread ever wakes up, it should kill the
current process with an error indicating that the lock was
lost.
if someone wants to aquire a write lock, they aquire the write lock as
you've written it today. then they do:
- create a new read lock fifo /var/pkg/lock_read.$$
- open it and get a write lock on it.
- open the old read lock fifo.
- move the new read lock fifo over the old one.
- write a byte of data to the old fifo fd and close it.
this is a seat of the pants design idea, it may have problems.
but you get the general idea.
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss