Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Kevin Wolf
Am 28.04.2017 um 17:30 hat Fam Zheng geschrieben:
> On Fri, 04/28 15:45, Kevin Wolf wrote:
> > Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> > > This extends the permission bits of op blocker API to external using
> > > Linux OFD locks.
> > > 
> > > Each permission in @perm and @shared_perm is represented by a locked
> > > byte in the image file.  Requesting a permission in @perm is translated
> > > to a shared lock of the corresponding byte; rejecting to share the same
> > > permission is translated to a shared lock of a separate byte. With that,
> > > we use 2x number of bytes of distinct permission types.
> > > 
> > > virtlockd in libvirt locks the first byte, so we do locking from a
> > > higher offset.
> > > 
> > > Suggested-by: Kevin Wolf 
> > > Signed-off-by: Fam Zheng 
> > 
> > >  BlockDriver bdrv_file = {
> > >  .format_name = "file",
> > >  .protocol_name = "file",
> > > @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
> > >  .bdrv_get_info = raw_get_info,
> > >  .bdrv_get_allocated_file_size
> > >  = raw_get_allocated_file_size,
> > > -
> > > +.bdrv_inactivate = raw_inactivate,
> > > +.bdrv_invalidate_cache = raw_invalidate_cache,
> > > +.bdrv_check_perm = raw_check_perm,
> > > +.bdrv_set_perm   = raw_set_perm,
> > > +.bdrv_abort_perm_update = raw_abort_perm_update,
> > >  .create_opts = _create_opts,
> > >  };
> > 
> > By the way, is it intentional that we apply locking only to bdrv_file,
> > but not to bdrv_host_device or bdrv_host_cdrom?
> 
> Good question.
> 
> Though CDROM is not very interesting, I am not sure about bdrv_host_device:
> 
> 1) On the one hand, a host device can contain a qcow2 image so maybe locking 
> is
> useful.  Another reason to lock is that they share the same QAPI option,
> BlockdevOptionsFile. That last reason is, it should be very easy to add it.
> 
> 2) On the other hand, unlike files, host devices get pretty high chances in
> being accesses by multiple readers/writers, outside of QEMU, such as partition
> detection, mount, fsck, etc.
> 
> What is your opinion?

I would add it, if nothing else to be consistent with regular files. You
don't even need qcow2 on a block device for it to be useful, even for
raw images the guest may not be able to tolerate a second writer (in
this case, share-rw of the qdev device will directly control the locking
mode).

As for 2), I don't think these other users are a problem because we're
only taking shared locks. We'll prevent other users from taking an
exclusive lock, but that's exactly right because it's true that we're
using the image.

Kevin



Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Fam Zheng
On Fri, 04/28 15:45, Kevin Wolf wrote:
> Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> > This extends the permission bits of op blocker API to external using
> > Linux OFD locks.
> > 
> > Each permission in @perm and @shared_perm is represented by a locked
> > byte in the image file.  Requesting a permission in @perm is translated
> > to a shared lock of the corresponding byte; rejecting to share the same
> > permission is translated to a shared lock of a separate byte. With that,
> > we use 2x number of bytes of distinct permission types.
> > 
> > virtlockd in libvirt locks the first byte, so we do locking from a
> > higher offset.
> > 
> > Suggested-by: Kevin Wolf 
> > Signed-off-by: Fam Zheng 
> 
> >  BlockDriver bdrv_file = {
> >  .format_name = "file",
> >  .protocol_name = "file",
> > @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
> >  .bdrv_get_info = raw_get_info,
> >  .bdrv_get_allocated_file_size
> >  = raw_get_allocated_file_size,
> > -
> > +.bdrv_inactivate = raw_inactivate,
> > +.bdrv_invalidate_cache = raw_invalidate_cache,
> > +.bdrv_check_perm = raw_check_perm,
> > +.bdrv_set_perm   = raw_set_perm,
> > +.bdrv_abort_perm_update = raw_abort_perm_update,
> >  .create_opts = _create_opts,
> >  };
> 
> By the way, is it intentional that we apply locking only to bdrv_file,
> but not to bdrv_host_device or bdrv_host_cdrom?

Good question.

Though CDROM is not very interesting, I am not sure about bdrv_host_device:

1) On the one hand, a host device can contain a qcow2 image so maybe locking is
useful.  Another reason to lock is that they share the same QAPI option,
BlockdevOptionsFile. That last reason is, it should be very easy to add it.

2) On the other hand, unlike files, host devices get pretty high chances in
being accesses by multiple readers/writers, outside of QEMU, such as partition
detection, mount, fsck, etc.

What is your opinion?

Fam



Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-28 Thread Kevin Wolf
Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> This extends the permission bits of op blocker API to external using
> Linux OFD locks.
> 
> Each permission in @perm and @shared_perm is represented by a locked
> byte in the image file.  Requesting a permission in @perm is translated
> to a shared lock of the corresponding byte; rejecting to share the same
> permission is translated to a shared lock of a separate byte. With that,
> we use 2x number of bytes of distinct permission types.
> 
> virtlockd in libvirt locks the first byte, so we do locking from a
> higher offset.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Fam Zheng 

>  BlockDriver bdrv_file = {
>  .format_name = "file",
>  .protocol_name = "file",
> @@ -1977,7 +2234,11 @@ BlockDriver bdrv_file = {
>  .bdrv_get_info = raw_get_info,
>  .bdrv_get_allocated_file_size
>  = raw_get_allocated_file_size,
> -
> +.bdrv_inactivate = raw_inactivate,
> +.bdrv_invalidate_cache = raw_invalidate_cache,
> +.bdrv_check_perm = raw_check_perm,
> +.bdrv_set_perm   = raw_set_perm,
> +.bdrv_abort_perm_update = raw_abort_perm_update,
>  .create_opts = _create_opts,
>  };

By the way, is it intentional that we apply locking only to bdrv_file,
but not to bdrv_host_device or bdrv_host_cdrom?

Kevin



Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-27 Thread Fam Zheng
On Wed, 04/26 16:22, Kevin Wolf wrote:
> > @@ -455,6 +473,21 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> > *options,
> >  }
> >  s->fd = fd;
> >  
> > +s->lock_fd = -1;
> > +fd = qemu_open(filename, O_RDONLY);
> 
> Note that with /dev/fdset there can be cases where we can open a file
> O_RDWR, but not O_RDONLY. Should we better just use the same flags as
> for the s->fd?

Makes sense.

> 
> > +if (fd < 0) {
> > +if (RAW_LOCK_SUPPORTED) {
> > +ret = -errno;
> > +error_setg_errno(errp, errno, "Could not open '%s' for 
> > locking",
> > + filename);
> > +qemu_close(s->fd);
> > +goto fail;
> > +}
> > +}
> 
> You still open the fd and possibly error out even with s->use_lock ==
> false. Should the code starting from qemu_open() to here be conditional
> on s->use_lock?

Yes.

> 
> > +s->lock_fd = fd;
> > +s->perm = 0;
> > +s->shared_perm = BLK_PERM_ALL;
> > +
> >  #ifdef CONFIG_LINUX_AIO
> >   /* Currently Linux does AIO only for files opened with O_DIRECT */
> >  if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
> > @@ -542,6 +575,156 @@ static int raw_open(BlockDriverState *bs, QDict 
> > *options, int flags,
> >  return raw_open_common(bs, options, flags, 0, errp);
> >  }
> >  
> > +typedef enum {
> > +RAW_PL_PREPARE,
> > +RAW_PL_COMMIT,
> > +RAW_PL_ABORT,
> > +} RawPermLockOp;
> > +
> > +/* Lock wanted bytes by @perm and ~@shared_perm in the file; if @unlock ==
> 
> This function doesn't take @perm or @shared_perm. This comment is
> especially confusing because shared_perm_lock_bits == ~shared_perm. We
> should be explicit here that shared_perm_lock_bits is the mask of all
> permissions that _cannot_ be shared.

Will update the comment.

> 
> > + * true, also unlock the unneeded bytes. */
> > +static int raw_apply_lock_bytes(BDRVRawState *s,
> > +uint64_t perm_lock_bits,
> > +uint64_t shared_perm_lock_bits,
> > +bool unlock, Error **errp)
> > +{
> > +int ret;
> > +int i;
> > +
> > +for (i = 0; i < BLK_PERM_MAX; ++i) {
> > +int off = RAW_LOCK_PERM_BASE + i;
> > +if (perm_lock_bits & (1ULL << i)) {
> > +ret = qemu_lock_fd(s->lock_fd, off, 1, false);
> > +if (ret) {
> > +error_setg(errp, "Failed to lock byte %d", off);
> 
> Should bdrv_perm_names() be used in this function, too? Maybe not that
> important because we never expect this to fail (we don't do any
> exclusive locks).

Ineed, so going a bit of low level is probably better when it does fail. :)

> 
> > +return ret;
> > +}
> > +} else if (unlock) {
> > +ret = qemu_unlock_fd(s->lock_fd, off, 1);
> > +if (ret) {
> > +error_setg(errp, "Failed to unlock byte %d", off);
> > +return ret;
> > +}
> > +}
> > +}
> > +for (i = 0; i < BLK_PERM_MAX; ++i) {
> > +int off = RAW_LOCK_SHARED_BASE + i;
> > +if (shared_perm_lock_bits & (1ULL << i)) {
> > +ret = qemu_lock_fd(s->lock_fd, off, 1, false);
> > +if (ret) {
> > +error_setg(errp, "Failed to lock byte %d", off);
> > +return ret;
> > +}
> > +} else if (unlock) {
> > +ret = qemu_unlock_fd(s->lock_fd, off, 1);
> > +if (ret) {
> > +error_setg(errp, "Failed to unlock byte %d", off);
> > +return ret;
> > +}
> > +}
> > +}
> > +return 0;
> > +}
> 
> Note: If this function returns an error, we may have acquired some of
> the locks. The caller probably wants to call it again to reset the
> permissions to the old mask.

I think callers do.

> 
> > +/* Check "unshared" bytes implied by @perm and ~@shared_perm in the file. 
> > */
> > +static int raw_check_lock_bytes(BDRVRawState *s,
> > +uint64_t perm, uint64_t shared_perm,
> > +Error **errp)
> > +{
> > +int ret;
> > +int i;
> > +
> > +for (i = 0; i < BLK_PERM_MAX; ++i) {
> > +int off = RAW_LOCK_SHARED_BASE + i;
> > +uint64_t p = 1ULL << i;
> > +if (perm & p) {
> > +ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
> > +if (ret) {
> > +error_setg(errp,
> > +   "Failed to check byte %d for \"%s\" permission",
> > +   off, bdrv_perm_names(p));
> 
> bdrv_perm_names() returns a malloced string, which is leaked here.

Neglected that, will fix.

> 
> > +error_append_hint(errp,
> > +  "Is another process using the image?\n");
> 
> We probably need to tweak the error messages a bit. When I saw this one
> this morning, I felt that 

Re: [Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-26 Thread Kevin Wolf
Am 26.04.2017 um 05:34 hat Fam Zheng geschrieben:
> This extends the permission bits of op blocker API to external using
> Linux OFD locks.
> 
> Each permission in @perm and @shared_perm is represented by a locked
> byte in the image file.  Requesting a permission in @perm is translated
> to a shared lock of the corresponding byte; rejecting to share the same
> permission is translated to a shared lock of a separate byte. With that,
> we use 2x number of bytes of distinct permission types.
> 
> virtlockd in libvirt locks the first byte, so we do locking from a
> higher offset.
> 
> Suggested-by: Kevin Wolf 
> Signed-off-by: Fam Zheng 
> ---
>  block/file-posix.c | 267 
> -
>  1 file changed, 264 insertions(+), 3 deletions(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 2114720..b92fdc3 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -129,12 +129,28 @@ do { \
>  
>  #define MAX_BLOCKSIZE4096
>  
> +/* Posix file locking bytes. Libvirt takes byte 0, we start from higher 
> bytes,
> + * leaving a few more bytes for its future use. */
> +#define RAW_LOCK_PERM_BASE 100
> +#define RAW_LOCK_SHARED_BASE   200
> +#ifdef F_OFD_SETLK
> +#define RAW_LOCK_SUPPORTED 1
> +#else
> +#define RAW_LOCK_SUPPORTED 0
> +#endif
> +
>  typedef struct BDRVRawState {
>  int fd;
> +int lock_fd;
> +bool use_lock;
>  int type;
>  int open_flags;
>  size_t buf_align;
>  
> +/* The current permissions. */
> +uint64_t perm;
> +uint64_t shared_perm;
> +
>  #ifdef CONFIG_XFS
>  bool is_xfs:1;
>  #endif
> @@ -440,6 +456,8 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> *options,
>  }
>  s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
>  
> +s->use_lock = qemu_opt_get_bool(opts, "locking", true);
> +
>  s->open_flags = open_flags;
>  raw_parse_flags(bdrv_flags, >open_flags);
>  
> @@ -455,6 +473,21 @@ static int raw_open_common(BlockDriverState *bs, QDict 
> *options,
>  }
>  s->fd = fd;
>  
> +s->lock_fd = -1;
> +fd = qemu_open(filename, O_RDONLY);

Note that with /dev/fdset there can be cases where we can open a file
O_RDWR, but not O_RDONLY. Should we better just use the same flags as
for the s->fd?

> +if (fd < 0) {
> +if (RAW_LOCK_SUPPORTED) {
> +ret = -errno;
> +error_setg_errno(errp, errno, "Could not open '%s' for locking",
> + filename);
> +qemu_close(s->fd);
> +goto fail;
> +}
> +}

You still open the fd and possibly error out even with s->use_lock ==
false. Should the code starting from qemu_open() to here be conditional
on s->use_lock?

> +s->lock_fd = fd;
> +s->perm = 0;
> +s->shared_perm = BLK_PERM_ALL;
> +
>  #ifdef CONFIG_LINUX_AIO
>   /* Currently Linux does AIO only for files opened with O_DIRECT */
>  if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
> @@ -542,6 +575,156 @@ static int raw_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  return raw_open_common(bs, options, flags, 0, errp);
>  }
>  
> +typedef enum {
> +RAW_PL_PREPARE,
> +RAW_PL_COMMIT,
> +RAW_PL_ABORT,
> +} RawPermLockOp;
> +
> +/* Lock wanted bytes by @perm and ~@shared_perm in the file; if @unlock ==

This function doesn't take @perm or @shared_perm. This comment is
especially confusing because shared_perm_lock_bits == ~shared_perm. We
should be explicit here that shared_perm_lock_bits is the mask of all
permissions that _cannot_ be shared.

> + * true, also unlock the unneeded bytes. */
> +static int raw_apply_lock_bytes(BDRVRawState *s,
> +uint64_t perm_lock_bits,
> +uint64_t shared_perm_lock_bits,
> +bool unlock, Error **errp)
> +{
> +int ret;
> +int i;
> +
> +for (i = 0; i < BLK_PERM_MAX; ++i) {
> +int off = RAW_LOCK_PERM_BASE + i;
> +if (perm_lock_bits & (1ULL << i)) {
> +ret = qemu_lock_fd(s->lock_fd, off, 1, false);
> +if (ret) {
> +error_setg(errp, "Failed to lock byte %d", off);

Should bdrv_perm_names() be used in this function, too? Maybe not that
important because we never expect this to fail (we don't do any
exclusive locks).

> +return ret;
> +}
> +} else if (unlock) {
> +ret = qemu_unlock_fd(s->lock_fd, off, 1);
> +if (ret) {
> +error_setg(errp, "Failed to unlock byte %d", off);
> +return ret;
> +}
> +}
> +}
> +for (i = 0; i < BLK_PERM_MAX; ++i) {
> +int off = RAW_LOCK_SHARED_BASE + i;
> +if (shared_perm_lock_bits & (1ULL << i)) {
> +ret = qemu_lock_fd(s->lock_fd, off, 1, false);
> +if (ret) {
> +

[Qemu-devel] [PATCH v15 20/21] file-posix: Add image locking to perm operations

2017-04-25 Thread Fam Zheng
This extends the permission bits of op blocker API to external using
Linux OFD locks.

Each permission in @perm and @shared_perm is represented by a locked
byte in the image file.  Requesting a permission in @perm is translated
to a shared lock of the corresponding byte; rejecting to share the same
permission is translated to a shared lock of a separate byte. With that,
we use 2x number of bytes of distinct permission types.

virtlockd in libvirt locks the first byte, so we do locking from a
higher offset.

Suggested-by: Kevin Wolf 
Signed-off-by: Fam Zheng 
---
 block/file-posix.c | 267 -
 1 file changed, 264 insertions(+), 3 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 2114720..b92fdc3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -129,12 +129,28 @@ do { \
 
 #define MAX_BLOCKSIZE  4096
 
+/* Posix file locking bytes. Libvirt takes byte 0, we start from higher bytes,
+ * leaving a few more bytes for its future use. */
+#define RAW_LOCK_PERM_BASE 100
+#define RAW_LOCK_SHARED_BASE   200
+#ifdef F_OFD_SETLK
+#define RAW_LOCK_SUPPORTED 1
+#else
+#define RAW_LOCK_SUPPORTED 0
+#endif
+
 typedef struct BDRVRawState {
 int fd;
+int lock_fd;
+bool use_lock;
 int type;
 int open_flags;
 size_t buf_align;
 
+/* The current permissions. */
+uint64_t perm;
+uint64_t shared_perm;
+
 #ifdef CONFIG_XFS
 bool is_xfs:1;
 #endif
@@ -440,6 +456,8 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 s->use_linux_aio = (aio == BLOCKDEV_AIO_OPTIONS_NATIVE);
 
+s->use_lock = qemu_opt_get_bool(opts, "locking", true);
+
 s->open_flags = open_flags;
 raw_parse_flags(bdrv_flags, >open_flags);
 
@@ -455,6 +473,21 @@ static int raw_open_common(BlockDriverState *bs, QDict 
*options,
 }
 s->fd = fd;
 
+s->lock_fd = -1;
+fd = qemu_open(filename, O_RDONLY);
+if (fd < 0) {
+if (RAW_LOCK_SUPPORTED) {
+ret = -errno;
+error_setg_errno(errp, errno, "Could not open '%s' for locking",
+ filename);
+qemu_close(s->fd);
+goto fail;
+}
+}
+s->lock_fd = fd;
+s->perm = 0;
+s->shared_perm = BLK_PERM_ALL;
+
 #ifdef CONFIG_LINUX_AIO
  /* Currently Linux does AIO only for files opened with O_DIRECT */
 if (s->use_linux_aio && !(s->open_flags & O_DIRECT)) {
@@ -542,6 +575,156 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return raw_open_common(bs, options, flags, 0, errp);
 }
 
+typedef enum {
+RAW_PL_PREPARE,
+RAW_PL_COMMIT,
+RAW_PL_ABORT,
+} RawPermLockOp;
+
+/* Lock wanted bytes by @perm and ~@shared_perm in the file; if @unlock ==
+ * true, also unlock the unneeded bytes. */
+static int raw_apply_lock_bytes(BDRVRawState *s,
+uint64_t perm_lock_bits,
+uint64_t shared_perm_lock_bits,
+bool unlock, Error **errp)
+{
+int ret;
+int i;
+
+for (i = 0; i < BLK_PERM_MAX; ++i) {
+int off = RAW_LOCK_PERM_BASE + i;
+if (perm_lock_bits & (1ULL << i)) {
+ret = qemu_lock_fd(s->lock_fd, off, 1, false);
+if (ret) {
+error_setg(errp, "Failed to lock byte %d", off);
+return ret;
+}
+} else if (unlock) {
+ret = qemu_unlock_fd(s->lock_fd, off, 1);
+if (ret) {
+error_setg(errp, "Failed to unlock byte %d", off);
+return ret;
+}
+}
+}
+for (i = 0; i < BLK_PERM_MAX; ++i) {
+int off = RAW_LOCK_SHARED_BASE + i;
+if (shared_perm_lock_bits & (1ULL << i)) {
+ret = qemu_lock_fd(s->lock_fd, off, 1, false);
+if (ret) {
+error_setg(errp, "Failed to lock byte %d", off);
+return ret;
+}
+} else if (unlock) {
+ret = qemu_unlock_fd(s->lock_fd, off, 1);
+if (ret) {
+error_setg(errp, "Failed to unlock byte %d", off);
+return ret;
+}
+}
+}
+return 0;
+}
+
+/* Check "unshared" bytes implied by @perm and ~@shared_perm in the file. */
+static int raw_check_lock_bytes(BDRVRawState *s,
+uint64_t perm, uint64_t shared_perm,
+Error **errp)
+{
+int ret;
+int i;
+
+for (i = 0; i < BLK_PERM_MAX; ++i) {
+int off = RAW_LOCK_SHARED_BASE + i;
+uint64_t p = 1ULL << i;
+if (perm & p) {
+ret = qemu_lock_fd_test(s->lock_fd, off, 1, true);
+if (ret) {
+error_setg(errp,
+   "Failed to check byte %d for \"%s\" permission",
+   off, bdrv_perm_names(p));
+