Re: [Qemu-block] [Qemu-devel] [PATCH 03/30] hw/block/nvme: include the "qemu/cutils.h" in the source file

2018-02-14 Thread Thomas Huth
On 15.02.2018 05:28, Philippe Mathieu-Daudé wrote:
> where it is used.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/block/nvme.h | 1 -
>  hw/block/nvme.c | 1 +
>  2 files changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/nvme.h b/hw/block/nvme.h
> index 8f3981121d..cabcf20c32 100644
> --- a/hw/block/nvme.h
> +++ b/hw/block/nvme.h
> @@ -1,6 +1,5 @@
>  #ifndef HW_NVME_H
>  #define HW_NVME_H
> -#include "qemu/cutils.h"
>  #include "block/nvme.h"
>  
>  typedef struct NvmeAsyncEvent {
> diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> index 85d2406400..811084b6a7 100644
> --- a/hw/block/nvme.c
> +++ b/hw/block/nvme.c
> @@ -35,6 +35,7 @@
>  #include "sysemu/block-backend.h"
>  
>  #include "qemu/log.h"
> +#include "qemu/cutils.h"
>  #include "trace.h"
>  #include "nvme.h"

Reviewed-by: Thomas Huth 



[Qemu-block] [PATCH 30/30] xen: use the BYTE-based definitions

2018-02-14 Thread Philippe Mathieu-Daudé
It ease code review, unit is explicit.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/xen_disk.c|  4 ++--
 hw/xenpv/xen_domainbuild.c | 10 +-
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index f74fcd42d1..557005b5e5 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -1153,9 +1153,9 @@ static int blk_connect(struct XenDevice *xendev)
 }
 
 xen_pv_printf(xendev, 1, "type \"%s\", fileproto \"%s\", filename \"%s\","
-  " size %" PRId64 " (%" PRId64 " MB)\n",
+  " size %" PRId64 " (%llu MB)\n",
   blkdev->type, blkdev->fileproto, blkdev->filename,
-  blkdev->file_size, blkdev->file_size >> 20);
+  blkdev->file_size, blkdev->file_size / M_BYTE);
 
 /* Fill in number of sector size and number of sectors */
 xenstore_write_be_int(>xendev, "sector-size", blkdev->file_blk);
diff --git a/hw/xenpv/xen_domainbuild.c b/hw/xenpv/xen_domainbuild.c
index 027f76fad1..083fb80ee5 100644
--- a/hw/xenpv/xen_domainbuild.c
+++ b/hw/xenpv/xen_domainbuild.c
@@ -75,9 +75,9 @@ int xenstore_domain_init1(const char *kernel, const char 
*ramdisk,
 xenstore_write_str(dom, "vm", vm);
 
 /* memory */
-xenstore_write_int(dom, "memory/target", ram_size >> 10);  // kB
-xenstore_write_int(vm, "memory", ram_size >> 20);  // MB
-xenstore_write_int(vm, "maxmem", ram_size >> 20);  // MB
+xenstore_write_int(dom, "memory/target", ram_size * K_BYTE);
+xenstore_write_int(vm, "memory", ram_size * M_BYTE);
+xenstore_write_int(vm, "maxmem", ram_size * M_BYTE);
 
 /* cpus */
 for (i = 0; i < smp_cpus; i++) {
@@ -260,7 +260,7 @@ int xen_domain_build_pv(const char *kernel, const char 
*ramdisk,
 }
 #endif
 
-rc = xc_domain_setmaxmem(xen_xc, xen_domid, ram_size >> 10);
+rc = xc_domain_setmaxmem(xen_xc, xen_domid, ram_size / K_BYTE);
 if (rc < 0) {
 fprintf(stderr, "xen: xc_domain_setmaxmem() failed\n");
 goto err;
@@ -269,7 +269,7 @@ int xen_domain_build_pv(const char *kernel, const char 
*ramdisk,
 xenstore_port = xc_evtchn_alloc_unbound(xen_xc, xen_domid, 0);
 console_port = xc_evtchn_alloc_unbound(xen_xc, xen_domid, 0);
 
-rc = xc_linux_build(xen_xc, xen_domid, ram_size >> 20,
+rc = xc_linux_build(xen_xc, xen_domid, ram_size / M_BYTE,
 kernel, ramdisk, cmdline,
 0, flags,
 xenstore_port, _mfn,
-- 
2.16.1




[Qemu-block] [PATCH 16/30] hw/sh4: use the BYTE-based definitions

2018-02-14 Thread Philippe Mathieu-Daudé
It ease code review, unit is explicit.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/tc58128.c | 2 +-
 hw/sh4/r2d.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c
index 1d9f7ee000..3e658d509f 100644
--- a/hw/block/tc58128.c
+++ b/hw/block/tc58128.c
@@ -26,7 +26,7 @@ typedef struct {
 
 static tc58128_dev tc58128_devs[2];
 
-#define FLASH_SIZE (16*1024*1024)
+#define FLASH_SIZE (16 * M_BYTE)
 
 static void init_dev(tc58128_dev * dev, const char *filename)
 {
diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c
index 458ed83297..720bd6ad04 100644
--- a/hw/sh4/r2d.c
+++ b/hw/sh4/r2d.c
@@ -292,7 +292,7 @@ static void r2d_init(MachineState *machine)
 dinfo = drive_get(IF_PFLASH, 0, 0);
 pflash_cfi02_register(0x0, NULL, "r2d.flash", FLASH_SIZE,
   dinfo ? blk_by_legacy_dinfo(dinfo) : NULL,
-  (16 * 1024), FLASH_SIZE >> 16,
+  16 * K_BYTE, FLASH_SIZE >> 16,
   1, 4, 0x, 0x, 0x, 0x,
   0x555, 0x2aa, 0);
 
-- 
2.16.1




[Qemu-block] [PATCH 03/30] hw/block/nvme: include the "qemu/cutils.h" in the source file

2018-02-14 Thread Philippe Mathieu-Daudé
where it is used.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/block/nvme.h | 1 -
 hw/block/nvme.c | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/block/nvme.h b/hw/block/nvme.h
index 8f3981121d..cabcf20c32 100644
--- a/hw/block/nvme.h
+++ b/hw/block/nvme.h
@@ -1,6 +1,5 @@
 #ifndef HW_NVME_H
 #define HW_NVME_H
-#include "qemu/cutils.h"
 #include "block/nvme.h"
 
 typedef struct NvmeAsyncEvent {
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 85d2406400..811084b6a7 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -35,6 +35,7 @@
 #include "sysemu/block-backend.h"
 
 #include "qemu/log.h"
+#include "qemu/cutils.h"
 #include "trace.h"
 #include "nvme.h"
 
-- 
2.16.1




[Qemu-block] [PATCH] nbd: Honor server's advertised minimum block size

2018-02-14 Thread Eric Blake
Commit 79ba8c98 (v2.7) changed the setting of request_alignment
to occur only during bdrv_refresh_limits(), rather than at at
bdrv_open() time; but at the time, NBD was unaffected, because
it still used sector-based callbacks, so the block layer
defaulted NBD to use 512 request_alignment.

Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based
callbacks, without setting request_alignment.  This resulted in
NBD using request_alignment of 1, which works great when the
server supports it (as is the case for qemu-nbd), but falls apart
miserably if the server requires alignment (but only if qemu
actually sends a sub-sector request; qemu-io can do it, but
most qemu operations still perform on sectors or larger).

Even later, the NBD protocol was updated to document that clients
should learn the server's minimum alignment during NBD_OPT_GO;
and recommended that clients should assume a minimum size of 512
unless the server understands NBD_OPT_GO and replied with a smaller
size.  Commit 081dd1fe (v2.10) attempted to do that, by assigning
request_alignment to whatever was learned from the server; but
it has two flaws: the assignment is done during bdrv_open() so
it gets unconditionally wiped out back to 1 during any later
bdrv_refresh_limits(); and the code is not using a default of 512
when the server did not report a minimum size.

Fix these issues by moving the assignment to request_alignment
to the right function, and by using a sane default when the
server does not advertise a minimum size.

CC: qemu-sta...@nongnu.org
Signed-off-by: Eric Blake 
---
 block/nbd-client.c | 3 ---
 block/nbd.c| 2 ++
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 9206652e45c..7b68499b76a 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -846,9 +846,6 @@ int nbd_client_init(BlockDriverState *bs,
 if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) {
 bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP;
 }
-if (client->info.min_block > bs->bl.request_alignment) {
-bs->bl.request_alignment = client->info.min_block;
-}

 qemu_co_mutex_init(>send_mutex);
 qemu_co_queue_init(>free_sema);
diff --git a/block/nbd.c b/block/nbd.c
index ef81a9f53ba..69b5fd5e8fa 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -474,8 +474,10 @@ static int nbd_co_flush(BlockDriverState *bs)
 static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
 {
 NBDClientSession *s = nbd_get_client_session(bs);
+uint32_t min = s->info.min_block;
 uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block);

+bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE;
 bs->bl.max_pdiscard = max;
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;
-- 
2.14.3




Re: [Qemu-block] [Qemu-devel] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking

2018-02-14 Thread Max Reitz
On 2018-02-14 22:11, Eric Blake wrote:
> On 02/14/2018 02:49 PM, Max Reitz wrote:
>> At runtime (that is, during a future ssh_truncate()), the SSH session is
>> non-blocking.  However, ssh_truncate() (or rather, bdrv_truncate() in
>> general) is not a coroutine, so this resize operation needs to block.
>>
>> For ssh_create(), that is fine, too; the session is never set to
>> non-blocking anyway.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>   block/ssh.c | 7 +++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/block/ssh.c b/block/ssh.c
>> index 964e55f7fe..ff8576f21e 100644
>> --- a/block/ssh.c
>> +++ b/block/ssh.c
>> @@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs,
>> QDict *options, int bdrv_flags,
>>   return ret;
>>   }
>>   +/* Note: This is a blocking operation */
>>   static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
>>   {
>>   ssize_t ret;
>>   char c[1] = { '\0' };
>> +    int was_blocking = libssh2_session_get_blocking(s->session);
>>     /* offset must be strictly greater than the current size so we do
>>    * not overwrite anything */
>>   assert(offset > 0 && offset > s->attrs.filesize);
>>   +    libssh2_session_set_blocking(s->session, 1);
>> +
>>   libssh2_sftp_seek64(s->sftp_handle, offset - 1);
>>   ret = libssh2_sftp_write(s->sftp_handle, c, 1);
>> +
>> +    libssh2_session_set_blocking(s->session, was_blocking);
> 
> Is it worth skipping the two libssh2_session_set_blocking() calls if
> was_blocking is 1?  But that's a micro-optimization that probably won't
> be noticeable, so I'm also fine with unconditional.

I was hoping libssh2 is clever enough for that itself. :-)

> Reviewed-by: Eric Blake 

Thanks!

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking

2018-02-14 Thread Eric Blake

On 02/14/2018 02:49 PM, Max Reitz wrote:

At runtime (that is, during a future ssh_truncate()), the SSH session is
non-blocking.  However, ssh_truncate() (or rather, bdrv_truncate() in
general) is not a coroutine, so this resize operation needs to block.

For ssh_create(), that is fine, too; the session is never set to
non-blocking anyway.

Signed-off-by: Max Reitz 
---
  block/ssh.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 964e55f7fe..ff8576f21e 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
  return ret;
  }
  
+/* Note: This is a blocking operation */

  static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
  {
  ssize_t ret;
  char c[1] = { '\0' };
+int was_blocking = libssh2_session_get_blocking(s->session);
  
  /* offset must be strictly greater than the current size so we do

   * not overwrite anything */
  assert(offset > 0 && offset > s->attrs.filesize);
  
+libssh2_session_set_blocking(s->session, 1);

+
  libssh2_sftp_seek64(s->sftp_handle, offset - 1);
  ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+
+libssh2_session_set_blocking(s->session, was_blocking);


Is it worth skipping the two libssh2_session_set_blocking() calls if 
was_blocking is 1?  But that's a micro-optimization that probably won't 
be noticeable, so I'm also fine with unconditional.


Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block/ssh: Add basic .bdrv_truncate()

2018-02-14 Thread Eric Blake

On 02/14/2018 02:49 PM, Max Reitz wrote:

libssh2 does not seem to offer real truncation support, so we can only
grow files -- but that is better than nothing.

Signed-off-by: Max Reitz 
---
  block/ssh.c | 24 
  1 file changed, 24 insertions(+)



Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] block/ssh: Pull ssh_grow_file() from ssh_create()

2018-02-14 Thread Eric Blake

On 02/14/2018 02:49 PM, Max Reitz wrote:

If we ever want to offer even rudimentary truncation functionality for
ssh, we should put the respective code into a reusable function.

Signed-off-by: Max Reitz 
---
  block/ssh.c | 30 ++
  1 file changed, 22 insertions(+), 8 deletions(-)




+++ b/block/ssh.c
@@ -803,6 +803,26 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
  return ret;
  }
  
+static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)

+{
+ssize_t ret;
+char c[1] = { '\0' };


Could spell this 'char c[1] = "";', but you just did code motion.

Reviewed-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [RFC PATCH 0/2] s/size/entries/ when dealing with non-byte units

2018-02-14 Thread Max Reitz
On 2018-02-14 00:33, Eric Blake wrote:
> I mentioned this while reviewing Berto's series on L2 slice handling;
> this is a first cut at patches that I think are worth doing throughout
> the qcow2 code base if we like the idea.

I like the idea. :-)

The patches look good to me.

Max

> Eric Blake (2):
>   qcow2: Prefer 'entries' over 'size' for non-byte values in spec
>   qcow2: Prefer 'entries' over 'size' during cache creation
> 
>  docs/interop/qcow2.txt |  4 ++--
>  block/qcow2.h  |  4 ++--
>  block/qcow2.c  | 21 +++--
>  3 files changed, 15 insertions(+), 14 deletions(-)
> 




signature.asc
Description: OpenPGP digital signature


[Qemu-block] [PATCH 3/3] block/ssh: Add basic .bdrv_truncate()

2018-02-14 Thread Max Reitz
libssh2 does not seem to offer real truncation support, so we can only
grow files -- but that is better than nothing.

Signed-off-by: Max Reitz 
---
 block/ssh.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index ff8576f21e..c235eec255 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -1219,6 +1219,29 @@ static int64_t ssh_getlength(BlockDriverState *bs)
 return length;
 }
 
+static int ssh_truncate(BlockDriverState *bs, int64_t offset,
+PreallocMode prealloc, Error **errp)
+{
+BDRVSSHState *s = bs->opaque;
+
+if (prealloc != PREALLOC_MODE_OFF) {
+error_setg(errp, "Unsupported preallocation mode '%s'",
+   PreallocMode_str(prealloc));
+return -ENOTSUP;
+}
+
+if (offset < s->attrs.filesize) {
+error_setg(errp, "ssh driver does not support shrinking files");
+return -ENOTSUP;
+}
+
+if (offset == s->attrs.filesize) {
+return 0;
+}
+
+return ssh_grow_file(s, offset, errp);
+}
+
 static BlockDriver bdrv_ssh = {
 .format_name  = "ssh",
 .protocol_name= "ssh",
@@ -1231,6 +1254,7 @@ static BlockDriver bdrv_ssh = {
 .bdrv_co_readv= ssh_co_readv,
 .bdrv_co_writev   = ssh_co_writev,
 .bdrv_getlength   = ssh_getlength,
+.bdrv_truncate= ssh_truncate,
 .bdrv_co_flush_to_disk= ssh_co_flush,
 .create_opts  = _create_opts,
 };
-- 
2.14.3




[Qemu-block] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking

2018-02-14 Thread Max Reitz
At runtime (that is, during a future ssh_truncate()), the SSH session is
non-blocking.  However, ssh_truncate() (or rather, bdrv_truncate() in
general) is not a coroutine, so this resize operation needs to block.

For ssh_create(), that is fine, too; the session is never set to
non-blocking anyway.

Signed-off-by: Max Reitz 
---
 block/ssh.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/block/ssh.c b/block/ssh.c
index 964e55f7fe..ff8576f21e 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 return ret;
 }
 
+/* Note: This is a blocking operation */
 static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
 {
 ssize_t ret;
 char c[1] = { '\0' };
+int was_blocking = libssh2_session_get_blocking(s->session);
 
 /* offset must be strictly greater than the current size so we do
  * not overwrite anything */
 assert(offset > 0 && offset > s->attrs.filesize);
 
+libssh2_session_set_blocking(s->session, 1);
+
 libssh2_sftp_seek64(s->sftp_handle, offset - 1);
 ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+
+libssh2_session_set_blocking(s->session, was_blocking);
+
 if (ret < 0) {
 sftp_error_setg(errp, s, "Failed to grow file");
 return -EIO;
-- 
2.14.3




[Qemu-block] [PATCH 0/3] block/ssh: Add basic .bdrv_truncate()

2018-02-14 Thread Max Reitz
For (x-)blockdev-create, all protocol drivers that support image
creation also need to offer a .bdrv_truncate() implementation that
matches in features.  A previous series of mine brought gluster's and
sheepdog's implementation up to par regarding preallocated truncation;
but I forgot about drivers that have a .bdrv_create() but no
.bdrv_truncate() at all.

There is only one of these, and that is ssh.  Since libssh2 does not
seem to know any way of truncating files, we can only support growing
files -- but that is what we need for (x-)blockdev-create.

Note that there are also drivers which do not support growing files,
namely iscsi and file-posix for host devices (maybe more?  I hope not).
But these also pretty much ignore the specified size on .bdrv_create()
and just use the size of the existing device.  They just check that the
specified size does not exceed the actual size, so that pretty much
matches what their .bdrv_truncate() supports, and we should be fine
there.


Max Reitz (3):
  block/ssh: Pull ssh_grow_file() from ssh_create()
  block/ssh: Make ssh_grow_file() blocking
  block/ssh: Add basic .bdrv_truncate()

 block/ssh.c | 61 +
 1 file changed, 53 insertions(+), 8 deletions(-)

-- 
2.14.3




[Qemu-block] [PATCH 1/3] block/ssh: Pull ssh_grow_file() from ssh_create()

2018-02-14 Thread Max Reitz
If we ever want to offer even rudimentary truncation functionality for
ssh, we should put the respective code into a reusable function.

Signed-off-by: Max Reitz 
---
 block/ssh.c | 30 ++
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/block/ssh.c b/block/ssh.c
index b63addcf94..964e55f7fe 100644
--- a/block/ssh.c
+++ b/block/ssh.c
@@ -803,6 +803,26 @@ static int ssh_file_open(BlockDriverState *bs, QDict 
*options, int bdrv_flags,
 return ret;
 }
 
+static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp)
+{
+ssize_t ret;
+char c[1] = { '\0' };
+
+/* offset must be strictly greater than the current size so we do
+ * not overwrite anything */
+assert(offset > 0 && offset > s->attrs.filesize);
+
+libssh2_sftp_seek64(s->sftp_handle, offset - 1);
+ret = libssh2_sftp_write(s->sftp_handle, c, 1);
+if (ret < 0) {
+sftp_error_setg(errp, s, "Failed to grow file");
+return -EIO;
+}
+
+s->attrs.filesize = offset;
+return 0;
+}
+
 static QemuOptsList ssh_create_opts = {
 .name = "ssh-create-opts",
 .head = QTAILQ_HEAD_INITIALIZER(ssh_create_opts.head),
@@ -822,8 +842,6 @@ static int ssh_create(const char *filename, QemuOpts *opts, 
Error **errp)
 int64_t total_size = 0;
 QDict *uri_options = NULL;
 BDRVSSHState s;
-ssize_t r2;
-char c[1] = { '\0' };
 
 ssh_state_init();
 
@@ -849,14 +867,10 @@ static int ssh_create(const char *filename, QemuOpts 
*opts, Error **errp)
 }
 
 if (total_size > 0) {
-libssh2_sftp_seek64(s.sftp_handle, total_size-1);
-r2 = libssh2_sftp_write(s.sftp_handle, c, 1);
-if (r2 < 0) {
-sftp_error_setg(errp, , "truncate failed");
-ret = -EINVAL;
+ret = ssh_grow_file(, total_size, errp);
+if (ret < 0) {
 goto out;
 }
-s.attrs.filesize = total_size;
 }
 
 ret = 0;
-- 
2.14.3




Re: [Qemu-block] [PATCH v3 1/2] iotest 033: add misaligned write-zeroes test via truncate

2018-02-14 Thread Eric Blake

On 02/14/2018 10:09 AM, Anton Nefedov wrote:

This new test case only makes sense for qcow2 while iotest 033 is generic;
however it matches the test purpose perfectly and also 033 contains those
do_test() tricks to pass the alignment, which won't look nice being
duplicated in other tests or moved to the common code.

Signed-off-by: Anton Nefedov 
---
  tests/qemu-iotests/033 | 29 +
  tests/qemu-iotests/033.out | 13 +
  2 files changed, 42 insertions(+)




+# only interested in qcow2 here; also other formats might respond with
+#  "not supported" error message
+if [ $IMGFMT = "qcow2" ]; then
+do_test 512 "truncate $L2_COVERAGE" "$TEST_IMG" | _filter_qemu_io
+fi


But without an else branch that echoes the same text as the if branch 
generates, your .out file is now broken for other image formats.  Or 
does 'qemu-io -c truncate' not produce output?


/me goes and tests...

Okay, looks like truncate is silent; and that the truncation (or 
skipping of the truncation) doesn't affect things.


Tested-by: Eric Blake 

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [Qemu-devel] [PATCH v8 02/21] nvme: Drop pointless .bdrv_co_get_block_status()

2018-02-14 Thread Philippe Mathieu-Daudé
On 02/13/2018 05:26 PM, Eric Blake wrote:
> Commit bdd6a90 has a bug: drivers should never directly set
> BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed).

Doesn't "pointless" in subject hide this is a bugfix?

> Instead, drivers should report BDRV_BLOCK_DATA if it knows that
> data comes from this BDS.
> 
> But let's look at the bigger picture: semantically, the nvme
> driver is similar to the nbd, null, and raw drivers (no backing
> file, all data comes from this BDS).  But while two of those
> other drivers have to supply the callback (null because it can
> special-case BDRV_BLOCK_ZERO, raw because it can special-case
> a different offset), in this case the block layer defaults are
> good enough without the callback at all (similar to nbd).
> 
> So, fix the bug by deletion ;)
> 
> Signed-off-by: Eric Blake 
> 
> ---
> v8: new patch
> ---
>  block/nvme.c | 14 --
>  1 file changed, 14 deletions(-)
> 
> diff --git a/block/nvme.c b/block/nvme.c
> index 10bffbbf2f4..4e561b08df3 100644
> --- a/block/nvme.c
> +++ b/block/nvme.c
> @@ -1068,18 +1068,6 @@ static int nvme_reopen_prepare(BDRVReopenState 
> *reopen_state,
>  return 0;
>  }
> 
> -static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs,
> - int64_t sector_num,
> - int nb_sectors, int 
> *pnum,
> - BlockDriverState **file)
> -{
> -*pnum = nb_sectors;
> -*file = bs;
> -
> -return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID |
> -   (sector_num << BDRV_SECTOR_BITS);
> -}
> -
>  static void nvme_refresh_filename(BlockDriverState *bs, QDict *opts)
>  {
>  QINCREF(opts);
> @@ -1179,8 +1167,6 @@ static BlockDriver bdrv_nvme = {
>  .bdrv_co_flush_to_disk= nvme_co_flush,
>  .bdrv_reopen_prepare  = nvme_reopen_prepare,
> 
> -.bdrv_co_get_block_status = nvme_co_get_block_status,
> -
>  .bdrv_refresh_filename= nvme_refresh_filename,
>  .bdrv_refresh_limits  = nvme_refresh_limits,
> 



Re: [Qemu-block] [PATCH v3 0/2] block: fix write with zero flag set and iovector provided

2018-02-14 Thread Kevin Wolf
Am 14.02.2018 um 17:09 hat Anton Nefedov geschrieben:
> v3: patch 1: image cluster size reduced to get away with a smaller test image
> (the cluster size can be as small as 512 bytes for qcow2,
>  but the test runs for all generic formats and minimum for qed is 4k)

Thanks, applied to the block branch (in reverse order, tests should
always pass).

Kevin



Re: [Qemu-block] [PATCH v8 00/21] add byte-based block_status driver callbacks

2018-02-14 Thread Kevin Wolf
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:
> There are patches floating around to add NBD_CMD_BLOCK_STATUS,
> but NBD wants to report status on byte granularity (even if the
> reporting will probably be naturally aligned to sectors or even
> much higher levels).  I've therefore started the task of
> converting our block status code to report at a byte granularity
> rather than sectors.
> 
> These patches have been around for a while, but it's time to
> finish it now that 2.12 is open for patches.

Thanks, touched up patch 8 and applied to the block branch.

Kevin



[Qemu-block] [PATCH v3 0/2] block: fix write with zero flag set and iovector provided

2018-02-14 Thread Anton Nefedov
v3: patch 1: image cluster size reduced to get away with a smaller test image
(the cluster size can be as small as 512 bytes for qcow2,
 but the test runs for all generic formats and minimum for qed is 4k)

v2: http://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg03016.html

Anton Nefedov (2):
  iotest 033: add misaligned write-zeroes test via truncate
  block: fix write with zero flag set and iovector provided

 block/io.c |  2 +-
 tests/qemu-iotests/033 | 29 +
 tests/qemu-iotests/033.out | 13 +
 3 files changed, 43 insertions(+), 1 deletion(-)

-- 
2.7.4




[Qemu-block] [PATCH v3 2/2] block: fix write with zero flag set and iovector provided

2018-02-14 Thread Anton Nefedov
The normal bdrv_co_pwritev() use is either
  - BDRV_REQ_ZERO_WRITE clear and iovector provided
  - BDRV_REQ_ZERO_WRITE set and iovector == NULL

while
  - the flag clear and iovector == NULL is an assertion failure
in bdrv_co_do_zero_pwritev()
  - the flag set and iovector provided is in fact allowed
(the flag prevails and zeroes are written)

However the alignment logic does not support the latter case so the padding
areas get overwritten with zeroes.

Currently, general functions like bdrv_rw_co() do provide iovector
regardless of flags. So, keep it supported and use bdrv_co_do_zero_pwritev()
alignment for it which also makes the code a bit more obvious anyway.

Signed-off-by: Anton Nefedov 
Reviewed-by: Eric Blake 
Reviewed-by: Alberto Garcia 
---
 block/io.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 89d0745..40df3be 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child,
  */
 tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE);
 
-if (!qiov) {
+if (flags & BDRV_REQ_ZERO_WRITE) {
 ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, );
 goto out;
 }
-- 
2.7.4




Re: [Qemu-block] [PATCH v8 20/21] vvfat: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 14.02.2018 um 15:50 hat Eric Blake geschrieben:
> On 02/14/2018 07:12 AM, Kevin Wolf wrote:
> > Am 13.02.2018 um 21:27 hat Eric Blake geschrieben:
> > > We are gradually moving away from sector-based interfaces, towards
> > > byte-based.  Update the vvfat driver accordingly.  Note that we
> > > can rely on the block driver having already clamped limits to our
> > > block size, and simplify accordingly.
> > > 
> > > Signed-off-by: Eric Blake 
> > > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > > Reviewed-by: Fam Zheng 
> 
> > >   {
> > > -*n = bs->total_sectors - sector_num;
> > > -if (*n > nb_sectors) {
> > > -*n = nb_sectors;
> > > -} else if (*n < 0) {
> > > -return 0;
> > > -}
> > > +*n = bytes;
> > >   return BDRV_BLOCK_DATA;
> > >   }
> > 
> > Preexisting, but this is inconsistent with other protocol drivers as far
> > as OFFSET_VALID is concerned (as I hinted in another mail, I like it
> > better this way, but it's still inconsistent).
> > 
> > Do we actually need any callback here or could the solution be to simply
> > remove it like with nvme?
> 
> Does that mean io.c's defaults for protocol drivers is wrong?  It defaults
> to setting OFFSET_VALID and *map on all protocol drivers without a callback
> (at least nvme, nbd); I didn't delete this callback because I noticed the
> difference in return value, and couldn't justify whether it was intentional.
> Also, vvfat is weird - it is somewhat of a format driver, rather than just a
> protocol; even though it sets .protocol_name.  It may be possible for vvfat
> to actually set OFFSET_VALID to particular offsets within the host file that
> correspond to what the guest would read, where it is not a simple 1:1
> mapping, precisely because it is imposing format on the host file.  However,
> vvfat is one of those things that I try to avoid as much as possible,
> because it is just so weird.

As I said in my reply to the null block driver, OFFSET_VALID doesn't
really make sense for protocol drivers anyway. Making use of it with
vvfat isn't any more practical than directly accessing the undefined
data of the null driver.

I think the unwritten rule at the moment is that protocols should always
set OFFSET_VALID and *file = bs (even though it doesn't make sense). So
with the current interface, I'd consider deleting the callback a vvfat
fix.

I also think that we should possibly look into changing the interface so
that protocols don't set OFFSET_VALID and *file, but then the default
handling would change too, and deleting the callback in vvfat would
still be right.

As this is preexisting, I'm okay with just merging the series as it is,
and then we can handle this while dealing with the more fundamental
question what protocol drivers should return in general.

Kevin



Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 14.02.2018 um 15:44 hat Eric Blake geschrieben:
> On 02/14/2018 06:05 AM, Kevin Wolf wrote:
> > Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:
> > > We are gradually moving away from sector-based interfaces, towards
> > > byte-based.  Update the null driver accordingly.
> > > 
> > > Signed-off-by: Eric Blake 
> > > Reviewed-by: Vladimir Sementsov-Ogievskiy 
> > > Reviewed-by: Fam Zheng 
> > > 
> 
> > >   if (s->read_zeroes) {
> > > -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
> > > -} else {
> > > -return BDRV_BLOCK_OFFSET_VALID | start;
> > > +ret |= BDRV_BLOCK_ZERO;
> > >   }
> > > +return ret;
> > >   }
> > 
> > Preexisting, but I think this return value is wrong. OFFSET_VALID
> > without DATA is to documented to have the following semantics:
> > 
> >   * DATA ZERO OFFSET_VALID
> >   *  ftt   sectors preallocated, read as zero, returned 
> > file not
> >   *necessarily zero at offset
> >   *  fft   sectors preallocated but read from backing_hd,
> >   *returned file contains garbage at offset
> > 
> > I'm not sure what OFFSET_VALID is even supposed to mean for null.
> 
> Yeah, and I was even thinking about that a bit yesterday when figuring out
> what to do with nvme.  It does highlight the fact that you get garbage when
> reading from the null driver (unless the zero option was enabled, then ZERO
> is set and you know you read zeros instead) - but there no pointer that is
> preallocated (whether it contains garbage or otherwise) that you can
> actually dereference to read what the guest would see.
> 
> > 
> > Or in fact, what it is supposed to mean for any protocol driver, because
> > normally it just means I can use this offset for accessing bs->file. But
> > protocol drivers don't have a bs->file, so it's interesting to see that
> > they still all set this flag.
> > 
> > OFFSET_VALID | DATA might be excusable because I can see that it's
> > convenient that a protocol driver refers to itself as *file instead of
> > returning NULL there and then the offset is valid (though it would be
> > pointless to actually follow the file pointer), but OFFSET_VALID without
> > DATA probably isn't.
> 
> Hmm, you're probably right.  Maybe that means I should tweak the
> documentation to be more explicit: for a format driver, OFFSET_VALID can
> always be used (and *file will be set to the underlying protocol driver);
> but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS
> itself and there is an actual buffer to read (that is, the protocol driver
> must also be returning DATA and/or ZERO).  Or maybe we can indeed state that
> protocol drivers always set *file to NULL (there is no further backing file
> to reference), and thus never need to return OFFSET_VALID (but I'm not sure
> whether that will accidentally propagate back up the call stack and
> negatively affect status queries of format drivers).
> 
> Since it is pre-existing, should I respin to address the issue in a separate
> patch, or should that be a followup after this series?

It's a more fundamental question that shouldn't hold up this series. I
just wanted to raise it while I was looking at it. So yes, a followup is
fine.

Kevin



Re: [Qemu-block] [PATCH v8 19/21] vpc: Switch to .bdrv_co_block_status()

2018-02-14 Thread Eric Blake

On 02/14/2018 07:08 AM, Kevin Wolf wrote:

Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the vpc driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 

---



+allocated = (image_offset != -1);
  *pnum = 0;
  ret = 0;

  do {
  /* All sectors in a block are contiguous (without using the bitmap) */
-n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
-  - sector_num;
-n = MIN(n, nb_sectors);
+n = ROUND_UP(offset + 1, s->block_size) - offset;
+n = MIN(n, bytes);

  *pnum += n;
-sector_num += n;
-nb_sectors -= n;
+offset += n;
+bytes -= n;
  /* *pnum can't be greater than one block for allocated
   * sectors since there is always a bitmap in between. */
  if (allocated) {
  *file = bs->file->bs;
-ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
+*map = image_offset;


This does work, but only because the loop isn't actually looping for
allocated == true. In the old code, it was obvious that start was
assigned only once and never changed during the loop, but image_offset
changes in each loop iteration.

It would probably be cleaner and more obviously correct to move the
assignment of *map to before the loop.


Yes, that would be a bit nicer.



Kevin



--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] qcow2 images corruption

2018-02-14 Thread Nicolas Ecarnot



https://framadrop.org/r/Lvvr392QZo#/wOeYUUlHQAtkUw1E+x2YdqTqq21Pbic6OPBIH0TjZE=

Le 14/02/2018 à 00:01, John Snow a écrit :



On 02/13/2018 04:41 AM, Kevin Wolf wrote:

Am 07.02.2018 um 18:06 hat Nicolas Ecarnot geschrieben:

TL; DR : qcow2 images keep getting corrupted. Any workaround?


Not without knowing the cause.

The first thing to make sure is that the image isn't touched by a second
process while QEMU is running a VM. The classic one is using 'qemu-img
snapshot' on the image of a running VM, which is instant corruption (and
newer QEMU versions have locking in place to prevent this), but we have
seen more absurd cases of things outside QEMU tampering with the image
when we were investigating previous corruption reports.

This covers the majority of all reports, we haven't had a real
corruption caused by a QEMU bug in ages.


After having found (https://access.redhat.com/solutions/1173623) the right
logical volume hosting the qcow2 image, I can run qemu-img check on it.
- On 80% of my VMs, I find no errors.
- On 15% of them, I find Leaked cluster errors that I can correct using
"qemu-img check -r all"
- On 5% of them, I find Leaked clusters errors and further fatal errors,
which can not be corrected with qemu-img.
In rare cases, qemu-img can correct them, but destroys large parts of the
image (becomes unusable), and on other cases it can not correct them at all.


It would be good if you could make the 'qemu-img check' output available
somewhere.

It would be even better if we could have a look at the respective image.
I seem to remember that John (CCed) had a few scripts to analyse
corrupted qcow2 images, maybe we would be able to see something there.



Hi! I did write a pretty simplistic tool for trying to tell the shape of
a corruption at a glance. It seems to work pretty similarly to the other
tool you already found, but it won't hurt anything to run it:

https://github.com/jnsnow/qcheck

(Actually, that other tool looks like it has an awful lot of options.
I'll have to check it out.)

It can print a really upsetting amount of data (especially for very
corrupt images), but in the default case, the simple setting should do
the trick just fine.

You could always put the output from this tool in a pastebin too; it
might help me visualize the problem a bit more -- I find seeing the
exact offsets and locations of where all the various tables and things
to be pretty helpful.

You can also always use the "deluge" option and compress it if you want,
just don't let it print to your terminal:

jsnow@probe (dev) ~/s/qcheck> ./qcheck -xd
/home/bos/jsnow/src/qemu/bin/git/install_test_f26.qcow2 > deluge.log;
and ls -sh deluge.log
4.3M deluge.log

but it compresses down very well:

jsnow@probe (dev) ~/s/qcheck> 7z a -t7z -m0=ppmd deluge.ppmd.7z deluge.log
jsnow@probe (dev) ~/s/qcheck> ls -s deluge.ppmd.7z
316 deluge.ppmd.7z

So I suppose if you want to send along:
(1) The basic output without any flags, in a pastebin
(2) The zipped deluge output, just in case

and I will try my hand at guessing what went wrong.


(Also, maybe my tool will totally choke for your image, who knows. It
hasn't received an overwhelming amount of testing apart from when I go
to use it personally and inevitably wind up displeased with how it
handles certain situations, so ...)


What I read similar to my case is :
- usage of qcow2
- heavy disk I/O
- using the virtio-blk driver

In the proxmox thread, they tend to say that using virtio-scsi is the
solution. Having asked this question to oVirt experts
(https://lists.ovirt.org/pipermail/users/2018-February/086753.html) but it's
not clear the driver is to blame.


This seems very unlikely. The corruption you're seeing is in the qcow2
metadata, not only in the guest data. If anything, virtio-scsi exercises
more qcow2 code paths than virtio-blk, so any potential bug that affects
virtio-blk should also affect virtio-scsi, but not the other way around.


I agree with the answer Yaniv Kaul gave to me, saying I have to properly
report the issue, so I'm longing to know which peculiar information I can
give you now.


To be honest, debugging corruption after the fact is pretty hard. We'd
need the 'qemu-img check' output and ideally the image to do anything,
but I can't promise that anything would come out of this.

Best would be a reproducer, or at least some operation that you can link
to the appearance of the corruption. Then we could take a more targeted
look at the respective code.


As you can imagine, all this setup is in production, and for most of the
VMs, I can not "play" with them. Moreover, we launched a campaign of nightly
stopping every VM, qemu-img check them one by one, then boot.
So it might take some time before I find another corrupted image.
(which I'll preciously store for debug)

Other informations : We very rarely do snapshots, but I'm close to imagine
that automated migrations of VMs could trigger similar behaviors on qcow2
images.


To my knowledge, oVirt only uses 

Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()

2018-02-14 Thread Eric Blake

On 02/14/2018 06:05 AM, Kevin Wolf wrote:

Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the null driver accordingly.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Fam Zheng 




  if (s->read_zeroes) {
-return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO;
-} else {
-return BDRV_BLOCK_OFFSET_VALID | start;
+ret |= BDRV_BLOCK_ZERO;
  }
+return ret;
  }


Preexisting, but I think this return value is wrong. OFFSET_VALID
without DATA is to documented to have the following semantics:

  * DATA ZERO OFFSET_VALID
  *  ftt   sectors preallocated, read as zero, returned file not
  *necessarily zero at offset
  *  fft   sectors preallocated but read from backing_hd,
  *returned file contains garbage at offset

I'm not sure what OFFSET_VALID is even supposed to mean for null.


Yeah, and I was even thinking about that a bit yesterday when figuring 
out what to do with nvme.  It does highlight the fact that you get 
garbage when reading from the null driver (unless the zero option was 
enabled, then ZERO is set and you know you read zeros instead) - but 
there no pointer that is preallocated (whether it contains garbage or 
otherwise) that you can actually dereference to read what the guest 
would see.




Or in fact, what it is supposed to mean for any protocol driver, because
normally it just means I can use this offset for accessing bs->file. But
protocol drivers don't have a bs->file, so it's interesting to see that
they still all set this flag.

OFFSET_VALID | DATA might be excusable because I can see that it's
convenient that a protocol driver refers to itself as *file instead of
returning NULL there and then the offset is valid (though it would be
pointless to actually follow the file pointer), but OFFSET_VALID without
DATA probably isn't.


Hmm, you're probably right.  Maybe that means I should tweak the 
documentation to be more explicit: for a format driver, OFFSET_VALID can 
always be used (and *file will be set to the underlying protocol 
driver); but for a protocol driver, OFFSET_VALID only makes sense if 
*file is the BDS itself and there is an actual buffer to read (that is, 
the protocol driver must also be returning DATA and/or ZERO).  Or maybe 
we can indeed state that protocol drivers always set *file to NULL 
(there is no further backing file to reference), and thus never need to 
return OFFSET_VALID (but I'm not sure whether that will accidentally 
propagate back up the call stack and negatively affect status queries of 
format drivers).


Since it is pre-existing, should I respin to address the issue in a 
separate patch, or should that be a followup after this series?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



[Qemu-block] BLOCK_STATUS extension

2018-02-14 Thread Vladimir Sementsov-Ogievskiy

Hi all.

Just note: looks like we allow zero-sized metadata context name. Is it ok?

 *

   |NBD_REP_META_CONTEXT| (4)

   A description of a metadata context. Data:

 o 32 bits, NBD metadata context ID.
 o String, name of the metadata context. This is not required to be
   a human-readable string, but it MUST be valid UTF-8 data.



--
Best regards,
Vladimir



Re: [Qemu-block] [PATCH v8 08/21] iscsi: Switch to .bdrv_co_block_status()

2018-02-14 Thread Eric Blake

On 02/14/2018 05:53 AM, Kevin Wolf wrote:

Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the iscsi driver accordingly.  In this case,
it is handy to teach iscsi_co_block_status() to handle a NULL map
and file parameter, even though the block layer passes non-NULL
values, because we also call the function directly.  For now, there
are no optimizations done based on the want_zero flag.


[1]



We can also make the simplification of asserting that the block
layer passed in aligned values.

Signed-off-by: Eric Blake 
Reviewed-by: Fam Zheng 




  /* default to all sectors allocated */
-ret = BDRV_BLOCK_DATA;
-ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
-*pnum = nb_sectors;
+ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
+if (map) {
+*map = offset;
+}


Can map ever be NULL? You didn't have that check in other drivers.


The documentation in block_int.h states that io.c never passes NULL for 
map.  However, see my commit message [1], and the code below [2], for 
why THIS driver has to check for NULL.




@@ -760,7 +758,7 @@ out:
  if (iTask.task != NULL) {
  scsi_free_scsi_task(iTask.task);
  }
-if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
+if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
  *file = bs;
  }


Can file ever be NULL?


Ditto.




  return ret;
@@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
*bs,
   nb_sectors * BDRV_SECTOR_SIZE) &&


No iscsi_co_preadv() yet... :-(


Yeah, but that's for another day.




  !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
   nb_sectors * BDRV_SECTOR_SIZE)) {
-int pnum;
-BlockDriverState *file;
+int64_t pnum;
  /* check the block status from the beginning of the cluster
   * containing the start sector */
-int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS;
-int head;
-int64_t ret;
+int64_t head;
+int ret;

-assert(cluster_sectors);
-head = sector_num % cluster_sectors;
-ret = iscsi_co_get_block_status(bs, sector_num - head,
-BDRV_REQUEST_MAX_SECTORS, ,
-);
+assert(iscsilun->cluster_size);
+head = (sector_num * BDRV_SECTOR_SIZE) % iscsilun->cluster_size;
+ret = iscsi_co_block_status(bs, false,
+sector_num * BDRV_SECTOR_SIZE - head,
+BDRV_REQUEST_MAX_BYTES, , NULL, NULL);




[2] This is the reason that THIS driver has to check for NULL, even 
though the block layer never passes NULL.



It doesn't make a difference with your current implementation because it
ignores want_zero, but consistent with your approach that
want_zero=false returns just that everyhting is allocated for drivers
without support for backing files, I think you want want_zero=true here.


Makes sense.  If that's the only tweak, can you make it while taking the 
series, or will I need to respin?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



Re: [Qemu-block] [PATCH v8 20/21] vvfat: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 13.02.2018 um 21:27 hat Eric Blake geschrieben:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  Update the vvfat driver accordingly.  Note that we
> can rely on the block driver having already clamped limits to our
> block size, and simplify accordingly.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Fam Zheng 
> 
> ---
> v5-v7: no change
> v4: rebase to interface tweak
> v3: no change
> v2: rebase to earlier changes, simplify
> ---
>  block/vvfat.c | 16 +++-
>  1 file changed, 7 insertions(+), 9 deletions(-)
> 
> diff --git a/block/vvfat.c b/block/vvfat.c
> index 7e06ebacf61..4a17a49e128 100644
> --- a/block/vvfat.c
> +++ b/block/vvfat.c
> @@ -3088,15 +3088,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t 
> offset, uint64_t bytes,
>  return ret;
>  }
> 
> -static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
> -int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file)
> +static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
> +  bool want_zero, int64_t offset,
> +  int64_t bytes, int64_t *n,
> +  int64_t *map,
> +  BlockDriverState **file)
>  {
> -*n = bs->total_sectors - sector_num;
> -if (*n > nb_sectors) {
> -*n = nb_sectors;
> -} else if (*n < 0) {
> -return 0;
> -}
> +*n = bytes;
>  return BDRV_BLOCK_DATA;
>  }

Preexisting, but this is inconsistent with other protocol drivers as far
as OFFSET_VALID is concerned (as I hinted in another mail, I like it
better this way, but it's still inconsistent).

Do we actually need any callback here or could the solution be to simply
remove it like with nvme?

Kevin



Re: [Qemu-block] [PATCH v8 19/21] vpc: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  Update the vpc driver accordingly.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: Fam Zheng 
> 
> ---
> v7: tweak commit message and type of 'n' [Fam]
> v6: no change
> v5: fix incorrect rounding in 'map' and bad loop condition [Vladimir]
> v4: rebase to interface tweak
> v3: rebase to master
> v2: drop get_sector_offset() [Kevin], rebase to mapping flag
> ---
>  block/vpc.c | 45 +++--
>  1 file changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/block/vpc.c b/block/vpc.c
> index cfa5144e867..fba4492fd7b 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -706,53 +706,54 @@ fail:
>  return ret;
>  }
> 
> -static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs,
> -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState 
> **file)
> +static int coroutine_fn vpc_co_block_status(BlockDriverState *bs,
> +bool want_zero,
> +int64_t offset, int64_t bytes,
> +int64_t *pnum, int64_t *map,
> +BlockDriverState **file)
>  {
>  BDRVVPCState *s = bs->opaque;
>  VHDFooter *footer = (VHDFooter*) s->footer_buf;
> -int64_t start, offset;
> +int64_t image_offset;
>  bool allocated;
> -int64_t ret;
> -int n;
> +int ret;
> +int64_t n;
> 
>  if (be32_to_cpu(footer->type) == VHD_FIXED) {
> -*pnum = nb_sectors;
> +*pnum = bytes;
> +*map = offset;
>  *file = bs->file->bs;
> -return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID |
> -   (sector_num << BDRV_SECTOR_BITS);
> +return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
>  }
> 
>  qemu_co_mutex_lock(>lock);
> 
> -offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, 
> NULL);
> -start = offset;
> -allocated = (offset != -1);
> +image_offset = get_image_offset(bs, offset, false, NULL);
> +allocated = (image_offset != -1);
>  *pnum = 0;
>  ret = 0;
> 
>  do {
>  /* All sectors in a block are contiguous (without using the bitmap) 
> */
> -n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE)
> -  - sector_num;
> -n = MIN(n, nb_sectors);
> +n = ROUND_UP(offset + 1, s->block_size) - offset;
> +n = MIN(n, bytes);
> 
>  *pnum += n;
> -sector_num += n;
> -nb_sectors -= n;
> +offset += n;
> +bytes -= n;
>  /* *pnum can't be greater than one block for allocated
>   * sectors since there is always a bitmap in between. */
>  if (allocated) {
>  *file = bs->file->bs;
> -ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> +*map = image_offset;

This does work, but only because the loop isn't actually looping for
allocated == true. In the old code, it was obvious that start was
assigned only once and never changed during the loop, but image_offset
changes in each loop iteration.

It would probably be cleaner and more obviously correct to move the
assignment of *map to before the loop.

Kevin



Re: [Qemu-block] [PATCH v8 08/21] iscsi: Switch to .bdrv_co_block_status()

2018-02-14 Thread Kevin Wolf
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben:
> We are gradually moving away from sector-based interfaces, towards
> byte-based.  Update the iscsi driver accordingly.  In this case,
> it is handy to teach iscsi_co_block_status() to handle a NULL map
> and file parameter, even though the block layer passes non-NULL
> values, because we also call the function directly.  For now, there
> are no optimizations done based on the want_zero flag.
> 
> We can also make the simplification of asserting that the block
> layer passed in aligned values.
> 
> Signed-off-by: Eric Blake 
> Reviewed-by: Fam Zheng 
> 
> ---
> v8: rebase to master
> v7: rebase to master
> v6: no change
> v5: assert rather than check for alignment
> v4: rebase to interface tweaks
> v3: no change
> v2: rebase to mapping parameter
> ---
>  block/iscsi.c | 67 
> ---
>  1 file changed, 32 insertions(+), 35 deletions(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index d2b0466775c..4842519fdad 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -653,36 +653,36 @@ out_unlock:
> 
> 
> 
> -static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs,
> -  int64_t sector_num,
> -  int nb_sectors, int *pnum,
> -  BlockDriverState **file)
> +static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
> +  bool want_zero, int64_t offset,
> +  int64_t bytes, int64_t *pnum,
> +  int64_t *map,
> +  BlockDriverState **file)
>  {
>  IscsiLun *iscsilun = bs->opaque;
>  struct scsi_get_lba_status *lbas = NULL;
>  struct scsi_lba_status_descriptor *lbasd = NULL;
>  struct IscsiTask iTask;
>  uint64_t lba;
> -int64_t ret;
> +int ret;
> 
>  iscsi_co_init_iscsitask(iscsilun, );
> 
> -if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
> -ret = -EINVAL;
> -goto out;
> -}
> +assert(QEMU_IS_ALIGNED(offset | bytes, iscsilun->block_size));
> 
>  /* default to all sectors allocated */
> -ret = BDRV_BLOCK_DATA;
> -ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID;
> -*pnum = nb_sectors;
> +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
> +if (map) {
> +*map = offset;
> +}

Can map ever be NULL? You didn't have that check in other drivers.

> +*pnum = bytes;
> 
>  /* LUN does not support logical block provisioning */
>  if (!iscsilun->lbpme) {
>  goto out;
>  }
> 
> -lba = sector_qemu2lun(sector_num, iscsilun);
> +lba = offset / iscsilun->block_size;
> 
>  qemu_mutex_lock(>mutex);
>  retry:
> @@ -727,12 +727,12 @@ retry:
> 
>  lbasd = >descriptors[0];
> 
> -if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
> +if (lba != lbasd->lba) {
>  ret = -EIO;
>  goto out_unlock;
>  }
> 
> -*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
> +*pnum = lbasd->num_blocks * iscsilun->block_size;
> 
>  if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED ||
>  lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) {
> @@ -743,15 +743,13 @@ retry:
>  }
> 
>  if (ret & BDRV_BLOCK_ZERO) {
> -iscsi_allocmap_set_unallocated(iscsilun, sector_num * 
> BDRV_SECTOR_SIZE,
> -   *pnum * BDRV_SECTOR_SIZE);
> +iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum);
>  } else {
> -iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
> - *pnum * BDRV_SECTOR_SIZE);
> +iscsi_allocmap_set_allocated(iscsilun, offset, *pnum);
>  }
> 
> -if (*pnum > nb_sectors) {
> -*pnum = nb_sectors;
> +if (*pnum > bytes) {
> +*pnum = bytes;
>  }
>  out_unlock:
>  qemu_mutex_unlock(>mutex);
> @@ -760,7 +758,7 @@ out:
>  if (iTask.task != NULL) {
>  scsi_free_scsi_task(iTask.task);
>  }
> -if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) {
> +if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) {
>  *file = bs;
>  }

Can file ever be NULL?

>  return ret;
> @@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState 
> *bs,
>   nb_sectors * BDRV_SECTOR_SIZE) &&

No iscsi_co_preadv() yet... :-(

>  !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE,
>   nb_sectors * BDRV_SECTOR_SIZE)) {
> -int pnum;
> -BlockDriverState *file;
> +int64_t pnum;
>  /* check the block status from the beginning of the 

Re: [Qemu-block] block_status automatically added flags

2018-02-14 Thread Vladimir Sementsov-Ogievskiy

13.02.2018 21:48, Eric Blake wrote:

On 02/13/2018 11:36 AM, Vladimir Sementsov-Ogievskiy wrote:

Hi Eric!

I'm now testing my nbd block status realization (block_status part, 
not about dirty bitmaps), and faced into the following effect.


I created empty qcow2 image and wrote to the first sector, so

qemu-io -c map x

reports:

64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)

But I can't get same results, when connecting to nbd server, 
exporting the same qcow2 file, I get


10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0)


Is this with or without your NBD_CMD_BLOCK_STATUS patches applied?  
And are you exposing the data over NBD as raw ('qemu-nbd -f 
qcow2'/'qemu-io -f raw') or as qcow2 ('qemu-nbd -f raw'/'qemu-io -f 
qcow2')?


with patches, as raw (server reads it as qcow2, so export is raw)




/me does a quick reproduction

Yes, I definitely see that behavior without any NBD_CMD_BLOCK_STATUS 
patches and when the image is exposed over NBD as raw, but not when 
exposed as qcow2, when testing the 2.11 release:


$ qemu-img create -f qcow2 file3 10M
Formatting 'file3', fmt=qcow2 size=10485760 cluster_size=65536 
lazy_refcounts=off refcount_bits=16

$ qemu-io -c 'w 0 64k' -c map -f qcow2 file3
wrote 65536/65536 bytes at offset 0
64 KiB, 1 ops; 0.0579 sec (1.079 MiB/sec and 17.2601 ops/sec)
64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)
$ qemu-nbd -f qcow2 -x foo file3
$ qemu-io -f raw -c map nbd://localhost:10809/foo
10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0)
$ qemu-nbd -f raw -x foo file3
$ qemu-io -f qcow2 -c map nbd://localhost:10809/foo
64 KiB (0x1) bytes allocated at offset 0 bytes (0x0)
9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1)

Right now, without NBD block status, the NBD driver reports the entire 
file as allocated, as it can't do any better (NBD has no backing file, 
and all data .  Presumably, once NBD_CMD_BLOCK_STATUS is implemented, 
we can then use that for more accurate information.


yes, I've done this, with my patches nbd client block driver 
get_block_status uses NBD_CMD_BLOCK_STATUS.







Finally, I understand the reason:

for local file, qemu-io calls bdrv_is_allocated, which calls 
bdrv_common_block_status_above with want_zero=false. So, it doesn't 
set BDRV_BLOCK_ZERO, and doesn't set BDRV_BLOCK_ALLOCATED.


'qemu-io map' is a bit unusual; it is the only UI that easily exposes 
bdrv_is_allocated() to the outside world ('qemu-img map' does not). 
(The fact that both operations are named 'map' but do something 
different is annoying; for back-compat reasons, we can't change 
qemu-img, and I don't know if changing qemu-io is worth it.)




And, even if we change want_zero to true,


Well, you'd do that by invoking bdrv_block_status() (via 'qemu-img 
map', for example).



Aha, thank you. Actually, qemu-img map --output=json works for me and 
shows block_status through nbd connection.






here, it will set BDRV_BLOCK_ZERO, but will not set 
BDRV_BLOCK_ALLOCATED, which contradicts with it's definition:


  BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
    layer (short for DATA || ZERO), set by block 
layer


This text is wrong; it gets fixed in my still-pending concluding 
series for byte-based block status:


https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00955.html


And even with my r-b. More than month ago, I've forgotten. What is the 
reason for such a long delay, don't you know?




Conceptually, BDRV_BLOCK_ALLOCATED means "is THIS layer of the backing 
chain responsible for the contents at this guest offset"; and there 
are cases where we know that we read zeroes but where the current 
layer is not responsible for the contents (such as a qcow2 that has a 
backing file with shorter length, where we return BDRV_BLOCK_ZERO but 
not BDRV_BLOCK_ALLOCATED).  But since NBD has no backing chain, the 
entire image is considered allocated. Meanwhile, asking whether 
something is allocated ('qemu-io -c map') is not the usual question 
you want to ask when determining what portions of a file are zero.





for nbd, we go through the similar way on server (but with want_zero 
= true), and we finally have BDRV_BLOCK_ZERO without 
BDRV_BLOCK_ALLOCATED, which maps to NBD_STATE_HOLE+NBD_STATE_ZERO. 
But then, in the client we have BDRV_BLOCK_ZERO not automatically 
added by block layer but directly from nbd driver, therefor 
BDRV_BLOCK_ALLOCATED is set and I get different result.


Drivers should never set BDRV_BLOCK_ALLOCATED; only the code in io.c 
should set it; and output based on BDRV_BLOCK_ALLOCATED is only useful 
in backing chain scenarios (which NBD does not have).





this all looks weird for me.

BDRV_BLOCK_ALLOCATED definition should be fixed, to show that this 
flag show only reported by driver