Re: [Qemu-block] [Qemu-devel] [PATCH 03/30] hw/block/nvme: include the "qemu/cutils.h" in the source file
On 15.02.2018 05:28, Philippe Mathieu-Daudé wrote: > where it is used. > > Signed-off-by: Philippe Mathieu-Daudé> --- > hw/block/nvme.h | 1 - > hw/block/nvme.c | 1 + > 2 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/block/nvme.h b/hw/block/nvme.h > index 8f3981121d..cabcf20c32 100644 > --- a/hw/block/nvme.h > +++ b/hw/block/nvme.h > @@ -1,6 +1,5 @@ > #ifndef HW_NVME_H > #define HW_NVME_H > -#include "qemu/cutils.h" > #include "block/nvme.h" > > typedef struct NvmeAsyncEvent { > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index 85d2406400..811084b6a7 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -35,6 +35,7 @@ > #include "sysemu/block-backend.h" > > #include "qemu/log.h" > +#include "qemu/cutils.h" > #include "trace.h" > #include "nvme.h" Reviewed-by: Thomas Huth
[Qemu-block] [PATCH 30/30] xen: use the BYTE-based definitions
It ease code review, unit is explicit. Signed-off-by: Philippe Mathieu-Daudé--- hw/block/xen_disk.c| 4 ++-- hw/xenpv/xen_domainbuild.c | 10 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c index f74fcd42d1..557005b5e5 100644 --- a/hw/block/xen_disk.c +++ b/hw/block/xen_disk.c @@ -1153,9 +1153,9 @@ static int blk_connect(struct XenDevice *xendev) } xen_pv_printf(xendev, 1, "type \"%s\", fileproto \"%s\", filename \"%s\"," - " size %" PRId64 " (%" PRId64 " MB)\n", + " size %" PRId64 " (%llu MB)\n", blkdev->type, blkdev->fileproto, blkdev->filename, - blkdev->file_size, blkdev->file_size >> 20); + blkdev->file_size, blkdev->file_size / M_BYTE); /* Fill in number of sector size and number of sectors */ xenstore_write_be_int(>xendev, "sector-size", blkdev->file_blk); diff --git a/hw/xenpv/xen_domainbuild.c b/hw/xenpv/xen_domainbuild.c index 027f76fad1..083fb80ee5 100644 --- a/hw/xenpv/xen_domainbuild.c +++ b/hw/xenpv/xen_domainbuild.c @@ -75,9 +75,9 @@ int xenstore_domain_init1(const char *kernel, const char *ramdisk, xenstore_write_str(dom, "vm", vm); /* memory */ -xenstore_write_int(dom, "memory/target", ram_size >> 10); // kB -xenstore_write_int(vm, "memory", ram_size >> 20); // MB -xenstore_write_int(vm, "maxmem", ram_size >> 20); // MB +xenstore_write_int(dom, "memory/target", ram_size * K_BYTE); +xenstore_write_int(vm, "memory", ram_size * M_BYTE); +xenstore_write_int(vm, "maxmem", ram_size * M_BYTE); /* cpus */ for (i = 0; i < smp_cpus; i++) { @@ -260,7 +260,7 @@ int xen_domain_build_pv(const char *kernel, const char *ramdisk, } #endif -rc = xc_domain_setmaxmem(xen_xc, xen_domid, ram_size >> 10); +rc = xc_domain_setmaxmem(xen_xc, xen_domid, ram_size / K_BYTE); if (rc < 0) { fprintf(stderr, "xen: xc_domain_setmaxmem() failed\n"); goto err; @@ -269,7 +269,7 @@ int xen_domain_build_pv(const char *kernel, const char *ramdisk, xenstore_port = xc_evtchn_alloc_unbound(xen_xc, xen_domid, 0); console_port = xc_evtchn_alloc_unbound(xen_xc, xen_domid, 0); -rc = xc_linux_build(xen_xc, xen_domid, ram_size >> 20, +rc = xc_linux_build(xen_xc, xen_domid, ram_size / M_BYTE, kernel, ramdisk, cmdline, 0, flags, xenstore_port, _mfn, -- 2.16.1
[Qemu-block] [PATCH 16/30] hw/sh4: use the BYTE-based definitions
It ease code review, unit is explicit. Signed-off-by: Philippe Mathieu-Daudé--- hw/block/tc58128.c | 2 +- hw/sh4/r2d.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/block/tc58128.c b/hw/block/tc58128.c index 1d9f7ee000..3e658d509f 100644 --- a/hw/block/tc58128.c +++ b/hw/block/tc58128.c @@ -26,7 +26,7 @@ typedef struct { static tc58128_dev tc58128_devs[2]; -#define FLASH_SIZE (16*1024*1024) +#define FLASH_SIZE (16 * M_BYTE) static void init_dev(tc58128_dev * dev, const char *filename) { diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c index 458ed83297..720bd6ad04 100644 --- a/hw/sh4/r2d.c +++ b/hw/sh4/r2d.c @@ -292,7 +292,7 @@ static void r2d_init(MachineState *machine) dinfo = drive_get(IF_PFLASH, 0, 0); pflash_cfi02_register(0x0, NULL, "r2d.flash", FLASH_SIZE, dinfo ? blk_by_legacy_dinfo(dinfo) : NULL, - (16 * 1024), FLASH_SIZE >> 16, + 16 * K_BYTE, FLASH_SIZE >> 16, 1, 4, 0x, 0x, 0x, 0x, 0x555, 0x2aa, 0); -- 2.16.1
[Qemu-block] [PATCH 03/30] hw/block/nvme: include the "qemu/cutils.h" in the source file
where it is used. Signed-off-by: Philippe Mathieu-Daudé--- hw/block/nvme.h | 1 - hw/block/nvme.c | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/block/nvme.h b/hw/block/nvme.h index 8f3981121d..cabcf20c32 100644 --- a/hw/block/nvme.h +++ b/hw/block/nvme.h @@ -1,6 +1,5 @@ #ifndef HW_NVME_H #define HW_NVME_H -#include "qemu/cutils.h" #include "block/nvme.h" typedef struct NvmeAsyncEvent { diff --git a/hw/block/nvme.c b/hw/block/nvme.c index 85d2406400..811084b6a7 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -35,6 +35,7 @@ #include "sysemu/block-backend.h" #include "qemu/log.h" +#include "qemu/cutils.h" #include "trace.h" #include "nvme.h" -- 2.16.1
[Qemu-block] [PATCH] nbd: Honor server's advertised minimum block size
Commit 79ba8c98 (v2.7) changed the setting of request_alignment to occur only during bdrv_refresh_limits(), rather than at at bdrv_open() time; but at the time, NBD was unaffected, because it still used sector-based callbacks, so the block layer defaulted NBD to use 512 request_alignment. Later, commit 70c4fb26 (also v2.7) changed NBD to use byte-based callbacks, without setting request_alignment. This resulted in NBD using request_alignment of 1, which works great when the server supports it (as is the case for qemu-nbd), but falls apart miserably if the server requires alignment (but only if qemu actually sends a sub-sector request; qemu-io can do it, but most qemu operations still perform on sectors or larger). Even later, the NBD protocol was updated to document that clients should learn the server's minimum alignment during NBD_OPT_GO; and recommended that clients should assume a minimum size of 512 unless the server understands NBD_OPT_GO and replied with a smaller size. Commit 081dd1fe (v2.10) attempted to do that, by assigning request_alignment to whatever was learned from the server; but it has two flaws: the assignment is done during bdrv_open() so it gets unconditionally wiped out back to 1 during any later bdrv_refresh_limits(); and the code is not using a default of 512 when the server did not report a minimum size. Fix these issues by moving the assignment to request_alignment to the right function, and by using a sane default when the server does not advertise a minimum size. CC: qemu-sta...@nongnu.org Signed-off-by: Eric Blake--- block/nbd-client.c | 3 --- block/nbd.c| 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/block/nbd-client.c b/block/nbd-client.c index 9206652e45c..7b68499b76a 100644 --- a/block/nbd-client.c +++ b/block/nbd-client.c @@ -846,9 +846,6 @@ int nbd_client_init(BlockDriverState *bs, if (client->info.flags & NBD_FLAG_SEND_WRITE_ZEROES) { bs->supported_zero_flags |= BDRV_REQ_MAY_UNMAP; } -if (client->info.min_block > bs->bl.request_alignment) { -bs->bl.request_alignment = client->info.min_block; -} qemu_co_mutex_init(>send_mutex); qemu_co_queue_init(>free_sema); diff --git a/block/nbd.c b/block/nbd.c index ef81a9f53ba..69b5fd5e8fa 100644 --- a/block/nbd.c +++ b/block/nbd.c @@ -474,8 +474,10 @@ static int nbd_co_flush(BlockDriverState *bs) static void nbd_refresh_limits(BlockDriverState *bs, Error **errp) { NBDClientSession *s = nbd_get_client_session(bs); +uint32_t min = s->info.min_block; uint32_t max = MIN_NON_ZERO(NBD_MAX_BUFFER_SIZE, s->info.max_block); +bs->bl.request_alignment = min ? min : BDRV_SECTOR_SIZE; bs->bl.max_pdiscard = max; bs->bl.max_pwrite_zeroes = max; bs->bl.max_transfer = max; -- 2.14.3
Re: [Qemu-block] [Qemu-devel] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking
On 2018-02-14 22:11, Eric Blake wrote: > On 02/14/2018 02:49 PM, Max Reitz wrote: >> At runtime (that is, during a future ssh_truncate()), the SSH session is >> non-blocking. However, ssh_truncate() (or rather, bdrv_truncate() in >> general) is not a coroutine, so this resize operation needs to block. >> >> For ssh_create(), that is fine, too; the session is never set to >> non-blocking anyway. >> >> Signed-off-by: Max Reitz>> --- >> block/ssh.c | 7 +++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/block/ssh.c b/block/ssh.c >> index 964e55f7fe..ff8576f21e 100644 >> --- a/block/ssh.c >> +++ b/block/ssh.c >> @@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, >> QDict *options, int bdrv_flags, >> return ret; >> } >> +/* Note: This is a blocking operation */ >> static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp) >> { >> ssize_t ret; >> char c[1] = { '\0' }; >> + int was_blocking = libssh2_session_get_blocking(s->session); >> /* offset must be strictly greater than the current size so we do >> * not overwrite anything */ >> assert(offset > 0 && offset > s->attrs.filesize); >> + libssh2_session_set_blocking(s->session, 1); >> + >> libssh2_sftp_seek64(s->sftp_handle, offset - 1); >> ret = libssh2_sftp_write(s->sftp_handle, c, 1); >> + >> + libssh2_session_set_blocking(s->session, was_blocking); > > Is it worth skipping the two libssh2_session_set_blocking() calls if > was_blocking is 1? But that's a micro-optimization that probably won't > be noticeable, so I'm also fine with unconditional. I was hoping libssh2 is clever enough for that itself. :-) > Reviewed-by: Eric Blake Thanks! Max signature.asc Description: OpenPGP digital signature
Re: [Qemu-block] [Qemu-devel] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking
On 02/14/2018 02:49 PM, Max Reitz wrote: At runtime (that is, during a future ssh_truncate()), the SSH session is non-blocking. However, ssh_truncate() (or rather, bdrv_truncate() in general) is not a coroutine, so this resize operation needs to block. For ssh_create(), that is fine, too; the session is never set to non-blocking anyway. Signed-off-by: Max Reitz--- block/ssh.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block/ssh.c b/block/ssh.c index 964e55f7fe..ff8576f21e 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags, return ret; } +/* Note: This is a blocking operation */ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp) { ssize_t ret; char c[1] = { '\0' }; +int was_blocking = libssh2_session_get_blocking(s->session); /* offset must be strictly greater than the current size so we do * not overwrite anything */ assert(offset > 0 && offset > s->attrs.filesize); +libssh2_session_set_blocking(s->session, 1); + libssh2_sftp_seek64(s->sftp_handle, offset - 1); ret = libssh2_sftp_write(s->sftp_handle, c, 1); + +libssh2_session_set_blocking(s->session, was_blocking); Is it worth skipping the two libssh2_session_set_blocking() calls if was_blocking is 1? But that's a micro-optimization that probably won't be noticeable, so I'm also fine with unconditional. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block/ssh: Add basic .bdrv_truncate()
On 02/14/2018 02:49 PM, Max Reitz wrote: libssh2 does not seem to offer real truncation support, so we can only grow files -- but that is better than nothing. Signed-off-by: Max Reitz--- block/ssh.c | 24 1 file changed, 24 insertions(+) Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH 1/3] block/ssh: Pull ssh_grow_file() from ssh_create()
On 02/14/2018 02:49 PM, Max Reitz wrote: If we ever want to offer even rudimentary truncation functionality for ssh, we should put the respective code into a reusable function. Signed-off-by: Max Reitz--- block/ssh.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) +++ b/block/ssh.c @@ -803,6 +803,26 @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags, return ret; } +static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp) +{ +ssize_t ret; +char c[1] = { '\0' }; Could spell this 'char c[1] = "";', but you just did code motion. Reviewed-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [RFC PATCH 0/2] s/size/entries/ when dealing with non-byte units
On 2018-02-14 00:33, Eric Blake wrote: > I mentioned this while reviewing Berto's series on L2 slice handling; > this is a first cut at patches that I think are worth doing throughout > the qcow2 code base if we like the idea. I like the idea. :-) The patches look good to me. Max > Eric Blake (2): > qcow2: Prefer 'entries' over 'size' for non-byte values in spec > qcow2: Prefer 'entries' over 'size' during cache creation > > docs/interop/qcow2.txt | 4 ++-- > block/qcow2.h | 4 ++-- > block/qcow2.c | 21 +++-- > 3 files changed, 15 insertions(+), 14 deletions(-) > signature.asc Description: OpenPGP digital signature
[Qemu-block] [PATCH 3/3] block/ssh: Add basic .bdrv_truncate()
libssh2 does not seem to offer real truncation support, so we can only grow files -- but that is better than nothing. Signed-off-by: Max Reitz--- block/ssh.c | 24 1 file changed, 24 insertions(+) diff --git a/block/ssh.c b/block/ssh.c index ff8576f21e..c235eec255 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -1219,6 +1219,29 @@ static int64_t ssh_getlength(BlockDriverState *bs) return length; } +static int ssh_truncate(BlockDriverState *bs, int64_t offset, +PreallocMode prealloc, Error **errp) +{ +BDRVSSHState *s = bs->opaque; + +if (prealloc != PREALLOC_MODE_OFF) { +error_setg(errp, "Unsupported preallocation mode '%s'", + PreallocMode_str(prealloc)); +return -ENOTSUP; +} + +if (offset < s->attrs.filesize) { +error_setg(errp, "ssh driver does not support shrinking files"); +return -ENOTSUP; +} + +if (offset == s->attrs.filesize) { +return 0; +} + +return ssh_grow_file(s, offset, errp); +} + static BlockDriver bdrv_ssh = { .format_name = "ssh", .protocol_name= "ssh", @@ -1231,6 +1254,7 @@ static BlockDriver bdrv_ssh = { .bdrv_co_readv= ssh_co_readv, .bdrv_co_writev = ssh_co_writev, .bdrv_getlength = ssh_getlength, +.bdrv_truncate= ssh_truncate, .bdrv_co_flush_to_disk= ssh_co_flush, .create_opts = _create_opts, }; -- 2.14.3
[Qemu-block] [PATCH 2/3] block/ssh: Make ssh_grow_file() blocking
At runtime (that is, during a future ssh_truncate()), the SSH session is non-blocking. However, ssh_truncate() (or rather, bdrv_truncate() in general) is not a coroutine, so this resize operation needs to block. For ssh_create(), that is fine, too; the session is never set to non-blocking anyway. Signed-off-by: Max Reitz--- block/ssh.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/block/ssh.c b/block/ssh.c index 964e55f7fe..ff8576f21e 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -803,17 +803,24 @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags, return ret; } +/* Note: This is a blocking operation */ static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp) { ssize_t ret; char c[1] = { '\0' }; +int was_blocking = libssh2_session_get_blocking(s->session); /* offset must be strictly greater than the current size so we do * not overwrite anything */ assert(offset > 0 && offset > s->attrs.filesize); +libssh2_session_set_blocking(s->session, 1); + libssh2_sftp_seek64(s->sftp_handle, offset - 1); ret = libssh2_sftp_write(s->sftp_handle, c, 1); + +libssh2_session_set_blocking(s->session, was_blocking); + if (ret < 0) { sftp_error_setg(errp, s, "Failed to grow file"); return -EIO; -- 2.14.3
[Qemu-block] [PATCH 0/3] block/ssh: Add basic .bdrv_truncate()
For (x-)blockdev-create, all protocol drivers that support image creation also need to offer a .bdrv_truncate() implementation that matches in features. A previous series of mine brought gluster's and sheepdog's implementation up to par regarding preallocated truncation; but I forgot about drivers that have a .bdrv_create() but no .bdrv_truncate() at all. There is only one of these, and that is ssh. Since libssh2 does not seem to know any way of truncating files, we can only support growing files -- but that is what we need for (x-)blockdev-create. Note that there are also drivers which do not support growing files, namely iscsi and file-posix for host devices (maybe more? I hope not). But these also pretty much ignore the specified size on .bdrv_create() and just use the size of the existing device. They just check that the specified size does not exceed the actual size, so that pretty much matches what their .bdrv_truncate() supports, and we should be fine there. Max Reitz (3): block/ssh: Pull ssh_grow_file() from ssh_create() block/ssh: Make ssh_grow_file() blocking block/ssh: Add basic .bdrv_truncate() block/ssh.c | 61 + 1 file changed, 53 insertions(+), 8 deletions(-) -- 2.14.3
[Qemu-block] [PATCH 1/3] block/ssh: Pull ssh_grow_file() from ssh_create()
If we ever want to offer even rudimentary truncation functionality for ssh, we should put the respective code into a reusable function. Signed-off-by: Max Reitz--- block/ssh.c | 30 ++ 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/block/ssh.c b/block/ssh.c index b63addcf94..964e55f7fe 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -803,6 +803,26 @@ static int ssh_file_open(BlockDriverState *bs, QDict *options, int bdrv_flags, return ret; } +static int ssh_grow_file(BDRVSSHState *s, int64_t offset, Error **errp) +{ +ssize_t ret; +char c[1] = { '\0' }; + +/* offset must be strictly greater than the current size so we do + * not overwrite anything */ +assert(offset > 0 && offset > s->attrs.filesize); + +libssh2_sftp_seek64(s->sftp_handle, offset - 1); +ret = libssh2_sftp_write(s->sftp_handle, c, 1); +if (ret < 0) { +sftp_error_setg(errp, s, "Failed to grow file"); +return -EIO; +} + +s->attrs.filesize = offset; +return 0; +} + static QemuOptsList ssh_create_opts = { .name = "ssh-create-opts", .head = QTAILQ_HEAD_INITIALIZER(ssh_create_opts.head), @@ -822,8 +842,6 @@ static int ssh_create(const char *filename, QemuOpts *opts, Error **errp) int64_t total_size = 0; QDict *uri_options = NULL; BDRVSSHState s; -ssize_t r2; -char c[1] = { '\0' }; ssh_state_init(); @@ -849,14 +867,10 @@ static int ssh_create(const char *filename, QemuOpts *opts, Error **errp) } if (total_size > 0) { -libssh2_sftp_seek64(s.sftp_handle, total_size-1); -r2 = libssh2_sftp_write(s.sftp_handle, c, 1); -if (r2 < 0) { -sftp_error_setg(errp, , "truncate failed"); -ret = -EINVAL; +ret = ssh_grow_file(, total_size, errp); +if (ret < 0) { goto out; } -s.attrs.filesize = total_size; } ret = 0; -- 2.14.3
Re: [Qemu-block] [PATCH v3 1/2] iotest 033: add misaligned write-zeroes test via truncate
On 02/14/2018 10:09 AM, Anton Nefedov wrote: This new test case only makes sense for qcow2 while iotest 033 is generic; however it matches the test purpose perfectly and also 033 contains those do_test() tricks to pass the alignment, which won't look nice being duplicated in other tests or moved to the common code. Signed-off-by: Anton Nefedov--- tests/qemu-iotests/033 | 29 + tests/qemu-iotests/033.out | 13 + 2 files changed, 42 insertions(+) +# only interested in qcow2 here; also other formats might respond with +# "not supported" error message +if [ $IMGFMT = "qcow2" ]; then +do_test 512 "truncate $L2_COVERAGE" "$TEST_IMG" | _filter_qemu_io +fi But without an else branch that echoes the same text as the if branch generates, your .out file is now broken for other image formats. Or does 'qemu-io -c truncate' not produce output? /me goes and tests... Okay, looks like truncate is silent; and that the truncation (or skipping of the truncation) doesn't affect things. Tested-by: Eric Blake -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [Qemu-devel] [PATCH v8 02/21] nvme: Drop pointless .bdrv_co_get_block_status()
On 02/13/2018 05:26 PM, Eric Blake wrote: > Commit bdd6a90 has a bug: drivers should never directly set > BDRV_BLOCK_ALLOCATED, but only io.c should do that (as needed). Doesn't "pointless" in subject hide this is a bugfix? > Instead, drivers should report BDRV_BLOCK_DATA if it knows that > data comes from this BDS. > > But let's look at the bigger picture: semantically, the nvme > driver is similar to the nbd, null, and raw drivers (no backing > file, all data comes from this BDS). But while two of those > other drivers have to supply the callback (null because it can > special-case BDRV_BLOCK_ZERO, raw because it can special-case > a different offset), in this case the block layer defaults are > good enough without the callback at all (similar to nbd). > > So, fix the bug by deletion ;) > > Signed-off-by: Eric Blake> > --- > v8: new patch > --- > block/nvme.c | 14 -- > 1 file changed, 14 deletions(-) > > diff --git a/block/nvme.c b/block/nvme.c > index 10bffbbf2f4..4e561b08df3 100644 > --- a/block/nvme.c > +++ b/block/nvme.c > @@ -1068,18 +1068,6 @@ static int nvme_reopen_prepare(BDRVReopenState > *reopen_state, > return 0; > } > > -static int64_t coroutine_fn nvme_co_get_block_status(BlockDriverState *bs, > - int64_t sector_num, > - int nb_sectors, int > *pnum, > - BlockDriverState **file) > -{ > -*pnum = nb_sectors; > -*file = bs; > - > -return BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_OFFSET_VALID | > - (sector_num << BDRV_SECTOR_BITS); > -} > - > static void nvme_refresh_filename(BlockDriverState *bs, QDict *opts) > { > QINCREF(opts); > @@ -1179,8 +1167,6 @@ static BlockDriver bdrv_nvme = { > .bdrv_co_flush_to_disk= nvme_co_flush, > .bdrv_reopen_prepare = nvme_reopen_prepare, > > -.bdrv_co_get_block_status = nvme_co_get_block_status, > - > .bdrv_refresh_filename= nvme_refresh_filename, > .bdrv_refresh_limits = nvme_refresh_limits, >
Re: [Qemu-block] [PATCH v3 0/2] block: fix write with zero flag set and iovector provided
Am 14.02.2018 um 17:09 hat Anton Nefedov geschrieben: > v3: patch 1: image cluster size reduced to get away with a smaller test image > (the cluster size can be as small as 512 bytes for qcow2, > but the test runs for all generic formats and minimum for qed is 4k) Thanks, applied to the block branch (in reverse order, tests should always pass). Kevin
Re: [Qemu-block] [PATCH v8 00/21] add byte-based block_status driver callbacks
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: > There are patches floating around to add NBD_CMD_BLOCK_STATUS, > but NBD wants to report status on byte granularity (even if the > reporting will probably be naturally aligned to sectors or even > much higher levels). I've therefore started the task of > converting our block status code to report at a byte granularity > rather than sectors. > > These patches have been around for a while, but it's time to > finish it now that 2.12 is open for patches. Thanks, touched up patch 8 and applied to the block branch. Kevin
[Qemu-block] [PATCH v3 0/2] block: fix write with zero flag set and iovector provided
v3: patch 1: image cluster size reduced to get away with a smaller test image (the cluster size can be as small as 512 bytes for qcow2, but the test runs for all generic formats and minimum for qed is 4k) v2: http://lists.nongnu.org/archive/html/qemu-devel/2018-02/msg03016.html Anton Nefedov (2): iotest 033: add misaligned write-zeroes test via truncate block: fix write with zero flag set and iovector provided block/io.c | 2 +- tests/qemu-iotests/033 | 29 + tests/qemu-iotests/033.out | 13 + 3 files changed, 43 insertions(+), 1 deletion(-) -- 2.7.4
[Qemu-block] [PATCH v3 2/2] block: fix write with zero flag set and iovector provided
The normal bdrv_co_pwritev() use is either - BDRV_REQ_ZERO_WRITE clear and iovector provided - BDRV_REQ_ZERO_WRITE set and iovector == NULL while - the flag clear and iovector == NULL is an assertion failure in bdrv_co_do_zero_pwritev() - the flag set and iovector provided is in fact allowed (the flag prevails and zeroes are written) However the alignment logic does not support the latter case so the padding areas get overwritten with zeroes. Currently, general functions like bdrv_rw_co() do provide iovector regardless of flags. So, keep it supported and use bdrv_co_do_zero_pwritev() alignment for it which also makes the code a bit more obvious anyway. Signed-off-by: Anton NefedovReviewed-by: Eric Blake Reviewed-by: Alberto Garcia --- block/io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/io.c b/block/io.c index 89d0745..40df3be 100644 --- a/block/io.c +++ b/block/io.c @@ -1701,7 +1701,7 @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child, */ tracked_request_begin(, bs, offset, bytes, BDRV_TRACKED_WRITE); -if (!qiov) { +if (flags & BDRV_REQ_ZERO_WRITE) { ret = bdrv_co_do_zero_pwritev(child, offset, bytes, flags, ); goto out; } -- 2.7.4
Re: [Qemu-block] [PATCH v8 20/21] vvfat: Switch to .bdrv_co_block_status()
Am 14.02.2018 um 15:50 hat Eric Blake geschrieben: > On 02/14/2018 07:12 AM, Kevin Wolf wrote: > > Am 13.02.2018 um 21:27 hat Eric Blake geschrieben: > > > We are gradually moving away from sector-based interfaces, towards > > > byte-based. Update the vvfat driver accordingly. Note that we > > > can rely on the block driver having already clamped limits to our > > > block size, and simplify accordingly. > > > > > > Signed-off-by: Eric Blake> > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > > Reviewed-by: Fam Zheng > > > > { > > > -*n = bs->total_sectors - sector_num; > > > -if (*n > nb_sectors) { > > > -*n = nb_sectors; > > > -} else if (*n < 0) { > > > -return 0; > > > -} > > > +*n = bytes; > > > return BDRV_BLOCK_DATA; > > > } > > > > Preexisting, but this is inconsistent with other protocol drivers as far > > as OFFSET_VALID is concerned (as I hinted in another mail, I like it > > better this way, but it's still inconsistent). > > > > Do we actually need any callback here or could the solution be to simply > > remove it like with nvme? > > Does that mean io.c's defaults for protocol drivers is wrong? It defaults > to setting OFFSET_VALID and *map on all protocol drivers without a callback > (at least nvme, nbd); I didn't delete this callback because I noticed the > difference in return value, and couldn't justify whether it was intentional. > Also, vvfat is weird - it is somewhat of a format driver, rather than just a > protocol; even though it sets .protocol_name. It may be possible for vvfat > to actually set OFFSET_VALID to particular offsets within the host file that > correspond to what the guest would read, where it is not a simple 1:1 > mapping, precisely because it is imposing format on the host file. However, > vvfat is one of those things that I try to avoid as much as possible, > because it is just so weird. As I said in my reply to the null block driver, OFFSET_VALID doesn't really make sense for protocol drivers anyway. Making use of it with vvfat isn't any more practical than directly accessing the undefined data of the null driver. I think the unwritten rule at the moment is that protocols should always set OFFSET_VALID and *file = bs (even though it doesn't make sense). So with the current interface, I'd consider deleting the callback a vvfat fix. I also think that we should possibly look into changing the interface so that protocols don't set OFFSET_VALID and *file, but then the default handling would change too, and deleting the callback in vvfat would still be right. As this is preexisting, I'm okay with just merging the series as it is, and then we can handle this while dealing with the more fundamental question what protocol drivers should return in general. Kevin
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
Am 14.02.2018 um 15:44 hat Eric Blake geschrieben: > On 02/14/2018 06:05 AM, Kevin Wolf wrote: > > Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: > > > We are gradually moving away from sector-based interfaces, towards > > > byte-based. Update the null driver accordingly. > > > > > > Signed-off-by: Eric Blake> > > Reviewed-by: Vladimir Sementsov-Ogievskiy > > > Reviewed-by: Fam Zheng > > > > > > > if (s->read_zeroes) { > > > -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; > > > -} else { > > > -return BDRV_BLOCK_OFFSET_VALID | start; > > > +ret |= BDRV_BLOCK_ZERO; > > > } > > > +return ret; > > > } > > > > Preexisting, but I think this return value is wrong. OFFSET_VALID > > without DATA is to documented to have the following semantics: > > > > * DATA ZERO OFFSET_VALID > > * ftt sectors preallocated, read as zero, returned > > file not > > *necessarily zero at offset > > * fft sectors preallocated but read from backing_hd, > > *returned file contains garbage at offset > > > > I'm not sure what OFFSET_VALID is even supposed to mean for null. > > Yeah, and I was even thinking about that a bit yesterday when figuring out > what to do with nvme. It does highlight the fact that you get garbage when > reading from the null driver (unless the zero option was enabled, then ZERO > is set and you know you read zeros instead) - but there no pointer that is > preallocated (whether it contains garbage or otherwise) that you can > actually dereference to read what the guest would see. > > > > > Or in fact, what it is supposed to mean for any protocol driver, because > > normally it just means I can use this offset for accessing bs->file. But > > protocol drivers don't have a bs->file, so it's interesting to see that > > they still all set this flag. > > > > OFFSET_VALID | DATA might be excusable because I can see that it's > > convenient that a protocol driver refers to itself as *file instead of > > returning NULL there and then the offset is valid (though it would be > > pointless to actually follow the file pointer), but OFFSET_VALID without > > DATA probably isn't. > > Hmm, you're probably right. Maybe that means I should tweak the > documentation to be more explicit: for a format driver, OFFSET_VALID can > always be used (and *file will be set to the underlying protocol driver); > but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS > itself and there is an actual buffer to read (that is, the protocol driver > must also be returning DATA and/or ZERO). Or maybe we can indeed state that > protocol drivers always set *file to NULL (there is no further backing file > to reference), and thus never need to return OFFSET_VALID (but I'm not sure > whether that will accidentally propagate back up the call stack and > negatively affect status queries of format drivers). > > Since it is pre-existing, should I respin to address the issue in a separate > patch, or should that be a followup after this series? It's a more fundamental question that shouldn't hold up this series. I just wanted to raise it while I was looking at it. So yes, a followup is fine. Kevin
Re: [Qemu-block] [PATCH v8 19/21] vpc: Switch to .bdrv_co_block_status()
On 02/14/2018 07:08 AM, Kevin Wolf wrote: Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: We are gradually moving away from sector-based interfaces, towards byte-based. Update the vpc driver accordingly. Signed-off-by: Eric BlakeReviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- +allocated = (image_offset != -1); *pnum = 0; ret = 0; do { /* All sectors in a block are contiguous (without using the bitmap) */ -n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE) - - sector_num; -n = MIN(n, nb_sectors); +n = ROUND_UP(offset + 1, s->block_size) - offset; +n = MIN(n, bytes); *pnum += n; -sector_num += n; -nb_sectors -= n; +offset += n; +bytes -= n; /* *pnum can't be greater than one block for allocated * sectors since there is always a bitmap in between. */ if (allocated) { *file = bs->file->bs; -ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start; +*map = image_offset; This does work, but only because the loop isn't actually looping for allocated == true. In the old code, it was obvious that start was assigned only once and never changed during the loop, but image_offset changes in each loop iteration. It would probably be cleaner and more obviously correct to move the assignment of *map to before the loop. Yes, that would be a bit nicer. Kevin -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] qcow2 images corruption
https://framadrop.org/r/Lvvr392QZo#/wOeYUUlHQAtkUw1E+x2YdqTqq21Pbic6OPBIH0TjZE= Le 14/02/2018 à 00:01, John Snow a écrit : On 02/13/2018 04:41 AM, Kevin Wolf wrote: Am 07.02.2018 um 18:06 hat Nicolas Ecarnot geschrieben: TL; DR : qcow2 images keep getting corrupted. Any workaround? Not without knowing the cause. The first thing to make sure is that the image isn't touched by a second process while QEMU is running a VM. The classic one is using 'qemu-img snapshot' on the image of a running VM, which is instant corruption (and newer QEMU versions have locking in place to prevent this), but we have seen more absurd cases of things outside QEMU tampering with the image when we were investigating previous corruption reports. This covers the majority of all reports, we haven't had a real corruption caused by a QEMU bug in ages. After having found (https://access.redhat.com/solutions/1173623) the right logical volume hosting the qcow2 image, I can run qemu-img check on it. - On 80% of my VMs, I find no errors. - On 15% of them, I find Leaked cluster errors that I can correct using "qemu-img check -r all" - On 5% of them, I find Leaked clusters errors and further fatal errors, which can not be corrected with qemu-img. In rare cases, qemu-img can correct them, but destroys large parts of the image (becomes unusable), and on other cases it can not correct them at all. It would be good if you could make the 'qemu-img check' output available somewhere. It would be even better if we could have a look at the respective image. I seem to remember that John (CCed) had a few scripts to analyse corrupted qcow2 images, maybe we would be able to see something there. Hi! I did write a pretty simplistic tool for trying to tell the shape of a corruption at a glance. It seems to work pretty similarly to the other tool you already found, but it won't hurt anything to run it: https://github.com/jnsnow/qcheck (Actually, that other tool looks like it has an awful lot of options. I'll have to check it out.) It can print a really upsetting amount of data (especially for very corrupt images), but in the default case, the simple setting should do the trick just fine. You could always put the output from this tool in a pastebin too; it might help me visualize the problem a bit more -- I find seeing the exact offsets and locations of where all the various tables and things to be pretty helpful. You can also always use the "deluge" option and compress it if you want, just don't let it print to your terminal: jsnow@probe (dev) ~/s/qcheck> ./qcheck -xd /home/bos/jsnow/src/qemu/bin/git/install_test_f26.qcow2 > deluge.log; and ls -sh deluge.log 4.3M deluge.log but it compresses down very well: jsnow@probe (dev) ~/s/qcheck> 7z a -t7z -m0=ppmd deluge.ppmd.7z deluge.log jsnow@probe (dev) ~/s/qcheck> ls -s deluge.ppmd.7z 316 deluge.ppmd.7z So I suppose if you want to send along: (1) The basic output without any flags, in a pastebin (2) The zipped deluge output, just in case and I will try my hand at guessing what went wrong. (Also, maybe my tool will totally choke for your image, who knows. It hasn't received an overwhelming amount of testing apart from when I go to use it personally and inevitably wind up displeased with how it handles certain situations, so ...) What I read similar to my case is : - usage of qcow2 - heavy disk I/O - using the virtio-blk driver In the proxmox thread, they tend to say that using virtio-scsi is the solution. Having asked this question to oVirt experts (https://lists.ovirt.org/pipermail/users/2018-February/086753.html) but it's not clear the driver is to blame. This seems very unlikely. The corruption you're seeing is in the qcow2 metadata, not only in the guest data. If anything, virtio-scsi exercises more qcow2 code paths than virtio-blk, so any potential bug that affects virtio-blk should also affect virtio-scsi, but not the other way around. I agree with the answer Yaniv Kaul gave to me, saying I have to properly report the issue, so I'm longing to know which peculiar information I can give you now. To be honest, debugging corruption after the fact is pretty hard. We'd need the 'qemu-img check' output and ideally the image to do anything, but I can't promise that anything would come out of this. Best would be a reproducer, or at least some operation that you can link to the appearance of the corruption. Then we could take a more targeted look at the respective code. As you can imagine, all this setup is in production, and for most of the VMs, I can not "play" with them. Moreover, we launched a campaign of nightly stopping every VM, qemu-img check them one by one, then boot. So it might take some time before I find another corrupted image. (which I'll preciously store for debug) Other informations : We very rarely do snapshots, but I'm close to imagine that automated migrations of VMs could trigger similar behaviors on qcow2 images. To my knowledge, oVirt only uses
Re: [Qemu-block] [PATCH v8 09/21] null: Switch to .bdrv_co_block_status()
On 02/14/2018 06:05 AM, Kevin Wolf wrote: Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: We are gradually moving away from sector-based interfaces, towards byte-based. Update the null driver accordingly. Signed-off-by: Eric BlakeReviewed-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng if (s->read_zeroes) { -return BDRV_BLOCK_OFFSET_VALID | start | BDRV_BLOCK_ZERO; -} else { -return BDRV_BLOCK_OFFSET_VALID | start; +ret |= BDRV_BLOCK_ZERO; } +return ret; } Preexisting, but I think this return value is wrong. OFFSET_VALID without DATA is to documented to have the following semantics: * DATA ZERO OFFSET_VALID * ftt sectors preallocated, read as zero, returned file not *necessarily zero at offset * fft sectors preallocated but read from backing_hd, *returned file contains garbage at offset I'm not sure what OFFSET_VALID is even supposed to mean for null. Yeah, and I was even thinking about that a bit yesterday when figuring out what to do with nvme. It does highlight the fact that you get garbage when reading from the null driver (unless the zero option was enabled, then ZERO is set and you know you read zeros instead) - but there no pointer that is preallocated (whether it contains garbage or otherwise) that you can actually dereference to read what the guest would see. Or in fact, what it is supposed to mean for any protocol driver, because normally it just means I can use this offset for accessing bs->file. But protocol drivers don't have a bs->file, so it's interesting to see that they still all set this flag. OFFSET_VALID | DATA might be excusable because I can see that it's convenient that a protocol driver refers to itself as *file instead of returning NULL there and then the offset is valid (though it would be pointless to actually follow the file pointer), but OFFSET_VALID without DATA probably isn't. Hmm, you're probably right. Maybe that means I should tweak the documentation to be more explicit: for a format driver, OFFSET_VALID can always be used (and *file will be set to the underlying protocol driver); but for a protocol driver, OFFSET_VALID only makes sense if *file is the BDS itself and there is an actual buffer to read (that is, the protocol driver must also be returning DATA and/or ZERO). Or maybe we can indeed state that protocol drivers always set *file to NULL (there is no further backing file to reference), and thus never need to return OFFSET_VALID (but I'm not sure whether that will accidentally propagate back up the call stack and negatively affect status queries of format drivers). Since it is pre-existing, should I respin to address the issue in a separate patch, or should that be a followup after this series? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
[Qemu-block] BLOCK_STATUS extension
Hi all. Just note: looks like we allow zero-sized metadata context name. Is it ok? * |NBD_REP_META_CONTEXT| (4) A description of a metadata context. Data: o 32 bits, NBD metadata context ID. o String, name of the metadata context. This is not required to be a human-readable string, but it MUST be valid UTF-8 data. -- Best regards, Vladimir
Re: [Qemu-block] [PATCH v8 08/21] iscsi: Switch to .bdrv_co_block_status()
On 02/14/2018 05:53 AM, Kevin Wolf wrote: Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: We are gradually moving away from sector-based interfaces, towards byte-based. Update the iscsi driver accordingly. In this case, it is handy to teach iscsi_co_block_status() to handle a NULL map and file parameter, even though the block layer passes non-NULL values, because we also call the function directly. For now, there are no optimizations done based on the want_zero flag. [1] We can also make the simplification of asserting that the block layer passed in aligned values. Signed-off-by: Eric BlakeReviewed-by: Fam Zheng /* default to all sectors allocated */ -ret = BDRV_BLOCK_DATA; -ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID; -*pnum = nb_sectors; +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; +if (map) { +*map = offset; +} Can map ever be NULL? You didn't have that check in other drivers. The documentation in block_int.h states that io.c never passes NULL for map. However, see my commit message [1], and the code below [2], for why THIS driver has to check for NULL. @@ -760,7 +758,7 @@ out: if (iTask.task != NULL) { scsi_free_scsi_task(iTask.task); } -if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) { +if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) { *file = bs; } Can file ever be NULL? Ditto. return ret; @@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState *bs, nb_sectors * BDRV_SECTOR_SIZE) && No iscsi_co_preadv() yet... :-( Yeah, but that's for another day. !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE)) { -int pnum; -BlockDriverState *file; +int64_t pnum; /* check the block status from the beginning of the cluster * containing the start sector */ -int cluster_sectors = iscsilun->cluster_size >> BDRV_SECTOR_BITS; -int head; -int64_t ret; +int64_t head; +int ret; -assert(cluster_sectors); -head = sector_num % cluster_sectors; -ret = iscsi_co_get_block_status(bs, sector_num - head, -BDRV_REQUEST_MAX_SECTORS, , -); +assert(iscsilun->cluster_size); +head = (sector_num * BDRV_SECTOR_SIZE) % iscsilun->cluster_size; +ret = iscsi_co_block_status(bs, false, +sector_num * BDRV_SECTOR_SIZE - head, +BDRV_REQUEST_MAX_BYTES, , NULL, NULL); [2] This is the reason that THIS driver has to check for NULL, even though the block layer never passes NULL. It doesn't make a difference with your current implementation because it ignores want_zero, but consistent with your approach that want_zero=false returns just that everyhting is allocated for drivers without support for backing files, I think you want want_zero=true here. Makes sense. If that's the only tweak, can you make it while taking the series, or will I need to respin? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: [Qemu-block] [PATCH v8 20/21] vvfat: Switch to .bdrv_co_block_status()
Am 13.02.2018 um 21:27 hat Eric Blake geschrieben: > We are gradually moving away from sector-based interfaces, towards > byte-based. Update the vvfat driver accordingly. Note that we > can rely on the block driver having already clamped limits to our > block size, and simplify accordingly. > > Signed-off-by: Eric Blake> Reviewed-by: Vladimir Sementsov-Ogievskiy > Reviewed-by: Fam Zheng > > --- > v5-v7: no change > v4: rebase to interface tweak > v3: no change > v2: rebase to earlier changes, simplify > --- > block/vvfat.c | 16 +++- > 1 file changed, 7 insertions(+), 9 deletions(-) > > diff --git a/block/vvfat.c b/block/vvfat.c > index 7e06ebacf61..4a17a49e128 100644 > --- a/block/vvfat.c > +++ b/block/vvfat.c > @@ -3088,15 +3088,13 @@ vvfat_co_pwritev(BlockDriverState *bs, uint64_t > offset, uint64_t bytes, > return ret; > } > > -static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs, > -int64_t sector_num, int nb_sectors, int *n, BlockDriverState **file) > +static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs, > + bool want_zero, int64_t offset, > + int64_t bytes, int64_t *n, > + int64_t *map, > + BlockDriverState **file) > { > -*n = bs->total_sectors - sector_num; > -if (*n > nb_sectors) { > -*n = nb_sectors; > -} else if (*n < 0) { > -return 0; > -} > +*n = bytes; > return BDRV_BLOCK_DATA; > } Preexisting, but this is inconsistent with other protocol drivers as far as OFFSET_VALID is concerned (as I hinted in another mail, I like it better this way, but it's still inconsistent). Do we actually need any callback here or could the solution be to simply remove it like with nvme? Kevin
Re: [Qemu-block] [PATCH v8 19/21] vpc: Switch to .bdrv_co_block_status()
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: > We are gradually moving away from sector-based interfaces, towards > byte-based. Update the vpc driver accordingly. > > Signed-off-by: Eric Blake> Reviewed-by: Vladimir Sementsov-Ogievskiy > Reviewed-by: Fam Zheng > > --- > v7: tweak commit message and type of 'n' [Fam] > v6: no change > v5: fix incorrect rounding in 'map' and bad loop condition [Vladimir] > v4: rebase to interface tweak > v3: rebase to master > v2: drop get_sector_offset() [Kevin], rebase to mapping flag > --- > block/vpc.c | 45 +++-- > 1 file changed, 23 insertions(+), 22 deletions(-) > > diff --git a/block/vpc.c b/block/vpc.c > index cfa5144e867..fba4492fd7b 100644 > --- a/block/vpc.c > +++ b/block/vpc.c > @@ -706,53 +706,54 @@ fail: > return ret; > } > > -static int64_t coroutine_fn vpc_co_get_block_status(BlockDriverState *bs, > -int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState > **file) > +static int coroutine_fn vpc_co_block_status(BlockDriverState *bs, > +bool want_zero, > +int64_t offset, int64_t bytes, > +int64_t *pnum, int64_t *map, > +BlockDriverState **file) > { > BDRVVPCState *s = bs->opaque; > VHDFooter *footer = (VHDFooter*) s->footer_buf; > -int64_t start, offset; > +int64_t image_offset; > bool allocated; > -int64_t ret; > -int n; > +int ret; > +int64_t n; > > if (be32_to_cpu(footer->type) == VHD_FIXED) { > -*pnum = nb_sectors; > +*pnum = bytes; > +*map = offset; > *file = bs->file->bs; > -return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID | > - (sector_num << BDRV_SECTOR_BITS); > +return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID; > } > > qemu_co_mutex_lock(>lock); > > -offset = get_image_offset(bs, sector_num << BDRV_SECTOR_BITS, false, > NULL); > -start = offset; > -allocated = (offset != -1); > +image_offset = get_image_offset(bs, offset, false, NULL); > +allocated = (image_offset != -1); > *pnum = 0; > ret = 0; > > do { > /* All sectors in a block are contiguous (without using the bitmap) > */ > -n = ROUND_UP(sector_num + 1, s->block_size / BDRV_SECTOR_SIZE) > - - sector_num; > -n = MIN(n, nb_sectors); > +n = ROUND_UP(offset + 1, s->block_size) - offset; > +n = MIN(n, bytes); > > *pnum += n; > -sector_num += n; > -nb_sectors -= n; > +offset += n; > +bytes -= n; > /* *pnum can't be greater than one block for allocated > * sectors since there is always a bitmap in between. */ > if (allocated) { > *file = bs->file->bs; > -ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start; > +*map = image_offset; This does work, but only because the loop isn't actually looping for allocated == true. In the old code, it was obvious that start was assigned only once and never changed during the loop, but image_offset changes in each loop iteration. It would probably be cleaner and more obviously correct to move the assignment of *map to before the loop. Kevin
Re: [Qemu-block] [PATCH v8 08/21] iscsi: Switch to .bdrv_co_block_status()
Am 13.02.2018 um 21:26 hat Eric Blake geschrieben: > We are gradually moving away from sector-based interfaces, towards > byte-based. Update the iscsi driver accordingly. In this case, > it is handy to teach iscsi_co_block_status() to handle a NULL map > and file parameter, even though the block layer passes non-NULL > values, because we also call the function directly. For now, there > are no optimizations done based on the want_zero flag. > > We can also make the simplification of asserting that the block > layer passed in aligned values. > > Signed-off-by: Eric Blake> Reviewed-by: Fam Zheng > > --- > v8: rebase to master > v7: rebase to master > v6: no change > v5: assert rather than check for alignment > v4: rebase to interface tweaks > v3: no change > v2: rebase to mapping parameter > --- > block/iscsi.c | 67 > --- > 1 file changed, 32 insertions(+), 35 deletions(-) > > diff --git a/block/iscsi.c b/block/iscsi.c > index d2b0466775c..4842519fdad 100644 > --- a/block/iscsi.c > +++ b/block/iscsi.c > @@ -653,36 +653,36 @@ out_unlock: > > > > -static int64_t coroutine_fn iscsi_co_get_block_status(BlockDriverState *bs, > - int64_t sector_num, > - int nb_sectors, int *pnum, > - BlockDriverState **file) > +static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs, > + bool want_zero, int64_t offset, > + int64_t bytes, int64_t *pnum, > + int64_t *map, > + BlockDriverState **file) > { > IscsiLun *iscsilun = bs->opaque; > struct scsi_get_lba_status *lbas = NULL; > struct scsi_lba_status_descriptor *lbasd = NULL; > struct IscsiTask iTask; > uint64_t lba; > -int64_t ret; > +int ret; > > iscsi_co_init_iscsitask(iscsilun, ); > > -if (!is_sector_request_lun_aligned(sector_num, nb_sectors, iscsilun)) { > -ret = -EINVAL; > -goto out; > -} > +assert(QEMU_IS_ALIGNED(offset | bytes, iscsilun->block_size)); > > /* default to all sectors allocated */ > -ret = BDRV_BLOCK_DATA; > -ret |= (sector_num << BDRV_SECTOR_BITS) | BDRV_BLOCK_OFFSET_VALID; > -*pnum = nb_sectors; > +ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID; > +if (map) { > +*map = offset; > +} Can map ever be NULL? You didn't have that check in other drivers. > +*pnum = bytes; > > /* LUN does not support logical block provisioning */ > if (!iscsilun->lbpme) { > goto out; > } > > -lba = sector_qemu2lun(sector_num, iscsilun); > +lba = offset / iscsilun->block_size; > > qemu_mutex_lock(>mutex); > retry: > @@ -727,12 +727,12 @@ retry: > > lbasd = >descriptors[0]; > > -if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) { > +if (lba != lbasd->lba) { > ret = -EIO; > goto out_unlock; > } > > -*pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun); > +*pnum = lbasd->num_blocks * iscsilun->block_size; > > if (lbasd->provisioning == SCSI_PROVISIONING_TYPE_DEALLOCATED || > lbasd->provisioning == SCSI_PROVISIONING_TYPE_ANCHORED) { > @@ -743,15 +743,13 @@ retry: > } > > if (ret & BDRV_BLOCK_ZERO) { > -iscsi_allocmap_set_unallocated(iscsilun, sector_num * > BDRV_SECTOR_SIZE, > - *pnum * BDRV_SECTOR_SIZE); > +iscsi_allocmap_set_unallocated(iscsilun, offset, *pnum); > } else { > -iscsi_allocmap_set_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE, > - *pnum * BDRV_SECTOR_SIZE); > +iscsi_allocmap_set_allocated(iscsilun, offset, *pnum); > } > > -if (*pnum > nb_sectors) { > -*pnum = nb_sectors; > +if (*pnum > bytes) { > +*pnum = bytes; > } > out_unlock: > qemu_mutex_unlock(>mutex); > @@ -760,7 +758,7 @@ out: > if (iTask.task != NULL) { > scsi_free_scsi_task(iTask.task); > } > -if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID) { > +if (ret > 0 && ret & BDRV_BLOCK_OFFSET_VALID && file) { > *file = bs; > } Can file ever be NULL? > return ret; > @@ -800,25 +798,24 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState > *bs, > nb_sectors * BDRV_SECTOR_SIZE) && No iscsi_co_preadv() yet... :-( > !iscsi_allocmap_is_allocated(iscsilun, sector_num * BDRV_SECTOR_SIZE, > nb_sectors * BDRV_SECTOR_SIZE)) { > -int pnum; > -BlockDriverState *file; > +int64_t pnum; > /* check the block status from the beginning of the
Re: [Qemu-block] block_status automatically added flags
13.02.2018 21:48, Eric Blake wrote: On 02/13/2018 11:36 AM, Vladimir Sementsov-Ogievskiy wrote: Hi Eric! I'm now testing my nbd block status realization (block_status part, not about dirty bitmaps), and faced into the following effect. I created empty qcow2 image and wrote to the first sector, so qemu-io -c map x reports: 64 KiB (0x1) bytes allocated at offset 0 bytes (0x0) 9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1) But I can't get same results, when connecting to nbd server, exporting the same qcow2 file, I get 10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0) Is this with or without your NBD_CMD_BLOCK_STATUS patches applied? And are you exposing the data over NBD as raw ('qemu-nbd -f qcow2'/'qemu-io -f raw') or as qcow2 ('qemu-nbd -f raw'/'qemu-io -f qcow2')? with patches, as raw (server reads it as qcow2, so export is raw) /me does a quick reproduction Yes, I definitely see that behavior without any NBD_CMD_BLOCK_STATUS patches and when the image is exposed over NBD as raw, but not when exposed as qcow2, when testing the 2.11 release: $ qemu-img create -f qcow2 file3 10M Formatting 'file3', fmt=qcow2 size=10485760 cluster_size=65536 lazy_refcounts=off refcount_bits=16 $ qemu-io -c 'w 0 64k' -c map -f qcow2 file3 wrote 65536/65536 bytes at offset 0 64 KiB, 1 ops; 0.0579 sec (1.079 MiB/sec and 17.2601 ops/sec) 64 KiB (0x1) bytes allocated at offset 0 bytes (0x0) 9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1) $ qemu-nbd -f qcow2 -x foo file3 $ qemu-io -f raw -c map nbd://localhost:10809/foo 10 MiB (0xa0) bytes allocated at offset 0 bytes (0x0) $ qemu-nbd -f raw -x foo file3 $ qemu-io -f qcow2 -c map nbd://localhost:10809/foo 64 KiB (0x1) bytes allocated at offset 0 bytes (0x0) 9.938 MiB (0x9f) bytes not allocated at offset 64 KiB (0x1) Right now, without NBD block status, the NBD driver reports the entire file as allocated, as it can't do any better (NBD has no backing file, and all data . Presumably, once NBD_CMD_BLOCK_STATUS is implemented, we can then use that for more accurate information. yes, I've done this, with my patches nbd client block driver get_block_status uses NBD_CMD_BLOCK_STATUS. Finally, I understand the reason: for local file, qemu-io calls bdrv_is_allocated, which calls bdrv_common_block_status_above with want_zero=false. So, it doesn't set BDRV_BLOCK_ZERO, and doesn't set BDRV_BLOCK_ALLOCATED. 'qemu-io map' is a bit unusual; it is the only UI that easily exposes bdrv_is_allocated() to the outside world ('qemu-img map' does not). (The fact that both operations are named 'map' but do something different is annoying; for back-compat reasons, we can't change qemu-img, and I don't know if changing qemu-io is worth it.) And, even if we change want_zero to true, Well, you'd do that by invoking bdrv_block_status() (via 'qemu-img map', for example). Aha, thank you. Actually, qemu-img map --output=json works for me and shows block_status through nbd connection. here, it will set BDRV_BLOCK_ZERO, but will not set BDRV_BLOCK_ALLOCATED, which contradicts with it's definition: BDRV_BLOCK_ALLOCATED: the content of the block is determined by this layer (short for DATA || ZERO), set by block layer This text is wrong; it gets fixed in my still-pending concluding series for byte-based block status: https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg00955.html And even with my r-b. More than month ago, I've forgotten. What is the reason for such a long delay, don't you know? Conceptually, BDRV_BLOCK_ALLOCATED means "is THIS layer of the backing chain responsible for the contents at this guest offset"; and there are cases where we know that we read zeroes but where the current layer is not responsible for the contents (such as a qcow2 that has a backing file with shorter length, where we return BDRV_BLOCK_ZERO but not BDRV_BLOCK_ALLOCATED). But since NBD has no backing chain, the entire image is considered allocated. Meanwhile, asking whether something is allocated ('qemu-io -c map') is not the usual question you want to ask when determining what portions of a file are zero. for nbd, we go through the similar way on server (but with want_zero = true), and we finally have BDRV_BLOCK_ZERO without BDRV_BLOCK_ALLOCATED, which maps to NBD_STATE_HOLE+NBD_STATE_ZERO. But then, in the client we have BDRV_BLOCK_ZERO not automatically added by block layer but directly from nbd driver, therefor BDRV_BLOCK_ALLOCATED is set and I get different result. Drivers should never set BDRV_BLOCK_ALLOCATED; only the code in io.c should set it; and output based on BDRV_BLOCK_ALLOCATED is only useful in backing chain scenarios (which NBD does not have). this all looks weird for me. BDRV_BLOCK_ALLOCATED definition should be fixed, to show that this flag show only reported by driver