date:20151001

[Qemu-block] [PATCH v2] migration: disallow migrate_add_blocker during migration

2015-10-01 Thread John Snow

If a migration is already in progress and somebody attempts
to add a migration blocker, this should rightly fail.

Add an errp parameter and a retcode return value to migrate_add_blocker.

This is part one of two for a solution to prohibit e.g. block jobs
from running concurrently with migration.

Signed-off-by: John Snow 
---
 block/qcow.c  |  6 +-
 block/vdi.c   |  6 +-
 block/vhdx.c  |  6 +-
 block/vmdk.c  |  7 ++-
 block/vpc.c   | 10 +++---
 block/vvfat.c | 20 
 hw/9pfs/virtio-9p.c   | 16 
 hw/misc/ivshmem.c |  5 -
 hw/scsi/vhost-scsi.c  | 25 +++--
 hw/virtio/vhost.c | 35 +++
 include/migration/migration.h |  6 +-
 migration/migration.c | 32 
 stubs/migr-blocker.c  |  3 ++-
 target-i386/kvm.c | 16 +---
 14 files changed, 146 insertions(+), 47 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index 6e35db1..fbb498f 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -236,7 +236,11 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(>migration_blocker, "The qcow format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, errp);
+if (ret < 0) {
+error_free(s->migration_blocker);
+goto fail;
+}
 
 qemu_co_mutex_init(>lock);
 return 0;
diff --git a/block/vdi.c b/block/vdi.c
index 062a654..1635ab1 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -505,7 +505,11 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(>migration_blocker, "The vdi format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, errp);
+if (ret < 0) {
+error_free(s->migration_blocker);
+goto fail_free_bmap;
+}
 
 qemu_co_mutex_init(>write_lock);
 
diff --git a/block/vhdx.c b/block/vhdx.c
index d3bb1bd..097c707 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1005,7 +1005,11 @@ static int vhdx_open(BlockDriverState *bs, QDict 
*options, int flags,
 error_setg(>migration_blocker, "The vhdx format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, errp);
+if (ret < 0) {
+error_free(s->migration_blocker);
+goto fail;
+}
 
 return 0;
 fail:
diff --git a/block/vmdk.c b/block/vmdk.c
index be0d640..36ff6f4 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -951,7 +951,12 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, 
int flags,
 error_setg(>migration_blocker, "The vmdk format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, errp);
+if (ret < 0) {
+error_free(s->migration_blocker);
+goto fail;
+}
+
 g_free(buf);
 return 0;
 
diff --git a/block/vpc.c b/block/vpc.c
index 2b3b518..f3dd677 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -325,13 +325,17 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
 #endif
 }
 
-qemu_co_mutex_init(>lock);
-
 /* Disable migration when VHD images are used */
 error_setg(>migration_blocker, "The vpc format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-migrate_add_blocker(s->migration_blocker);
+ret = migrate_add_blocker(s->migration_blocker, errp);
+if (ret < 0) {
+error_free(s->migration_blocker);
+goto fail;
+}
+
+qemu_co_mutex_init(>lock);
 
 return 0;
 
diff --git a/block/vvfat.c b/block/vvfat.c
index 7ddc962..bdc1297 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1191,22 +1191,26 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 s->sector_count = s->faked_sectors + 
s->sectors_per_cluster*s->cluster_count;
 
-if (s->first_sectors_number == 0x40) {
-init_mbr(s, cyls, heads, secs);
-}
-
-//assert(is_consistent(s));
-qemu_co_mutex_init(>lock);
-
 /* Disable migration when vvfat is used rw */
 if (s->qcow) {
 error_setg(>migration_blocker,
"The vvfat (rw) format used by node '%s' "
"does not support live migration",

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Paolo Bonzini

On 30/09/2015 18:02, Jeff Cody wrote:
> As a general rule for blockjobs, I disagree.
> 
> Right away, there is a key difference: we don't know that the image is
> (or should be) empty.

Not necessarily empty.  sync='top' && mode='existing' &&
!target->backing_file, for example, makes sense if the target is a copy
of source->bs.

In fact, commit of the active layer is almost exactly a mode='existing'
drive-mirror operation.

But if you use mode == 'existing', and don't provide an image that
follows the rules, it's garbage-in garbage-out.  The sequence of
operation makes sense, but the resulting image does not.

Paolo

> With mode != "existing", we know the image
> should be empty, since we just created it (although for a host device,
> it may have extraneous data in it).  So I think it is not so much what
> we can assume about an existing image, as it is what we cannot assume.
> And that could potentially influence some block jobs.

Re: [Qemu-block] [PATCH] migration: disallow migrate_add_blocker during migration

2015-10-01 Thread John Snow



On 09/30/2015 06:08 AM, Kevin Wolf wrote:
> Am 29.09.2015 um 22:20 hat John Snow geschrieben:
>> If a migration is already in progress and somebody attempts
>> to add a migration blocker, this should rightly fail.
>>
>> Add an errp parameter and a retcode return value to migrate_add_blocker.
>>
>> This is part one of two for a solution to prohibit e.g. block jobs
>> from running concurrently with migration.
>>
>> Signed-off-by: John Snow 
> 
> Through which tree should this be merged?
> 

I think the more sensitive changes actually wind up in the block layer,
so perhaps it's best to do it through yours with ACK from the
migrational powers that be.

>>  block/qcow.c  |  5 -
>>  block/vdi.c   |  5 -
>>  block/vhdx.c  |  5 -
>>  block/vmdk.c  | 13 +
>>  block/vpc.c   |  9 ++---
>>  block/vvfat.c | 19 +++
>>  hw/9pfs/virtio-9p.c   | 15 +++
>>  hw/misc/ivshmem.c |  5 -
>>  hw/scsi/vhost-scsi.c  | 11 +++
>>  hw/virtio/vhost.c | 31 +++
>>  include/migration/migration.h |  4 +++-
>>  migration/migration.c | 32 
>>  stubs/migr-blocker.c  |  3 ++-
>>  target-i386/kvm.c |  6 +-
>>  14 files changed, 117 insertions(+), 46 deletions(-)
>>
>> diff --git a/block/qcow.c b/block/qcow.c
>> index 6e35db1..1b82dec 100644
>> --- a/block/qcow.c
>> +++ b/block/qcow.c
>> @@ -236,7 +236,10 @@ static int qcow_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  error_setg(>migration_blocker, "The qcow format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, errp);
>> +if (ret < 0) {
>> +goto fail;
> 
> This error path leaks s->migration_blocker.
> 
>> +}
>>  
>>  qemu_co_mutex_init(>lock);
>>  return 0;
>> diff --git a/block/vdi.c b/block/vdi.c
>> index 062a654..95b2690 100644
>> --- a/block/vdi.c
>> +++ b/block/vdi.c
>> @@ -505,7 +505,10 @@ static int vdi_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  error_setg(>migration_blocker, "The vdi format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, errp);
>> +if (ret < 0) {
>> +goto fail_free_bmap;
> 
> Same.
> 
>> +}
>>  
>>  qemu_co_mutex_init(>write_lock);
>>  
>> diff --git a/block/vhdx.c b/block/vhdx.c
>> index d3bb1bd..5bebe34 100644
>> --- a/block/vhdx.c
>> +++ b/block/vhdx.c
>> @@ -1005,7 +1005,10 @@ static int vhdx_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  error_setg(>migration_blocker, "The vhdx format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, errp);
>> +if (ret < 0) {
>> +goto fail;
>> +}
>>  
>>  return 0;
>>  fail:
> 
> This one happens to be okay because VHDX uses the close function in the
> failure path (and at last up to now that function even seems to cope
> with half-initialised images - it just feels a bit brittle).
> 

I'll patch it anyway. No need to cause work for people who audit this in
the future.

>> diff --git a/block/vmdk.c b/block/vmdk.c
>> index be0d640..09dcf6b 100644
>> --- a/block/vmdk.c
>> +++ b/block/vmdk.c
>> @@ -943,15 +943,20 @@ static int vmdk_open(BlockDriverState *bs, QDict 
>> *options, int flags,
>>  if (ret) {
>>  goto fail;
>>  }
>> -s->cid = vmdk_read_cid(bs, 0);
>> -s->parent_cid = vmdk_read_cid(bs, 1);
> 
> Why do you move this code? It doesn't seem to do anything that you would
> need to undo on failure.
> 

It was just the pattern of how I was applying the change. ("Try to
position the error check directly after the previous error pathway,
leaving any items that are incapable of failing to the rear portion of
the function.") I didn't trace vmdk_read_cid here, I only noted it
couldn't fail, so I shifted it down.

If in this particular case it's benign, I'll shift it back for the
smaller diff.

>> -qemu_co_mutex_init(>lock);
>>
>>  /* Disable migration when VMDK images are used */
>>  error_setg(>migration_blocker, "The vmdk format used by node '%s' "
>> "does not support live migration",
>> bdrv_get_device_or_node_name(bs));
>> -migrate_add_blocker(s->migration_blocker);
>> +ret = migrate_add_blocker(s->migration_blocker, errp);
>> +if (ret < 0) {
>> +goto fail;
>> +

Re: [Qemu-block] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-01 Thread Jeff Cody

On Tue, Sep 29, 2015 at 04:54:10PM +0100, Richard W.M. Jones wrote:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> 
> The following command fails on an NFS mountpoint:
> 
>   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
>   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> cluster_size=65536 preallocation='falloc' lazy_refcounts=off
>   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> descriptor
> 
> The reason turns out to be because NFS doesn't support the
> posix_fallocate call.  glibc emulates it instead.  However glibc's
> emulation involves using the pread(2) syscall.  The pread syscall
> fails with EBADF if the file descriptor is opened without the read
> open-flag (ie. open (..., O_WRONLY)).
> 
> I contacted glibc upstream about this, and their response is here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9
> 
> There are two possible fixes: Use Linux fallocate directly, or (this
> fix) work around the problem in qemu by opening the file with O_RDWR
> instead of O_WRONLY.
> 
> Signed-off-by: Richard W.M. Jones 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> ---
>  block/raw-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 30df8ad..86f8562 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1648,7 +1648,7 @@ static int raw_create(const char *filename, QemuOpts 
> *opts, Error **errp)
>  goto out;
>  }
>  
> -fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
> +fd = qemu_open(filename, O_RDWR | O_CREAT | O_TRUNC | O_BINARY,
> 0644);
>  if (fd < 0) {
>  result = -errno;
> -- 
> 2.5.0
> 
>

Reviewed-by: Jeff Cody

Re: [Qemu-block] [Qemu-devel] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-01 Thread Eric Blake

On 09/29/2015 09:54 AM, Richard W.M. Jones wrote:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> 
> The following command fails on an NFS mountpoint:
> 
>   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
>   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> cluster_size=65536 preallocation='falloc' lazy_refcounts=off
>   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> descriptor
> 
> The reason turns out to be because NFS doesn't support the
> posix_fallocate call.  glibc emulates it instead.  However glibc's
> emulation involves using the pread(2) syscall.  The pread syscall
> fails with EBADF if the file descriptor is opened without the read
> open-flag (ie. open (..., O_WRONLY)).
> 

Failing with EBADF feels fishy (POSIX requests EINVAL in that case) -
but that's an upstream glibc discussion that has forked from this thread
and doesn't hold up the qemu fix.

> I contacted glibc upstream about this, and their response is here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9
> 
> There are two possible fixes: Use Linux fallocate directly, or (this
> fix) work around the problem in qemu by opening the file with O_RDWR
> instead of O_WRONLY.
> 
> Signed-off-by: Richard W.M. Jones 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> ---
>  block/raw-posix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-block] [PATCH] gluster: allocate GlusterAIOCBs on the stack

2015-10-01 Thread Paolo Bonzini

This is simpler now that the driver has been converted to coroutines.

Signed-off-by: Paolo Bonzini 
---
 block/gluster.c | 86 ++---
 1 file changed, 33 insertions(+), 53 deletions(-)

diff --git a/block/gluster.c b/block/gluster.c
index 1eb3a8c..0857c14 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -429,28 +429,23 @@ static coroutine_fn int 
qemu_gluster_co_write_zeroes(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, BdrvRequestFlags flags)
 {
 int ret;
-GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
+GlusterAIOCB acb;
 BDRVGlusterState *s = bs->opaque;
 off_t size = nb_sectors * BDRV_SECTOR_SIZE;
 off_t offset = sector_num * BDRV_SECTOR_SIZE;
 
-acb->size = size;
-acb->ret = 0;
-acb->coroutine = qemu_coroutine_self();
-acb->aio_context = bdrv_get_aio_context(bs);
+acb.size = size;
+acb.ret = 0;
+acb.coroutine = qemu_coroutine_self();
+acb.aio_context = bdrv_get_aio_context(bs);
 
-ret = glfs_zerofill_async(s->fd, offset, size, _finish_aiocb, acb);
+ret = glfs_zerofill_async(s->fd, offset, size, gluster_finish_aiocb, );
 if (ret < 0) {
-ret = -errno;
-goto out;
+return -errno;
 }
 
 qemu_coroutine_yield();
-ret = acb->ret;
-
-out:
-g_slice_free(GlusterAIOCB, acb);
-return ret;
+return acb.ret;
 }
 
 static inline bool gluster_supports_zerofill(void)
@@ -541,35 +536,30 @@ static coroutine_fn int 
qemu_gluster_co_rw(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, int write)
 {
 int ret;
-GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
+GlusterAIOCB acb;
 BDRVGlusterState *s = bs->opaque;
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 off_t offset = sector_num * BDRV_SECTOR_SIZE;
 
-acb->size = size;
-acb->ret = 0;
-acb->coroutine = qemu_coroutine_self();
-acb->aio_context = bdrv_get_aio_context(bs);
+acb.size = size;
+acb.ret = 0;
+acb.coroutine = qemu_coroutine_self();
+acb.aio_context = bdrv_get_aio_context(bs);
 
 if (write) {
 ret = glfs_pwritev_async(s->fd, qiov->iov, qiov->niov, offset, 0,
-_finish_aiocb, acb);
+gluster_finish_aiocb, );
 } else {
 ret = glfs_preadv_async(s->fd, qiov->iov, qiov->niov, offset, 0,
-_finish_aiocb, acb);
+gluster_finish_aiocb, );
 }
 
 if (ret < 0) {
-ret = -errno;
-goto out;
+return -errno;
 }
 
 qemu_coroutine_yield();
-ret = acb->ret;
-
-out:
-g_slice_free(GlusterAIOCB, acb);
-return ret;
+return acb.ret;
 }
 
 static int qemu_gluster_truncate(BlockDriverState *bs, int64_t offset)
@@ -600,26 +590,21 @@ static coroutine_fn int 
qemu_gluster_co_writev(BlockDriverState *bs,
 static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
 {
 int ret;
-GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
+GlusterAIOCB acb;
 BDRVGlusterState *s = bs->opaque;
 
-acb->size = 0;
-acb->ret = 0;
-acb->coroutine = qemu_coroutine_self();
-acb->aio_context = bdrv_get_aio_context(bs);
+acb.size = 0;
+acb.ret = 0;
+acb.coroutine = qemu_coroutine_self();
+acb.aio_context = bdrv_get_aio_context(bs);
 
-ret = glfs_fsync_async(s->fd, _finish_aiocb, acb);
+ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, );
 if (ret < 0) {
-ret = -errno;
-goto out;
+return -errno;
 }
 
 qemu_coroutine_yield();
-ret = acb->ret;
-
-out:
-g_slice_free(GlusterAIOCB, acb);
-return ret;
+return acb.ret;
 }
 
 #ifdef CONFIG_GLUSTERFS_DISCARD
@@ -627,28 +612,23 @@ static coroutine_fn int 
qemu_gluster_co_discard(BlockDriverState *bs,
 int64_t sector_num, int nb_sectors)
 {
 int ret;
-GlusterAIOCB *acb = g_slice_new(GlusterAIOCB);
+GlusterAIOCB acb;
 BDRVGlusterState *s = bs->opaque;
 size_t size = nb_sectors * BDRV_SECTOR_SIZE;
 off_t offset = sector_num * BDRV_SECTOR_SIZE;
 
-acb->size = 0;
-acb->ret = 0;
-acb->coroutine = qemu_coroutine_self();
-acb->aio_context = bdrv_get_aio_context(bs);
+acb.size = 0;
+acb.ret = 0;
+acb.coroutine = qemu_coroutine_self();
+acb.aio_context = bdrv_get_aio_context(bs);
 
-ret = glfs_discard_async(s->fd, offset, size, _finish_aiocb, acb);
+ret = glfs_discard_async(s->fd, offset, size, gluster_finish_aiocb, );
 if (ret < 0) {
-ret = -errno;
-goto out;
+return -errno;
 }
 
 qemu_coroutine_yield();
-ret = acb->ret;
-
-out:
-g_slice_free(GlusterAIOCB, acb);
-return ret;
+return acb.ret;
 }
 #endif
 
-- 
2.5.0

[Qemu-block] [PATCH] scsi: switch from g_slice allocator to malloc

2015-10-01 Thread Paolo Bonzini

Simplify memory allocation by sticking with a single API.  GSlice
is not that fast anyway (tcmalloc/jemalloc are better).

Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c  |  4 ++--
 hw/scsi/virtio-scsi-dataplane.c | 10 +-
 hw/scsi/virtio-scsi.c   | 12 +---
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index ffac8f4..d373c1b 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -558,7 +558,7 @@ SCSIRequest *scsi_req_alloc(const SCSIReqOps *reqops, 
SCSIDevice *d,
 const int memset_off = offsetof(SCSIRequest, sense)
+ sizeof(req->sense);
 
-req = g_slice_alloc(reqops->size);
+req = g_malloc(reqops->size);
 memset((uint8_t *)req + memset_off, 0, reqops->size - memset_off);
 req->refcount = 1;
 req->bus = bus;
@@ -1622,7 +1622,7 @@ void scsi_req_unref(SCSIRequest *req)
 }
 object_unref(OBJECT(req->dev));
 object_unref(OBJECT(qbus->parent));
-g_slice_free1(req->ops->size, req);
+g_free(req);
 }
 }
 
diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
index 5575648..1248fd9 100644
--- a/hw/scsi/virtio-scsi-dataplane.c
+++ b/hw/scsi/virtio-scsi-dataplane.c
@@ -57,7 +57,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 return NULL;
 }
 
-r = g_slice_new(VirtIOSCSIVring);
+r = g_new(VirtIOSCSIVring, 1);
 r->host_notifier = *virtio_queue_get_host_notifier(vq);
 r->guest_notifier = *virtio_queue_get_guest_notifier(vq);
 aio_set_event_notifier(s->ctx, >host_notifier, handler);
@@ -73,7 +73,7 @@ static VirtIOSCSIVring *virtio_scsi_vring_init(VirtIOSCSI *s,
 fail_vring:
 aio_set_event_notifier(s->ctx, >host_notifier, NULL);
 k->set_host_notifier(qbus->parent, n, false);
-g_slice_free(VirtIOSCSIVring, r);
+g_free(r);
 return NULL;
 }
 
@@ -182,18 +182,18 @@ static void virtio_scsi_vring_teardown(VirtIOSCSI *s)
 
 if (s->ctrl_vring) {
 vring_teardown(>ctrl_vring->vring, vdev, 0);
-g_slice_free(VirtIOSCSIVring, s->ctrl_vring);
+g_free(s->ctrl_vring);
 s->ctrl_vring = NULL;
 }
 if (s->event_vring) {
 vring_teardown(>event_vring->vring, vdev, 1);
-g_slice_free(VirtIOSCSIVring, s->event_vring);
+g_free(s->event_vring);
 s->event_vring = NULL;
 }
 if (s->cmd_vrings) {
 for (i = 0; i < vs->conf.num_queues && s->cmd_vrings[i]; i++) {
 vring_teardown(>cmd_vrings[i]->vring, vdev, 2 + i);
-g_slice_free(VirtIOSCSIVring, s->cmd_vrings[i]);
+g_free(s->cmd_vrings[i]);
 s->cmd_vrings[i] = NULL;
 }
 free(s->cmd_vrings);
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 1c33f14..20885fb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -47,7 +47,7 @@ VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue 
*vq)
 const size_t zero_skip = offsetof(VirtIOSCSIReq, elem)
  + sizeof(VirtQueueElement);
 
-req = g_slice_alloc(sizeof(*req) + vs->cdb_size);
+req = g_malloc(sizeof(*req) + vs->cdb_size);
 req->vq = vq;
 req->dev = s;
 qemu_sglist_init(>qsgl, DEVICE(s), 8, _space_memory);
@@ -58,11 +58,9 @@ VirtIOSCSIReq *virtio_scsi_init_req(VirtIOSCSI *s, VirtQueue 
*vq)
 
 void virtio_scsi_free_req(VirtIOSCSIReq *req)
 {
-VirtIOSCSICommon *vs = (VirtIOSCSICommon *)req->dev;
-
 qemu_iovec_destroy(>resp_iov);
 qemu_sglist_destroy(>qsgl);
-g_slice_free1(sizeof(*req) + vs->cdb_size, req);
+g_free(req);
 }
 
 static void virtio_scsi_complete_req(VirtIOSCSIReq *req)
@@ -250,7 +248,7 @@ static void virtio_scsi_cancel_notify(Notifier *notifier, 
void *data)
 if (--n->tmf_req->remaining == 0) {
 virtio_scsi_complete_req(n->tmf_req);
 }
-g_slice_free(VirtIOSCSICancelNotifier, n);
+g_free(n);
 }
 
 /* Return 0 if the request is ready to be completed and return to guest;
@@ -301,7 +299,7 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq 
*req)
 VirtIOSCSICancelNotifier *notifier;
 
 req->remaining = 1;
-notifier = g_slice_new(VirtIOSCSICancelNotifier);
+notifier = g_new(VirtIOSCSICancelNotifier, 1);
 notifier->tmf_req = req;
 notifier->notifier.notify = virtio_scsi_cancel_notify;
 scsi_req_cancel_async(r, >notifier);
@@ -350,7 +348,7 @@ static int virtio_scsi_do_tmf(VirtIOSCSI *s, VirtIOSCSIReq 
*req)
 VirtIOSCSICancelNotifier *notifier;
 
 req->remaining++;
-notifier = g_slice_new(VirtIOSCSICancelNotifier);
+notifier = g_new(VirtIOSCSICancelNotifier, 1);
 notifier->notifier.notify = virtio_scsi_cancel_notify;

Re: [Qemu-block] [Qemu-devel] [PATCH] migration: disallow migrate_add_blocker during migration

2015-10-01 Thread Markus Armbruster

Kevin Wolf  writes:

> Am 29.09.2015 um 22:20 hat John Snow geschrieben:
>> If a migration is already in progress and somebody attempts
>> to add a migration blocker, this should rightly fail.
>> 
>> Add an errp parameter and a retcode return value to migrate_add_blocker.
>> 
>> This is part one of two for a solution to prohibit e.g. block jobs
>> from running concurrently with migration.
>> 
>> Signed-off-by: John Snow 
[...]
>> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c
>> index cc76989..859e844 100644
>> --- a/hw/misc/ivshmem.c
>> +++ b/hw/misc/ivshmem.c
>> @@ -740,7 +740,10 @@ static int pci_ivshmem_init(PCIDevice *dev)
>>  if (s->role_val == IVSHMEM_PEER) {
>>  error_setg(>migration_blocker,
>> "Migration is disabled when using feature 'peer mode' in 
>> device 'ivshmem'");
>> -migrate_add_blocker(s->migration_blocker);
>> +if (migrate_add_blocker(s->migration_blocker, NULL) < 0) {
>> +error_report("Unable to prohibit migration during ivshmem 
>> init");
>> +exit(1);
>
> Seriously? :-/
>
> I see that the whole function does the same and has an int return value
> only so that it can always return 0. So what you're doing is probably
> right for this patch, but generally, the error handling in this init
> function is horrible.

Horrible indeed.  Marc-André's series to clean it up clocks in at 47
patches right now.

Message-Id: <1443094669-4144-1-git-send-email-marcandre.lur...@redhat.com>
http://lists.gnu.org/archive/html/qemu-devel/2015-09/msg06302.html

> The good thing is that you can't leak anything this way...
>
>> +}
>>  }
>>  
>>  pci_conf = dev->config;
[...]

Re: [Qemu-block] [PATCH] iotests: Fix test 128 for password-less sudo

2015-10-01 Thread Kevin Wolf

Am 25.09.2015 um 19:19 hat Max Reitz geschrieben:
> As of 934659c460d46c948cf348822fda1d38556ed9a4, $QEMU_IO is generally no
> longer a program name, and therefore "sudo -n $QEMU_IO" will no longer
> work.
> 
> Fix this by copying the qemu-io invocation function from common.config,
> making it use $sudo for invoking $QEMU_IO_PROG, and then use that
> function instead of $QEMU_IO.
> 
> Reported-by: Fam Zheng 
> Signed-off-by: Max Reitz 

I just noticed that this is in my branch, but I don't seem to have sent
an email yet. So thanks, applied to the block branch.

Kevin

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Kevin Wolf

Am 30.09.2015 um 18:06 hat Paolo Bonzini geschrieben:
> 
> 
> On 30/09/2015 18:02, Jeff Cody wrote:
> > As a general rule for blockjobs, I disagree.
> > 
> > Right away, there is a key difference: we don't know that the image is
> > (or should be) empty.
> 
> Not necessarily empty.  sync='top' && mode='existing' &&
> !target->backing_file, for example, makes sense if the target is a copy
> of source->bs.
> 
> In fact, commit of the active layer is almost exactly a mode='existing'
> drive-mirror operation.
> 
> But if you use mode == 'existing', and don't provide an image that
> follows the rules, it's garbage-in garbage-out.  The sequence of
> operation makes sense, but the resulting image does not.

Yes, that's the point I was trying to make. The behaviour of our block
jobs doesn't depend on the contents of the target image, and I don't see
a future block job where this would change.

So if the user thinks it makes sense to start with a non-empty image and
keep some non-zero data in places that the block job didn't touch,
that's their choice (and as you mentioned, there are examples where it
does indeed make sense).

The block job code need not care about that, it just does its job
without ever looking at the contents of the target image.

Kevin

[Qemu-block] [PATCH] block: switch from g_slice allocator to malloc

2015-10-01 Thread Paolo Bonzini

Simplify memory allocation by sticking with a single API.  GSlice
is not that fast anyway (tcmalloc/jemalloc are better).

Signed-off-by: Paolo Bonzini 
---
 block/io.c| 4 ++--
 block/mirror.c| 4 ++--
 block/raw-posix.c | 8 
 block/raw-win32.c | 4 ++--
 hw/block/virtio-blk.c | 4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index 94e18e6..17293c3 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2218,7 +2218,7 @@ void *qemu_aio_get(const AIOCBInfo *aiocb_info, 
BlockDriverState *bs,
 {
 BlockAIOCB *acb;
 
-acb = g_slice_alloc(aiocb_info->aiocb_size);
+acb = g_malloc(aiocb_info->aiocb_size);
 acb->aiocb_info = aiocb_info;
 acb->bs = bs;
 acb->cb = cb;
@@ -2238,7 +2238,7 @@ void qemu_aio_unref(void *p)
 BlockAIOCB *acb = p;
 assert(acb->refcnt > 0);
 if (--acb->refcnt == 0) {
-g_slice_free1(acb->aiocb_info->aiocb_size, acb);
+g_free(acb);
 }
 }
 
diff --git a/block/mirror.c b/block/mirror.c
index a258926..c4cf851 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -113,7 +113,7 @@ static void mirror_iteration_done(MirrorOp *op, int ret)
 }
 
 qemu_iovec_destroy(>qiov);
-g_slice_free(MirrorOp, op);
+g_free(op);
 
 if (s->waiting_for_io) {
 qemu_coroutine_enter(s->common.co, NULL);
@@ -264,7 +264,7 @@ static uint64_t coroutine_fn 
mirror_iteration(MirrorBlockJob *s)
 } while (delay_ns == 0 && next_sector < end);
 
 /* Allocate a MirrorOp that is used as an AIO callback.  */
-op = g_slice_new(MirrorOp);
+op = g_new(MirrorOp, 1);
 op->s = s;
 op->sector_num = sector_num;
 op->nb_sectors = nb_sectors;
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 30df8ad..a5f9707 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1259,7 +1259,7 @@ static int aio_worker(void *arg)
 break;
 }
 
-g_slice_free(RawPosixAIOData, aiocb);
+g_free(aiocb);
 return ret;
 }
 
@@ -1267,7 +1267,7 @@ static int paio_submit_co(BlockDriverState *bs, int fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 int type)
 {
-RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
+RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
 ThreadPool *pool;
 
 acb->bs = bs;
@@ -1292,7 +1292,7 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, int 
fd,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque, int type)
 {
-RawPosixAIOData *acb = g_slice_new(RawPosixAIOData);
+RawPosixAIOData *acb = g_new(RawPosixAIOData, 1);
 ThreadPool *pool;
 
 acb->bs = bs;
@@ -2237,7 +2237,7 @@ static BlockAIOCB *hdev_aio_ioctl(BlockDriverState *bs,
 if (fd_open(bs) < 0)
 return NULL;
 
-acb = g_slice_new(RawPosixAIOData);
+acb = g_new(RawPosixAIOData, 1);
 acb->bs = bs;
 acb->aio_type = QEMU_AIO_IOCTL;
 acb->aio_fildes = s->fd;
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 68f2338..20a5332 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -135,7 +135,7 @@ static int aio_worker(void *arg)
 break;
 }
 
-g_slice_free(RawWin32AIOData, aiocb);
+g_free(aiocb);
 return ret;
 }
 
@@ -143,7 +143,7 @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, HANDLE 
hfile,
 int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
 BlockCompletionFunc *cb, void *opaque, int type)
 {
-RawWin32AIOData *acb = g_slice_new(RawWin32AIOData);
+RawWin32AIOData *acb = g_new(RawWin32AIOData, 1);
 ThreadPool *pool;
 
 acb->bs = bs;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f9301ae..c6ad2c9 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -30,7 +30,7 @@
 
 VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 {
-VirtIOBlockReq *req = g_slice_new(VirtIOBlockReq);
+VirtIOBlockReq *req = g_new(VirtIOBlockReq, 1);
 req->dev = s;
 req->qiov.size = 0;
 req->in_len = 0;
@@ -42,7 +42,7 @@ VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s)
 void virtio_blk_free_request(VirtIOBlockReq *req)
 {
 if (req) {
-g_slice_free(VirtIOBlockReq, req);
+g_free(req);
 }
 }
 
-- 
2.5.0

[Qemu-block] [PATCH] nbd: switch from g_slice allocator to malloc

2015-10-01 Thread Paolo Bonzini

Simplify memory allocation by sticking with a single API.  GSlice
is not that fast anyway (tcmalloc/jemalloc are better).

Signed-off-by: Paolo Bonzini 
---
 nbd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/nbd.c b/nbd.c
index 07240bd..74859cb 100644
--- a/nbd.c
+++ b/nbd.c
@@ -1005,7 +1005,7 @@ static NBDRequest *nbd_request_get(NBDClient *client)
 client->nb_requests++;
 nbd_update_can_read(client);
 
-req = g_slice_new0(NBDRequest);
+req = g_new0(NBDRequest, 1);
 nbd_client_get(client);
 req->client = client;
 return req;
@@ -1018,7 +1018,7 @@ static void nbd_request_put(NBDRequest *req)
 if (req->data) {
 qemu_vfree(req->data);
 }
-g_slice_free(NBDRequest, req);
+g_free(req);
 
 client->nb_requests--;
 nbd_update_can_read(client);
-- 
2.5.0

[Qemu-block] [PATCH v2 15/16] block: Add and use bdrv_replace_in_backing_chain()

2015-10-01 Thread Kevin Wolf

This cleans up the mess we left behind in the mirror code after the
previous patch. Instead of using bdrv_swap(), just change pointers.

The interface change of the mirror job that callers must consider is
that after job completion, their local BDS pointers still point to the
same node now. qemu-img must change its code accordingly (which makes it
easier to understand); the other callers stays unchanged because after
completion they don't do anything with the BDS, but just with the job,
and the job is still owned by the source BDS.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 32 +++-
 block/mirror.c| 23 +++
 include/block/block.h |  4 +++-
 qemu-img.c| 16 
 4 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 54ef5de..5e0ad24 100644
--- a/block.c
+++ b/block.c
@@ -1095,7 +1095,7 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
*parent_bs,
 return child;
 }
 
-void bdrv_detach_child(BdrvChild *child)
+static void bdrv_detach_child(BdrvChild *child)
 {
 QLIST_REMOVE(child, next);
 QLIST_REMOVE(child, next_parent);
@@ -2232,6 +2232,36 @@ void bdrv_append(BlockDriverState *bs_new, 
BlockDriverState *bs_top)
 bdrv_unref(bs_new);
 }
 
+void bdrv_replace_in_backing_chain(BlockDriverState *old, BlockDriverState 
*new)
+{
+assert(!bdrv_requests_pending(old));
+assert(!bdrv_requests_pending(new));
+
+bdrv_ref(old);
+
+if (old->blk) {
+/* As long as these fields aren't in BlockBackend, but in the top-level
+ * BlockDriverState, it's not possible for a BDS to have two BBs.
+ *
+ * We really want to copy the fields from old to new, but we go for a
+ * swap instead so that pointers aren't duplicated and cause trouble.
+ * (Also, bdrv_swap() used to do the same.) */
+assert(!new->blk);
+swap_feature_fields(old, new);
+}
+change_parent_backing_link(old, new);
+
+/* Change backing files if a previously independent node is added to the
+ * chain. For active commit, we replace top by its own (indirect) backing
+ * file and don't do anything here so we don't build a loop. */
+if (new->backing == NULL && !bdrv_chain_contains(backing_bs(old), new)) {
+bdrv_set_backing_hd(new, backing_bs(old));
+bdrv_set_backing_hd(old, NULL);
+}
+
+bdrv_unref(old);
+}
+
 static void bdrv_delete(BlockDriverState *bs)
 {
 assert(!bs->job);
diff --git a/block/mirror.c b/block/mirror.c
index 0597ad7..78f4c3f 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -353,6 +353,11 @@ static void mirror_exit(BlockJob *job, void *opaque)
 MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
 MirrorExitData *data = opaque;
 AioContext *replace_aio_context = NULL;
+BlockDriverState *src = s->common.bs;
+
+/* Make sure that the source BDS doesn't go away before we called
+ * block_job_completed(). */
+bdrv_ref(src);
 
 if (s->to_replace) {
 replace_aio_context = bdrv_get_aio_context(s->to_replace);
@@ -367,22 +372,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 if (bdrv_get_flags(s->target) != bdrv_get_flags(to_replace)) {
 bdrv_reopen(s->target, bdrv_get_flags(to_replace), NULL);
 }
-bdrv_swap(s->target, to_replace);
-if (s->common.driver->job_type == BLOCK_JOB_TYPE_COMMIT) {
-/* drop the bs loop chain formed by the swap: break the loop then
- * trigger the unref */
-/* FIXME This duplicates bdrv_set_backing_hd(), except for the
- * actual detach/unref so that the loop can be broken. When
- * bdrv_swap() gets replaced, this will become sane again. */
-BlockDriverState *backing = s->base->backing->bs;
-assert(s->base->backing_blocker);
-bdrv_op_unblock_all(backing, s->base->backing_blocker);
-error_free(s->base->backing_blocker);
-s->base->backing_blocker = NULL;
-bdrv_detach_child(s->base->backing);
-s->base->backing = NULL;
-bdrv_unref(backing);
-}
+bdrv_replace_in_backing_chain(to_replace, s->target);
 }
 if (s->to_replace) {
 bdrv_op_unblock_all(s->to_replace, s->replace_blocker);
@@ -396,6 +386,7 @@ static void mirror_exit(BlockJob *job, void *opaque)
 bdrv_unref(s->target);
 block_job_completed(>common, data->ret);
 g_free(data);
+bdrv_unref(src);
 }
 
 static void coroutine_fn mirror_run(void *opaque)
diff --git a/include/block/block.h b/include/block/block.h
index 5d9092c..1520dee 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -203,6 +203,9 @@ BlockDriverState *bdrv_new(void);
 void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_swap(BlockDriverState *bs_new, BlockDriverState *bs_old);

[Qemu-block] [PATCH v2 14/16] blockjob: Store device name at job creation

2015-10-01 Thread Kevin Wolf

Some block jobs change the block device graph on completion. This means
that the device that owns the job and originally was addressed with its
device name may no longer be what the corresponding BlockBackend points
to.

Previously, the effects of bdrv_swap() ensured that the job was (at
least partially) transferred to the target image. Events that contain
the device name could still use bdrv_get_device_name(job->bs) and get
the same result.

After removing bdrv_swap(), this won't work any more. Instead, save the
device name at job creation and use that copy for QMP events and
anything else identifying the job.

Signed-off-by: Kevin Wolf 
---
 block/mirror.c   |  3 +--
 blockjob.c   | 15 ---
 include/block/blockjob.h |  8 
 3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 911c432..0597ad7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -643,8 +643,7 @@ static void mirror_complete(BlockJob *job, Error **errp)
 return;
 }
 if (!s->synced) {
-error_setg(errp, QERR_BLOCK_JOB_NOT_READY,
-   bdrv_get_device_name(job->bs));
+error_setg(errp, QERR_BLOCK_JOB_NOT_READY, job->id);
 return;
 }
 
diff --git a/blockjob.c b/blockjob.c
index 62bb906..d87869c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -54,6 +54,7 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 bdrv_op_unblock(bs, BLOCK_OP_TYPE_DATAPLANE, job->blocker);
 
 job->driver= driver;
+job->id= g_strdup(bdrv_get_device_name(bs));
 job->bs= bs;
 job->cb= cb;
 job->opaque= opaque;
@@ -81,6 +82,7 @@ void block_job_release(BlockDriverState *bs)
 bs->job = NULL;
 bdrv_op_unblock_all(bs, job->blocker);
 error_free(job->blocker);
+g_free(job->id);
 g_free(job);
 }
 
@@ -113,8 +115,7 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 void block_job_complete(BlockJob *job, Error **errp)
 {
 if (job->pause_count || job->cancelled || !job->driver->complete) {
-error_setg(errp, QERR_BLOCK_JOB_NOT_READY,
-   bdrv_get_device_name(job->bs));
+error_setg(errp, QERR_BLOCK_JOB_NOT_READY, job->id);
 return;
 }
 
@@ -269,7 +270,7 @@ BlockJobInfo *block_job_query(BlockJob *job)
 {
 BlockJobInfo *info = g_new0(BlockJobInfo, 1);
 info->type  = g_strdup(BlockJobType_lookup[job->driver->job_type]);
-info->device= g_strdup(bdrv_get_device_name(job->bs));
+info->device= g_strdup(job->id);
 info->len   = job->len;
 info->busy  = job->busy;
 info->paused= job->pause_count > 0;
@@ -291,7 +292,7 @@ static void block_job_iostatus_set_err(BlockJob *job, int 
error)
 void block_job_event_cancelled(BlockJob *job)
 {
 qapi_event_send_block_job_cancelled(job->driver->job_type,
-bdrv_get_device_name(job->bs),
+job->id,
 job->len,
 job->offset,
 job->speed,
@@ -301,7 +302,7 @@ void block_job_event_cancelled(BlockJob *job)
 void block_job_event_completed(BlockJob *job, const char *msg)
 {
 qapi_event_send_block_job_completed(job->driver->job_type,
-bdrv_get_device_name(job->bs),
+job->id,
 job->len,
 job->offset,
 job->speed,
@@ -315,7 +316,7 @@ void block_job_event_ready(BlockJob *job)
 job->ready = true;
 
 qapi_event_send_block_job_ready(job->driver->job_type,
-bdrv_get_device_name(job->bs),
+job->id,
 job->len,
 job->offset,
 job->speed, _abort);
@@ -344,7 +345,7 @@ BlockErrorAction block_job_error_action(BlockJob *job, 
BlockDriverState *bs,
 default:
 abort();
 }
-qapi_event_send_block_job_error(bdrv_get_device_name(job->bs),
+qapi_event_send_block_job_error(job->id,
 is_read ? IO_OPERATION_TYPE_READ :
 IO_OPERATION_TYPE_WRITE,
 action, _abort);
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index dd9d5e6..289b13f 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -65,6 +65,14 @@ struct BlockJob {
 BlockDriverState *bs;
 
 /**
+ * The ID of the block job. Currently the BlockBackend name of the BDS
+ * owning the job at the time when the job is started.
+ *
+ * TODO Decouple block job IDs from

[Qemu-block] [PATCH v2 11/16] block-backend: Add blk_set_bs()

2015-10-01 Thread Kevin Wolf

It allows changing the BlockDriverState that a BlockBackend points to.

Signed-off-by: Kevin Wolf 
---
 block/block-backend.c | 17 +
 include/block/block_int.h |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/block/block-backend.c b/block/block-backend.c
index c2e8732..2256551 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -239,6 +239,23 @@ BlockDriverState *blk_bs(BlockBackend *blk)
 }
 
 /*
+ * Changes the BlockDriverState attached to @blk
+ */
+void blk_set_bs(BlockBackend *blk, BlockDriverState *bs)
+{
+bdrv_ref(bs);
+
+if (blk->bs) {
+blk->bs->blk = NULL;
+bdrv_unref(blk->bs);
+}
+assert(bs->blk == NULL);
+
+blk->bs = bs;
+bs->blk = blk;
+}
+
+/*
  * Return @blk's DriveInfo if any, else null.
  */
 DriveInfo *blk_legacy_dinfo(BlockBackend *blk)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4598101..cfcae52 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -659,6 +659,8 @@ void backup_start(BlockDriverState *bs, BlockDriverState 
*target,
   BlockCompletionFunc *cb, void *opaque,
   Error **errp);
 
+void blk_set_bs(BlockBackend *blk, BlockDriverState *bs);
+
 void blk_dev_change_media_cb(BlockBackend *blk, bool load);
 bool blk_dev_has_removable_media(BlockBackend *blk);
 void blk_dev_eject_request(BlockBackend *blk, bool force);
-- 
1.8.3.1

[Qemu-block] [PATCH v2 12/16] block: Introduce parents list

2015-10-01 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c   | 3 +++
 include/block/block_int.h | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/block.c b/block.c
index f8e4631..9ff2b64 100644
--- a/block.c
+++ b/block.c
@@ -1090,6 +1090,7 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
*parent_bs,
 };
 
 QLIST_INSERT_HEAD(_bs->children, child, next);
+QLIST_INSERT_HEAD(_bs->parents, child, next_parent);
 
 return child;
 }
@@ -1097,6 +1098,7 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
*parent_bs,
 void bdrv_detach_child(BdrvChild *child)
 {
 QLIST_REMOVE(child, next);
+QLIST_REMOVE(child, next_parent);
 g_free(child);
 }
 
@@ -2038,6 +2040,7 @@ static void bdrv_move_reference_fields(BlockDriverState 
*bs_dest,
 /* keep the same entry in bdrv_states */
 bs_dest->device_list = bs_src->device_list;
 bs_dest->blk = bs_src->blk;
+bs_dest->parents = bs_src->parents;
 
 memcpy(bs_dest->op_blockers, bs_src->op_blockers,
sizeof(bs_dest->op_blockers));
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cfcae52..52ea7c0 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -339,6 +339,7 @@ struct BdrvChild {
 BlockDriverState *bs;
 const BdrvChildRole *role;
 QLIST_ENTRY(BdrvChild) next;
+QLIST_ENTRY(BdrvChild) next_parent;
 };
 
 /*
@@ -445,6 +446,7 @@ struct BlockDriverState {
  * parent node of this node. */
 BlockDriverState *inherits_from;
 QLIST_HEAD(, BdrvChild) children;
+QLIST_HEAD(, BdrvChild) parents;
 
 QDict *options;
 BlockdevDetectZeroesOptions detect_zeroes;
-- 
1.8.3.1

[Qemu-block] [PATCH v2 09/16] block: Split bdrv_move_feature_fields()

2015-10-01 Thread Kevin Wolf

After bdrv_swap(), some fields must be moved back to their original BDS
to compensate for the effects that a swap of the contents of the objects
has while keeping the old addresses. Other fields must be moved back
because they should logically be moved and must stay on top

When replacing bdrv_swap() with operations changing the pointers in the
parents, we only need the latter and must avoid swapping the former.
Split the function accordingly.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/block.c b/block.c
index c4dcf81..f8e4631 100644
--- a/block.c
+++ b/block.c
@@ -1985,6 +1985,8 @@ static void bdrv_rebind(BlockDriverState *bs)
 }
 }
 
+/* Fields that need to stay with the top-level BDS, no matter whether the
+ * address of the top-level BDS stays the same or not. */
 static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
  BlockDriverState *bs_src)
 {
@@ -2020,7 +2022,13 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 
 /* dirty bitmap */
 bs_dest->dirty_bitmaps  = bs_src->dirty_bitmaps;
+}
 
+/* Fields that only need to be swapped if the contents of BDSes is swapped
+ * rather than pointers being changed in the parents. */
+static void bdrv_move_reference_fields(BlockDriverState *bs_dest,
+   BlockDriverState *bs_src)
+{
 /* reference count */
 bs_dest->refcnt = bs_src->refcnt;
 
@@ -2091,6 +2099,10 @@ void bdrv_swap(BlockDriverState *bs_new, 
BlockDriverState *bs_old)
 bdrv_move_feature_fields(bs_old, bs_new);
 bdrv_move_feature_fields(bs_new, );
 
+bdrv_move_reference_fields(, bs_old);
+bdrv_move_reference_fields(bs_old, bs_new);
+bdrv_move_reference_fields(bs_new, );
+
 /* bs_new must remain unattached */
 assert(!bs_new->blk);
 
-- 
1.8.3.1

[Qemu-block] [PATCH v2 10/16] block/io: Make bdrv_requests_pending() public

2015-10-01 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/io.c| 2 +-
 include/block/block_int.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index bf907cf..bf026a4 100644
--- a/block/io.c
+++ b/block/io.c
@@ -213,7 +213,7 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
 }
 
 /* Check if any requests are in-flight (including throttled requests) */
-static bool bdrv_requests_pending(BlockDriverState *bs)
+bool bdrv_requests_pending(BlockDriverState *bs)
 {
 if (!QLIST_EMPTY(>tracked_requests)) {
 return true;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 90971c0..4598101 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -667,5 +667,6 @@ bool blk_dev_is_medium_locked(BlockBackend *blk);
 void blk_dev_resize_cb(BlockBackend *blk);
 
 void bdrv_set_dirty(BlockDriverState *bs, int64_t cur_sector, int nr_sectors);
+bool bdrv_requests_pending(BlockDriverState *bs);
 
 #endif /* BLOCK_INT_H */
-- 
1.8.3.1

[Qemu-block] [PATCH v2 06/16] block: Remove bdrv_open_image()

2015-10-01 Thread Kevin Wolf

It is unused now.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c   | 34 --
 include/block/block.h |  4 
 2 files changed, 38 deletions(-)

diff --git a/block.c b/block.c
index 8fd345b..33ecd93 100644
--- a/block.c
+++ b/block.c
@@ -1279,40 +1279,6 @@ done:
 return c;
 }
 
-/*
- * This is a version of bdrv_open_child() that returns 0/-EINVAL instead of
- * a BdrvChild object.
- *
- * If allow_none is true, no image will be opened if filename is false and no
- * BlockdevRef is given. *pbs will remain unchanged and 0 will be returned.
- *
- * To conform with the behavior of bdrv_open(), *pbs has to be NULL.
- */
-int bdrv_open_image(BlockDriverState **pbs, const char *filename,
-QDict *options, const char *bdref_key,
-BlockDriverState* parent, const BdrvChildRole *child_role,
-bool allow_none, Error **errp)
-{
-Error *local_err = NULL;
-BdrvChild *c;
-
-assert(pbs);
-assert(*pbs == NULL);
-
-c = bdrv_open_child(filename, options, bdref_key, parent, child_role,
-allow_none, _err);
-if (local_err) {
-error_propagate(errp, local_err);
-return -EINVAL;
-}
-
-if (c != NULL) {
-*pbs = c->bs;
-}
-
-return 0;
-}
-
 int bdrv_append_temp_snapshot(BlockDriverState *bs, int flags, Error **errp)
 {
 /* TODO: extra byte is a hack to ensure MAX_PATH space on Windows. */
diff --git a/include/block/block.h b/include/block/block.h
index 7ebb35d..c6854a6 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -205,10 +205,6 @@ void bdrv_swap(BlockDriverState *bs_new, BlockDriverState 
*bs_old);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 int bdrv_parse_cache_flags(const char *mode, int *flags);
 int bdrv_parse_discard_flags(const char *mode, int *flags);
-int bdrv_open_image(BlockDriverState **pbs, const char *filename,
-QDict *options, const char *bdref_key,
-BlockDriverState* parent, const BdrvChildRole *child_role,
-bool allow_none, Error **errp);
 BdrvChild *bdrv_open_child(const char *filename,
QDict *options, const char *bdref_key,
BlockDriverState* parent,
-- 
1.8.3.1

[Qemu-block] [PATCH v2 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-01 Thread Kevin Wolf

Remember all parent nodes and just change the pointers there instead of
swapping the contents of the BlockDriverState.

Handling of snapshot=on must be moved further down in bdrv_open()
because *pbs (which is the bs pointer in the BlockBackend) must already
be set before bdrv_append() is called. Otherwise bdrv_append() changes
the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
it with the read-only original image.

We also need to be careful to update callers as the interface changes
(becomes less insane): Previously, the meaning of the two parameters was
inverted when bdrv_append() returns. Now any BDS pointers keep pointing
to the same node.

Signed-off-by: Kevin Wolf 
---
 block.c| 112 +
 blockdev.c |   2 +-
 2 files changed, 85 insertions(+), 29 deletions(-)

diff --git a/block.c b/block.c
index 9ff2b64..54ef5de 100644
--- a/block.c
+++ b/block.c
@@ -1516,15 +1516,6 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 
 bdrv_refresh_filename(bs);
 
-/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
- * temporary snapshot afterwards. */
-if (snapshot_flags) {
-ret = bdrv_append_temp_snapshot(bs, snapshot_flags, _err);
-if (local_err) {
-goto close_and_fail;
-}
-}
-
 /* Check if any unknown options were used */
 if (options && (qdict_size(options) != 0)) {
 const QDictEntry *entry = qdict_first(options);
@@ -1556,6 +1547,16 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 
 QDECREF(options);
 *pbs = bs;
+
+/* For snapshot=on, create a temporary qcow2 overlay. bs points to the
+ * temporary snapshot afterwards. */
+if (snapshot_flags) {
+ret = bdrv_append_temp_snapshot(bs, snapshot_flags, _err);
+if (local_err) {
+goto close_and_fail;
+}
+}
+
 return 0;
 
 fail:
@@ -2000,20 +2001,6 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 
 bs_dest->enable_write_cache = bs_src->enable_write_cache;
 
-/* i/o throttled req */
-bs_dest->throttle_state = bs_src->throttle_state,
-bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
-bs_dest->pending_reqs[0]= bs_src->pending_reqs[0];
-bs_dest->pending_reqs[1]= bs_src->pending_reqs[1];
-bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
-bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
-memcpy(_dest->round_robin,
-   _src->round_robin,
-   sizeof(bs_dest->round_robin));
-memcpy(_dest->throttle_timers,
-   _src->throttle_timers,
-   sizeof(ThrottleTimers));
-
 /* r/w error */
 bs_dest->on_read_error  = bs_src->on_read_error;
 bs_dest->on_write_error = bs_src->on_write_error;
@@ -2027,10 +2014,25 @@ static void bdrv_move_feature_fields(BlockDriverState 
*bs_dest,
 }
 
 /* Fields that only need to be swapped if the contents of BDSes is swapped
- * rather than pointers being changed in the parents. */
+ * rather than pointers being changed in the parents, and throttling fields
+ * because only bdrv_swap() messes with internals of throttling. */
 static void bdrv_move_reference_fields(BlockDriverState *bs_dest,
BlockDriverState *bs_src)
 {
+/* i/o throttled req */
+bs_dest->throttle_state = bs_src->throttle_state,
+bs_dest->io_limits_enabled  = bs_src->io_limits_enabled;
+bs_dest->pending_reqs[0]= bs_src->pending_reqs[0];
+bs_dest->pending_reqs[1]= bs_src->pending_reqs[1];
+bs_dest->throttled_reqs[0]  = bs_src->throttled_reqs[0];
+bs_dest->throttled_reqs[1]  = bs_src->throttled_reqs[1];
+memcpy(_dest->round_robin,
+   _src->round_robin,
+   sizeof(bs_dest->round_robin));
+memcpy(_dest->throttle_timers,
+   _src->throttle_timers,
+   sizeof(ThrottleTimers));
+
 /* reference count */
 bs_dest->refcnt = bs_src->refcnt;
 
@@ -2156,6 +2158,45 @@ void bdrv_swap(BlockDriverState *bs_new, 
BlockDriverState *bs_old)
 bdrv_rebind(bs_old);
 }
 
+static void change_parent_backing_link(BlockDriverState *from,
+   BlockDriverState *to)
+{
+BdrvChild *c, *next;
+
+QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
+assert(c->role != _backing);
+c->bs = to;
+QLIST_REMOVE(c, next_parent);
+QLIST_INSERT_HEAD(>parents, c, next_parent);
+bdrv_ref(to);
+bdrv_unref(from);
+}
+if (from->blk) {
+blk_set_bs(from->blk, to);
+if (!to->device_list.tqe_prev) {
+QTAILQ_INSERT_BEFORE(from, to, device_list);
+}
+QTAILQ_REMOVE(_states, from, device_list);
+}
+}
+
+static void swap_feature_fields(BlockDriverState *bs_top,
+

[Qemu-block] [PATCH v2 05/16] block: Convert bs->file to BdrvChild

2015-10-01 Thread Kevin Wolf

This patch removes the temporary duplication between bs->file and
bs->file_child by converting everything to BdrvChild.

Signed-off-by: Kevin Wolf 
---
 block.c   | 63 ++-
 block/blkdebug.c  | 32 +---
 block/blkverify.c | 33 ++---
 block/bochs.c |  8 +++---
 block/cloop.c | 10 
 block/dmg.c   | 20 +++
 block/io.c| 50 ++---
 block/parallels.c | 38 ++--
 block/qapi.c  |  2 +-
 block/qcow.c  | 42 ---
 block/qcow2-cache.c   | 11 +
 block/qcow2-cluster.c | 38 
 block/qcow2-refcount.c| 45 +
 block/qcow2-snapshot.c| 30 +++---
 block/qcow2.c | 62 --
 block/qed-table.c |  4 +--
 block/qed.c   | 32 
 block/raw_bsd.c   | 40 +++---
 block/snapshot.c  | 12 -
 block/vdi.c   | 17 +++--
 block/vhdx-log.c  | 25 ++-
 block/vhdx.c  | 36 ++-
 block/vmdk.c  | 27 ++--
 block/vpc.c   | 34 +
 include/block/block.h |  8 +-
 include/block/block_int.h |  3 +--
 26 files changed, 378 insertions(+), 344 deletions(-)

diff --git a/block.c b/block.c
index 75afed1..8fd345b 100644
--- a/block.c
+++ b/block.c
@@ -809,7 +809,7 @@ static QemuOptsList bdrv_runtime_opts = {
  *
  * Removes all processed options from *options.
  */
-static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
+static int bdrv_open_common(BlockDriverState *bs, BdrvChild *file,
 QDict *options, int flags, BlockDriver *drv, Error **errp)
 {
 int ret, open_flags;
@@ -823,7 +823,7 @@ static int bdrv_open_common(BlockDriverState *bs, 
BlockDriverState *file,
 assert(options != NULL && bs->options != options);
 
 if (file != NULL) {
-filename = file->filename;
+filename = file->bs->filename;
 } else {
 filename = qdict_get_try_str(options, "filename");
 }
@@ -1401,7 +1401,8 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
  const BdrvChildRole *child_role, Error **errp)
 {
 int ret;
-BlockDriverState *file = NULL, *bs;
+BdrvChild *file = NULL;
+BlockDriverState *bs;
 BlockDriver *drv = NULL;
 const char *drvname;
 Error *local_err = NULL;
@@ -1485,25 +1486,20 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 flags = bdrv_backing_flags(flags);
 }
 
-assert(file == NULL);
 bs->open_flags = flags;
 
-bs->file_child = bdrv_open_child(filename, options, "file", bs,
- _file, true, _err);
+file = bdrv_open_child(filename, options, "file", bs,
+   _file, true, _err);
 if (local_err) {
 ret = -EINVAL;
 goto fail;
 }
-
-if (bs->file_child) {
-file = bs->file_child->bs;
-}
 }
 
 /* Image format probing */
 bs->probed = !drv;
 if (!drv && file) {
-ret = find_image_format(file, filename, , _err);
+ret = find_image_format(file->bs, filename, , _err);
 if (ret < 0) {
 goto fail;
 }
@@ -1526,7 +1522,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 }
 
 if (file && (bs->file != file)) {
-bdrv_unref(file);
+bdrv_unref_child(bs, file);
 file = NULL;
 }
 
@@ -1587,7 +1583,7 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 
 fail:
 if (file != NULL) {
-bdrv_unref(file);
+bdrv_unref_child(bs, file);
 }
 QDECREF(bs->options);
 QDECREF(options);
@@ -1928,6 +1924,7 @@ void bdrv_close(BlockDriverState *bs)
 BdrvChild *child, *next;
 
 bs->drv->bdrv_close(bs);
+bs->drv = NULL;
 
 if (bs->backing_hd) {
 BlockDriverState *backing_hd = bs->backing_hd;
@@ -1935,6 +1932,11 @@ void bdrv_close(BlockDriverState *bs)
 bdrv_unref(backing_hd);
 }
 
+if (bs->file != NULL) {
+bdrv_unref_child(bs, bs->file);
+bs->file = NULL;
+}
+
 QLIST_FOREACH_SAFE(child, >children, next, next) {
 /* TODO Remove bdrv_unref() from drivers' close function and use
  * bdrv_unref_child() here */
@@ -1946,7 +1948,6 @@ void bdrv_close(BlockDriverState *bs)
 
 g_free(bs->opaque);
 bs->opaque = NULL;
-bs->drv = NULL;

[Qemu-block] [PATCH v2 00/16] block: Get rid of bdrv_swap()

2015-10-01 Thread Kevin Wolf

bdrv_swap() has always been an ugly hack that we would rather have
avoided.  When it was introduced, we simply didn't have the
infrastructure to update pointers instead of transplanting the contents
of BDS object, so we grudgingly added bdrv_swap() as a quick solution.
Meanwhile, most of the infrastructure exists and this series implements
the final step necessary to implement the required functionality in a
less adventurous way.

v2:
- Patch 4 ('quorum: Convert to BdrvChild')
  Use bdrv_unref_child() instead of bdrv_unref() [Berto]

- Patch 5 ('block: Convert bs->file to BdrvChild')
  bdrv_close: Set bs->drv = NULL immediately after .bdrv_close [Berto]
  bdrv_close: bdrv_unref_child() instead of bdrv_unref() for bs->file [Max]

- Patch 7 ('block: Convert bs->backing_hd to BdrvChild')
  Use backing_bs() more consistently instead of open-coding and introducing
  additional bugs while doing so [Max]

- Patch 8 ('block: Manage backing file references in bdrv_set_backing_hd()')
  mirror: Use backing instead of s->base->backing->bs [Max]
  stream: More simplication, remove close_unused_images() altogether [Berto]

- Patch 11 ('block-backend: Add blk_set_bs()')
  assert(bs->blk == NULL) [Max]

- Patch 13 ('block: Implement bdrv_append() without bdrv_swap()')
  bdrv_ref/unref in change_parent_backing_link [Max]
  Improved throttle state handling in swap_feature_fields() [Berto]
  Fixed external_snapshot_commit() [Berto]

- Patch 14 ('blockjob: Store device name at job creation')
  Use new job->id more consistently [Max]

Kevin Wolf (16):
  block: Introduce BDS.file_child
  vmdk: Use BdrvChild instead of BDS for references to extents
  blkverify: Convert s->test_file to BdrvChild
  quorum: Convert to BdrvChild
  block: Convert bs->file to BdrvChild
  block: Remove bdrv_open_image()
  block: Convert bs->backing_hd to BdrvChild
  block: Manage backing file references in bdrv_set_backing_hd()
  block: Split bdrv_move_feature_fields()
  block/io: Make bdrv_requests_pending() public
  block-backend: Add blk_set_bs()
  block: Introduce parents list
  block: Implement bdrv_append() without bdrv_swap()
  blockjob: Store device name at job creation
  block: Add and use bdrv_replace_in_backing_chain()
  block: Remove bdrv_swap()

 block.c   | 470 +++---
 block/blkdebug.c  |  32 ++--
 block/blkverify.c |  68 ---
 block/block-backend.c |  17 ++
 block/bochs.c |   8 +-
 block/cloop.c |  10 +-
 block/dmg.c   |  20 +-
 block/io.c|  76 
 block/mirror.c|  22 +--
 block/parallels.c |  38 ++--
 block/qapi.c  |  10 +-
 block/qcow.c  |  46 ++---
 block/qcow2-cache.c   |  11 +-
 block/qcow2-cluster.c |  42 +++--
 block/qcow2-refcount.c|  45 ++---
 block/qcow2-snapshot.c|  30 +--
 block/qcow2.c |  68 +++
 block/qed-table.c |   4 +-
 block/qed.c   |  51 +++--
 block/quorum.c|  63 ---
 block/raw_bsd.c   |  40 ++--
 block/snapshot.c  |  12 +-
 block/stream.c|  34 +---
 block/vdi.c   |  17 +-
 block/vhdx-log.c  |  25 +--
 block/vhdx.c  |  36 ++--
 block/vmdk.c  | 133 ++---
 block/vpc.c   |  34 ++--
 block/vvfat.c |  19 +-
 blockdev.c|   6 +-
 blockjob.c|  15 +-
 include/block/block.h |  15 +-
 include/block/block_int.h |  20 +-
 include/block/blockjob.h  |   8 +
 include/qemu/queue.h  |   6 -
 qemu-img.c|  20 +-
 36 files changed, 762 insertions(+), 809 deletions(-)

-- 
1.8.3.1

[Qemu-block] [PATCH v2 03/16] blkverify: Convert s->test_file to BdrvChild

2015-10-01 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
---
 block/blkverify.c | 41 +
 1 file changed, 21 insertions(+), 20 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index d277e63..6b71622 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -14,7 +14,7 @@
 #include "qapi/qmp/qstring.h"
 
 typedef struct {
-BlockDriverState *test_file;
+BdrvChild *test_file;
 } BDRVBlkverifyState;
 
 typedef struct BlkverifyAIOCB BlkverifyAIOCB;
@@ -132,12 +132,12 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Open the test file */
-assert(s->test_file == NULL);
-ret = bdrv_open_image(>test_file, qemu_opt_get(opts, "x-image"), 
options,
-  "test", bs, _format, false, _err);
-if (ret < 0) {
+s->test_file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options,
+   "test", bs, _format, false,
+   _err);
+if (local_err) {
+ret = -EINVAL;
 error_propagate(errp, local_err);
-s->test_file = NULL;
 goto fail;
 }
 
@@ -151,7 +151,7 @@ static void blkverify_close(BlockDriverState *bs)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-bdrv_unref(s->test_file);
+bdrv_unref_child(bs, s->test_file);
 s->test_file = NULL;
 }
 
@@ -159,7 +159,7 @@ static int64_t blkverify_getlength(BlockDriverState *bs)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-return bdrv_getlength(s->test_file);
+return bdrv_getlength(s->test_file->bs);
 }
 
 static BlkverifyAIOCB *blkverify_aio_get(BlockDriverState *bs, bool is_write,
@@ -242,7 +242,7 @@ static BlockAIOCB *blkverify_aio_readv(BlockDriverState *bs,
 qemu_iovec_init(>raw_qiov, acb->qiov->niov);
 qemu_iovec_clone(>raw_qiov, qiov, acb->buf);
 
-bdrv_aio_readv(s->test_file, sector_num, qiov, nb_sectors,
+bdrv_aio_readv(s->test_file->bs, sector_num, qiov, nb_sectors,
blkverify_aio_cb, acb);
 bdrv_aio_readv(bs->file, sector_num, >raw_qiov, nb_sectors,
blkverify_aio_cb, acb);
@@ -257,7 +257,7 @@ static BlockAIOCB *blkverify_aio_writev(BlockDriverState 
*bs,
 BlkverifyAIOCB *acb = blkverify_aio_get(bs, true, sector_num, qiov,
 nb_sectors, cb, opaque);
 
-bdrv_aio_writev(s->test_file, sector_num, qiov, nb_sectors,
+bdrv_aio_writev(s->test_file->bs, sector_num, qiov, nb_sectors,
 blkverify_aio_cb, acb);
 bdrv_aio_writev(bs->file, sector_num, qiov, nb_sectors,
 blkverify_aio_cb, acb);
@@ -271,7 +271,7 @@ static BlockAIOCB *blkverify_aio_flush(BlockDriverState *bs,
 BDRVBlkverifyState *s = bs->opaque;
 
 /* Only flush test file, the raw file is not important */
-return bdrv_aio_flush(s->test_file, cb, opaque);
+return bdrv_aio_flush(s->test_file->bs, cb, opaque);
 }
 
 static bool blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
@@ -285,7 +285,7 @@ static bool 
blkverify_recurse_is_first_non_filter(BlockDriverState *bs,
 return true;
 }
 
-return bdrv_recurse_is_first_non_filter(s->test_file, candidate);
+return bdrv_recurse_is_first_non_filter(s->test_file->bs, candidate);
 }
 
 /* Propagate AioContext changes to ->test_file */
@@ -293,7 +293,7 @@ static void blkverify_detach_aio_context(BlockDriverState 
*bs)
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-bdrv_detach_aio_context(s->test_file);
+bdrv_detach_aio_context(s->test_file->bs);
 }
 
 static void blkverify_attach_aio_context(BlockDriverState *bs,
@@ -301,7 +301,7 @@ static void blkverify_attach_aio_context(BlockDriverState 
*bs,
 {
 BDRVBlkverifyState *s = bs->opaque;
 
-bdrv_attach_aio_context(s->test_file, new_context);
+bdrv_attach_aio_context(s->test_file->bs, new_context);
 }
 
 static void blkverify_refresh_filename(BlockDriverState *bs)
@@ -309,24 +309,25 @@ static void blkverify_refresh_filename(BlockDriverState 
*bs)
 BDRVBlkverifyState *s = bs->opaque;
 
 /* bs->file has already been refreshed */
-bdrv_refresh_filename(s->test_file);
+bdrv_refresh_filename(s->test_file->bs);
 
-if (bs->file->full_open_options && s->test_file->full_open_options) {
+if (bs->file->full_open_options && s->test_file->bs->full_open_options) {
 QDict *opts = qdict_new();
 qdict_put_obj(opts, "driver", QOBJECT(qstring_from_str("blkverify")));
 
 QINCREF(bs->file->full_open_options);
 qdict_put_obj(opts, "raw", QOBJECT(bs->file->full_open_options));
-QINCREF(s->test_file->full_open_options);
-qdict_put_obj(opts, "test", QOBJECT(s->test_file->full_open_options));
+QINCREF(s->test_file->bs->full_open_options);
+qdict_put_obj(opts, "test",
+  QOBJECT(s->test_file->bs->full_open_options));
 
 bs->full_open_options =

Re: [Qemu-block] [PATCH v3] block: mirror - fix full sync mode when target does not support zero init

2015-10-01 Thread Jeff Cody

On Thu, Oct 01, 2015 at 03:35:15PM +0200, Paolo Bonzini wrote:
> 
> 
> On 01/10/2015 15:14, Jeff Cody wrote:
> > During mirror, if the target device does not support zero init, a
> > mirror may result in a corrupted image for sync="full" mode.
> > 
> > This is due to how the initial dirty bitmap is set up prior to copying
> > data - we did not mark sectors as dirty that are unallocated.  This
> > means those unallocated sectors are skipped over on the target, and for
> > a device without zero init, invalid data may reside in those holes.
> > 
> > If both of the following conditions are true, then we will explicitly
> > mark all sectors as dirty:
> > 
> > 1.) sync = "full"
> > 2.) bdrv_has_zero_init(target) == false
> > 
> > If the target does support zero init, but a target image is passed in
> > with data already present (i.e. an "existing" image), it is assumed the
> > data present in the existing image is valid data for those sectors.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block/mirror.c | 8 +---
> >  1 file changed, 5 insertions(+), 3 deletions(-)
> > 
> > diff --git a/block/mirror.c b/block/mirror.c
> > index a258926..ce367e0 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -455,6 +455,8 @@ static void coroutine_fn mirror_run(void *opaque)
> >  if (!s->is_none_mode) {
> >  /* First part, loop on the sectors and initialize the dirty 
> > bitmap.  */
> >  BlockDriverState *base = s->base;
> > +bool mark_all_dirty = s->base == NULL && 
> > !bdrv_has_zero_init(s->target);
> > +
> >  for (sector_num = 0; sector_num < end; ) {
> >  /* Just to make sure we are not exceeding int limit. */
> >  int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
> > @@ -477,7 +479,7 @@ static void coroutine_fn mirror_run(void *opaque)
> >  }
> >  
> >  assert(n > 0);
> > -if (ret == 1) {
> > +if (ret == 1 || mark_all_dirty) {
> >  bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
> >  }
> >  sector_num += n;
> > @@ -767,8 +769,8 @@ void mirror_start(BlockDriverState *bs, 
> > BlockDriverState *target,
> >  base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
> >  mirror_start_job(bs, target, replaces,
> >   speed, granularity, buf_size,
> > - on_source_error, on_target_error, unmap, cb, opaque, 
> > errp,
> > - _job_driver, is_none_mode, base);
> > + on_source_error, on_target_error, unmap,
> > + cb, opaque, errp, _job_driver, is_none_mode, 
> > base);
> >  }
> >  
> >  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
> > 
> 
> Unnecessary last hunk, but the patch is good anyway.
> 
> Reviewed-by: Paolo Bonzini 

Thanks,

I'll go ahead and drop the last hunk when applying.

Jeff

[Qemu-block] [PATCH v2 02/16] vmdk: Use BdrvChild instead of BDS for references to extents

2015-10-01 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/vmdk.c | 99 +++-
 1 file changed, 51 insertions(+), 48 deletions(-)

diff --git a/block/vmdk.c b/block/vmdk.c
index be0d640..9702132 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -87,7 +87,7 @@ typedef struct {
 #define L2_CACHE_SIZE 16
 
 typedef struct VmdkExtent {
-BlockDriverState *file;
+BdrvChild *file;
 bool flat;
 bool compressed;
 bool has_marker;
@@ -221,8 +221,8 @@ static void vmdk_free_extents(BlockDriverState *bs)
 g_free(e->l2_cache);
 g_free(e->l1_backup_table);
 g_free(e->type);
-if (e->file != bs->file) {
-bdrv_unref(e->file);
+if (e->file != bs->file_child) {
+bdrv_unref_child(bs, e->file);
 }
 }
 g_free(s->extents);
@@ -367,7 +367,7 @@ static int vmdk_parent_open(BlockDriverState *bs)
 /* Create and append extent to the extent array. Return the added VmdkExtent
  * address. return NULL if allocation failed. */
 static int vmdk_add_extent(BlockDriverState *bs,
-   BlockDriverState *file, bool flat, int64_t sectors,
+   BdrvChild *file, bool flat, int64_t sectors,
int64_t l1_offset, int64_t l1_backup_offset,
uint32_t l1_size,
int l2_size, uint64_t cluster_sectors,
@@ -392,7 +392,7 @@ static int vmdk_add_extent(BlockDriverState *bs,
 return -EFBIG;
 }
 
-nb_sectors = bdrv_nb_sectors(file);
+nb_sectors = bdrv_nb_sectors(file->bs);
 if (nb_sectors < 0) {
 return nb_sectors;
 }
@@ -439,14 +439,14 @@ static int vmdk_init_tables(BlockDriverState *bs, 
VmdkExtent *extent,
 return -ENOMEM;
 }
 
-ret = bdrv_pread(extent->file,
+ret = bdrv_pread(extent->file->bs,
  extent->l1_table_offset,
  extent->l1_table,
  l1_size);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "Could not read l1 table from extent '%s'",
- extent->file->filename);
+ extent->file->bs->filename);
 goto fail_l1;
 }
 for (i = 0; i < extent->l1_size; i++) {
@@ -459,14 +459,14 @@ static int vmdk_init_tables(BlockDriverState *bs, 
VmdkExtent *extent,
 ret = -ENOMEM;
 goto fail_l1;
 }
-ret = bdrv_pread(extent->file,
+ret = bdrv_pread(extent->file->bs,
  extent->l1_backup_table_offset,
  extent->l1_backup_table,
  l1_size);
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "Could not read l1 backup table from extent '%s'",
- extent->file->filename);
+ extent->file->bs->filename);
 goto fail_l1b;
 }
 for (i = 0; i < extent->l1_size; i++) {
@@ -485,7 +485,7 @@ static int vmdk_init_tables(BlockDriverState *bs, 
VmdkExtent *extent,
 }
 
 static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
- BlockDriverState *file,
+ BdrvChild *file,
  int flags, Error **errp)
 {
 int ret;
@@ -493,11 +493,11 @@ static int vmdk_open_vmfs_sparse(BlockDriverState *bs,
 VMDK3Header header;
 VmdkExtent *extent;
 
-ret = bdrv_pread(file, sizeof(magic), , sizeof(header));
+ret = bdrv_pread(file->bs, sizeof(magic), , sizeof(header));
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "Could not read header from file '%s'",
- file->filename);
+ file->bs->filename);
 return ret;
 }
 ret = vmdk_add_extent(bs, file, false,
@@ -559,7 +559,7 @@ static char *vmdk_read_desc(BlockDriverState *file, 
uint64_t desc_offset,
 }
 
 static int vmdk_open_vmdk4(BlockDriverState *bs,
-   BlockDriverState *file,
+   BdrvChild *file,
int flags, QDict *options, Error **errp)
 {
 int ret;
@@ -570,17 +570,17 @@ static int vmdk_open_vmdk4(BlockDriverState *bs,
 BDRVVmdkState *s = bs->opaque;
 int64_t l1_backup_offset = 0;
 
-ret = bdrv_pread(file, sizeof(magic), , sizeof(header));
+ret = bdrv_pread(file->bs, sizeof(magic), , sizeof(header));
 if (ret < 0) {
 error_setg_errno(errp, -ret,
  "Could not read header from file '%s'",
- file->filename);
+ file->bs->filename);
 return -EINVAL;
 }
 if (header.capacity == 0) {
 uint64_t desc_offset = le64_to_cpu(header.desc_offset);
 if

[Qemu-block] [PATCH v2 07/16] block: Convert bs->backing_hd to BdrvChild

2015-10-01 Thread Kevin Wolf

This is the final step in converting all of the BlockDriverState
pointers that block drivers use to BdrvChild.

After this patch, bs->children contains the full list of child nodes
that are referenced by a given BDS, and these children are only
referenced through BdrvChild, so that updating the pointer in there is
enough for changing edges in the graph.

Signed-off-by: Kevin Wolf 
---
 block.c   | 116 +-
 block/io.c|  24 +-
 block/mirror.c|   6 +--
 block/qapi.c  |   8 ++--
 block/qcow.c  |   4 +-
 block/qcow2-cluster.c |   4 +-
 block/qcow2.c |   6 +--
 block/qed.c   |  12 ++---
 block/stream.c|   8 ++--
 block/vmdk.c  |  21 +
 block/vvfat.c |   6 +--
 blockdev.c|   4 +-
 include/block/block_int.h |  12 +++--
 qemu-img.c|   4 +-
 14 files changed, 125 insertions(+), 110 deletions(-)

diff --git a/block.c b/block.c
index 33ecd93..20a3c5c 100644
--- a/block.c
+++ b/block.c
@@ -721,7 +721,7 @@ const BdrvChildRole child_format = {
 };
 
 /*
- * Returns the flags that bs->backing_hd should get, based on the given flags
+ * Returns the flags that bs->backing should get, based on the given flags
  * for the parent BDS
  */
 static int bdrv_backing_flags(int flags)
@@ -1115,32 +1115,31 @@ void bdrv_unref_child(BlockDriverState *parent, 
BdrvChild *child)
 void bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd)
 {
 
-if (bs->backing_hd) {
+if (bs->backing) {
 assert(bs->backing_blocker);
-bdrv_op_unblock_all(bs->backing_hd, bs->backing_blocker);
-bdrv_detach_child(bs->backing_child);
+bdrv_op_unblock_all(bs->backing->bs, bs->backing_blocker);
+bdrv_detach_child(bs->backing);
 } else if (backing_hd) {
 error_setg(>backing_blocker,
"node is used as backing hd of '%s'",
bdrv_get_device_or_node_name(bs));
 }
 
-bs->backing_hd = backing_hd;
 if (!backing_hd) {
 error_free(bs->backing_blocker);
 bs->backing_blocker = NULL;
-bs->backing_child = NULL;
+bs->backing = NULL;
 goto out;
 }
-bs->backing_child = bdrv_attach_child(bs, backing_hd, _backing);
+bs->backing = bdrv_attach_child(bs, backing_hd, _backing);
 bs->open_flags &= ~BDRV_O_NO_BACKING;
 pstrcpy(bs->backing_file, sizeof(bs->backing_file), backing_hd->filename);
 pstrcpy(bs->backing_format, sizeof(bs->backing_format),
 backing_hd->drv ? backing_hd->drv->format_name : "");
 
-bdrv_op_block_all(bs->backing_hd, bs->backing_blocker);
+bdrv_op_block_all(backing_hd, bs->backing_blocker);
 /* Otherwise we won't be able to commit due to check in bdrv_commit */
-bdrv_op_unblock(bs->backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
+bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET,
 bs->backing_blocker);
 out:
 bdrv_refresh_limits(bs, NULL);
@@ -1161,7 +1160,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 BlockDriverState *backing_hd;
 Error *local_err = NULL;
 
-if (bs->backing_hd != NULL) {
+if (bs->backing != NULL) {
 QDECREF(options);
 goto free_exit;
 }
@@ -1201,7 +1200,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*options, Error **errp)
 qdict_put(options, "driver", qstring_from_str(bs->backing_format));
 }
 
-assert(bs->backing_hd == NULL);
+assert(bs->backing == NULL);
 ret = bdrv_open_inherit(_hd,
 *backing_filename ? backing_filename : NULL,
 NULL, options, 0, bs, _backing, _err);
@@ -1892,8 +1891,8 @@ void bdrv_close(BlockDriverState *bs)
 bs->drv->bdrv_close(bs);
 bs->drv = NULL;
 
-if (bs->backing_hd) {
-BlockDriverState *backing_hd = bs->backing_hd;
+if (bs->backing) {
+BlockDriverState *backing_hd = bs->backing->bs;
 bdrv_set_backing_hd(bs, NULL);
 bdrv_unref(backing_hd);
 }
@@ -2205,20 +2204,20 @@ int bdrv_commit(BlockDriverState *bs)
 if (!drv)
 return -ENOMEDIUM;
 
-if (!bs->backing_hd) {
+if (!bs->backing) {
 return -ENOTSUP;
 }
 
 if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_COMMIT_SOURCE, NULL) ||
-bdrv_op_is_blocked(bs->backing_hd, BLOCK_OP_TYPE_COMMIT_TARGET, NULL)) 
{
+bdrv_op_is_blocked(bs->backing->bs, BLOCK_OP_TYPE_COMMIT_TARGET, 
NULL)) {
 return -EBUSY;
 }
 
-ro = bs->backing_hd->read_only;
-open_flags =  bs->backing_hd->open_flags;
+ro = bs->backing->bs->read_only;
+open_flags =  bs->backing->bs->open_flags;
 
 if (ro) {
-if (bdrv_reopen(bs->backing_hd, open_flags | BDRV_O_RDWR, NULL)) {
+if

[Qemu-block] [PATCH v3] block: mirror - fix full sync mode when target does not support zero init

2015-10-01 Thread Jeff Cody

During mirror, if the target device does not support zero init, a
mirror may result in a corrupted image for sync="full" mode.

This is due to how the initial dirty bitmap is set up prior to copying
data - we did not mark sectors as dirty that are unallocated.  This
means those unallocated sectors are skipped over on the target, and for
a device without zero init, invalid data may reside in those holes.

If both of the following conditions are true, then we will explicitly
mark all sectors as dirty:

1.) sync = "full"
2.) bdrv_has_zero_init(target) == false

If the target does support zero init, but a target image is passed in
with data already present (i.e. an "existing" image), it is assumed the
data present in the existing image is valid data for those sectors.

Signed-off-by: Jeff Cody 
---
 block/mirror.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index a258926..ce367e0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -455,6 +455,8 @@ static void coroutine_fn mirror_run(void *opaque)
 if (!s->is_none_mode) {
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
 BlockDriverState *base = s->base;
+bool mark_all_dirty = s->base == NULL && 
!bdrv_has_zero_init(s->target);
+
 for (sector_num = 0; sector_num < end; ) {
 /* Just to make sure we are not exceeding int limit. */
 int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
@@ -477,7 +479,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }
 
 assert(n > 0);
-if (ret == 1) {
+if (ret == 1 || mark_all_dirty) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
 }
 sector_num += n;
@@ -767,8 +769,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
*target,
 base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
 mirror_start_job(bs, target, replaces,
  speed, granularity, buf_size,
- on_source_error, on_target_error, unmap, cb, opaque, errp,
- _job_driver, is_none_mode, base);
+ on_source_error, on_target_error, unmap,
+ cb, opaque, errp, _job_driver, is_none_mode, base);
 }
 
 void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
-- 
1.9.3

Re: [Qemu-block] [PATCH v3] block: mirror - fix full sync mode when target does not support zero init

2015-10-01 Thread Paolo Bonzini



On 01/10/2015 15:14, Jeff Cody wrote:
> During mirror, if the target device does not support zero init, a
> mirror may result in a corrupted image for sync="full" mode.
> 
> This is due to how the initial dirty bitmap is set up prior to copying
> data - we did not mark sectors as dirty that are unallocated.  This
> means those unallocated sectors are skipped over on the target, and for
> a device without zero init, invalid data may reside in those holes.
> 
> If both of the following conditions are true, then we will explicitly
> mark all sectors as dirty:
> 
> 1.) sync = "full"
> 2.) bdrv_has_zero_init(target) == false
> 
> If the target does support zero init, but a target image is passed in
> with data already present (i.e. an "existing" image), it is assumed the
> data present in the existing image is valid data for those sectors.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index a258926..ce367e0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -455,6 +455,8 @@ static void coroutine_fn mirror_run(void *opaque)
>  if (!s->is_none_mode) {
>  /* First part, loop on the sectors and initialize the dirty bitmap.  
> */
>  BlockDriverState *base = s->base;
> +bool mark_all_dirty = s->base == NULL && 
> !bdrv_has_zero_init(s->target);
> +
>  for (sector_num = 0; sector_num < end; ) {
>  /* Just to make sure we are not exceeding int limit. */
>  int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
> @@ -477,7 +479,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  }
>  
>  assert(n > 0);
> -if (ret == 1) {
> +if (ret == 1 || mark_all_dirty) {
>  bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
>  }
>  sector_num += n;
> @@ -767,8 +769,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>  base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
>  mirror_start_job(bs, target, replaces,
>   speed, granularity, buf_size,
> - on_source_error, on_target_error, unmap, cb, opaque, 
> errp,
> - _job_driver, is_none_mode, base);
> + on_source_error, on_target_error, unmap,
> + cb, opaque, errp, _job_driver, is_none_mode, 
> base);
>  }
>  
>  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
> 

Unnecessary last hunk, but the patch is good anyway.

Reviewed-by: Paolo Bonzini

[Qemu-block] [PATCH v2 04/16] quorum: Convert to BdrvChild

2015-10-01 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block/quorum.c | 63 ++
 1 file changed, 33 insertions(+), 30 deletions(-)

diff --git a/block/quorum.c b/block/quorum.c
index 8fe53b4..c4cda32 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -64,7 +64,7 @@ typedef struct QuorumVotes {
 
 /* the following structure holds the state of one quorum instance */
 typedef struct BDRVQuorumState {
-BlockDriverState **bs; /* children BlockDriverStates */
+BdrvChild **children;  /* children BlockDriverStates */
 int num_children;  /* children count */
 int threshold; /* if less than threshold children reads gave the
 * same result a quorum error occurs.
@@ -336,7 +336,7 @@ static void quorum_report_bad_versions(BDRVQuorumState *s,
 continue;
 }
 QLIST_FOREACH(item, >items, next) {
-quorum_report_bad(acb, s->bs[item->index]->node_name, 0);
+quorum_report_bad(acb, s->children[item->index]->bs->node_name, 0);
 }
 }
 }
@@ -369,8 +369,9 @@ static bool quorum_rewrite_bad_versions(BDRVQuorumState *s, 
QuorumAIOCB *acb,
 continue;
 }
 QLIST_FOREACH(item, >items, next) {
-bdrv_aio_writev(s->bs[item->index], acb->sector_num, acb->qiov,
-acb->nb_sectors, quorum_rewrite_aio_cb, acb);
+bdrv_aio_writev(s->children[item->index]->bs, acb->sector_num,
+acb->qiov, acb->nb_sectors, quorum_rewrite_aio_cb,
+acb);
 }
 }
 
@@ -639,13 +640,13 @@ static BlockAIOCB *read_quorum_children(QuorumAIOCB *acb)
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-acb->qcrs[i].buf = qemu_blockalign(s->bs[i], acb->qiov->size);
+acb->qcrs[i].buf = qemu_blockalign(s->children[i]->bs, 
acb->qiov->size);
 qemu_iovec_init(>qcrs[i].qiov, acb->qiov->niov);
 qemu_iovec_clone(>qcrs[i].qiov, acb->qiov, acb->qcrs[i].buf);
 }
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_aio_readv(s->bs[i], acb->sector_num, >qcrs[i].qiov,
+bdrv_aio_readv(s->children[i]->bs, acb->sector_num, >qcrs[i].qiov,
acb->nb_sectors, quorum_aio_cb, >qcrs[i]);
 }
 
@@ -656,12 +657,12 @@ static BlockAIOCB *read_fifo_child(QuorumAIOCB *acb)
 {
 BDRVQuorumState *s = acb->common.bs->opaque;
 
-acb->qcrs[acb->child_iter].buf = qemu_blockalign(s->bs[acb->child_iter],
- acb->qiov->size);
+acb->qcrs[acb->child_iter].buf =
+qemu_blockalign(s->children[acb->child_iter]->bs, acb->qiov->size);
 qemu_iovec_init(>qcrs[acb->child_iter].qiov, acb->qiov->niov);
 qemu_iovec_clone(>qcrs[acb->child_iter].qiov, acb->qiov,
  acb->qcrs[acb->child_iter].buf);
-bdrv_aio_readv(s->bs[acb->child_iter], acb->sector_num,
+bdrv_aio_readv(s->children[acb->child_iter]->bs, acb->sector_num,
>qcrs[acb->child_iter].qiov, acb->nb_sectors,
quorum_aio_cb, >qcrs[acb->child_iter]);
 
@@ -702,8 +703,8 @@ static BlockAIOCB *quorum_aio_writev(BlockDriverState *bs,
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-acb->qcrs[i].aiocb = bdrv_aio_writev(s->bs[i], sector_num, qiov,
- nb_sectors, _aio_cb,
+acb->qcrs[i].aiocb = bdrv_aio_writev(s->children[i]->bs, sector_num,
+ qiov, nb_sectors, _aio_cb,
  >qcrs[i]);
 }
 
@@ -717,12 +718,12 @@ static int64_t quorum_getlength(BlockDriverState *bs)
 int i;
 
 /* check that all file have the same length */
-result = bdrv_getlength(s->bs[0]);
+result = bdrv_getlength(s->children[0]->bs);
 if (result < 0) {
 return result;
 }
 for (i = 1; i < s->num_children; i++) {
-int64_t value = bdrv_getlength(s->bs[i]);
+int64_t value = bdrv_getlength(s->children[i]->bs);
 if (value < 0) {
 return value;
 }
@@ -741,7 +742,7 @@ static void quorum_invalidate_cache(BlockDriverState *bs, 
Error **errp)
 int i;
 
 for (i = 0; i < s->num_children; i++) {
-bdrv_invalidate_cache(s->bs[i], _err);
+bdrv_invalidate_cache(s->children[i]->bs, _err);
 if (local_err) {
 error_propagate(errp, local_err);
 return;
@@ -762,7 +763,7 @@ static coroutine_fn int quorum_co_flush(BlockDriverState 
*bs)
 error_votes.compare = quorum_64bits_compare;
 
 for (i = 0; i < s->num_children; i++) {
-result = bdrv_co_flush(s->bs[i]);
+result = bdrv_co_flush(s->children[i]->bs);
 result_value.l = result;
 quorum_count_vote(_votes, _value, i);
 }
@@ -782,7 +783,7

[Qemu-block] [PATCH v2 01/16] block: Introduce BDS.file_child

2015-10-01 Thread Kevin Wolf

Store the BdrvChild for bs->file. At this point, bs->file_child->bs just
duplicates the bs->file pointer. Later, it will completely replace it.

Signed-off-by: Kevin Wolf 
Reviewed-by: Max Reitz 
Reviewed-by: Alberto Garcia 
---
 block.c   | 12 +---
 include/block/block_int.h |  1 +
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 1f90b47..75afed1 100644
--- a/block.c
+++ b/block.c
@@ -1487,11 +1487,17 @@ static int bdrv_open_inherit(BlockDriverState **pbs, 
const char *filename,
 
 assert(file == NULL);
 bs->open_flags = flags;
-ret = bdrv_open_image(, filename, options, "file",
-  bs, _file, true, _err);
-if (ret < 0) {
+
+bs->file_child = bdrv_open_child(filename, options, "file", bs,
+ _file, true, _err);
+if (local_err) {
+ret = -EINVAL;
 goto fail;
 }
+
+if (bs->file_child) {
+file = bs->file_child->bs;
+}
 }
 
 /* Image format probing */
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 14ad4c3..d0dd93e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -381,6 +381,7 @@ struct BlockDriverState {
 BlockDriverState *backing_hd;
 BdrvChild *backing_child;
 BlockDriverState *file;
+BdrvChild *file_child;
 
 NotifierList close_notifiers;
 
-- 
1.8.3.1

[Qemu-block] [PATCH 3/3] block: prohibit migration during transactions

2015-10-01 Thread John Snow

Similarly to BlockJobs, prohibit migration at least
during the synchronous phase of a transaction.

In particular, this guards internal and external
snapshots, which are implemented via transaction actions
through blockdev_do_action.

Signed-off-by: John Snow 
---
 blockdev.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 32b04b4..231dcf6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -49,6 +49,7 @@
 #include "qmp-commands.h"
 #include "trace.h"
 #include "sysemu/arch_init.h"
+#include "migration/migration.h"
 
 static const char *const if_name[IF_COUNT] = {
 [IF_NONE] = "none",
@@ -1759,6 +1760,14 @@ void qmp_transaction(TransactionActionList *dev_list, 
Error **errp)
 TransactionActionList *dev_entry = dev_list;
 BlkTransactionState *state, *next;
 Error *local_err = NULL;
+Error *blocker = NULL;
+int ret;
+
+error_setg(, "Block device(s) are in use by a Block Transaction");
+ret = migrate_add_blocker(blocker, errp);
+if (ret < 0) {
+goto cleanup_mig;
+}
 
 QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionState) snap_bdrv_states;
 QSIMPLEQ_INIT(_bdrv_states);
@@ -1814,6 +1823,9 @@ exit:
 }
 g_free(state);
 }
+ cleanup_mig:
+migrate_del_blocker(blocker);
+error_free(blocker);
 }
 
 
-- 
2.4.3

[Qemu-block] [PATCH 0/3] block: prohibit migrations during tasks

2015-10-01 Thread John Snow

requires:
[PATCH v2] migration: disallow migrate_add_blocker during migration

We don't want to allow migrations during sensitive
operations such as snapshots or mirroring. In conjunction
with the previous patch, we will also prohibit the
user from starting any block jobs while migrations
are active.

Questions:
 - Are there other actions that need to be guarded?
 - Are there actions here that are guarded, but
   should not be?
 - Is this worth doing at all? libvirt saves us
   in most cases.
 - What other cases besides a fully synchronized
   mirror might be valid in a migration workflow?

Known open issues:
 - Does not guard against incoming migrations,
   only outgoing ones. Is this a problem? Are
   there valid use cases for running jobs on
   a machine before or during an incoming migration?

John Snow (3):
  block: prohibit migration during BlockJobs
  block/mirror: allow migration after sync
  block: prohibit migration during transactions

 block/mirror.c   |  2 ++
 blockdev.c   | 12 
 blockjob.c   | 16 
 include/block/blockjob.h |  8 
 4 files changed, 38 insertions(+)

-- 
2.4.3

Re: [Qemu-block] [Qemu-devel] [PATCH] qtest/ide-test: ppc64be correction for ATAPI tests

2015-10-01 Thread John Snow



On 09/28/2015 01:38 PM, John Snow wrote:
> the 16bit ide data register is LE by definition.
> 
> Signed-off-by: John Snow 
> ---
>  tests/ide-test.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/ide-test.c b/tests/ide-test.c
> index 5594738..b6e9e1a 100644
> --- a/tests/ide-test.c
> +++ b/tests/ide-test.c
> @@ -633,7 +633,7 @@ static void send_scsi_cdb_read10(uint64_t lba, int 
> nblocks)
>  
>  /* Send Packet */
>  for (i = 0; i < sizeof(Read10CDB)/2; i++) {
> -outw(IDE_BASE + reg_data, ((uint16_t *))[i]);
> +outw(IDE_BASE + reg_data, cpu_to_le16(((uint16_t *))[i]));
>  }
>  }
>  
> @@ -733,7 +733,7 @@ static void cdrom_pio_impl(int nblocks)
>  size_t offset = i * (limit / 2);
>  size_t rem = (rxsize / 2) - offset;
>  for (j = 0; j < MIN((limit / 2), rem); j++) {
> -rx[offset + j] = inw(IDE_BASE + reg_data);
> +rx[offset + j] = le16_to_cpu(inw(IDE_BASE + reg_data));
>  }
>  ide_wait_intr(IDE_PRIMARY_IRQ);
>  }
> 

Tentatively staged. Barring objections, I will send a pull request for
this tomorrow, thanks.

https://github.com/jnsnow/qemu/commits/ide
https://github.com/jnsnow/qemu.git

--js

[Qemu-block] [PATCH 2/3] block/mirror: allow migration after sync

2015-10-01 Thread John Snow

After the mirroring blockjob reaches synchronization,
allow migration to resume.

Signed-off-by: John Snow 
---
 block/mirror.c   | 2 ++
 blockjob.c   | 5 +
 include/block/blockjob.h | 8 
 3 files changed, 15 insertions(+)

diff --git a/block/mirror.c b/block/mirror.c
index a258926..5eb469a 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -415,6 +415,7 @@ static void coroutine_fn mirror_run(void *opaque)
 /* Report BLOCK_JOB_READY and wait for complete. */
 block_job_event_ready(>common);
 s->synced = true;
+block_job_allow_migration(>common);
 while (!block_job_is_cancelled(>common) && !s->should_complete) {
 block_job_yield(>common);
 }
@@ -540,6 +541,7 @@ static void coroutine_fn mirror_run(void *opaque)
 if (!s->synced) {
 block_job_event_ready(>common);
 s->synced = true;
+block_job_allow_migration(>common);
 }
 
 should_complete = s->should_complete ||
diff --git a/blockjob.c b/blockjob.c
index b5849b3..bcb4fca 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -95,6 +95,11 @@ void block_job_release(BlockDriverState *bs)
 g_free(job);
 }
 
+void block_job_allow_migration(BlockJob *job)
+{
+migrate_del_blocker(job->blocker);
+}
+
 void block_job_completed(BlockJob *job, int ret)
 {
 BlockDriverState *bs = job->bs;
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index dd9d5e6..2278484 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -147,6 +147,14 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
void *opaque, Error **errp);
 
 /**
+ * block_job_allow_migration:
+ * @job: The job to modify to lift its restriction on migrations.
+ *
+ * Lift this job's restriction on prohibiting outgoing migrations.
+ */
+void block_job_allow_migration(BlockJob *job);
+
+/**
  * block_job_sleep_ns:
  * @job: The job that calls the function.
  * @clock: The clock to sleep on.
-- 
2.4.3

[Qemu-block] [PATCH 1/3] block: prohibit migration during BlockJobs

2015-10-01 Thread John Snow

Unless we can prove this to be safe for specific cases,
the default should be to prohibit migration during BlockJobs.

In conjunction with
"migration: disallow_migrate_add_blocker during migration",
this should be sufficient to disallow the blockjob from starting
in the event of an in-progress migration.

Signed-off-by: John Snow 
---
 blockjob.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 62bb906..b5849b3 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -35,12 +35,14 @@
 #include "qmp-commands.h"
 #include "qemu/timer.h"
 #include "qapi-event.h"
+#include "migration/migration.h"
 
 void *block_job_create(const BlockJobDriver *driver, BlockDriverState *bs,
int64_t speed, BlockCompletionFunc *cb,
void *opaque, Error **errp)
 {
 BlockJob *job;
+int ret;
 
 if (bs->job) {
 error_setg(errp, QERR_DEVICE_IN_USE, bdrv_get_device_name(bs));
@@ -71,6 +73,14 @@ void *block_job_create(const BlockJobDriver *driver, 
BlockDriverState *bs,
 return NULL;
 }
 }
+
+/* Re-use the op-blocker as a migration blocker */
+ret = migrate_add_blocker(job->blocker, errp);
+if (ret < 0) {
+block_job_release(bs);
+return NULL;
+}
+
 return job;
 }
 
@@ -80,6 +90,7 @@ void block_job_release(BlockDriverState *bs)
 
 bs->job = NULL;
 bdrv_op_unblock_all(bs, job->blocker);
+migrate_del_blocker(job->blocker);
 error_free(job->blocker);
 g_free(job);
 }
-- 
2.4.3

Re: [Qemu-block] [PATCH 3/3] block: prohibit migration during transactions

2015-10-01 Thread Paolo Bonzini



On 01/10/2015 18:34, John Snow wrote:
> +
> +error_setg(, "Block device(s) are in use by a Block 
> Transaction");

s/Block Transaction/transaction command/

But how can migration start during a transaction?

> +ret = migrate_add_blocker(blocker, errp);
> +if (ret < 0) {
> +goto cleanup_mig;
> +}
>  
>  QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionState) snap_bdrv_states;
>  QSIMPLEQ_INIT(_bdrv_states);
> @@ -1814,6 +1823,9 @@ exit:
>  }
>  g_free(state);
>  }
> + cleanup_mig:
> +migrate_del_blocker(blocker);
> +error_free(blocker);

[Qemu-block] [PULL 0/1] Block job patches

2015-10-01 Thread Jeff Cody

The following changes since commit fa500928ad9da6dd570918e3dfca13c029af07a8:

  Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20150930' 
into staging (2015-10-01 10:49:38 +0100)

are available in the git repository at:


  g...@github.com:codyprime/qemu-kvm-jtc.git tags/block-pull-request

for you to fetch changes up to 5279efebcf8f8fbf2ed2feed63cdb9d375c7cd07:

  block: mirror - fix full sync mode when target does not support zero init 
(2015-10-01 15:02:21 -0400)


Block job patches


Jeff Cody (1):
  block: mirror - fix full sync mode when target does not support zero
init

 block/mirror.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

-- 
1.9.3

[Qemu-block] [PULL 1/1] block: mirror - fix full sync mode when target does not support zero init

2015-10-01 Thread Jeff Cody

During mirror, if the target device does not support zero init, a
mirror may result in a corrupted image for sync="full" mode.

This is due to how the initial dirty bitmap is set up prior to copying
data - we did not mark sectors as dirty that are unallocated.  This
means those unallocated sectors are skipped over on the target, and for
a device without zero init, invalid data may reside in those holes.

If both of the following conditions are true, then we will explicitly
mark all sectors as dirty:

1.) sync = "full"
2.) bdrv_has_zero_init(target) == false

If the target does support zero init, but a target image is passed in
with data already present (i.e. an "existing" image), it is assumed the
data present in the existing image is valid data for those sectors.

Reviewed-by: Paolo Bonzini 
Message-id: 
91ed4bc5bda7e2b09eb508b07c83f4071fe0b3c9.1443705220.git.jc...@redhat.com
Signed-off-by: Jeff Cody 
---
 block/mirror.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index a258926..87928ab 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -455,6 +455,8 @@ static void coroutine_fn mirror_run(void *opaque)
 if (!s->is_none_mode) {
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
 BlockDriverState *base = s->base;
+bool mark_all_dirty = s->base == NULL && 
!bdrv_has_zero_init(s->target);
+
 for (sector_num = 0; sector_num < end; ) {
 /* Just to make sure we are not exceeding int limit. */
 int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
@@ -477,7 +479,7 @@ static void coroutine_fn mirror_run(void *opaque)
 }
 
 assert(n > 0);
-if (ret == 1) {
+if (ret == 1 || mark_all_dirty) {
 bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
 }
 sector_num += n;
-- 
1.9.3

Re: [Qemu-block] [PATCH v3] block: mirror - fix full sync mode when target does not support zero init

2015-10-01 Thread Jeff Cody

On Thu, Oct 01, 2015 at 09:14:51AM -0400, Jeff Cody wrote:
> During mirror, if the target device does not support zero init, a
> mirror may result in a corrupted image for sync="full" mode.
> 
> This is due to how the initial dirty bitmap is set up prior to copying
> data - we did not mark sectors as dirty that are unallocated.  This
> means those unallocated sectors are skipped over on the target, and for
> a device without zero init, invalid data may reside in those holes.
> 
> If both of the following conditions are true, then we will explicitly
> mark all sectors as dirty:
> 
> 1.) sync = "full"
> 2.) bdrv_has_zero_init(target) == false
> 
> If the target does support zero init, but a target image is passed in
> with data already present (i.e. an "existing" image), it is assumed the
> data present in the existing image is valid data for those sectors.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block/mirror.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index a258926..ce367e0 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -455,6 +455,8 @@ static void coroutine_fn mirror_run(void *opaque)
>  if (!s->is_none_mode) {
>  /* First part, loop on the sectors and initialize the dirty bitmap.  
> */
>  BlockDriverState *base = s->base;
> +bool mark_all_dirty = s->base == NULL && 
> !bdrv_has_zero_init(s->target);
> +
>  for (sector_num = 0; sector_num < end; ) {
>  /* Just to make sure we are not exceeding int limit. */
>  int nb_sectors = MIN(INT_MAX >> BDRV_SECTOR_BITS,
> @@ -477,7 +479,7 @@ static void coroutine_fn mirror_run(void *opaque)
>  }
>  
>  assert(n > 0);
> -if (ret == 1) {
> +if (ret == 1 || mark_all_dirty) {
>  bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);
>  }
>  sector_num += n;
> @@ -767,8 +769,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState 
> *target,
>  base = mode == MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;
>  mirror_start_job(bs, target, replaces,
>   speed, granularity, buf_size,
> - on_source_error, on_target_error, unmap, cb, opaque, 
> errp,
> - _job_driver, is_none_mode, base);
> + on_source_error, on_target_error, unmap,
> + cb, opaque, errp, _job_driver, is_none_mode, 
> base);
>  }
>  
>  void commit_active_start(BlockDriverState *bs, BlockDriverState *base,
> -- 
> 1.9.3
>

Applied to my block branch (with the last hunk removed):

git git://github.com/codyprime/qemu-kvm-jtc.git block

Thanks,
Jeff

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Kevin Wolf

Am 30.09.2015 um 16:43 hat Jeff Cody geschrieben:
> On Tue, Sep 29, 2015 at 12:52:33PM +0200, Paolo Bonzini wrote:
> > 
> > 
> > On 29/09/2015 11:35, Kevin Wolf wrote:
> > > The caller could be copying the backing file in the background and it
> > > may not yet be finished.
> > 
> > Yes, and this is permitted (the destination file of mirroring is opened
> > with BDRV_O_NO_BACKING).
> > 
> > Some more assumptions arise when block-job-complete is invoked, because
> > at this point the content must not change under the guest's feet.
> > Because block-job-complete does bdrv_open_backing_file on the
> > destination, for sync!='full' it means that either 1) the image has no
> > backing file, but it starts with the content of the backing file or 2)
> > the image's backing file is complete at the time block-job-complete is
> > invoked.
> > 
> > For mode!='existing' it is always case (2), and the backing file is
> > complete all the time; for mode=='existing' the backing file could be
> > copied in the background, and case (1) could happen as well.  An example
> > of case (1) is replacing sync=='full' with a "fast copy" of the backing
> > file (e.g. via btrfs's COW copies) and sync=='top'.  This should be valid.
> 
> One issue is that QEMU will do mode!='existing' && sync!='full' for
> drivers that do not support backing files (raw host devices, for
> instance).  We could refuse to start a mirror in the case of:
> 
> mode != 'existing' && sync != 'full' && !target->drv->supports_backing
> 
> Alternatively, we could do the two-pass zero approach in this patch,
> except under the following conditions:
> 
> sync == 'full' || (mode != 'existing' && !target->drv->supports_backing)
> 
> (In the sync == 'full' case, we could also just queue all sectors, as
> Kevin suggested)

I don't think that mode == 'existing' should play any role in the
behaviour of any block job. There's no reason why doing an external
'qemu-img create' should make it do anything different compared to
images created using the monitor.

Kevin

Re: [Qemu-block] [PATCH 09/16] block: Split bdrv_move_feature_fields()

2015-10-01 Thread Max Reitz

On 17.09.2015 15:48, Kevin Wolf wrote:
> After bdrv_swap(), some fields must be moved back to their original BDS
> to compensate for the effects that a swap of the contents of the objects
> has while keeping the old addresses. Other fields must be moved back
> because they should logically be moved and must stay on top
> 
> When replacing bdrv_swap() with operations changing the pointers in the
> parents, we only need the latter and must avoid swapping the former.
> Split the function accordingly.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c | 12 
>  1 file changed, 12 insertions(+)

Reviewed-by: Max Reitz 

(and an ACK to "Op blockers are a mess.")



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-01 Thread Kevin Wolf

Am 30.09.2015 um 16:45 hat Max Reitz geschrieben:
> On 29.09.2015 15:51, Kevin Wolf wrote:
> > Am 23.09.2015 um 18:36 hat Max Reitz geschrieben:
> >> On 17.09.2015 15:48, Kevin Wolf wrote:
> >>>  void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
> >>>  {
> >>> -bdrv_swap(bs_new, bs_top);
> >>> +assert(!bdrv_requests_pending(bs_top));
> >>> +assert(!bdrv_requests_pending(bs_new));
> >>> +
> >>> +bdrv_ref(bs_top);
> >>> +change_parent_backing_link(bs_top, bs_new);
> >>> +
> >>> +/* Some fields always stay on top of the backing file chain */
> >>> +swap_feature_fields(bs_top, bs_new);
> >>> +
> >>> +bdrv_set_backing_hd(bs_new, bs_top);
> >>> +bdrv_unref(bs_top);
> >>>  
> >>> -/* The contents of 'tmp' will become bs_top, as we are
> >>> - * swapping bs_new and bs_top contents. */
> >>> -bdrv_set_backing_hd(bs_top, bs_new);
> >>> +/* bs_new is now referenced by its new parents, we don't need the
> >>> + * additional reference any more. */
> >>> +bdrv_unref(bs_new);
> >>>  }
> >>
> >> Before, all pointers to @bs_new were moved to @bs_top. Now, they stay at
> >> @bs_new. I suppose we are assuming there are no pointers to @bs_new,
> >> should we assert that, and/or point it out in the documentation?
> > 
> > How would you assert something like this?
> 
> Of course, I have no idea.
> 
> >   Also, I think it's currently
> > true, but there's no reason why it should stay so. The important part is
> > just that it's true while applying the patch because the semantics
> > changes. Once it's applied, we have sane behaviour and can make use of
> > it.
> 
> The thing is that this is exactly the reason for the bug Berto found.
> external_snapshot_commit() keeps a reference to state->new_bs (@bs_new)
> and uses it afterwards for bdrv_reopen(), whereas it should be using
> state->old_bs (@bs_top) after this series.

Yes, the interface changes. That means that you need to review all
callers. There aren't that many, so it's doable.

Kevin


pgpcRDwR6Ej22.pgp
Description: PGP signature

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Jeff Cody

On Wed, Sep 30, 2015 at 05:26:28PM +0200, Kevin Wolf wrote:
> Am 30.09.2015 um 16:43 hat Jeff Cody geschrieben:
> > On Tue, Sep 29, 2015 at 12:52:33PM +0200, Paolo Bonzini wrote:
> > > 
> > > 
> > > On 29/09/2015 11:35, Kevin Wolf wrote:
> > > > The caller could be copying the backing file in the background and it
> > > > may not yet be finished.
> > > 
> > > Yes, and this is permitted (the destination file of mirroring is opened
> > > with BDRV_O_NO_BACKING).
> > > 
> > > Some more assumptions arise when block-job-complete is invoked, because
> > > at this point the content must not change under the guest's feet.
> > > Because block-job-complete does bdrv_open_backing_file on the
> > > destination, for sync!='full' it means that either 1) the image has no
> > > backing file, but it starts with the content of the backing file or 2)
> > > the image's backing file is complete at the time block-job-complete is
> > > invoked.
> > > 
> > > For mode!='existing' it is always case (2), and the backing file is
> > > complete all the time; for mode=='existing' the backing file could be
> > > copied in the background, and case (1) could happen as well.  An example
> > > of case (1) is replacing sync=='full' with a "fast copy" of the backing
> > > file (e.g. via btrfs's COW copies) and sync=='top'.  This should be valid.
> > 
> > One issue is that QEMU will do mode!='existing' && sync!='full' for
> > drivers that do not support backing files (raw host devices, for
> > instance).  We could refuse to start a mirror in the case of:
> > 
> > mode != 'existing' && sync != 'full' && !target->drv->supports_backing
> > 
> > Alternatively, we could do the two-pass zero approach in this patch,
> > except under the following conditions:
> > 
> > sync == 'full' || (mode != 'existing' && !target->drv->supports_backing)
> > 
> > (In the sync == 'full' case, we could also just queue all sectors, as
> > Kevin suggested)
> 
> I don't think that mode == 'existing' should play any role in the
> behaviour of any block job. There's no reason why doing an external
> 'qemu-img create' should make it do anything different compared to
> images created using the monitor.
>

As a general rule for blockjobs, I disagree.

Right away, there is a key difference: we don't know that the image is
(or should be) empty.  With mode != "existing", we know the image
should be empty, since we just created it (although for a host device,
it may have extraneous data in it).  So I think it is not so much what
we can assume about an existing image, as it is what we cannot assume.
And that could potentially influence some block jobs.

That said, I think I agree with simplifying the mirror case, as you
suggested earlier.  Namely, just adding all sectors into the dirty
bitmap when sync=='full'. That approach obviates the need for patches
1 & 2, and makes the single resulting patch pretty small.

The case of: 
 sync!='full' && 
 mode!='existing' && 
 !target->drv->supports_backing &&
 !bdrv_has_zero_init(target)

will result in an image with possible extraneous data, but I think I
agree with Paolo that it is either A) unimportant for the use case, or
B) user error.

Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Kevin Wolf

Am 30.09.2015 um 17:11 hat Jeff Cody geschrieben:
> On Mon, Sep 28, 2015 at 04:23:16PM +0100, Stefan Hajnoczi wrote:
> > On Sun, Sep 27, 2015 at 11:29:18PM -0400, Jeff Cody wrote:
> > > +if (s->zero_cycle) {
> > > +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> > > );
> > > +if (!(ret & BDRV_BLOCK_ZERO)) {
> > > +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> > > +  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> > > +  mirror_write_complete, op);
> > 
> > mirror_write_complete will advance s->common.offset.  Won't the progress
> > be incorrect if we do that for both zeroing and regular mirroring?
> 
> Good point.  However, Is it really wrong to count it in the progress,
> if we do the zero mirror pass?  I

It's wrong as long as you increment the progress (offset), but don't
consider it in the expected value for completion (length).

Kevin

Re: [Qemu-block] [PATCH] migration: disallow migrate_add_blocker during migration

2015-10-01 Thread Kevin Wolf

Am 29.09.2015 um 22:20 hat John Snow geschrieben:
> If a migration is already in progress and somebody attempts
> to add a migration blocker, this should rightly fail.
> 
> Add an errp parameter and a retcode return value to migrate_add_blocker.
> 
> This is part one of two for a solution to prohibit e.g. block jobs
> from running concurrently with migration.
> 
> Signed-off-by: John Snow 

Through which tree should this be merged?

>  block/qcow.c  |  5 -
>  block/vdi.c   |  5 -
>  block/vhdx.c  |  5 -
>  block/vmdk.c  | 13 +
>  block/vpc.c   |  9 ++---
>  block/vvfat.c | 19 +++
>  hw/9pfs/virtio-9p.c   | 15 +++
>  hw/misc/ivshmem.c |  5 -
>  hw/scsi/vhost-scsi.c  | 11 +++
>  hw/virtio/vhost.c | 31 +++
>  include/migration/migration.h |  4 +++-
>  migration/migration.c | 32 
>  stubs/migr-blocker.c  |  3 ++-
>  target-i386/kvm.c |  6 +-
>  14 files changed, 117 insertions(+), 46 deletions(-)
> 
> diff --git a/block/qcow.c b/block/qcow.c
> index 6e35db1..1b82dec 100644
> --- a/block/qcow.c
> +++ b/block/qcow.c
> @@ -236,7 +236,10 @@ static int qcow_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(>migration_blocker, "The qcow format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;

This error path leaks s->migration_blocker.

> +}
>  
>  qemu_co_mutex_init(>lock);
>  return 0;
> diff --git a/block/vdi.c b/block/vdi.c
> index 062a654..95b2690 100644
> --- a/block/vdi.c
> +++ b/block/vdi.c
> @@ -505,7 +505,10 @@ static int vdi_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(>migration_blocker, "The vdi format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail_free_bmap;

Same.

> +}
>  
>  qemu_co_mutex_init(>write_lock);
>  
> diff --git a/block/vhdx.c b/block/vhdx.c
> index d3bb1bd..5bebe34 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1005,7 +1005,10 @@ static int vhdx_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  error_setg(>migration_blocker, "The vhdx format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;
> +}
>  
>  return 0;
>  fail:

This one happens to be okay because VHDX uses the close function in the
failure path (and at last up to now that function even seems to cope
with half-initialised images - it just feels a bit brittle).

> diff --git a/block/vmdk.c b/block/vmdk.c
> index be0d640..09dcf6b 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -943,15 +943,20 @@ static int vmdk_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  if (ret) {
>  goto fail;
>  }
> -s->cid = vmdk_read_cid(bs, 0);
> -s->parent_cid = vmdk_read_cid(bs, 1);

Why do you move this code? It doesn't seem to do anything that you would
need to undo on failure.

> -qemu_co_mutex_init(>lock);
>
>  /* Disable migration when VMDK images are used */
>  error_setg(>migration_blocker, "The vmdk format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret = migrate_add_blocker(s->migration_blocker, errp);
> +if (ret < 0) {
> +goto fail;
> +}

But the usual leak is still there. :-)

> +
> +s->cid = vmdk_read_cid(bs, 0);
> +s->parent_cid = vmdk_read_cid(bs, 1);
> +qemu_co_mutex_init(>lock);
> +
>  g_free(buf);
>  return 0;
>  
> diff --git a/block/vpc.c b/block/vpc.c
> index 2b3b518..4c60942 100644
> --- a/block/vpc.c
> +++ b/block/vpc.c
> @@ -325,13 +325,16 @@ static int vpc_open(BlockDriverState *bs, QDict 
> *options, int flags,
>  #endif
>  }
>  
> -qemu_co_mutex_init(>lock);
> -
>  /* Disable migration when VHD images are used */
>  error_setg(>migration_blocker, "The vpc format used by node '%s' "
> "does not support live migration",
> bdrv_get_device_or_node_name(bs));
> -migrate_add_blocker(s->migration_blocker);
> +ret =

Re: [Qemu-block] [PATCH] block/raw-posix: Open file descriptor O_RDWR to work around glibc posix_fallocate emulation issue.

2015-10-01 Thread Kevin Wolf

Am 29.09.2015 um 17:54 hat Richard W.M. Jones geschrieben:
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196
> 
> The following command fails on an NFS mountpoint:
> 
>   $ qemu-img create -f qcow2 -o preallocation=falloc disk.img 262144
>   Formatting 'disk.img', fmt=qcow2 size=262144 encryption=off 
> cluster_size=65536 preallocation='falloc' lazy_refcounts=off
>   qemu-img: disk.img: Could not preallocate data for the new file: Bad file 
> descriptor
> 
> The reason turns out to be because NFS doesn't support the
> posix_fallocate call.  glibc emulates it instead.  However glibc's
> emulation involves using the pread(2) syscall.  The pread syscall
> fails with EBADF if the file descriptor is opened without the read
> open-flag (ie. open (..., O_WRONLY)).
> 
> I contacted glibc upstream about this, and their response is here:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1265196#c9
> 
> There are two possible fixes: Use Linux fallocate directly, or (this
> fix) work around the problem in qemu by opening the file with O_RDWR
> instead of O_WRONLY.
> 
> Signed-off-by: Richard W.M. Jones 
> BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1265196

Thanks, applied to the block branch.

Kevin

Re: [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Jeff Cody

On Mon, Sep 28, 2015 at 04:23:16PM +0100, Stefan Hajnoczi wrote:
> On Sun, Sep 27, 2015 at 11:29:18PM -0400, Jeff Cody wrote:
> > +if (s->zero_cycle) {
> > +ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> > );
> > +if (!(ret & BDRV_BLOCK_ZERO)) {
> > +bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> > +  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> > +  mirror_write_complete, op);
> 
> mirror_write_complete will advance s->common.offset.  Won't the progress
> be incorrect if we do that for both zeroing and regular mirroring?

Good point.  However, Is it really wrong to count it in the progress,
if we do the zero mirror pass?  I

Re: [Qemu-block] [PATCH 15/16] block: Add and use bdrv_replace_in_backing_chain()

2015-10-01 Thread Max Reitz

On 17.09.2015 15:48, Kevin Wolf wrote:
> This cleans up the mess we left behind in the mirror code after the
> previous patch. Instead of using bdrv_swap(), just change pointers.
> 
> The interface change of the mirror job that callers must consider is
> that after job completion, their local BDS pointers still point to the
> same node now. qemu-img must change its code accordingly (which makes it
> easier to understand); the other callers stays unchanged because after
> completion they don't do anything with the BDS, but just with the job,
> and the job is still owned by the source BDS.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  block.c   | 32 +++-
>  block/mirror.c| 23 +++
>  include/block/block.h |  4 +++-
>  qemu-img.c| 16 
>  4 files changed, 49 insertions(+), 26 deletions(-)

Reviewed-by: Max Reitz 



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Jeff Cody

On Tue, Sep 29, 2015 at 12:52:33PM +0200, Paolo Bonzini wrote:
> 
> 
> On 29/09/2015 11:35, Kevin Wolf wrote:
> > The caller could be copying the backing file in the background and it
> > may not yet be finished.
> 
> Yes, and this is permitted (the destination file of mirroring is opened
> with BDRV_O_NO_BACKING).
> 
> Some more assumptions arise when block-job-complete is invoked, because
> at this point the content must not change under the guest's feet.
> Because block-job-complete does bdrv_open_backing_file on the
> destination, for sync!='full' it means that either 1) the image has no
> backing file, but it starts with the content of the backing file or 2)
> the image's backing file is complete at the time block-job-complete is
> invoked.
> 
> For mode!='existing' it is always case (2), and the backing file is
> complete all the time; for mode=='existing' the backing file could be
> copied in the background, and case (1) could happen as well.  An example
> of case (1) is replacing sync=='full' with a "fast copy" of the backing
> file (e.g. via btrfs's COW copies) and sync=='top'.  This should be valid.

One issue is that QEMU will do mode!='existing' && sync!='full' for
drivers that do not support backing files (raw host devices, for
instance).  We could refuse to start a mirror in the case of:

mode != 'existing' && sync != 'full' && !target->drv->supports_backing

Alternatively, we could do the two-pass zero approach in this patch,
except under the following conditions:

sync == 'full' || (mode != 'existing' && !target->drv->supports_backing)

(In the sync == 'full' case, we could also just queue all sectors, as
Kevin suggested)

> 
> Of course, if block-job-complete is never called, all bets are off.
> 
> > We don't do this now, but assuming
> > the promise means that we could e.g. read the backing file in order to
> > optimise sparseness in the target (if it happens to have the same data
> > as its backing file) - and I don't think this would be valid with our
> > currently documented API.
> 
> Accessing the backing file of the target is never valid indeed.
> 
> > Anyway, the conclusion that we shouldn't zero unrelated sectors is still
> > right. But it's because we document which sectors we copy, not because
> > we can make assumptions about the user.
> 
> Right.
> 
> Paolo

Re: [Qemu-block] [PATCH v5 24/38] blockdev: Pull out blockdev option extraction

2015-10-01 Thread Alberto Garcia

On Fri 18 Sep 2015 05:22:59 PM CEST, Max Reitz  wrote:
> Extract some of the blockdev option extraction code from blockdev_init()
> into its own function. This simplifies blockdev_init() and will allow
> reusing the code in a different function added in a follow-up patch.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto

Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

2015-10-01 Thread Paolo Bonzini

On 30/09/2015 16:43, Jeff Cody wrote:
> One issue is that QEMU will do mode!='existing' && sync!='full' for
> drivers that do not support backing files (raw host devices, for
> instance).

Yup, this can be used to get a mirror of future operations (the idea was
to support things such as antiviruses, where the antivirus connects to
QEMU via NBD).

I think it's okay to ignore this case, since the resulting image is
bogus anyway.  What is interesting is only the stream of writes which
you can get through NBD.

Paolo

Re: [Qemu-block] [PATCH 13/16] block: Implement bdrv_append() without bdrv_swap()

2015-10-01 Thread Max Reitz

On 29.09.2015 15:51, Kevin Wolf wrote:
> Am 23.09.2015 um 18:36 hat Max Reitz geschrieben:
>> On 17.09.2015 15:48, Kevin Wolf wrote:
>>> Remember all parent nodes and just change the pointers there instead of
>>> swapping the contents of the BlockDriverState.
>>>
>>> Handling of snapshot=on must be moved further down in bdrv_open()
>>> because *pbs (which is the bs pointer in the BlockBackend) must already
>>> be set before bdrv_append() is called. Otherwise bdrv_append() changes
>>> the BB's pointer to the temporary snapshot, but bdrv_open() overwrites
>>> it with the read-only original image.
>>>
>>> Signed-off-by: Kevin Wolf 
> 
>>> @@ -2155,6 +2157,42 @@ void bdrv_swap(BlockDriverState *bs_new, 
>>> BlockDriverState *bs_old)
>>>  bdrv_rebind(bs_old);
>>>  }
>>>  
>>> +static void change_parent_backing_link(BlockDriverState *from,
>>> +   BlockDriverState *to)
>>> +{
>>> +BdrvChild *c, *next;
>>> +
>>> +QLIST_FOREACH_SAFE(c, >parents, next_parent, next) {
>>> +assert(c->role != _backing);
>>> +c->bs = to;
>>> +QLIST_REMOVE(c, next_parent);
>>> +QLIST_INSERT_HEAD(>parents, c, next_parent);
>>
>> This drops a reference from the parent BDS to @from, and adds a new one
>> from the parent BDS to @to. However, this is not reflected here.
> 
> You mean bdrv_ref(to); bdrv_unref(from); ?

Yes.

>>> +}
>>> +if (from->blk) {
>>> +blk_set_bs(from->blk, to);
>>> +if (!to->device_list.tqe_prev) {
>>> +QTAILQ_INSERT_BEFORE(from, to, device_list);
>>> +}
>>> +QTAILQ_REMOVE(_states, from, device_list);
>>> +}
>>> +}
>>> +
>>> +static void swap_feature_fields(BlockDriverState *bs_top,
>>> +BlockDriverState *bs_new)
>>> +{
>>> +BlockDriverState tmp;
>>> +
>>> +bdrv_move_feature_fields(, bs_top);
>>> +bdrv_move_feature_fields(bs_top, bs_new);
>>> +bdrv_move_feature_fields(bs_new, );
>>> +
>>> +assert(!bs_new->io_limits_enabled);
>>> +if (bs_top->io_limits_enabled) {
>>> +bdrv_io_limits_enable(bs_new, throttle_group_get_name(bs_top));
>>> +bdrv_io_limits_disable(bs_top);
>>> +}
>>> +}
>>> +
>>>  /*
>>>   * Add new bs contents at the top of an image chain while the chain is
>>>   * live, while keeping required fields on the top layer.
>>> @@ -2165,14 +2203,29 @@ void bdrv_swap(BlockDriverState *bs_new, 
>>> BlockDriverState *bs_old)
>>>   * bs_new must not be attached to a BlockBackend.
>>>   *
>>>   * This function does not create any image files.
>>> + *
>>> + * bdrv_append() takes ownership of a bs_new reference and unrefs it 
>>> because
>>> + * that's what the callers commonly need. bs_new will be referenced by the 
>>> old
>>> + * parents of bs_top after bdrv_append() returns. If the caller needs to 
>>> keep a
>>> + * reference of its own, it must call bdrv_ref().
>>>   */
>>>  void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top)
>>>  {
>>> -bdrv_swap(bs_new, bs_top);
>>> +assert(!bdrv_requests_pending(bs_top));
>>> +assert(!bdrv_requests_pending(bs_new));
>>> +
>>> +bdrv_ref(bs_top);
>>> +change_parent_backing_link(bs_top, bs_new);
>>> +
>>> +/* Some fields always stay on top of the backing file chain */
>>> +swap_feature_fields(bs_top, bs_new);
>>> +
>>> +bdrv_set_backing_hd(bs_new, bs_top);
>>> +bdrv_unref(bs_top);
>>>  
>>> -/* The contents of 'tmp' will become bs_top, as we are
>>> - * swapping bs_new and bs_top contents. */
>>> -bdrv_set_backing_hd(bs_top, bs_new);
>>> +/* bs_new is now referenced by its new parents, we don't need the
>>> + * additional reference any more. */
>>> +bdrv_unref(bs_new);
>>>  }
>>
>> Before, all pointers to @bs_new were moved to @bs_top. Now, they stay at
>> @bs_new. I suppose we are assuming there are no pointers to @bs_new,
>> should we assert that, and/or point it out in the documentation?
> 
> How would you assert something like this?

Of course, I have no idea.

>   Also, I think it's currently
> true, but there's no reason why it should stay so. The important part is
> just that it's true while applying the patch because the semantics
> changes. Once it's applied, we have sane behaviour and can make use of
> it.

The thing is that this is exactly the reason for the bug Berto found.
external_snapshot_commit() keeps a reference to state->new_bs (@bs_new)
and uses it afterwards for bdrv_reopen(), whereas it should be using
state->old_bs (@bs_top) after this series.

Max



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH v5 07/38] block: Make bdrv_is_inserted() recursive

2015-10-01 Thread Max Reitz

On 29.09.2015 11:15, Alberto Garcia wrote:
> On Fri 18 Sep 2015 05:22:42 PM CEST, Max Reitz wrote:
>> If bdrv_is_inserted() is called on the top level BDS, it should make
>> sure all nodes in the BDS tree are actually inserted.
>>
>> Signed-off-by: Max Reitz 
>> ---
>>  block.c | 12 +---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 4a089e6..c4fa299 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -3247,14 +3247,20 @@ void bdrv_invalidate_cache_all(Error **errp)
>>  bool bdrv_is_inserted(BlockDriverState *bs)
>>  {
>>  BlockDriver *drv = bs->drv;
>> +BdrvChild *child;
>>  
>>  if (!drv) {
>>  return false;
>>  }
>> -if (!drv->bdrv_is_inserted) {
>> -return true;
>> +if (drv->bdrv_is_inserted) {
>> +return drv->bdrv_is_inserted(bs);
>>  }
> 
> If there's drv->bdrv_is_inserted then this code stops here and does not
> iterate the children, is this correct?

Yes, that is how it's supposed to be.

Max

>> -return drv->bdrv_is_inserted(bs);
>> +QLIST_FOREACH(child, >children, next) {
>> +if (!bdrv_is_inserted(child->bs)) {
>> +return false;
>> +}
>> +}
>> +return true;
>>  }
> 
> Berto
> 




signature.asc
Description: OpenPGP digital signature

Re: [Qemu-block] [PATCH 15/16] block: Add and use bdrv_replace_in_backing_chain()

2015-10-01 Thread Max Reitz

On 29.09.2015 17:22, Kevin Wolf wrote:
> Am 23.09.2015 um 19:08 hat Max Reitz geschrieben:
>> On 17.09.2015 15:48, Kevin Wolf wrote:
>>> This cleans up the mess we left behind in the mirror code after the
>>> previous patch. Instead of using bdrv_swap(), just change pointers.
>>>
>>> The interface change of the mirror job that callers must consider is
>>> that after job completion, their local BDS pointers still point to the
>>> same node now. qemu-img must change its code accordingly (which makes it
>>> easier to understand); the other callers stays unchanged because after
>>> completion they don't do anything with the BDS, but just with the job,
>>> and the job is still owned by the source BDS.
>>>
>>> Signed-off-by: Kevin Wolf 
>>> ---
>>>  block.c   | 32 +++-
>>>  block/mirror.c| 23 +++
>>>  include/block/block.h |  4 +++-
>>>  qemu-img.c| 16 
>>>  4 files changed, 49 insertions(+), 26 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index 98fc17c..7c21659 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1095,7 +1095,7 @@ static BdrvChild *bdrv_attach_child(BlockDriverState 
>>> *parent_bs,
>>>  return child;
>>>  }
>>>  
>>> -void bdrv_detach_child(BdrvChild *child)
>>> +static void bdrv_detach_child(BdrvChild *child)
>>>  {
>>>  QLIST_REMOVE(child, next);
>>>  QLIST_REMOVE(child, next_parent);
>>> @@ -2228,6 +2228,36 @@ void bdrv_append(BlockDriverState *bs_new, 
>>> BlockDriverState *bs_top)
>>>  bdrv_unref(bs_new);
>>>  }
>>>  
>>> +void bdrv_replace_in_backing_chain(BlockDriverState *old, BlockDriverState 
>>> *new)
>>> +{
>>> +assert(!bdrv_requests_pending(old));
>>> +assert(!bdrv_requests_pending(new));
>>> +
>>> +bdrv_ref(old);
>>> +
>>> +if (old->blk) {
>>> +/* As long as these fields aren't in BlockBackend, but in the 
>>> top-level
>>> + * BlockDriverState, it's not possible for a BDS to have two BBs.
>>> + *
>>> + * We really want to copy the fields from old to new, but we go 
>>> for a
>>> + * swap instead so that pointers aren't duplicated and cause 
>>> trouble.
>>> + * (Also, bdrv_swap() used to do the same.) */
>>> +assert(!new->blk);
>>> +swap_feature_fields(old, new);
>>> +}
>>> +change_parent_backing_link(old, new);
>>> +
>>> +/* Change backing files if a previously independent node is added to 
>>> the
>>> + * chain. For active commit, we replace top by its own (indirect) 
>>> backing
>>> + * file and don't do anything here so we don't build a loop. */
>>> +if (new->backing == NULL && !bdrv_chain_contains(backing_bs(old), 
>>> new)) {
>>> +bdrv_set_backing_hd(new, backing_bs(old));
>>> +bdrv_set_backing_hd(old, NULL);
>>
>> Wouldn't we want @old to keep its backing file?
> 
> Would we? The operation I had in mind was: Given a backing file chain,
> one node in that chain and an independent node, replace the node in the
> chain. That is:
> 
> A <- B <- C <- D
> 
>  X
> 
> becomes
> 
> A <- X <- C <- D
> 
>  B
> 
> Of course, you could define a different operation, but this seemed to be
> the obvious one that the mirror completion needs.

Oops, apparently I missed the backing_bs(old) in
bdrv_set_backing_hd(new, ...).

>> Then bdrv_append() would basically be a special case of this function,
>> with it additionally decrementing the refcount of @bs_new.
> 
> Hm, less duplication sounds nice, but as long as the current way is
> technically correct, can we leave this for a cleanup on top?

And with the backing_bs() taken into account, it is not so duplicated
after all.

Max

> Kevin
> 
>> Max
>>
>>> +}
>>> +
>>> +bdrv_unref(old);
>>> +}
>>> +
>>>  static void bdrv_delete(BlockDriverState *bs)
>>>  {
>>>  assert(!bs->job);
>>
> 
> 




signature.asc
Description: OpenPGP digital signature

54 matches

Mail list logo