Re: [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-14 Thread Fam Zheng
On Fri, 08/11 18:48, Kevin Wolf wrote:
> The patch below implements what I was thinking of, and it seems to fix
> this problem. However, you'll immediately run into the next one, which
> is a message like 'Conflicts with use by ide0-hd0 as 'root', which does
> not allow 'write' on #block172'.
> 
> The reason for this is that since commit 4417ab7adf1,
> bdrv_invalidate_cache() not only activates the image, but also is the
> signal for guest device BlockBackends that migration has completed and
> they should now request exclusive access to the image.
> 
> As the NBD server shows, this assumption is wrong. Images can be
> activated even before migration completes. Maybe this really needs to
> be done in a VM state change handler.
> 
> We can't simply revert commit 4417ab7adf1 because for image file
> locking, it is important that we drop locks for inactive images, so
> BdrvChildRole.activate/inactivate will probably stay. Maybe only one
> bool blk->disable_perm isn't enough, we might need a different one for
> devices before migration completed, or at least a counter.

I'll make your diff a formal patch and add a VM state change handler for 2.10.

I'm not very confident with a counter because it's not obviour to me that
inactivate/activate pairs are always balanced.

> 
> I'll be on vacation starting tomorrow, so someone else needs to tackle
> this. Fam, I think you are relatively familiar with the op blockers?
> 
> Kevin
> 
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 82a78bf439..b68b6fb535 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1045,11 +1045,22 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
> dev_offset, off_t size,
>bool writethrough, BlockBackend *on_eject_blk,
>Error **errp)
>  {
> +AioContext  *ctx;
>  BlockBackend *blk;
>  NBDExport *exp = g_malloc0(sizeof(NBDExport));
>  uint64_t perm;
>  int ret;
>  
> +/*
> + * NBD exports are used for non-shared storage migration.  Make sure
> + * that BDRV_O_INACTIVE is cleared and the image is ready for write
> + * access since the export could be available before migration handover.
> + */
> +ctx = bdrv_get_aio_context(bs);
> +aio_context_acquire(ctx);
> +bdrv_invalidate_cache(bs, NULL);
> +aio_context_release(ctx);
> +
>  /* Don't allow resize while the NBD server is running, otherwise we don't
>   * care what happens with the node. */
>  perm = BLK_PERM_CONSISTENT_READ;
> @@ -1087,15 +1098,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
> dev_offset, off_t size,
>  exp->eject_notifier.notify = nbd_eject_notifier;
>  blk_add_remove_bs_notifier(on_eject_blk, >eject_notifier);
>  }
> -
> -/*
> - * NBD exports are used for non-shared storage migration.  Make sure
> - * that BDRV_O_INACTIVE is cleared and the image is ready for write
> - * access since the export could be available before migration handover.
> - */
> -aio_context_acquire(exp->ctx);
> -blk_invalidate_cache(blk, NULL);
> -aio_context_release(exp->ctx);
>  return exp;
>  
>  fail:

Fam



Re: [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Kevin Wolf
Am 11.08.2017 um 17:34 hat Christian Ehrhardt geschrieben:
> On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf  wrote:
> 
> > Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > > Simplifying that to a smaller test:
> > > >
> >
> [...]
> 
> > > > Block node is read-only
> >
> [...]
> 
> > >
> > > This is actually caused by
> > >
> > > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > > Author: Kevin Wolf 
> > > Date:   Thu May 4 18:52:40 2017 +0200
> > >
> > > block: Fix write/resize permissions for inactive images
> > >
> >
> 
> Yeah in the meantime an automated bisect run is through an agrees.
> Thanks for pointing out the right change triggering that so early!
> 
> Thanks Kevin for all the suggestions already, I quickly looked at the code
> but I'm lost there without taking much more time.
> Is anybody experienced in that area looking at fixing that?
> Because IMHO that is kind of a block bug for 2.10 by breaking formerly
> working behavior (even as Kevin identified it seems to have done it wrong
> all the time).

The patch below implements what I was thinking of, and it seems to fix
this problem. However, you'll immediately run into the next one, which
is a message like 'Conflicts with use by ide0-hd0 as 'root', which does
not allow 'write' on #block172'.

The reason for this is that since commit 4417ab7adf1,
bdrv_invalidate_cache() not only activates the image, but also is the
signal for guest device BlockBackends that migration has completed and
they should now request exclusive access to the image.

As the NBD server shows, this assumption is wrong. Images can be
activated even before migration completes. Maybe this really needs to
be done in a VM state change handler.

We can't simply revert commit 4417ab7adf1 because for image file
locking, it is important that we drop locks for inactive images, so
BdrvChildRole.activate/inactivate will probably stay. Maybe only one
bool blk->disable_perm isn't enough, we might need a different one for
devices before migration completed, or at least a counter.

I'll be on vacation starting tomorrow, so someone else needs to tackle
this. Fam, I think you are relatively familiar with the op blockers?

Kevin


diff --git a/nbd/server.c b/nbd/server.c
index 82a78bf439..b68b6fb535 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1045,11 +1045,22 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
   bool writethrough, BlockBackend *on_eject_blk,
   Error **errp)
 {
+AioContext  *ctx;
 BlockBackend *blk;
 NBDExport *exp = g_malloc0(sizeof(NBDExport));
 uint64_t perm;
 int ret;
 
+/*
+ * NBD exports are used for non-shared storage migration.  Make sure
+ * that BDRV_O_INACTIVE is cleared and the image is ready for write
+ * access since the export could be available before migration handover.
+ */
+ctx = bdrv_get_aio_context(bs);
+aio_context_acquire(ctx);
+bdrv_invalidate_cache(bs, NULL);
+aio_context_release(ctx);
+
 /* Don't allow resize while the NBD server is running, otherwise we don't
  * care what happens with the node. */
 perm = BLK_PERM_CONSISTENT_READ;
@@ -1087,15 +1098,6 @@ NBDExport *nbd_export_new(BlockDriverState *bs, off_t 
dev_offset, off_t size,
 exp->eject_notifier.notify = nbd_eject_notifier;
 blk_add_remove_bs_notifier(on_eject_blk, >eject_notifier);
 }
-
-/*
- * NBD exports are used for non-shared storage migration.  Make sure
- * that BDRV_O_INACTIVE is cleared and the image is ready for write
- * access since the export could be available before migration handover.
- */
-aio_context_acquire(exp->ctx);
-blk_invalidate_cache(blk, NULL);
-aio_context_release(exp->ctx);
 return exp;
 
 fail:



Re: [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Christian Ehrhardt
On Fri, Aug 11, 2017 at 2:37 PM, Kevin Wolf  wrote:

> Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> > On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > > Simplifying that to a smaller test:
> > >
>
[...]

> > > Block node is read-only
>
[...]

> >
> > This is actually caused by
> >
> > commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> > Author: Kevin Wolf 
> > Date:   Thu May 4 18:52:40 2017 +0200
> >
> > block: Fix write/resize permissions for inactive images
> >
>

Yeah in the meantime an automated bisect run is through an agrees.
Thanks for pointing out the right change triggering that so early!

Thanks Kevin for all the suggestions already, I quickly looked at the code
but I'm lost there without taking much more time.
Is anybody experienced in that area looking at fixing that?
Because IMHO that is kind of a block bug for 2.10 by breaking formerly
working behavior (even as Kevin identified it seems to have done it wrong
all the time).


Re: [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Kevin Wolf
Am 11.08.2017 um 14:04 hat Fam Zheng geschrieben:
> On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> > Simplifying that to a smaller test:
> > 
> > $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> > $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> > stdio -drive
> > file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> > defer
> > QEMU 2.9.92 monitor - type 'help' for more information
> > (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> > [bit 5]
> > nbd_server_start 0.0.0.0:49153
> > (qemu) nbd_server_add -w drive-virtio-disk0
> > Block node is read-only
> > (qemu) quit
> > 
> > It might be reasonable to keep the -device section to specify how it is
> > included.
> 
> This is actually caused by
> 
> commit 9c5e6594f15b7364624a3ad40306c396c93a2145
> Author: Kevin Wolf 
> Date:   Thu May 4 18:52:40 2017 +0200
> 
> block: Fix write/resize permissions for inactive images
> 
> which forbids "nbd_server_add -w" in the "inactive" state set by -incoming.
> 
> But I'm not sure what is a proper fix. Maybe revert the bdrv_is_writable() 
> part
> of the commit? Kevin?

No, the reason why this fail is simply that nbd_export_new() does things
in the wrong order. It always had to activate the image, otherwise the
first write would run into an assertion failure, but it only does so
after requesting the permissions.

blk_insert_bs() in line 1061 requests the permissions,
blk_invalidate_cache() in line 1097 activates the image.

We could either use bdrv_invalidate_cache() instead and call that before
blk_insert_bs(), or start with perm=0, shared=BLK_PERM_ALL and then use
blk_set_perm() only after activating the image. The third option would
be to set blk->disable_perm = true, which would ensure that the
permissions are only effective after the image gets activated; but this
field isn't accessible outside block-backend.c at the moment.

Kevin



Re: [Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Fam Zheng
On Fri, 08/11 13:07, Christian Ehrhardt wrote:
> Hi,
> testing on 2.10-rc2 I ran into an issue around:
>   unable to execute QEMU command 'nbd-server-add': Block node is read-only
> 
> ### TL;DR ###
> - triggered by livbirt driven live migration with --copy-storage-all
> - buils down to nbd_server_add failing
> - can be reproduced on a single system without libvirt
> - last known working (so far) was qemu 2.8
> 
> Simple repro:
> $ qemu-img create -f qcow2 /tmp/test.qcow2 100M
> $ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
> stdio -drive
> file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
> defer
> QEMU 2.9.92 monitor - type 'help' for more information
> (qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
> [bit 5]
> nbd_server_start 0.0.0.0:49153
> (qemu) nbd_server_add -w drive-virtio-disk0
> Block node is read-only
> 
> ### Details ###
> 
> Trigger:
> virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://
> 10.22.69.61/system
> 
> Setup:
> - Two systems without shared storage
> - An equal image is synced before the test, so it would only migrate
> minimal remaining changes
> => Only the --copy-storage-all case triggers this, other migrations work as
> far as I tested.
> => No related apparmor denials
> 
> The last combination I knew to be successful was libvirt 3.5 and qemu 2.8.
> So I Downgraded libvirt (but kept qemu) and retest.
> => Still an issue
> => So it is a qemu 2.10 issue and not libvirt 3.6
> Continuing with libvirt 3.6 to have latest updates.
> 
> On the migration target I see the following in the log:
> libvirtd[11829]: 2017-08-11 08:51:49.283+: 11842: warning :
> qemuDomainObjTaint:4545 : Domain id=2 name='kvmguest-artful-normal'
> uuid=b6f4cdab-77b0-43b1-933d-9683567f3a89 is tainted: high-privileges
> libvirtd[11829]: 2017-08-11 08:51:49.386+: 11842: error :
> qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU
> command 'nbd-server-add': Block node is read-only
> 
> I checked the images on source (active since the guest is runngin) and
> target (inactive and out of sync copies)
> source $ virsh domblklist kvmguest-artful-normal
> Target Source
> 
> vda/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> vdb/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow
> 
> But when checking details on the source I didn't get any being blocked:
> qemu-img info --backing-chain
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
> qemu-img: Could not open
> '/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow': Failed to get
> shared "write" lock
> Is another process using the image?
> 
> On the target these are inactive, so the inquiry works (content is the same
> anyway).
> All files are there and the backing chain looks correct.
> 
> I'm not sure if being unable to qemu-img while running is considered an
> issue of its own, but this could be related to:
> (qemu) commit 244a5668106297378391b768e7288eb157616f64
> Author: Fam Zheng 
> file-posix: Add image locking to perm operations
> 
> 
> The add should be from libvirt
>  - the chain here should be
>qemuMigrationPrepareAny -> qemuMigrationStartNBDServer ->
> qemuMonitorNBDServerAdd -> qemuMonitorJSONNBDServerAdd
>  - there is a debug statement in qemuMonitorNBDServerAdd
> VIR_DEBUG("deviceID=%s", deviceID);
>This shows it is the nbd sevrer for the first disk
>  - seems libvirt adding a nbd server for the copy-storage-all
>  - that qmp fails in qemu
>  - afterwards all else in the logs is libvirt cleaning up and killing the
> qemu that was prepped (with -S) on
>the target
> 
> With virt debug and gdb I catched the device for the qmp command and stop
> it while doing so.
> debug : qemuMonitorNBDServerStart:3993 : host=:: port=49153
> debug : qemuMonitorNBDServerStart:3995 : mon:0x7f6d4c016500
> vm:0x7f6d4c011580 json:1 fd:27
> debug : qemuMonitorNBDServerAdd:4006 : deviceID=drive-virtio-disk0
> debug : qemuMonitorNBDServerAdd:4008 : mon:0x7f6d4c016500 vm:0x7f6d4c011580
> json:1 fd:27
> [...]
> debug : qemuMonitorJSONCheckError:378 : unable to execute QEMU command
> {"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-24"}:
> {"id":"libvirt-24","error":{"class":"GenericError","desc":"Block node is
> read-only"}}
> 
> 
> Reducing the qemu call libvirt does to some basiscs that don't need libvirt
> I tried to reproduce.
> At the -S starting point that the migration uses just as well the devices
> are already there, going on in monitor from there:
> (qemu) info block
> drive-virtio-disk0 (#block164):
> /var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow (qcow2)
> Attached to:  /machine/peripheral/virtio-disk0/virtio-backend
> Cache mode:   writeback
> Backing file:
> 

[Qemu-devel] migration issue with qemu 2.10-rc2: QEMU command 'nbd-server-add': Block node is read-only

2017-08-11 Thread Christian Ehrhardt
Hi,
testing on 2.10-rc2 I ran into an issue around:
  unable to execute QEMU command 'nbd-server-add': Block node is read-only

### TL;DR ###
- triggered by livbirt driven live migration with --copy-storage-all
- buils down to nbd_server_add failing
- can be reproduced on a single system without libvirt
- last known working (so far) was qemu 2.8

Simple repro:
$ qemu-img create -f qcow2 /tmp/test.qcow2 100M
$ qemu-system-x86_64 -S -m 512 -smp 1 -nodefaults --nographic -monitor
stdio -drive
file=/tmp/test.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -incoming
defer
QEMU 2.9.92 monitor - type 'help' for more information
(qemu) warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx
[bit 5]
nbd_server_start 0.0.0.0:49153
(qemu) nbd_server_add -w drive-virtio-disk0
Block node is read-only

### Details ###

Trigger:
virsh migrate --live --copy-storage-all kvmguest-artful-normal qemu+ssh://
10.22.69.61/system

Setup:
- Two systems without shared storage
- An equal image is synced before the test, so it would only migrate
minimal remaining changes
=> Only the --copy-storage-all case triggers this, other migrations work as
far as I tested.
=> No related apparmor denials

The last combination I knew to be successful was libvirt 3.5 and qemu 2.8.
So I Downgraded libvirt (but kept qemu) and retest.
=> Still an issue
=> So it is a qemu 2.10 issue and not libvirt 3.6
Continuing with libvirt 3.6 to have latest updates.

On the migration target I see the following in the log:
libvirtd[11829]: 2017-08-11 08:51:49.283+: 11842: warning :
qemuDomainObjTaint:4545 : Domain id=2 name='kvmguest-artful-normal'
uuid=b6f4cdab-77b0-43b1-933d-9683567f3a89 is tainted: high-privileges
libvirtd[11829]: 2017-08-11 08:51:49.386+: 11842: error :
qemuMonitorJSONCheckError:389 : internal error: unable to execute QEMU
command 'nbd-server-add': Block node is read-only

I checked the images on source (active since the guest is runngin) and
target (inactive and out of sync copies)
source $ virsh domblklist kvmguest-artful-normal
Target Source

vda/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
vdb/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow

But when checking details on the source I didn't get any being blocked:
qemu-img info --backing-chain
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow
qemu-img: Could not open
'/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow': Failed to get
shared "write" lock
Is another process using the image?

On the target these are inactive, so the inquiry works (content is the same
anyway).
All files are there and the backing chain looks correct.

I'm not sure if being unable to qemu-img while running is considered an
issue of its own, but this could be related to:
(qemu) commit 244a5668106297378391b768e7288eb157616f64
Author: Fam Zheng 
file-posix: Add image locking to perm operations


The add should be from libvirt
 - the chain here should be
   qemuMigrationPrepareAny -> qemuMigrationStartNBDServer ->
qemuMonitorNBDServerAdd -> qemuMonitorJSONNBDServerAdd
 - there is a debug statement in qemuMonitorNBDServerAdd
VIR_DEBUG("deviceID=%s", deviceID);
   This shows it is the nbd sevrer for the first disk
 - seems libvirt adding a nbd server for the copy-storage-all
 - that qmp fails in qemu
 - afterwards all else in the logs is libvirt cleaning up and killing the
qemu that was prepped (with -S) on
   the target

With virt debug and gdb I catched the device for the qmp command and stop
it while doing so.
debug : qemuMonitorNBDServerStart:3993 : host=:: port=49153
debug : qemuMonitorNBDServerStart:3995 : mon:0x7f6d4c016500
vm:0x7f6d4c011580 json:1 fd:27
debug : qemuMonitorNBDServerAdd:4006 : deviceID=drive-virtio-disk0
debug : qemuMonitorNBDServerAdd:4008 : mon:0x7f6d4c016500 vm:0x7f6d4c011580
json:1 fd:27
[...]
debug : qemuMonitorJSONCheckError:378 : unable to execute QEMU command
{"execute":"nbd-server-add","arguments":{"device":"drive-virtio-disk0","writable":true},"id":"libvirt-24"}:
{"id":"libvirt-24","error":{"class":"GenericError","desc":"Block node is
read-only"}}


Reducing the qemu call libvirt does to some basiscs that don't need libvirt
I tried to reproduce.
At the -S starting point that the migration uses just as well the devices
are already there, going on in monitor from there:
(qemu) info block
drive-virtio-disk0 (#block164):
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal.qcow (qcow2)
Attached to:  /machine/peripheral/virtio-disk0/virtio-backend
Cache mode:   writeback
Backing file:
/var/lib/uvtool/libvirt/images/x-uvt-b64-Y29tLnVidW50dS5jbG91ZC5kYWlseTpzZXJ2ZXI6MTcuMTA6YW1kNjQgMjAxNzA4MTA=
(chain depth: 1)

drive-virtio-disk1 (#block507):
/var/lib/uvtool/libvirt/images/kvmguest-artful-normal-ds.qcow (qcow2)
Attached to:  /machine/peripheral/virtio-disk1/virtio-backend
Cache mode:   writeback

# starting