TL;DR ----- >From the debug analysis of the log below, and discussion with Eric Blake of upstream QEMU / libvirt resulted in the below bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the "ready" field of `query-block-jobs` Until the above is fixed, I think _swap_volume() method in Nova could either: (a) just try to either wait until it sees the BLOCK_JOB_READY event, which is emitted when a 'drive-mirror' job. (b) Or keep polling, until we see "ready": true field (in the response of `query-block-jobs` inviked via libvirt blockJobInfo() Details ------- The code in Nova that's being executed is this part in _swap_volume() from libvirt/driver.py. [...] # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to # allow writing to existing external volume file dev.rebase(new_path, copy=True, reuse_ext=True) while not dev.is_job_complete(): time.sleep(0.5) dev.abort_job(pivot=True) [...] (Optional Reading) libvirt / QEMU traffic log analysis ------------------------------------------------------ Libvirt debug log (NB: 61MB) http://logs.openstack.org/73/374373/2/check/gate-tempest-dsvm-full-ubuntu-xenial/149fe3e/logs/libvirt/libvirtd.txt.gz Failed test: tempest.api.compute.admin.test_volume_swap.TestVolumeSwap.test_volume_swap (I line-wrapped the logs manually.) (1) We see _swap_volume() invokes the Libvirt blockRebase(), which calls 'drive-mirror' (ID: libvirt-25) ----- 2016-10-04 15:54:45.308+0000: 30554: debug : qemuMonitorJSONCommandWithFd:291 : Send command '{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk1","target":"/dev/disk/by-path /ip-10.205.221.168:3260-iscsi-iqn.2010-10.org.openstack:volume-7b86913d-78e4-4b68-8df2-63f1a4c6656a-lun-1","sync":"full","mode":"existing","format":"raw"},"id":"libvirt-25"}' for write with FD -1 ----- (2.a) Then, libvirt queries QEMU for job status (ID: 'libvirt-26') via `query-block-jobs` command: ----- 2016-10-04 15:54:45.332+0000: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-26"} ----- (2.b) Response for libvirt-26. (Here we see: "len" == 1073741824, "offset" == 0. The copy job has just begun). ----- 2016-10-04 15:54:45.332+0000: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 0, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-26"} ----- [Libvirt keeps on querying for job status, via `query-block-jobs`] [Snip three polling requests 27, 28, 29] (3.a) Next query (ID: libvirt-30): ----- 2016-10-04 15:54:50.060+0000: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-30"} ----- (3.b) Response for libvirt-30. (Here we see: "len" == 1073741824, "offset" == 734003200 in bytes. Copy operation is still in progress.) ----- 2016-10-04 15:54:50.061+0000: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 734003200, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-30"} ----- [Snip four (IDs, 31, 32, 33, 34) more job status request / response round-trips.] (4.a) Query for job status again (ID: libvirt-35): ----- 2016-10-04 15:54:50.060+0000: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-30"} ----- (4.b) In response, finally, we see len (1073741824) == offset (1073741824): ----- 2016-10-04 15:54:52.076+0000: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 1073741824, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-35"} ----- Above, the "ready" flag does not yet indicate 'true', which signals the job has _really_ ended. Drilling down further.... (5) The abort + pivot failed (as Nova calls "dev.abort_job(pivot=True)"): ------- 2016-10-04 15:54:52.076+0000: 30555: debug : qemuDomainObjExitMonitorInternal:1923 : Exited monitor (mon=0x7fa43000ee60 vm=0x7fa414002a70 name=instance-00000008) 2016-10-04 15:54:52.076+0000: 30555: error : qemuDomainBlockPivot:16137 : block copy still active: disk 'vdb' not ready for pivot yet 2016-10-04 15:54:52.076+0000: 30555: debug : qemuBlockJobSyncEnd:242 : disk=vdb 2016-10-04 15:54:52.076+0000: 30555: debug : qemuDomainObjEndJob:1831 : Stopping job: modify (async=none vm=0x7fa414002a70 name=instance-00000008) ------- (6) HOWEVER, *here* we see the event BLOCK_JOB_READY, indicating the job is now ready for pivot. Unfortunately, this turns out to be too late -- as libvirt was asked to pivot too early in step (5) above. ------- 2016-10-04 15:54:52.333+0000: 30550: info : qemuMonitorIOProcess:423 : QEMU_MONITOR_IO_PROCESS: mon=0x7fa43000ee60 buf={"timestamp": {"seconds": 1475596492, "microseconds": 333414}, "event": "BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk1", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}} ------- -- /kashyap __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
