Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On Mon, Oct 10, 2016 at 5:54 PM, Matt Riedemann wrote: > On 10/10/2016 6:26 AM, Kashyap Chamarthy wrote: > >> >> Also, maybe you have noticed, given Eric's reply on this thread (and the >> upstream libvir-list), it is agreed that virDomainGetBlockJobInfo() >> should be fixed "to quit reporting cur==end when ready:false". This >> allows the existing Nova code work correctly without any changes. >> >> Discusion on Eric's reply in this thread: >> >> http://lists.openstack.org/pipermail/openstack-dev/2016-Octo >> ber/105194.html >> >> And upstream libvir-list >> >> https://www.redhat.com/archives/libvir-list/2016-October/ >> msg00347.html >> >> > Yeah, that's goodness long-term, but OpenStack CI won't benefit from that > for probably at least a couple of years. The hypervisor is a (the?) critical component of any cloud deployment. Objectively, it's bizarre that we expect people to deploy our brand new code to work round things that were fixed in the hypervisor 2 years ago. Given that's where we are, though, I agree. Matt -- Matthew Booth Red Hat Engineering, Virtualisation Team Phone: +442070094448 (UK) __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On 10/10/2016 6:26 AM, Kashyap Chamarthy wrote: Also, maybe you have noticed, given Eric's reply on this thread (and the upstream libvir-list), it is agreed that virDomainGetBlockJobInfo() should be fixed "to quit reporting cur==end when ready:false". This allows the existing Nova code work correctly without any changes. Discusion on Eric's reply in this thread: http://lists.openstack.org/pipermail/openstack-dev/2016-October/105194.html And upstream libvir-list https://www.redhat.com/archives/libvir-list/2016-October/msg00347.html Yeah, that's goodness long-term, but OpenStack CI won't benefit from that for probably at least a couple of years. -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On Fri, Oct 07, 2016 at 03:32:14PM -0500, Matt Riedemann wrote: > On 10/6/2016 7:58 AM, Kashyap Chamarthy wrote: > > - > > We expose the state of the copy job in the XML and forward the READY > > event from qemu to the users. > > > > A running copy job exposes itself in the xml as: > > > > > > > > > > > > > > > > > > > > [...] > > > > > > While the ready copy job is exposed as: > > > > > > > > > > > >> ready='yes'> > > > > > > > > [...] > > [...] > Thanks for the great libvirtd log analysis, that's really helpful see what's > going on and where we fail. Took some time to write that, glad to know that at least one person has read it :-) > I've replied in Matthew's patch, which I think we can get in now regardless > and backport. Yes, noticed it just now. > As for the fix, it sounds like mdbooth is going to work on the event > listener code, which I'm hesitant to backport, but honestly this is such a > latent broken flow that I don't think we really need to backport any fixes, > at least for the event listener work to fix this long-term. The swap-volume > test is disabled by default in Tempest and we enable it in devstack, so we > can control which CI environments it runs in. Agreed. Also, maybe you have noticed, given Eric's reply on this thread (and the upstream libvir-list), it is agreed that virDomainGetBlockJobInfo() should be fixed "to quit reporting cur==end when ready:false". This allows the existing Nova code work correctly without any changes. Discusion on Eric's reply in this thread: http://lists.openstack.org/pipermail/openstack-dev/2016-October/105194.html And upstream libvir-list https://www.redhat.com/archives/libvir-list/2016-October/msg00347.html -- /kashyap __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On 10/6/2016 7:58 AM, Kashyap Chamarthy wrote: On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote: TL;DR - From the debug analysis of the log below, and discussion with Eric Blake of upstream QEMU / libvirt resulted in the below bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the "ready" field of `query-block-jobs` When I raised this on libvirt mailing list[0][1], one of the upstream libvirt devs expressed an NACK in adjusting / "deliberately reporting false data in block info structure". Similar concern was also shared by Matt Booth on #openstack-nova IRC. Next, turns out the READY event is already exposed via the guest XML[1]: - We expose the state of the copy job in the XML and forward the READY event from qemu to the users. A running copy job exposes itself in the xml as: [...] While the ready copy job is exposed as: [...] Additionally we have anyncrhronous events that are emitted once qemu notifies us that the block job has reached sync state or finished. Libvirt uses the event to switch to the ready state. The documentation suggests that block jobs should listen to the events and act accordingly only after receiving the event. - So, Nova's is_job_complete() method & friends need to be reworked to listen on the events for job readiness. [0] https://www.redhat.com/archives/libvir-list/2016-October/msg00217.html [1] https://www.redhat.com/archives/libvir-list/2016-October/msg00229.html [2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3 Details --- The code in Nova that's being executed is this part in _swap_volume() from libvirt/driver.py. [...] # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to # allow writing to existing external volume file dev.rebase(new_path, copy=True, reuse_ext=True) while not dev.is_job_complete(): time.sleep(0.5) dev.abort_job(pivot=True) [...] [...] Thanks for the great libvirtd log analysis, that's really helpful see what's going on and where we fail. I've replied in Matthew's patch, which I think we can get in now regardless and backport. As for the fix, it sounds like mdbooth is going to work on the event listener code, which I'm hesitant to backport, but honestly this is such a latent broken flow that I don't think we really need to backport any fixes, at least for the event listener work to fix this long-term. The swap-volume test is disabled by default in Tempest and we enable it in devstack, so we can control which CI environments it runs in. -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On Thu, Oct 06, 2016 at 09:29:41AM -0500, Eric Blake wrote: > On 10/06/2016 07:58 AM, Kashyap Chamarthy wrote: > > On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote: > >> TL;DR > >> - > >> > >> From the debug analysis of the log below, and discussion with Eric Blake > >> of upstream QEMU / libvirt resulted in the below bug report: > >> > >> https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- > >> virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the > >> "ready" field of `query-block-jobs` > > > > When I raised this on libvirt mailing list[0][1], one of the upstream > > libvirt devs expressed an NACK in adjusting / "deliberately reporting > > false data in block info structure". Similar concern was also shared by > > Matt Booth on #openstack-nova IRC. > > I disagree with that sentiment. Interesting, I just noticed your argument here: https://www.redhat.com/archives/libvir-list/2016-October/msg00249.html > I think it is libvirt's responsibility to live up to libvirt's promise > of virDomainGetBlockJobInfo() (namely, LIBVIRT documents that cur==end > means the job is stable; and if qemu reports cur==end when the job is > not stable, then it is libvirt that is lying to the upper user if it > does NOT munge qemu's results to be accurate). > > As it is, we already patched libvirt to munge qemu's 0/0 into 0/1 when > ready:false, so munging 123/123 into 122/123 when ready:false would just > be another case of libvirt working around an infelicity of qemu. There > is NO INHERENT MEANING to the magnitude of cur and end, nor any > requirement that end stays the same value across multiple calls to > virDomainGetBlockJobInfo() - they are ONLY useful for a relative > comparison of how much work remains to be done. Munging the results IS > appropriate. Understood, this clarifies it, albeit a little messy. It indeed seems inconsistent to allow it in one case (like the one you allude to above, fixed in 988218ca[1]) to adjust (& _not_ falsify, as you accurately point out) libvirt reporting, but not the other case (cur==end, "ready": false case when cur != 0). So I re-opened the upstream libvirt bug[2] in light of your new comments. [1] http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=988218ca [2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the "ready" field of `query-block-jobs` > That said, if you are going to work with existing libvirt that does > not munge values, then yes, you either have to implement event > handling or parse XML for the ready status, as existing libvirt's > virDomainGetBlockJobInfo() is insufficient for the task. Yep, noted. Thanks for the great details, as usual. -- /kashyap __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On 10/06/2016 07:58 AM, Kashyap Chamarthy wrote: > On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote: >> TL;DR >> - >> >> From the debug analysis of the log below, and discussion with Eric Blake >> of upstream QEMU / libvirt resulted in the below bug report: >> >> https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- >> virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the >> "ready" field of `query-block-jobs` > > When I raised this on libvirt mailing list[0][1], one of the upstream > libvirt devs expressed an NACK in adjusting / "deliberately reporting > false data in block info structure". Similar concern was also shared by > Matt Booth on #openstack-nova IRC. I disagree with that sentiment. I think it is libvirt's responsibility to live up to libvirt's promise of virDomainGetBlockJobInfo() (namely, LIBVIRT documents that cur==end means the job is stable; and if qemu reports cur==end when the job is not stable, then it is libvirt that is lying to the upper user if it does NOT munge qemu's results to be accurate). As it is, we already patched libvirt to munge qemu's 0/0 into 0/1 when ready:false, so munging 123/123 into 122/123 when ready:false would just be another case of libvirt working around an infelicity of qemu. There is NO INHERENT MEANING to the magnitude of cur and end, nor any requirement that end stays the same value across multiple calls to virDomainGetBlockJobInfo() - they are ONLY useful for a relative comparison of how much work remains to be done. Munging the results IS appropriate. That said, if you are going to work with existing libvirt that does not munge values, then yes, you either have to implement event handling or parse XML for the ready status, as existing libvirt's virDomainGetBlockJobInfo() is insufficient for the task. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote: > TL;DR > - > > From the debug analysis of the log below, and discussion with Eric Blake > of upstream QEMU / libvirt resulted in the below bug report: > > https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- > virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the > "ready" field of `query-block-jobs` When I raised this on libvirt mailing list[0][1], one of the upstream libvirt devs expressed an NACK in adjusting / "deliberately reporting false data in block info structure". Similar concern was also shared by Matt Booth on #openstack-nova IRC. Next, turns out the READY event is already exposed via the guest XML[1]: - We expose the state of the copy job in the XML and forward the READY event from qemu to the users. A running copy job exposes itself in the xml as: [...] While the ready copy job is exposed as: [...] Additionally we have anyncrhronous events that are emitted once qemu notifies us that the block job has reached sync state or finished. Libvirt uses the event to switch to the ready state. The documentation suggests that block jobs should listen to the events and act accordingly only after receiving the event. - So, Nova's is_job_complete() method & friends need to be reworked to listen on the events for job readiness. [0] https://www.redhat.com/archives/libvir-list/2016-October/msg00217.html [1] https://www.redhat.com/archives/libvir-list/2016-October/msg00229.html [2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3 > > Details > --- > > The code in Nova that's being executed is this part in _swap_volume() > from libvirt/driver.py. > > [...] > # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to > # allow writing to existing external volume file > dev.rebase(new_path, copy=True, reuse_ext=True) > > while not dev.is_job_complete(): > time.sleep(0.5) > > > dev.abort_job(pivot=True) > [...] > [...] -- /kashyap __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
TL;DR - >From the debug analysis of the log below, and discussion with Eric Blake of upstream QEMU / libvirt resulted in the below bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1382165 -- virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the "ready" field of `query-block-jobs` Until the above is fixed, I think _swap_volume() method in Nova could either: (a) just try to either wait until it sees the BLOCK_JOB_READY event, which is emitted when a 'drive-mirror' job. (b) Or keep polling, until we see "ready": true field (in the response of `query-block-jobs` inviked via libvirt blockJobInfo() Details --- The code in Nova that's being executed is this part in _swap_volume() from libvirt/driver.py. [...] # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to # allow writing to existing external volume file dev.rebase(new_path, copy=True, reuse_ext=True) while not dev.is_job_complete(): time.sleep(0.5) dev.abort_job(pivot=True) [...] (Optional Reading) libvirt / QEMU traffic log analysis -- Libvirt debug log (NB: 61MB) http://logs.openstack.org/73/374373/2/check/gate-tempest-dsvm-full-ubuntu-xenial/149fe3e/logs/libvirt/libvirtd.txt.gz Failed test: tempest.api.compute.admin.test_volume_swap.TestVolumeSwap.test_volume_swap (I line-wrapped the logs manually.) (1) We see _swap_volume() invokes the Libvirt blockRebase(), which calls 'drive-mirror' (ID: libvirt-25) - 2016-10-04 15:54:45.308+: 30554: debug : qemuMonitorJSONCommandWithFd:291 : Send command '{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk1","target":"/dev/disk/by-path /ip-10.205.221.168:3260-iscsi-iqn.2010-10.org.openstack:volume-7b86913d-78e4-4b68-8df2-63f1a4c6656a-lun-1","sync":"full","mode":"existing","format":"raw"},"id":"libvirt-25"}' for write with FD -1 - (2.a) Then, libvirt queries QEMU for job status (ID: 'libvirt-26') via `query-block-jobs` command: - 2016-10-04 15:54:45.332+: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-26"} - (2.b) Response for libvirt-26. (Here we see: "len" == 1073741824, "offset" == 0. The copy job has just begun). - 2016-10-04 15:54:45.332+: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 0, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-26"} - [Libvirt keeps on querying for job status, via `query-block-jobs`] [Snip three polling requests 27, 28, 29] (3.a) Next query (ID: libvirt-30): - 2016-10-04 15:54:50.060+: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-30"} - (3.b) Response for libvirt-30. (Here we see: "len" == 1073741824, "offset" == 734003200 in bytes. Copy operation is still in progress.) - 2016-10-04 15:54:50.061+: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 734003200, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-30"} - [Snip four (IDs, 31, 32, 33, 34) more job status request / response round-trips.] (4.a) Query for job status again (ID: libvirt-35): - 2016-10-04 15:54:50.060+: 30550: info : qemuMonitorIOWrite:528 : QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60 buf={"execute":"query-block-jobs","id":"libvirt-30"} - (4.b) In response, finally, we see len (1073741824) == offset (1073741824): - 2016-10-04 15:54:52.076+: 30550: info : qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY: mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device": "drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 1073741824, "paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id": "libvirt-35"} - Above, the "ready" flag does not yet indicate 'true', which signals the job has _really_ ended. Drilling down further (5) The abort + pivot failed (as Nova calls "dev.abort_job(pivot=True)"): --- 2016-10-04 15:54:52.076+: 30555: debug : qemuDomainObjExitMonitorInternal:1923 : Exited monitor (mon=0x7fa43000ee60 vm=0x7fa414002a70 name=instance-0008) 2016-10-04 15:54:52.076+: 30555: error : qemuDomainBlockPivot:16137 : block copy still active: disk 'vdb' not ready for pivot yet 2016-10-04 15:54:52.076+: 30555: debug : qemuBlockJobSyncEnd:242 : disk=vdb 2016-10-04 15:54:52.076+: 30555: debug : qemuDomainObjEndJob:1831 : Stopping job: modify (async=none vm=0x7fa414002a70 name=instance-0008) --- (6) HOWEVER, *here* we see