Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-11 Thread Matthew Booth
On Mon, Oct 10, 2016 at 5:54 PM, Matt Riedemann 
wrote:

> On 10/10/2016 6:26 AM, Kashyap Chamarthy wrote:
>
>>
>> Also, maybe you have noticed, given Eric's reply on this thread (and the
>> upstream libvir-list), it is agreed that virDomainGetBlockJobInfo()
>> should be fixed "to quit reporting cur==end when ready:false".  This
>> allows the existing Nova code work correctly without any changes.
>>
>> Discusion on Eric's reply in this thread:
>>
>> http://lists.openstack.org/pipermail/openstack-dev/2016-Octo
>> ber/105194.html
>>
>> And upstream libvir-list
>>
>> https://www.redhat.com/archives/libvir-list/2016-October/
>> msg00347.html
>>
>>
> Yeah, that's goodness long-term, but OpenStack CI won't benefit from that
> for probably at least a couple of years.


The hypervisor is a (the?) critical component of any cloud deployment.
Objectively, it's bizarre that we expect people to deploy our brand new
code to work round things that were fixed in the hypervisor 2 years ago.

Given that's where we are, though, I agree.

Matt
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-10 Thread Matt Riedemann

On 10/10/2016 6:26 AM, Kashyap Chamarthy wrote:


Also, maybe you have noticed, given Eric's reply on this thread (and the
upstream libvir-list), it is agreed that virDomainGetBlockJobInfo()
should be fixed "to quit reporting cur==end when ready:false".  This
allows the existing Nova code work correctly without any changes.

Discusion on Eric's reply in this thread:

http://lists.openstack.org/pipermail/openstack-dev/2016-October/105194.html

And upstream libvir-list

https://www.redhat.com/archives/libvir-list/2016-October/msg00347.html



Yeah, that's goodness long-term, but OpenStack CI won't benefit from 
that for probably at least a couple of years.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-10 Thread Kashyap Chamarthy
On Fri, Oct 07, 2016 at 03:32:14PM -0500, Matt Riedemann wrote:
> On 10/6/2016 7:58 AM, Kashyap Chamarthy wrote:
> > -
> > We expose the state of the copy job in the XML and forward the READY
> > event from qemu to the users.
> > 
> > A running copy job exposes itself in the xml as:
> > 
> > 
> >   
> >   
> >   
> >   
> > 
> > 
> >   
> >   [...]
> > 
> > 
> > While the ready copy job is exposed as:
> > 
> > 
> >   
> >   
> >   
> >> ready='yes'>
> > 
> > 
> >   
> >   [...]
> > 

[...]

> Thanks for the great libvirtd log analysis, that's really helpful see what's
> going on and where we fail.

Took some time to write that, glad to know that at least one person has
read it :-)

> I've replied in Matthew's patch, which I think we can get in now regardless
> and backport.

Yes, noticed it just now.

> As for the fix, it sounds like mdbooth is going to work on the event
> listener code, which I'm hesitant to backport, but honestly this is such a
> latent broken flow that I don't think we really need to backport any fixes,
> at least for the event listener work to fix this long-term. The swap-volume
> test is disabled by default in Tempest and we enable it in devstack, so we
> can control which CI environments it runs in.
 
Agreed.


Also, maybe you have noticed, given Eric's reply on this thread (and the
upstream libvir-list), it is agreed that virDomainGetBlockJobInfo()
should be fixed "to quit reporting cur==end when ready:false".  This
allows the existing Nova code work correctly without any changes.

Discusion on Eric's reply in this thread:

http://lists.openstack.org/pipermail/openstack-dev/2016-October/105194.html

And upstream libvir-list

https://www.redhat.com/archives/libvir-list/2016-October/msg00347.html

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-07 Thread Matt Riedemann

On 10/6/2016 7:58 AM, Kashyap Chamarthy wrote:

On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote:

TL;DR
-

From the debug analysis of the log below, and discussion with Eric Blake
of upstream QEMU / libvirt resulted in the below bug report:

  https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
  virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
  "ready" field of `query-block-jobs`


When I raised this on libvirt mailing list[0][1], one of the upstream
libvirt devs expressed an NACK in adjusting / "deliberately reporting
false data in block info structure".  Similar concern was also shared by
Matt Booth on #openstack-nova IRC.

Next, turns out the READY event is already exposed via the guest XML[1]:

-
We expose the state of the copy job in the XML and forward the READY
event from qemu to the users.

A running copy job exposes itself in the xml as:


  
  
  
  


  
  [...]


While the ready copy job is exposed as:


  
  
  
  


  
  [...]



Additionally we have anyncrhronous events that are emitted once qemu
notifies us that the block job has reached sync state or finished.
Libvirt uses the event to switch to the ready state.

The documentation suggests that block jobs should listen to the events
and act accordingly only after receiving the event.
-

So, Nova's is_job_complete() method & friends need to be reworked to
listen on the events for job readiness.

[0]
https://www.redhat.com/archives/libvir-list/2016-October/msg00217.html
[1] https://www.redhat.com/archives/libvir-list/2016-October/msg00229.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3



Details
---

The code in Nova that's being executed is this part in _swap_volume()
from libvirt/driver.py.

[...]
# Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to
# allow writing to existing external volume file
dev.rebase(new_path, copy=True, reuse_ext=True)

while not dev.is_job_complete():
time.sleep(0.5)


dev.abort_job(pivot=True)
[...]



[...]



Thanks for the great libvirtd log analysis, that's really helpful see 
what's going on and where we fail.


I've replied in Matthew's patch, which I think we can get in now 
regardless and backport.


As for the fix, it sounds like mdbooth is going to work on the event 
listener code, which I'm hesitant to backport, but honestly this is such 
a latent broken flow that I don't think we really need to backport any 
fixes, at least for the event listener work to fix this long-term. The 
swap-volume test is disabled by default in Tempest and we enable it in 
devstack, so we can control which CI environments it runs in.


--

Thanks,

Matt Riedemann


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-06 Thread Kashyap Chamarthy
On Thu, Oct 06, 2016 at 09:29:41AM -0500, Eric Blake wrote:
> On 10/06/2016 07:58 AM, Kashyap Chamarthy wrote:
> > On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote:
> >> TL;DR
> >> -
> >>
> >> From the debug analysis of the log below, and discussion with Eric Blake
> >> of upstream QEMU / libvirt resulted in the below bug report:
> >>
> >>   https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
> >>   virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
> >>   "ready" field of `query-block-jobs`
> > 
> > When I raised this on libvirt mailing list[0][1], one of the upstream
> > libvirt devs expressed an NACK in adjusting / "deliberately reporting
> > false data in block info structure".  Similar concern was also shared by
> > Matt Booth on #openstack-nova IRC.
> 
> I disagree with that sentiment. 

Interesting, I just noticed your argument here:

https://www.redhat.com/archives/libvir-list/2016-October/msg00249.html

> I think it is libvirt's responsibility to live up to libvirt's promise
> of virDomainGetBlockJobInfo() (namely, LIBVIRT documents that cur==end
> means the job is stable; and if qemu reports cur==end when the job is
> not stable, then it is libvirt that is lying to the upper user if it
> does NOT munge qemu's results to be accurate).
>
> As it is, we already patched libvirt to munge qemu's 0/0 into 0/1 when
> ready:false, so munging 123/123 into 122/123 when ready:false would just
> be another case of libvirt working around an infelicity of qemu.  There
> is NO INHERENT MEANING to the magnitude of cur and end, nor any
> requirement that end stays the same value across multiple calls to
> virDomainGetBlockJobInfo() - they are ONLY useful for a relative
> comparison of how much work remains to be done.  Munging the results IS
> appropriate.

Understood, this clarifies it, albeit a little messy. 

It indeed seems inconsistent to allow it in one case (like the one you
allude to above, fixed in 988218ca[1]) to adjust (& _not_ falsify, as
you accurately point out) libvirt reporting, but not the other case
(cur==end, "ready": false case when cur != 0).

So I re-opened the upstream libvirt bug[2] in light of your new
comments.
 

[1] http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=988218ca
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats &
the "ready" field of `query-block-jobs`

> That said, if you are going to work with existing libvirt that does
> not munge values, then yes, you either have to implement event
> handling or parse XML for the ready status, as existing libvirt's
> virDomainGetBlockJobInfo() is insufficient for the task.

Yep, noted.

Thanks for the great details, as usual.


-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-06 Thread Eric Blake
On 10/06/2016 07:58 AM, Kashyap Chamarthy wrote:
> On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote:
>> TL;DR
>> -
>>
>> From the debug analysis of the log below, and discussion with Eric Blake
>> of upstream QEMU / libvirt resulted in the below bug report:
>>
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
>>   virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
>>   "ready" field of `query-block-jobs`
> 
> When I raised this on libvirt mailing list[0][1], one of the upstream
> libvirt devs expressed an NACK in adjusting / "deliberately reporting
> false data in block info structure".  Similar concern was also shared by
> Matt Booth on #openstack-nova IRC.

I disagree with that sentiment.  I think it is libvirt's responsibility
to live up to libvirt's promise of virDomainGetBlockJobInfo() (namely,
LIBVIRT documents that cur==end means the job is stable; and if qemu
reports cur==end when the job is not stable, then it is libvirt that is
lying to the upper user if it does NOT munge qemu's results to be accurate).

As it is, we already patched libvirt to munge qemu's 0/0 into 0/1 when
ready:false, so munging 123/123 into 122/123 when ready:false would just
be another case of libvirt working around an infelicity of qemu.  There
is NO INHERENT MEANING to the magnitude of cur and end, nor any
requirement that end stays the same value across multiple calls to
virDomainGetBlockJobInfo() - they are ONLY useful for a relative
comparison of how much work remains to be done.  Munging the results IS
appropriate.

That said, if you are going to work with existing libvirt that does not
munge values, then yes, you either have to implement event handling or
parse XML for the ready status, as existing libvirt's
virDomainGetBlockJobInfo() is insufficient for the task.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-06 Thread Kashyap Chamarthy
On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote:
> TL;DR
> -
> 
> From the debug analysis of the log below, and discussion with Eric Blake
> of upstream QEMU / libvirt resulted in the below bug report:
> 
>   https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
>   virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
>   "ready" field of `query-block-jobs`

When I raised this on libvirt mailing list[0][1], one of the upstream
libvirt devs expressed an NACK in adjusting / "deliberately reporting
false data in block info structure".  Similar concern was also shared by
Matt Booth on #openstack-nova IRC.

Next, turns out the READY event is already exposed via the guest XML[1]:

-
We expose the state of the copy job in the XML and forward the READY
event from qemu to the users.

A running copy job exposes itself in the xml as:


  
  
  
  


  
  [...]


While the ready copy job is exposed as:


  
  
  
  


  
  [...]



Additionally we have anyncrhronous events that are emitted once qemu
notifies us that the block job has reached sync state or finished.
Libvirt uses the event to switch to the ready state.

The documentation suggests that block jobs should listen to the events
and act accordingly only after receiving the event.
-

So, Nova's is_job_complete() method & friends need to be reworked to
listen on the events for job readiness.

[0]
https://www.redhat.com/archives/libvir-list/2016-October/msg00217.html
[1] https://www.redhat.com/archives/libvir-list/2016-October/msg00229.html
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3
 
> 
> Details
> ---
> 
> The code in Nova that's being executed is this part in _swap_volume()
> from libvirt/driver.py.
> 
> [...]
> # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to
> # allow writing to existing external volume file
> dev.rebase(new_path, copy=True, reuse_ext=True)
> 
> while not dev.is_job_complete():
> time.sleep(0.5)
>
> 
> dev.abort_job(pivot=True)
> [...]
> 

[...]

-- 
/kashyap

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

2016-10-05 Thread Kashyap Chamarthy
TL;DR
-

>From the debug analysis of the log below, and discussion with Eric Blake
of upstream QEMU / libvirt resulted in the below bug report:

  https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
  virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
  "ready" field of `query-block-jobs`

Until the above is fixed, I think _swap_volume() method in Nova could
either:

  (a) just try to either wait until it sees the BLOCK_JOB_READY event,
  which is emitted when a 'drive-mirror' job.  

  (b) Or keep polling, until we see "ready": true field (in the response
  of `query-block-jobs` inviked via libvirt blockJobInfo() 


Details
---

The code in Nova that's being executed is this part in _swap_volume()
from libvirt/driver.py.

[...]
# Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to
# allow writing to existing external volume file
dev.rebase(new_path, copy=True, reuse_ext=True)

while not dev.is_job_complete():
time.sleep(0.5)

dev.abort_job(pivot=True)
[...]


(Optional Reading) libvirt / QEMU traffic log analysis 
--

Libvirt debug log (NB: 61MB)
http://logs.openstack.org/73/374373/2/check/gate-tempest-dsvm-full-ubuntu-xenial/149fe3e/logs/libvirt/libvirtd.txt.gz
Failed test: 
tempest.api.compute.admin.test_volume_swap.TestVolumeSwap.test_volume_swap

(I line-wrapped the logs manually.)

(1) We see _swap_volume() invokes the Libvirt blockRebase(), which calls
'drive-mirror' (ID: libvirt-25)

-
2016-10-04 15:54:45.308+: 30554: debug : qemuMonitorJSONCommandWithFd:291 :
Send command
'{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk1","target":"/dev/disk/by-path
/ip-10.205.221.168:3260-iscsi-iqn.2010-10.org.openstack:volume-7b86913d-78e4-4b68-8df2-63f1a4c6656a-lun-1","sync":"full","mode":"existing","format":"raw"},"id":"libvirt-25"}'
for write with FD -1
-


(2.a) Then, libvirt queries QEMU for job status (ID: 'libvirt-26') via
`query-block-jobs` command:

-
2016-10-04 15:54:45.332+: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-26"}
-

(2.b) Response for libvirt-26.  (Here we see: "len" == 1073741824,
"offset" == 0.  The copy job has just begun).

-
2016-10-04 15:54:45.332+: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 0,
"paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id":
"libvirt-26"}
-

[Libvirt keeps on querying for job status, via `query-block-jobs`]
[Snip three polling requests 27, 28, 29]


(3.a) Next query (ID: libvirt-30):

-
2016-10-04 15:54:50.060+: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-30"}
-

(3.b) Response for libvirt-30.  (Here we see: "len" == 1073741824,
"offset" == 734003200 in bytes.  Copy operation is still in progress.)

-
2016-10-04 15:54:50.061+: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset":
734003200, "paused": false, "speed": 0, "ready": false, "type":
"mirror"}], "id": "libvirt-30"}
-

[Snip four (IDs, 31, 32, 33, 34) more job status request / response
round-trips.]


(4.a) Query for job status again (ID: libvirt-35):

-
2016-10-04 15:54:50.060+: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-30"}
-

(4.b) In response, finally, we see len (1073741824) == offset (1073741824):

-
2016-10-04 15:54:52.076+: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset":
1073741824, "paused": false, "speed": 0, "ready": false, "type":
"mirror"}], "id": "libvirt-35"}
-

Above, the "ready" flag does not yet indicate 'true', which signals the
job has _really_ ended.


Drilling down further


(5) The abort + pivot failed (as Nova calls
"dev.abort_job(pivot=True)"):

---
2016-10-04 15:54:52.076+: 30555: debug :
qemuDomainObjExitMonitorInternal:1923 : Exited monitor (mon=0x7fa43000ee60
vm=0x7fa414002a70 name=instance-0008)

2016-10-04 15:54:52.076+: 30555: error : qemuDomainBlockPivot:16137 : block
copy still active: disk 'vdb' not ready for pivot yet

2016-10-04 15:54:52.076+: 30555: debug : qemuBlockJobSyncEnd:242 : disk=vdb

2016-10-04 15:54:52.076+: 30555: debug : qemuDomainObjEndJob:1831 :
Stopping job: modify (async=none vm=0x7fa414002a70 name=instance-0008)
---

(6) HOWEVER, *here* we see