On 09/20/2016 11:20 AM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 11:01:23AM -0400, Sean Dague wrote:
>> On 09/20/2016 10:38 AM, Daniel P. Berrange wrote:
>>> On Tue, Sep 20, 2016 at 09:20:15AM -0400, Sean Dague wrote:
>>>> This is a bit delayed due to the release rush, finally getting back to
>>>> writing up my experiences at the Ops Meetup.
>>>> Nova Feedback Session
>>>> =====================
>>>> We had a double session for Feedback for Nova from Operators, raw
>>>> etherpad here - https://etherpad.openstack.org/p/NYC-ops-Nova.
>>>> The median release people were on in the room was Kilo. Some were
>>>> upgrading to Liberty, many had older than Kilo clouds. Remembering
>>>> these are the larger ops environments that are engaged enough with the
>>>> community to send people to the Ops Meetup.
>>>> Performance Bottlenecks
>>>> -----------------------
>>>> * scheduling issues with Ironic - (this is a bug we got through during
>>>>   the week after the session)
>>>> * live snapshots actually end up performance issue for people
>>>> The workarounds config group was not well known, and everyone in the
>>>> room wished we advertised that a bit more. The solution for snapshot
>>>> performance is in there
>>>> There were also general questions about what scale cells should be
>>>> considered at.
>>>> ACTION: we should make sure workarounds are advertised better
>>> Workarounds ought to be something that admins are rarely, if
>>> ever, having to deal with.
>>> If the lack of live snapshot is such a major performance problem
>>> for ops, this tends to suggest that our default behaviour is wrong,
>>> rather than a need to publicise that operators should set this
>>> workaround.
>>> eg, instead of optimizing for the case of a broken live snapshot
>>> support by default, we should optimize for the case of working
>>> live snapshot by default. The broken live snapshot stuff was so
>>> rare that no one has ever reproduced it outside of the gate
>>> AFAIK.
>>> IOW, rather than hardcoding disable_live_snapshot=True in nova,
>>> we should just set it in the gate CI configs, and leave it set
>>> to False in Nova, so operators get good performance out of the
>>> box.
>>> Also it has been a while since we added the workaround, and IIRC,
>>> we've got newer Ubuntu available on at least some of the gate
>>> hosts now, so we have the ability to test to see if it still
>>> hits newer Ubuntu. 
>> Here is my reconstruction of the snapshot issue from what I can remember
>> of the conversation.
>> Nova defaults to live snapshots. This uses the libvirt facility which
>> dumps both memory and disk. And then we throw away the memory. For large
>> memory guests (especially volume backed ones that might have a fast path
>> for the disk) this leads to a lot of overhead for no gain. The
>> workaround got them past it.
> I think you've got it backwards there.
> Nova defaults to *not* using live snapshots:
>     cfg.BoolOpt(
>         'disable_libvirt_livesnapshot',
>         default=True,
>         help="""
> Disable live snapshots when using the libvirt driver.
> ...""")
> When live snapshot is disabled like this, the snapshot code is unable
> to guarantee a consistent disk state. So the libvirt nova driver will
> stop the guest by doing a managed save (this saves all memory to
> disk), then does the disk snapshot, then restores the managed saved
> (which loads all memory from disk).
> This is terrible for multiple reasons
>   1. the guest workload stops running while snapshot is taken
>   2. we churn disk I/O saving & loading VM memory
>   3. you can't do it at all if host PCI devices are attached to
>      the VM
> Enabling live snapshots by default fixes all these problems, at the
> risk of hitting the live snapshot bug we saw in the gate CI but never
> anywhere else.

Ah, right. I'll propose inverting the default and we'll see if we can
get past the testing in the gate - https://review.openstack.org/#/c/373430/


Sean Dague

OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

Reply via email to