Hi Paul, Comments inline: 2015-11-23 16:36 GMT+08:00 Paul Carlton <[email protected]>:
> John > > At the live migration sub team meeting I undertook to look at the issue > of progress reporting. > > The use cases I'm envisaging are... > > As a user I want to know how much longer my instance will be migrating > for. > > As an operator I want to identify any migration that are making slow > progress so I can expedite their progress or abort them. > > The current implementation reports on the instance's migration with > respect to memory transfer, using the total memory and memory remaining > fields from libvirt to report the percentage of memory still to be > transferred. Due to the instance writing to pages already transferred > this percentage can go up as well as down. Daniel has done a good job > of generating regular log records to report progress and highlight lack > of progress but from the API all a user/operator can see is the current > percentage complete. By observing this periodically they can identify > instance migrations that are struggling to migrate memory pages fast > enough to keep pace with the instance's memory updates. > > The problem is that at present we have only one field, the instance > progress, to record progress. With a live migration there are measures > [Shaohe]: >From this link, OpenStack API ref: http://developer.openstack.org/api-ref-compute-v2.1.html#listDetailServers It describe the instance progress: A percentage value of the build progress. But for libvirt driver it does be migration progress. For other driver it is building progress. And there is a spec to propose some change. https://review.openstack.org/#/c/249086/ > of progress, how much of the ephemeral disks (not needed for shared > disk setups) have been copied and how much of the memory has been > copied. Both can go up and down as the instance writes to pages already > copied causing those pages to need to be copied again. As Daniel says > in his comments in the code, the disk size could dwarf the memory so > reporting both in single percentage number is problematic. > > We could add an additional progress item to the instance object, i.e. > disk progress and memory progress but that seems odd to have an > additional progress field only for this operation so this is probably > a non starter! > > For operations staff with access to log files we could report disk > progress as well as memory in the log file, however that does not > address the needs of users and whilst log files are the right place for > support staff to look when investigating issues operational tooling > is much better served by notification messages. > > Thus I'd recommend generating periodic notifications during a migration > to report both memory and disk progress would be useful? Cloud > operators are likely to manage their instance migration activity using > some orchestration tooling which could consume these notifications and > deduce what challenges the instance migration is encountering and thus > determine how to address any issues. > > The use cases are only partially addressed by the current > implementation, they can repeatedly get the server details and look at > the progress percentage to see how quickly (or even if) it is > increasing and determine how long the instance is likely to be > migrating for. However for an instance that has a large disk and/or > is doing a high rate of disk i/o they may see the percentage complete > (i.e. memory) repeatedly showing 90%+ but the instance migration does > not complete. > > The nova spec https://review.openstack.org/#/c/248472/ suggests making > detailed information available via the os-migrations object. This is > not a bad idea but I have some issues with the implementation that I > will share on that spec. > [Shaohe]: About this spec, Daniel has give some comments on it, and we have updated it. Maybe we can work together on it to make it more better. I have worked on libvirt multi-thread compress migration for libvirt. and looks into some live migrations performance optimizations. and generate an ideas: 1. Let nova expose more live migration details, such as the RAM statistics, xbzrle-cache status, also the information of multi-thread compression in future, and so on. 2. nova can enable auto-converge, tune the xbzrle-cache and multi-thread compression dynamically. 3. Then other project can make a good strategy to tune the live migration base on the migration details. For example: cache size is a performance key for xbzrle, the best is that the cache size are same with the guest total RAM, but this maybe not always available on host. Multi-thread compress level is higher is better, but it is cpu consume, Auto converge will slow down the CPU running. Seems things not always as good as I had expected. Also we have submit a topic to summit about this idea, but not accepted. Topic: <Towards Robust Live Migration in Dynamic Environments> Link: https://www.openstack.org/summit/tokyo-2015/vote-for-speakers/presentation/4971 We looking into other hypervisor, it does not expose so many details. And Daniel are right. we should not expose so low level QEMU specific implementation details. > > -- Paul Carlton Software Engineer Cloud Services > Hewlett Packard Enterprise > BUK03:T242 > Longdown Avenue > Stoke Gifford > Bristol BS34 8QZ > Mobile: +44 (0)7768 994283 > Email: mailto:[email protected] > Hewlett-Packard Enterprise Limited > registered Office: Cain Road, Bracknell, Berks RG12 1HN Registered No: > 690597 England. > The contents of this message and any attachments to it are confidential > and may be legally privileged. > If you have received this message in error, you should delete it from your > system immediately and advise the sender. > To any recipient of this message within HP, unless otherwise stated you > should consider this message and attachments as "HP CONFIDENTIAL". > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
