Reviewed: https://review.opendev.org/c/openstack/nova/+/828387 Committed: https://opendev.org/openstack/nova/commit/de110b042d8e340d19a52b9fb7ef6f4c52bc0762 Submitter: "Zuul (22348)" Branch: master
commit de110b042d8e340d19a52b9fb7ef6f4c52bc0762 Author: Pedro Almeida <[email protected]> Date: Tue Feb 8 14:51:46 2022 -0300 Update live_migration_downtime definition Before, the definition of live_migration_downtime didn't explain if any exception/timeout occurs if the migration exceeds the value. This is just used as a reference for nova and if any problem happens when the VM gets paused, there will be no abort or force-complete. Closes-Bug: #1960345 Signed-off-by: Pedro Almeida <[email protected]> Change-Id: I336481d1801a367b5628fedcd2aa5f5cf763355a ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1960345 Title: Nova documentation isn't clear enough about live_migration_downtime behavior Status in OpenStack Compute (nova): Fix Released Bug description: https://docs.openstack.org/nova/xena/admin/configuring-migrations.html says that "live_migration_downtime sets the maximum permitted downtime for a live migration, in milliseconds. The default is 500." but it's not clear enough about what happens (or *if* something happens) if that "maximum permitted downtime" gets exceeded. It seems there's no timeout action regarding the downtime and IMO it's misleading the user to think so. Downtime increased to max: nova-compute-controller-0-937646f6-9q4n9 nova-compute 2022-02-02 16:59:44.477 1552666 INFO nova.virt.libvirt.migration [-] [instance: 5d91f6cc-dcc4-4f1f-8285-0b682284ac35] Increasing downtime to 100 ms after 72 sec elapsed time Downtime being exceeded: (controller-0) 2022-02-02 17:00:06.503+0000: 3737473: debug : qemuProcessHandleStop:674 : Transitioned guest instance-0000000b to paused state, reason migration (controller-1) 2022-02-02 17:00:06.613+0000: 4075521: debug : qemuProcessHandleResume:726 : Transitioned guest instance-0000000b out of paused into resumed state Also on libvirt logs: 2022-02-02 17:00:06.579+0000: 3737473: info : qemuMonitorJSONIOProcessLine:217 : QEMU_MONITOR_RECV_REPLY: mon=0x7f92e00f4ae0 reply={"return": {"status": "completed", "setup-time": 190, "downtime": 179, "total-time": 91054, "ram": {"total": 8594989056, "postcopy-requests": 0, "dirty-sync-count": 128, "multifd-bytes": 0, "page-size": 4096, "remaining": 0, "mbps": 7196.567648, "transferred": 81738286801, "duplicate": 870668, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 81571127296, "normal": 19914826}}, "id": "libvirt-201"} <downtime>75</downtime> <downtime>109</downtime> <downtime>109</downtime> <downtime>109</downtime> And there was no timeout exception. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1960345/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : [email protected] Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp

