Reviewed: https://review.openstack.org/485752 Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=14c38ac0f253036da79f9d07aedf7dfd5778fde8 Submitter: Jenkins Branch: master
commit 14c38ac0f253036da79f9d07aedf7dfd5778fde8 Author: Kashyap Chamarthy <kcham...@redhat.com> Date: Thu Jul 20 19:01:23 2017 +0200 libvirt: Post-migration, set cache value for Cinder volume(s) This was noticed in a downstream bug when a Nova instance with Cinder volume (in this case, both the Nova instance storage _and_ Cinder volume are located on Ceph) is migrated to a target Compute node, the disk cache value for the Cinder volume gets changed. I.e. the QEMU command-line for the Cinder volume stored on Ceph turns into the following: Pre-migration, QEMU command-line for the Nova instance: [...] -drive file=rbd:volumes/volume-[...],cache=writeback Post-migration, QEMU command-line for the Nova instance: [...] -drive file=rbd:volumes/volume-[...],cache=none Furthermore, Jason Dillaman from Ceph confirms RBD cache being enabled pre-migration: $ ceph --admin-daemon /var/run/qemu/ceph-client.openstack.[...] \ config get rbd_cache { "rbd_cache": "true" } And disabled, post-migration: $ ceph --admin-daemon /var/run/qemu/ceph-client.openstack.[...] \ config get rbd_cache { "rbd_cache": "false" } This change in cache value post-migration causes I/O latency on the Cinder volume. From a chat with Daniel Berrangé on IRC: Prior to live migration, Nova rewrites all the <disk> elements, and passes this updated guest XML across to target libvirt. And it is never calling _set_cache_mode() when doing this. So `nova.conf`'s `writeback` setting is getting lost, leaving the default `cache=none` setting. And this mistake (of leaving the default cache value to 'none') will of course be correct when you reboot the guest on the target later. So: - Call _set_cache_mode() in _get_volume_config() method -- because it is a callback function to _update_volume_xml() in nova/virt/libvirt/migration.py. - And remove duplicate calls to _set_cache_mode() in _get_guest_storage_config() and attach_volume(). - Fix broken unit tests; adjust test_get_volume_config() to reflect the disk cache mode. Thanks: Jason Dillaman of Ceph for observing the change in cache modes in a downstream bug analysis, Daniel Berrangé for help in analysis from a Nova libvirt driver POV, and Stefan Hajnoczi from QEMU for help on I/O latency instrumentation with `perf`. Closes-bug: 1706083 Change-Id: I4184382b49dd2193d6a21bfe02ea973d02d8b09f ** Changed in: nova Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1706083 Title: Post-migration, Cinder volumes lose disk cache value, resulting in I/O latency Status in OpenStack Compute (nova): Fix Released Status in OpenStack Compute (nova) newton series: Confirmed Status in OpenStack Compute (nova) ocata series: Confirmed Bug description: Description =========== [This was initially reported by a Red Hat OSP customer.] The I/O latency of a Cinder volume after live migration of an instance to which it's attached increases significantly. This stays increased till the VM is stopped and started again. [VM is booted with Cinder volume. This is not the case when using a disk from a Nova store backend [ without Cinder volume] -- or at least the difference isn't so significantly high after a live migration. The storage backend is Ceph 2.0. How reproducible: Consistently Steps to Reproduce ================== (0) Both the Nova instances and Cinder volumes are located on Ceph (1) Create a Nova instance with a Cinder volume attached to it (2) Live migrate it to a target Compute node (3) Run `ioping` (`ioping -c 10 .`) on the Cinder volume. Alternatively, run other I/O benchmarks like using `fio` with 'direct=1' (which uses non-bufferred I/O) as a good sanity check to get a second opinion regarding latency. Actual result ============= Before live migration: `ioping` output on the Cinder volume attached to a Nova instance: [guest]$ sudo ioping -c 10 . 4 KiB <<< . (xfs /dev/sda1): request=1 time=98.0 us (warmup) 4 KiB <<< . (xfs /dev/sda1): request=2 time=135.6 us 4 KiB <<< . (xfs /dev/sda1): request=3 time=155.5 us 4 KiB <<< . (xfs /dev/sda1): request=4 time=161.7 us 4 KiB <<< . (xfs /dev/sda1): request=5 time=148.4 us 4 KiB <<< . (xfs /dev/sda1): request=6 time=354.3 us 4 KiB <<< . (xfs /dev/sda1): request=7 time=138.0 us (fast) 4 KiB <<< . (xfs /dev/sda1): request=8 time=150.7 us 4 KiB <<< . (xfs /dev/sda1): request=9 time=149.6 us 4 KiB <<< . (xfs /dev/sda1): request=10 time=138.6 us (fast) --- . (xfs /dev/sda1) ioping statistics --- 9 requests completed in 1.53 ms, 36 KiB read, 5.87 k iops, 22.9 MiB/s generated 10 requests in 9.00 s, 40 KiB, 1 iops, 4.44 KiB/s min/avg/max/mdev = 135.6 us / 170.3 us / 354.3 us / 65.6 us After live migration, `ioping` output on the Cinder [guest]$ sudo ioping -c 10 . 4 KiB <<< . (xfs /dev/sda1): request=1 time=1.03 ms (warmup) 4 KiB <<< . (xfs /dev/sda1): request=2 time=948.6 us 4 KiB <<< . (xfs /dev/sda1): request=3 time=955.7 us 4 KiB <<< . (xfs /dev/sda1): request=4 time=920.5 us 4 KiB <<< . (xfs /dev/sda1): request=5 time=1.03 ms 4 KiB <<< . (xfs /dev/sda1): request=6 time=838.2 us 4 KiB <<< . (xfs /dev/sda1): request=7 time=1.13 ms (slow) 4 KiB <<< . (xfs /dev/sda1): request=8 time=868.6 us 4 KiB <<< . (xfs /dev/sda1): request=9 time=985.2 us 4 KiB <<< . (xfs /dev/sda1): request=10 time=936.6 us --- . (xfs /dev/sda1) ioping statistics --- 9 requests completed in 8.61 ms, 36 KiB read, 1.04 k iops, 4.08 MiB/s generated 10 requests in 9.00 s, 40 KiB, 1 iops, 4.44 KiB/s min/avg/max/mdev = 838.2 us / 956.9 us / 1.13 ms / 81.0 us This goes back to an average of 200us again after shutting down and starting up the instance. Expected result =============== No I/O latency experienced on Cinder volumes. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1706083/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp