Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point
+openstack-operators to see if others have the same use case On 5/31/2018 5:14 PM, Moore, Curt wrote: We recently upgraded from Liberty to Pike and looking ahead to the code in Queens, noticed the image download deprecation notice with instructions to post here if this interface was in use. As such, I’d like to explain our use case and see if there is a better way of accomplishing our goal or lobby for the "un-deprecation" of this extension point. Thanks for speaking up - this is much easier *before* code is removed. As with many installations, we are using Ceph for both our Glance image store and VM instance disks. In a normal workflow when both Glance and libvirt are configured to use Ceph, libvirt reacts to the direct_url field on the Glance image and performs an in-place clone of the RAW disk image from the images pool into the vms pool all within Ceph. The snapshot creation process is very fast and is thinly provisioned as it’s a COW snapshot. This underlying workflow itself works great, the issue is with performance of the VM’s disk within Ceph, especially as the number of nodes within the cluster grows. We have found, especially with Windows VMs (largely as a result of I/O for the Windows pagefile), that the performance of the Ceph cluster as a whole takes a very large hit in keeping up with all of this I/O thrashing, especially when Windows is booting. This is not the case with Linux VMs as they do not use swap as frequently as do Windows nodes with their pagefiles. Windows can be run without a pagefile but that leads to other odditites within Windows. I should also mention that in our case, the nodes themselves are ephemeral and we do not care about live migration, etc., we just want raw performance. As an aside on our Ceph setup without getting into too many details, we have very fast SSD based Ceph nodes for this pool (separate crush root, SSDs for both OSD and journals, 2 replicas), interconnected on the same switch backplane, each with bonded 10GB uplinks to the switch. Our Nova nodes are within the same datacenter (also have bonded 10GB uplinks to their switches) but are distributed across different switches. We could move the Nova nodes to the same switch as the Ceph nodes but that is a larger logistical challenge to rearrange many servers to make space. Back to our use case, in order to isolate this heavy I/O, a subset of our compute nodes have a local SSD and are set to use qcow2 images instead of rbd so that libvirt will pull the image down from Glance into the node’s local image cache and run the VM from the local SSD. This allows Windows VMs to boot and perform their initial cloudbase-init setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph cluster load. Additionally, this prevents us from "wasting" IOPS and instead keep them local to the Nova node, reclaiming the network bandwidth and Ceph IOPS for use by Cinder volumes. This is essentially the use case outlined here in the "Do designate some non-Ceph compute hosts with low-latency local storage" section: https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/ The challenge is that transferring the Glance image transfer is _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB Windows image (It’s Windows, it’s huge with all of the necessary tools installed)). If libvirt can instead perform an RBD export on the image using the image download functionality, it is able to download the same image in ~30 sec. We have code that is performing the direct download from Glance over RBD and it works great in our use case which is very similar to the code in this older patch: https://review.openstack.org/#/c/44321/ It looks like at the time this had general approval (i.e. it wasn't considered crazy) but was blocked simply due to the Havana feature freeze. That's good to know. We could look at attaching an additional ephemeral disk to the instance and have cloudbase-init use it as the pagefile but it appears that if libvirt is using rbd for its images_type, _all_ disks must then come from Ceph, there is no way at present to allow the VM image to run from Ceph and have an ephemeral disk mapped in from node-local storage. Even still, this would have the effect of "wasting" Ceph IOPS for the VM disk itself which could be better used for other purposes. When you mentioned the swap above I was thinking similar to this, attaching a swap device but as you've pointed out, all disks local to the compute host are going to use the same image type backend, so you can't have the root disk and swap/ephemeral disks using different image backends. Based on what I have explained about our use case, is there a better/different way to accomplish the same goal without using the deprecated image download functionality? If not, can we work to "un-deprecate" the download extension point? Should I work to get the code for this R
Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point
On 6/1/2018 12:44 AM, Chris Friesen wrote: > On 05/31/2018 04:14 PM, Curt Moore wrote: >> The challenge is that transferring the Glance image transfer is >> _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB >> Windows image (It’s Windows, it’s huge with all of the necessary >> tools installed)). If libvirt can instead perform an RBD export on >> the image using the image download functionality, it is able to >> download the same image in ~30 sec. > This seems oddly slow. I just downloaded a 1.6 GB image from glance in > slightly under 10 seconds. That would map to about 5 minutes for a > 50GB image. Agreed. There's nothing really special about the Glance API setup, we have multiple load balanced instances behind HAProxy. However, in our use case, we are very sensitive to node spin-up time so anything we can do to reduce this time is desired. If a VM lands on a compute node where the image isn't yet locally cached, paying an additional 5 min penalty is undesired. >> We could look at attaching an additional ephemeral disk to the >> instance and have cloudbase-init use it as the pagefile but it >> appears that if libvirt is using rbd for its images_type, _all_ disks >> must then come from Ceph, there is no way at present to allow the VM >> image to run from Ceph and have an ephemeral disk mapped in from >> node-local storage. Even still, this would have the effect of >> "wasting" Ceph IOPS for the VM disk itself which could be better used >> for other purposes. Based on what I have explained about our use >> case, is there a better/different way to accomplish the same goal >> without using the deprecated image download functionality? If not, >> can we work to "un-deprecate" the download extension point? Should I >> work to get the code for this RBD download into the upstream repository? > Have you considered using compute nodes configured for local storage > but then use boot-from-volume with cinder and glance both using ceph? > I *think* there's an optimization there such that the volume creation > is fast. Assuming the volume creation is indeed fast, in this scenario > you could then have a local ephemeral/swap disk for your pagefile. > You'd still have your VM root disks on ceph though. Understood. Booting directly from a Cinder volume would work, but as you mention, we'd still have the VM root disks in Ceph, using the expensive Ceph SSD IOPS for no good reason. I'm trying to get the best of both worlds by keeping the Glance images in Ceph and also keeping all VM I/O local to the compute node. -Curt CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point
On 05/31/2018 04:14 PM, Moore, Curt wrote: The challenge is that transferring the Glance image transfer is _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB Windows image (It’s Windows, it’s huge with all of the necessary tools installed)). If libvirt can instead perform an RBD export on the image using the image download functionality, it is able to download the same image in ~30 sec. This seems oddly slow. I just downloaded a 1.6 GB image from glance in slightly under 10 seconds. That would map to about 5 minutes for a 50GB image. We could look at attaching an additional ephemeral disk to the instance and have cloudbase-init use it as the pagefile but it appears that if libvirt is using rbd for its images_type, _all_ disks must then come from Ceph, there is no way at present to allow the VM image to run from Ceph and have an ephemeral disk mapped in from node-local storage. Even still, this would have the effect of "wasting" Ceph IOPS for the VM disk itself which could be better used for other purposes. Based on what I have explained about our use case, is there a better/different way to accomplish the same goal without using the deprecated image download functionality? If not, can we work to "un-deprecate" the download extension point? Should I work to get the code for this RBD download into the upstream repository? Have you considered using compute nodes configured for local storage but then use boot-from-volume with cinder and glance both using ceph? I *think* there's an optimization there such that the volume creation is fast. Assuming the volume creation is indeed fast, in this scenario you could then have a local ephemeral/swap disk for your pagefile. You'd still have your VM root disks on ceph though. Chris __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point
Hello. We recently upgraded from Liberty to Pike and looking ahead to the code in Queens, noticed the image download deprecation notice with instructions to post here if this interface was in use. As such, I'd like to explain our use case and see if there is a better way of accomplishing our goal or lobby for the "un-deprecation" of this extension point. As with many installations, we are using Ceph for both our Glance image store and VM instance disks. In a normal workflow when both Glance and libvirt are configured to use Ceph, libvirt reacts to the direct_url field on the Glance image and performs an in-place clone of the RAW disk image from the images pool into the vms pool all within Ceph. The snapshot creation process is very fast and is thinly provisioned as it's a COW snapshot. This underlying workflow itself works great, the issue is with performance of the VM's disk within Ceph, especially as the number of nodes within the cluster grows. We have found, especially with Windows VMs (largely as a result of I/O for the Windows pagefile), that the performance of the Ceph cluster as a whole takes a very large hit in keeping up with all of this I/O thrashing, especially when Windows is booting. This is not the case with Linux VMs as they do not use swap as frequently as do Windows nodes with their pagefiles. Windows can be run without a pagefile but that leads to other odditites within Windows. I should also mention that in our case, the nodes themselves are ephemeral and we do not care about live migration, etc., we just want raw performance. As an aside on our Ceph setup without getting into too many details, we have very fast SSD based Ceph nodes for this pool (separate crush root, SSDs for both OSD and journals, 2 replicas), interconnected on the same switch backplane, each with bonded 10GB uplinks to the switch. Our Nova nodes are within the same datacenter (also have bonded 10GB uplinks to their switches) but are distributed across different switches. We could move the Nova nodes to the same switch as the Ceph nodes but that is a larger logistical challenge to rearrange many servers to make space. Back to our use case, in order to isolate this heavy I/O, a subset of our compute nodes have a local SSD and are set to use qcow2 images instead of rbd so that libvirt will pull the image down from Glance into the node's local image cache and run the VM from the local SSD. This allows Windows VMs to boot and perform their initial cloudbase-init setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph cluster load. Additionally, this prevents us from "wasting" IOPS and instead keep them local to the Nova node, reclaiming the network bandwidth and Ceph IOPS for use by Cinder volumes. This is essentially the use case outlined here in the "Do designate some non-Ceph compute hosts with low-latency local storage" section: https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/ The challenge is that transferring the Glance image transfer is _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB Windows image (It's Windows, it's huge with all of the necessary tools installed)). If libvirt can instead perform an RBD export on the image using the image download functionality, it is able to download the same image in ~30 sec. We have code that is performing the direct download from Glance over RBD and it works great in our use case which is very similar to the code in this older patch: https://review.openstack.org/#/c/44321/ We could look at attaching an additional ephemeral disk to the instance and have cloudbase-init use it as the pagefile but it appears that if libvirt is using rbd for its images_type, _all_ disks must then come from Ceph, there is no way at present to allow the VM image to run from Ceph and have an ephemeral disk mapped in from node-local storage. Even still, this would have the effect of "wasting" Ceph IOPS for the VM disk itself which could be better used for other purposes. Based on what I have explained about our use case, is there a better/different way to accomplish the same goal without using the deprecated image download functionality? If not, can we work to "un-deprecate" the download extension point? Should I work to get the code for this RBD download into the upstream repository? Thanks, -Curt CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of the intended recipient(s) and contain information that may be Garmin confidential and/or Garmin legally privileged. If you have received this email in error, please notify the sender by reply email and delete the message. Any disclosure, copying, distribution or use of this communication (including attachments) by someone other than the intended recipient is prohibited. Thank you.