Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-06-04 Thread Matt Riedemann

+openstack-operators to see if others have the same use case

On 5/31/2018 5:14 PM, Moore, Curt wrote:
We recently upgraded from Liberty to Pike and looking ahead to the code 
in Queens, noticed the image download deprecation notice with 
instructions to post here if this interface was in use.  As such, I’d 
like to explain our use case and see if there is a better way of 
accomplishing our goal or lobby for the "un-deprecation" of this 
extension point.


Thanks for speaking up - this is much easier *before* code is removed.



As with many installations, we are using Ceph for both our Glance image 
store and VM instance disks.  In a normal workflow when both Glance and 
libvirt are configured to use Ceph, libvirt reacts to the direct_url 
field on the Glance image and performs an in-place clone of the RAW disk 
image from the images pool into the vms pool all within Ceph.  The 
snapshot creation process is very fast and is thinly provisioned as it’s 
a COW snapshot.


This underlying workflow itself works great, the issue is with 
performance of the VM’s disk within Ceph, especially as the number of 
nodes within the cluster grows.  We have found, especially with Windows 
VMs (largely as a result of I/O for the Windows pagefile), that the 
performance of the Ceph cluster as a whole takes a very large hit in 
keeping up with all of this I/O thrashing, especially when Windows is 
booting.  This is not the case with Linux VMs as they do not use swap as 
frequently as do Windows nodes with their pagefiles.  Windows can be run 
without a pagefile but that leads to other odditites within Windows.


I should also mention that in our case, the nodes themselves are 
ephemeral and we do not care about live migration, etc., we just want 
raw performance.


As an aside on our Ceph setup without getting into too many details, we 
have very fast SSD based Ceph nodes for this pool (separate crush root, 
SSDs for both OSD and journals, 2 replicas), interconnected on the same 
switch backplane, each with bonded 10GB uplinks to the switch.  Our Nova 
nodes are within the same datacenter (also have bonded 10GB uplinks to 
their switches) but are distributed across different switches.  We could 
move the Nova nodes to the same switch as the Ceph nodes but that is a 
larger logistical challenge to rearrange many servers to make space.


Back to our use case, in order to isolate this heavy I/O, a subset of 
our compute nodes have a local SSD and are set to use qcow2 images 
instead of rbd so that libvirt will pull the image down from Glance into 
the node’s local image cache and run the VM from the local SSD.  This 
allows Windows VMs to boot and perform their initial cloudbase-init 
setup/reboot within ~20 sec vs 4-5 min, regardless of overall Ceph 
cluster load.  Additionally, this prevents us from "wasting" IOPS and 
instead keep them local to the Nova node, reclaiming the network 
bandwidth and Ceph IOPS for use by Cinder volumes.  This is essentially 
the use case outlined here in the "Do designate some non-Ceph compute 
hosts with low-latency local storage" section:


https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/

The challenge is that transferring the Glance image transfer is 
_glacially slow_ when using the Glance HTTP API (~30 min for a 50GB 
Windows image (It’s Windows, it’s huge with all of the necessary tools 
installed)).  If libvirt can instead perform an RBD export on the image 
using the image download functionality, it is able to download the same 
image in ~30 sec.  We have code that is performing the direct download 
from Glance over RBD and it works great in our use case which is very 
similar to the code in this older patch:


https://review.openstack.org/#/c/44321/


It looks like at the time this had general approval (i.e. it wasn't 
considered crazy) but was blocked simply due to the Havana feature 
freeze. That's good to know.




We could look at attaching an additional ephemeral disk to the instance 
and have cloudbase-init use it as the pagefile but it appears that if 
libvirt is using rbd for its images_type, _all_ disks must then come 
from Ceph, there is no way at present to allow the VM image to run from 
Ceph and have an ephemeral disk mapped in from node-local storage.  Even 
still, this would have the effect of "wasting" Ceph IOPS for the VM disk 
itself which could be better used for other purposes.


When you mentioned the swap above I was thinking similar to this, 
attaching a swap device but as you've pointed out, all disks local to 
the compute host are going to use the same image type backend, so you 
can't have the root disk and swap/ephemeral disks using different image 
backends.




Based on what I have explained about our use case, is there a 
better/different way to accomplish the same goal without using the 
deprecated image download functionality?  If not, can we work to 
"un-deprecate" the download extension point? Should I work to get the 
code for this R

Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-06-01 Thread Moore, Curt
On 6/1/2018 12:44 AM, Chris Friesen wrote:
> On 05/31/2018 04:14 PM, Curt Moore wrote:
>> The challenge is that transferring the Glance image transfer is
>> _glacially slow_ when using the Glance HTTP API (~30 min for a 50GB
>> Windows image (It’s Windows, it’s huge with all of the necessary
>> tools installed)). If libvirt can instead perform an RBD export on
>> the image using the image download functionality, it is able to
>> download the same image in ~30 sec.
> This seems oddly slow. I just downloaded a 1.6 GB image from glance in
> slightly under 10 seconds. That would map to about 5 minutes for a
> 50GB image.
Agreed.  There's nothing really special about the Glance API setup, we
have multiple load balanced instances behind HAProxy.  However, in our
use case, we are very sensitive to node spin-up time so anything we can
do to reduce this time is desired.  If a VM lands on a compute node
where the image isn't yet locally cached, paying an additional 5 min
penalty is undesired.
>> We could look at attaching an additional ephemeral disk to the
>> instance and have cloudbase-init use it as the pagefile but it
>> appears that if libvirt is using rbd for its images_type, _all_ disks
>> must then come from Ceph, there is no way at present to allow the VM
>> image to run from Ceph and have an ephemeral disk mapped in from
>> node-local storage. Even still, this would have the effect of
>> "wasting" Ceph IOPS for the VM disk itself which could be better used
>> for other purposes. Based on what I have explained about our use
>> case, is there a better/different way to accomplish the same goal
>> without using the deprecated image download functionality? If not,
>> can we work to "un-deprecate" the download extension point? Should I
>> work to get the code for this RBD download into the upstream repository?
> Have you considered using compute nodes configured for local storage
> but then use boot-from-volume with cinder and glance both using ceph?
> I *think* there's an optimization there such that the volume creation
> is fast. Assuming the volume creation is indeed fast, in this scenario
> you could then have a local ephemeral/swap disk for your pagefile.
> You'd still have your VM root disks on ceph though.
Understood. Booting directly from a Cinder volume would work, but as you
mention, we'd still have the VM root disks in Ceph, using the expensive
Ceph SSD IOPS for no good reason.  I'm trying to get the best of both
worlds by keeping the Glance images in Ceph and also keeping all VM I/O
local to the compute node.

-Curt



CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of 
the intended recipient(s) and contain information that may be Garmin 
confidential and/or Garmin legally privileged. If you have received this email 
in error, please notify the sender by reply email and delete the message. Any 
disclosure, copying, distribution or use of this communication (including 
attachments) by someone other than the intended recipient is prohibited. Thank 
you.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-05-31 Thread Chris Friesen

On 05/31/2018 04:14 PM, Moore, Curt wrote:


The challenge is that transferring the Glance image transfer is _glacially slow_
when using the Glance HTTP API (~30 min for a 50GB Windows image (It’s Windows,
it’s huge with all of the necessary tools installed)).  If libvirt can instead
perform an RBD export on the image using the image download functionality, it is
able to download the same image in ~30 sec.


This seems oddly slow.  I just downloaded a 1.6 GB image from glance in slightly 
under 10 seconds.  That would map to about 5 minutes for a 50GB image.




We could look at attaching an additional ephemeral disk to the instance and have
cloudbase-init use it as the pagefile but it appears that if libvirt is using
rbd for its images_type, _all_ disks must then come from Ceph, there is no way
at present to allow the VM image to run from Ceph and have an ephemeral disk
mapped in from node-local storage.  Even still, this would have the effect of
"wasting" Ceph IOPS for the VM disk itself which could be better used for other
purposes.

Based on what I have explained about our use case, is there a better/different
way to accomplish the same goal without using the deprecated image download
functionality?  If not, can we work to "un-deprecate" the download extension
point? Should I work to get the code for this RBD download into the upstream
repository?


Have you considered using compute nodes configured for local storage but then 
use boot-from-volume with cinder and glance both using ceph?  I *think* there's 
an optimization there such that the volume creation is fast.


Assuming the volume creation is indeed fast, in this scenario you could then 
have a local ephemeral/swap disk for your pagefile.  You'd still have your VM 
root disks on ceph though.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

2018-05-31 Thread Moore, Curt
Hello.



We recently upgraded from Liberty to Pike and looking ahead to the code in 
Queens, noticed the image download deprecation notice with instructions to post 
here if this interface was in use.  As such, I'd like to explain our use case 
and see if there is a better way of accomplishing our goal or lobby for the 
"un-deprecation" of this extension point.



As with many installations, we are using Ceph for both our Glance image store 
and VM instance disks.  In a normal workflow when both Glance and libvirt are 
configured to use Ceph, libvirt reacts to the direct_url field on the Glance 
image and performs an in-place clone of the RAW disk image from the images pool 
into the vms pool all within Ceph.  The snapshot creation process is very fast 
and is thinly provisioned as it's a COW snapshot.



This underlying workflow itself works great, the issue is with performance of 
the VM's disk within Ceph, especially as the number of nodes within the cluster 
grows.  We have found, especially with Windows VMs (largely as a result of I/O 
for the Windows pagefile), that the performance of the Ceph cluster as a whole 
takes a very large hit in keeping up with all of this I/O thrashing, especially 
when Windows is booting.  This is not the case with Linux VMs as they do not 
use swap as frequently as do Windows nodes with their pagefiles.  Windows can 
be run without a pagefile but that leads to other odditites within Windows.



I should also mention that in our case, the nodes themselves are ephemeral and 
we do not care about live migration, etc., we just want raw performance.



As an aside on our Ceph setup without getting into too many details, we have 
very fast SSD based Ceph nodes for this pool (separate crush root, SSDs for 
both OSD and journals, 2 replicas), interconnected on the same switch 
backplane, each with bonded 10GB uplinks to the switch.  Our Nova nodes are 
within the same datacenter (also have bonded 10GB uplinks to their switches) 
but are distributed across different switches.  We could move the Nova nodes to 
the same switch as the Ceph nodes but that is a larger logistical challenge to 
rearrange many servers to make space.



Back to our use case, in order to isolate this heavy I/O, a subset of our 
compute nodes have a local SSD and are set to use qcow2 images instead of rbd 
so that libvirt will pull the image down from Glance into the node's local 
image cache and run the VM from the local SSD.  This allows Windows VMs to boot 
and perform their initial cloudbase-init setup/reboot within ~20 sec vs 4-5 
min, regardless of overall Ceph cluster load.  Additionally, this prevents us 
from "wasting" IOPS and instead keep them local to the Nova node, reclaiming 
the network bandwidth and Ceph IOPS for use by Cinder volumes.  This is 
essentially the use case outlined here in the "Do designate some non-Ceph 
compute hosts with low-latency local storage" section:



https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/



The challenge is that transferring the Glance image transfer is _glacially 
slow_ when using the Glance HTTP API (~30 min for a 50GB Windows image (It's 
Windows, it's huge with all of the necessary tools installed)).  If libvirt can 
instead perform an RBD export on the image using the image download 
functionality, it is able to download the same image in ~30 sec.  We have code 
that is performing the direct download from Glance over RBD and it works great 
in our use case which is very similar to the code in this older patch:



https://review.openstack.org/#/c/44321/



We could look at attaching an additional ephemeral disk to the instance and 
have cloudbase-init use it as the pagefile but it appears that if libvirt is 
using rbd for its images_type, _all_ disks must then come from Ceph, there is 
no way at present to allow the VM image to run from Ceph and have an ephemeral 
disk mapped in from node-local storage.  Even still, this would have the effect 
of "wasting" Ceph IOPS for the VM disk itself which could be better used for 
other purposes.



Based on what I have explained about our use case, is there a better/different 
way to accomplish the same goal without using the deprecated image download 
functionality?  If not, can we work to "un-deprecate" the download extension 
point?  Should I work to get the code for this RBD download into the upstream 
repository?



Thanks,

-Curt



CONFIDENTIALITY NOTICE: This email and any attachments are for the sole use of 
the intended recipient(s) and contain information that may be Garmin 
confidential and/or Garmin legally privileged. If you have received this email 
in error, please notify the sender by reply email and delete the message. Any 
disclosure, copying, distribution or use of this communication (including 
attachments) by someone other than the intended recipient is prohibited. Thank 
you.