Re: [Openstack-operators] RAID / stripe block storage volumes

2016-03-07 Thread Joe Topjian
On Mon, Mar 7, 2016 at 12:33 AM, Tim Bell <tim.b...@cern.ch> wrote:

> From: joe <j...@topjian.net>
> Date: Monday 7 March 2016 at 07:53
> To: openstack-operators <openstack-operators@lists.openstack.org>
> Subject: Re: [Openstack-operators] RAID / stripe block storage volumes
>
> We ($work) have been researching this topic for the past few weeks and I
> wanted to give an update on what we've found.
>
> First, we've found that both Rackspace and Azure advocate the use of
> RAID'ing block storage volumes from within an instance for both performance
> and resilience [1][2][3]. I only mention this to add to the earlier Amazon
> AWS information and not to imply that more people should share this view.
>
> Second, we discovered virtio-scsi [4]. By adding the following properties
> to an image, the disks will now appear as SCSI disks, including the more
> common /dev/sdx naming:
>
> hw_disk_bus_model=virtio-scsi
> hw_scsi_model=virtio-scsi
> hw_disk_bus=scsi
>
> What's notable is that, in our testing, ZFS pools and Gluster replicas are
> more likely to see the volume disconnect/fail with virtio-scsi. mdadm has
> always been fairly dependable, so there hasn't been a change there. We're
> still testing, but virtio-scsi looks promising.
>
>
> We found significantly slower (~20%) from the virtio SCSI on bonnie++. I
> had been thinking it would be better.
>
> What were your performance experiences ?
>
> Tim
>

That's one area we're still testing. We're seeing a 15% increase in reads
for 4k - 1m blocks but anywhere from 3-20% decrease in all types of writing
activity. Something seems off... or at least that there should be a reason.


>
> 1:
> https://support.rackspace.com/how-to/configuring-a-software-raid-on-a-linux-general-purpose-cloud-server/
> 2: https://support.rackspace.com/how-to/cloud-block-storage-faq/
> 3:
> https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-configure-raid/
> 4: https://wiki.openstack.org/wiki/LibvirtVirtioScsi
>
> On Mon, Feb 8, 2016 at 7:18 PM, Joe Topjian <j...@topjian.net> wrote:
>
>> Yep. Don't get me wrong -- I agree 100% with everything you've said
>> throughout this thread. Applications that have native replication are
>> awesome. Swift is crazy awesome. :)
>>
>> I understand that some may see the use of mdadm, Cinder-assisted
>> replication, etc as supporting "pet" environments, and I agree to some
>> extent. But I do think there are applicable use-cases where those services
>> could be very helpful.
>>
>> As one example, I know of large cloud-based environments which handle
>> very large data sets and are entirely stood up through configuration
>> management systems. However, due to the sheer size of data being handled,
>> rebuilding or resyncing a portion of the environment could take hours.
>> Failing over to a replicated volume is instant.In addition, being able to
>> both stripe and replicate goes a very long way in making the most out of
>> commodity block storage environments (for example, avoiding packing
>> problems and such).
>>
>> Should these types of applications be reading / writing directly to
>> Swift, HDFS, or handling replication themselves? Sure, in a perfect world.
>> Does Gluster fill all gaps I've mentioned? Kind of.
>>
>> I guess I'm just trying to survey the options available for applications
>> and environments that would otherwise be very flexible and resilient if it
>> wasn't for their awkward use of storage. :)
>>
>> On Mon, Feb 8, 2016 at 6:18 PM, Robert Starmer <rob...@kumul.us> wrote:
>>
>>> Besides, wouldn't it be better to actually do application layer backup
>>> restore, or application level distribution for replication?  That
>>> architecture at least let's the application determine and deal with corrupt
>>> data transmission rather than the DRBD like model where you corrupt one
>>> data-set, you corrupt them all...
>>>
>>> Hence my comment about having some form of object storage (SWIFT is
>>> perhaps even a good example of this architeccture, the proxy replicates,
>>> checks MD5, etc. to verify good data, rather than just replicating blocks
>>> of data).
>>>
>>>
>>>
>>> On Mon, Feb 8, 2016 at 7:15 PM, Robert Starmer <rob...@kumul.us> wrote:
>>>
>>>> I have not run into anyone replicating volumes or creating redundancy
>>>> at the VM level (beyond, as you point out, HDFS, etc.).
>>>>
>>>> R
>>>>
>>>> On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <j...@topjian.net> wrote

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-03-07 Thread Ned Rhudy (BLOOMBERG/ 731 LEX)
Hey Saverio,

We currently implement it by setting images_type=lvm under [libvirt] in 
nova.conf on hypervisors that have the LVM+RAID0 and then providing different 
flavors (e1.* versus the default m1.* flavors) that launch instances on a host 
aggregate for the LVM-hosting hypervisors. I suspect this system is similar to 
what you use.

The advantage of it is it was very simple to implement and it guarantees that 
the volume will be on the same hypervisor as the instance. The disadvantages 
are probably things you've also experienced:

- no quota management because Nova considers it local storage (Warren Wang and 
I had complained about this in separate postings to this ML)
- can't create additional volumes on the LVM after instance launch because 
they're not managed by Cinder

Our users like it because they've figured out these LVM volumes are exempt from 
quota management, and because it's fast; our most active hypervisors on any 
given cluster are invariably the LVM ones. So far users have also gotten lucky 
with not a single RAID 0 failing in the 6 months since we've begun deploying 
this solution, so there's probably a bit of a perception gap between current 
and actual expected reliability.

I have begun thinking about ways of improving this system so as to bring these 
volumes under the control of Cinder, but have not come up with anything that I 
think would actually work. We discarded implementing iSCSI because of 
administrative overhead (who really wants to manage iSCSI?) and because it 
would negate the automatic forced locality; the whole point of the design was 
to provide maximum possible block storage speed, and if we have iSCSI traffic 
going over the storage network and competing with Ceph traffic, you get latency 
from the network, Ceph performance is degraded, and nobody's happy. I could 
possibly add cinder-volume to all the LVM hypervisors and register each one as 
a Cinder AZ, but I'm not sure if Nova would create the volume in the right AZ 
when scheduling an instance, and it would also  break the fourth wall on users 
knowing what hypervisor is hosting their instance.

From: ziopr...@gmail.com 
Subject: Re: [Openstack-operators] RAID / stripe block storage volumes

> In our environments, we offer two types of storage. Tenants can either use
> Ceph/RBD and trade speed/latency for reliability and protection against
> physical disk failures, or they can launch instances that are realized as
> LVs on an LVM VG that we create on top of a RAID 0 spanning all but the OS
> disk on the hypervisor. This lets the users elect to go all-in on speed and
[..CUT..]

Hello Ned,

how do you implement this ? What is like the user experience of having
two types of storage ?

We generally have Ceph/RBD as storage backend, however we have a use
case where we need LVM because latency is important.

To cope with our use case we have different flavors, where setting a
flavor-key to a specific flavor you can force the VM to be scheduled
to a specific host-aggregate. Then we have a host-aggregate for
hypervisors supporting the LVM storage and another host-aggregate for
hypervisors running the default Ceph/RBD backend.

However, let's say the user just creates a Cinder Volume in Horizon.
In this case the Volume is created to Ceph/RBD. Is there a solution to
support multiple storage backends at the same time and let the user
decide in Horizon which one to use ???

Thanks.

Saverio


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RAID / stripe block storage volumes

2016-03-06 Thread Tim Bell
From: joe <j...@topjian.net<mailto:j...@topjian.net>>
Date: Monday 7 March 2016 at 07:53
To: openstack-operators 
<openstack-operators@lists.openstack.org<mailto:openstack-operators@lists.openstack.org>>
Subject: Re: [Openstack-operators] RAID / stripe block storage volumes

We ($work) have been researching this topic for the past few weeks and I wanted 
to give an update on what we've found.

First, we've found that both Rackspace and Azure advocate the use of RAID'ing 
block storage volumes from within an instance for both performance and 
resilience [1][2][3]. I only mention this to add to the earlier Amazon AWS 
information and not to imply that more people should share this view.

Second, we discovered virtio-scsi [4]. By adding the following properties to an 
image, the disks will now appear as SCSI disks, including the more common 
/dev/sdx naming:

hw_disk_bus_model=virtio-scsi
hw_scsi_model=virtio-scsi
hw_disk_bus=scsi

What's notable is that, in our testing, ZFS pools and Gluster replicas are more 
likely to see the volume disconnect/fail with virtio-scsi. mdadm has always 
been fairly dependable, so there hasn't been a change there. We're still 
testing, but virtio-scsi looks promising.

We found significantly slower (~20%) from the virtio SCSI on bonnie++. I had 
been thinking it would be better.

What were your performance experiences ?

Tim

1: 
https://support.rackspace.com/how-to/configuring-a-software-raid-on-a-linux-general-purpose-cloud-server/
2: https://support.rackspace.com/how-to/cloud-block-storage-faq/
3: 
https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-configure-raid/
4: https://wiki.openstack.org/wiki/LibvirtVirtioScsi

On Mon, Feb 8, 2016 at 7:18 PM, Joe Topjian 
<j...@topjian.net<mailto:j...@topjian.net>> wrote:
Yep. Don't get me wrong -- I agree 100% with everything you've said throughout 
this thread. Applications that have native replication are awesome. Swift is 
crazy awesome. :)

I understand that some may see the use of mdadm, Cinder-assisted replication, 
etc as supporting "pet" environments, and I agree to some extent. But I do 
think there are applicable use-cases where those services could be very helpful.

As one example, I know of large cloud-based environments which handle very 
large data sets and are entirely stood up through configuration management 
systems. However, due to the sheer size of data being handled, rebuilding or 
resyncing a portion of the environment could take hours. Failing over to a 
replicated volume is instant.In addition, being able to both stripe and 
replicate goes a very long way in making the most out of commodity block 
storage environments (for example, avoiding packing problems and such).

Should these types of applications be reading / writing directly to Swift, 
HDFS, or handling replication themselves? Sure, in a perfect world. Does 
Gluster fill all gaps I've mentioned? Kind of.

I guess I'm just trying to survey the options available for applications and 
environments that would otherwise be very flexible and resilient if it wasn't 
for their awkward use of storage. :)

On Mon, Feb 8, 2016 at 6:18 PM, Robert Starmer 
<rob...@kumul.us<mailto:rob...@kumul.us>> wrote:
Besides, wouldn't it be better to actually do application layer backup restore, 
or application level distribution for replication?  That architecture at least 
let's the application determine and deal with corrupt data transmission rather 
than the DRBD like model where you corrupt one data-set, you corrupt them all...

Hence my comment about having some form of object storage (SWIFT is perhaps 
even a good example of this architeccture, the proxy replicates, checks MD5, 
etc. to verify good data, rather than just replicating blocks of data).



On Mon, Feb 8, 2016 at 7:15 PM, Robert Starmer 
<rob...@kumul.us<mailto:rob...@kumul.us>> wrote:
I have not run into anyone replicating volumes or creating redundancy at the VM 
level (beyond, as you point out, HDFS, etc.).

R

On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian 
<j...@topjian.net<mailto:j...@topjian.net>> wrote:
This is a great conversation and I really appreciate everyone's input. Though, 
I agree, we wandered off the original question and that's my fault for 
mentioning various storage backends.

For the sake of conversation, let's just say the user has no knowledge of the 
underlying storage technology. They're presented with a Block Storage service 
and the rest is up to them. What known, working options does the user have to 
build their own block storage resilience? (Ignoring "obvious" solutions where 
the application has native replication, such as Galera, elasticsearch, etc)

I have seen references to Cinder supporting replication, but I'm not able to 
find a lot of information about it. The support matrix[1] lists very few 
drivers that actually implement replication -- is this true

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-03-06 Thread Joe Topjian
f replication docs that I just haven't been able to find?
>>>>
>>>> Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One
>>>> might interpret that to mean mdadm is a supported solution within EC2 based
>>>> instances.
>>>>
>>>> There are also references to DRBD and EC2, though I could not find
>>>> anything as "official" as mdadm and EC2.
>>>>
>>>> Does anyone have experience (or know users) doing either? (specifically
>>>> with libvirt/KVM, but I'd be curious to know in general)
>>>>
>>>> Or is it more advisable to create multiple instances where data is
>>>> replicated instance-to-instance rather than a single instance with multiple
>>>> volumes and have data replicated volume-to-volume (by way of a single
>>>> instance)? And if so, why? Is a lack of stable volume-to-volume replication
>>>> a limitation of certain hypervisors?
>>>>
>>>> Or has this area just not been explored in depth within OpenStack
>>>> environments yet?
>>>>
>>>> 1: https://wiki.openstack.org/wiki/CinderSupportMatrix
>>>> 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html
>>>>
>>>>
>>>> On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <rob...@kumul.us> wrote:
>>>>
>>>>> I'm not against Ceph, but even 2 machines (and really 2 machines with
>>>>> enough storage to be meaningful, e.g. not the all blade environments I've
>>>>> built some o7k  systems on) may not be available for storage, so there are
>>>>> cases where that's not necessarily the solution. I built resiliency in one
>>>>> environment with a 2 node controller/Glance/db system with Gluster, which
>>>>> enabled enough middleware resiliency to meet the customers recovery
>>>>> expectations. Regardless, even with a cattle application model, the
>>>>> infrastructure middleware still needs to be able to provide some level of
>>>>> resiliency.
>>>>>
>>>>> But we've kind-of wandered off of the original question. I think that
>>>>> to bring this back on topic, I think users can build resilience in their
>>>>> own storage construction, but I still think there are use cases where the
>>>>> middleware either needs to use it's own resiliency layer, and/or may end 
>>>>> up
>>>>> providing it for the end user.
>>>>>
>>>>> R
>>>>>
>>>>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin@pnnl.gov>
>>>>> wrote:
>>>>>
>>>>>> We've used ceph to address the storage requirement in small clouds
>>>>>> pretty well. it works pretty well with only two storage nodes with
>>>>>> replication set to 2, and because of the radosgw, you can share your 
>>>>>> small
>>>>>> amount of storage between the object store and the block store avoiding 
>>>>>> the
>>>>>> need to overprovision swift-only or cinder-only to handle usage unknowns.
>>>>>> Its just one pool of storage.
>>>>>>
>>>>>> Your right, using lvm is like telling your users, don't do pets, but
>>>>>> then having pets at the heart of your system. when you loose one, you 
>>>>>> loose
>>>>>> a lot. With a small ceph, you can take out one of the nodes, burn it to 
>>>>>> the
>>>>>> ground and put it back, and it just works. No pets.
>>>>>>
>>>>>> Do consider ceph for the small use case.
>>>>>>
>>>>>> Thanks,
>>>>>> Kevin
>>>>>>
>>>>>> --
>>>>>> *From:* Robert Starmer [rob...@kumul.us]
>>>>>> *Sent:* Monday, February 08, 2016 1:30 PM
>>>>>> *To:* Ned Rhudy
>>>>>> *Cc:* OpenStack Operators
>>>>>>
>>>>>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage
>>>>>> volumes
>>>>>>
>>>>>> Ned's model is the model I meant by "multiple underlying storage
>>>>>> services".  Most of the systems I've built are LV/LVM only,  a few added
>>>>>> Ceph as an alternative/live-migration option, and one where we used 
>>>&g

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-03-06 Thread Saverio Proto
> In our environments, we offer two types of storage. Tenants can either use
> Ceph/RBD and trade speed/latency for reliability and protection against
> physical disk failures, or they can launch instances that are realized as
> LVs on an LVM VG that we create on top of a RAID 0 spanning all but the OS
> disk on the hypervisor. This lets the users elect to go all-in on speed and
[..CUT..]

Hello Ned,

how do you implement this ? What is like the user experience of having
two types of storage ?

We generally have Ceph/RBD as storage backend, however we have a use
case where we need LVM because latency is important.

To cope with our use case we have different flavors, where setting a
flavor-key to a specific flavor you can force the VM to be scheduled
to a specific host-aggregate. Then we have a host-aggregate for
hypervisors supporting the LVM storage and another host-aggregate for
hypervisors running the default Ceph/RBD backend.

However, let's say the user just creates a Cinder Volume in Horizon.
In this case the Volume is created to Ceph/RBD. Is there a solution to
support multiple storage backends at the same time and let the user
decide in Horizon which one to use ???

Thanks.

Saverio

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Joe Topjian
t some o7k  systems on) may not be available for storage, so there are
>>>> cases where that's not necessarily the solution. I built resiliency in one
>>>> environment with a 2 node controller/Glance/db system with Gluster, which
>>>> enabled enough middleware resiliency to meet the customers recovery
>>>> expectations. Regardless, even with a cattle application model, the
>>>> infrastructure middleware still needs to be able to provide some level of
>>>> resiliency.
>>>>
>>>> But we've kind-of wandered off of the original question. I think that
>>>> to bring this back on topic, I think users can build resilience in their
>>>> own storage construction, but I still think there are use cases where the
>>>> middleware either needs to use it's own resiliency layer, and/or may end up
>>>> providing it for the end user.
>>>>
>>>> R
>>>>
>>>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin@pnnl.gov>
>>>> wrote:
>>>>
>>>>> We've used ceph to address the storage requirement in small clouds
>>>>> pretty well. it works pretty well with only two storage nodes with
>>>>> replication set to 2, and because of the radosgw, you can share your small
>>>>> amount of storage between the object store and the block store avoiding 
>>>>> the
>>>>> need to overprovision swift-only or cinder-only to handle usage unknowns.
>>>>> Its just one pool of storage.
>>>>>
>>>>> Your right, using lvm is like telling your users, don't do pets, but
>>>>> then having pets at the heart of your system. when you loose one, you 
>>>>> loose
>>>>> a lot. With a small ceph, you can take out one of the nodes, burn it to 
>>>>> the
>>>>> ground and put it back, and it just works. No pets.
>>>>>
>>>>> Do consider ceph for the small use case.
>>>>>
>>>>> Thanks,
>>>>> Kevin
>>>>>
>>>>> --
>>>>> *From:* Robert Starmer [rob...@kumul.us]
>>>>> *Sent:* Monday, February 08, 2016 1:30 PM
>>>>> *To:* Ned Rhudy
>>>>> *Cc:* OpenStack Operators
>>>>>
>>>>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage
>>>>> volumes
>>>>>
>>>>> Ned's model is the model I meant by "multiple underlying storage
>>>>> services".  Most of the systems I've built are LV/LVM only,  a few added
>>>>> Ceph as an alternative/live-migration option, and one where we used 
>>>>> Gluster
>>>>> due to size.  Note that the environments I have worked with in general are
>>>>> small (~20 compute), so huge Ceph environments aren't common.  I am also
>>>>> working on a project where the storage backend is entirely NFS...
>>>>>
>>>>> And I think users are more and more educated to assume that there is
>>>>> nothing guaranteed.  There is the realization, at least for a good set of
>>>>> the customers I've worked with (and I try to educate the non-believers),
>>>>> that the way you get best effect from a system like OpenStack is to
>>>>> consider everything disposable. The one gap I've seen is that there are
>>>>> plenty of folks who don't deploy SWIFT, and without some form of object
>>>>> store, there's still the question of where you place your datasets so that
>>>>> they can be quickly recovered (and how do you keep them up to date if you
>>>>> do have one).  With VMs, there's the concept that you can recover quickly
>>>>> because the "dataset" e.g. your OS, is already there for you, and in 
>>>>> plenty
>>>>> of small environments, that's only as true as the glance repository (guess
>>>>> what's usually backing that when there's no SWIFT around...).
>>>>>
>>>>> So I see the issue as a holistic one. How do you show operators/users
>>>>> that they should consider everything disposable if we only look at the
>>>>> current running instance as the "thing"   Somewhere you still likely need
>>>>> some form of distributed resilience (and yes, I can see using the
>>>>> distributed Canonical, Centos, RedHat, Fedora, Debian, etc. mirrors as 
>>>>> your
>>>>>

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Ned Rhudy (BLOOMBERG/ 731 LEX)
In our environments, we offer two types of storage. Tenants can either use 
Ceph/RBD and trade speed/latency for reliability and protection against 
physical disk failures, or they can launch instances that are realized as LVs 
on an LVM VG that we create on top of a RAID 0 spanning all but the OS disk on 
the hypervisor. This lets the users elect to go all-in on speed and sacrifice 
reliability for applications where replication/HA is handled at the app level, 
if the data on the instance is sourced from elsewhere, or if they just don't 
care much about the data.

There are some further changes to our approach that we would like to make down 
the road, but in general our users seem to like the current system and being 
able to forgo reliability or speed as their circumstances demand.

From: j...@topjian.net 
Subject: Re: [Openstack-operators] RAID / stripe block storage volumes

Hi Robert,

Can you elaborate on "multiple underlying storage services"?

The reason I asked the initial question is because historically we've made our 
block storage service resilient to failure. Historically we also made our 
compute environment resilient to failure, too, but over time, we've seen users 
become more educated to cope with compute failure. As a result, we've been able 
to become more lenient with regard to building resilient compute environments.

We've been discussing how possible it would be to translate that same idea to 
block storage. Rather than have a large HA storage cluster (whether Ceph, 
Gluster, NetApp, etc), is it possible to offer simple single LVM volume servers 
and push the failure handling on to the user? 

Of course, this doesn't work for all types of use cases and environments. We 
still have projects which require the cloud to own most responsibility for 
failure than the users. 

But for environments were we offer general purpose / best effort compute and 
storage, what methods are available to help the user be resilient to block 
storage failures?

Joe

On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer <rob...@kumul.us> wrote:

I've always recommended providing multiple underlying storage services to 
provide this rather than adding the overhead to the VM.  So, not in any of my 
systems or any I've worked with.

R


On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian <j...@topjian.net> wrote:

Hello,

Does anyone have users RAID'ing or striping multiple block storage volumes from 
within an instance?

If so, what was the experience? Good, bad, possible but with caveats?

Thanks,
Joe 
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


 ___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
  

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Robert Starmer
Ned's model is the model I meant by "multiple underlying storage
services".  Most of the systems I've built are LV/LVM only,  a few added
Ceph as an alternative/live-migration option, and one where we used Gluster
due to size.  Note that the environments I have worked with in general are
small (~20 compute), so huge Ceph environments aren't common.  I am also
working on a project where the storage backend is entirely NFS...

And I think users are more and more educated to assume that there is
nothing guaranteed.  There is the realization, at least for a good set of
the customers I've worked with (and I try to educate the non-believers),
that the way you get best effect from a system like OpenStack is to
consider everything disposable. The one gap I've seen is that there are
plenty of folks who don't deploy SWIFT, and without some form of object
store, there's still the question of where you place your datasets so that
they can be quickly recovered (and how do you keep them up to date if you
do have one).  With VMs, there's the concept that you can recover quickly
because the "dataset" e.g. your OS, is already there for you, and in plenty
of small environments, that's only as true as the glance repository (guess
what's usually backing that when there's no SWIFT around...).

So I see the issue as a holistic one. How do you show operators/users that
they should consider everything disposable if we only look at the current
running instance as the "thing"   Somewhere you still likely need some form
of distributed resilience (and yes, I can see using the distributed
Canonical, Centos, RedHat, Fedora, Debian, etc. mirrors as your distributed
Image backup but what about the database content, etc.).

Robert

On Mon, Feb 8, 2016 at 1:44 PM, Ned Rhudy (BLOOMBERG/ 731 LEX) <
erh...@bloomberg.net> wrote:

> In our environments, we offer two types of storage. Tenants can either use
> Ceph/RBD and trade speed/latency for reliability and protection against
> physical disk failures, or they can launch instances that are realized as
> LVs on an LVM VG that we create on top of a RAID 0 spanning all but the OS
> disk on the hypervisor. This lets the users elect to go all-in on speed and
> sacrifice reliability for applications where replication/HA is handled at
> the app level, if the data on the instance is sourced from elsewhere, or if
> they just don't care much about the data.
>
> There are some further changes to our approach that we would like to make
> down the road, but in general our users seem to like the current system and
> being able to forgo reliability or speed as their circumstances demand.
>
> From: j...@topjian.net
> Subject: Re: [Openstack-operators] RAID / stripe block storage volumes
>
> Hi Robert,
>
> Can you elaborate on "multiple underlying storage services"?
>
> The reason I asked the initial question is because historically we've made
> our block storage service resilient to failure. Historically we also made
> our compute environment resilient to failure, too, but over time, we've
> seen users become more educated to cope with compute failure. As a result,
> we've been able to become more lenient with regard to building resilient
> compute environments.
>
> We've been discussing how possible it would be to translate that same idea
> to block storage. Rather than have a large HA storage cluster (whether
> Ceph, Gluster, NetApp, etc), is it possible to offer simple single LVM
> volume servers and push the failure handling on to the user?
>
> Of course, this doesn't work for all types of use cases and environments.
> We still have projects which require the cloud to own most responsibility
> for failure than the users.
>
> But for environments were we offer general purpose / best effort compute
> and storage, what methods are available to help the user be resilient to
> block storage failures?
>
> Joe
>
> On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer <rob...@kumul.us> wrote:
>
>> I've always recommended providing multiple underlying storage services to
>> provide this rather than adding the overhead to the VM.  So, not in any of
>> my systems or any I've worked with.
>>
>> R
>>
>>
>>
>> On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian <j...@topjian.net> wrote:
>>
>>> Hello,
>>>
>>> Does anyone have users RAID'ing or striping multiple block storage
>>> volumes from within an instance?
>>>
>>> If so, what was the experience? Good, bad, possible but with caveats?
>>>
>>> Thanks,
>>> Joe
>>>
>>> ___
>>> OpenStack-operators mailing list
>>> OpenStack-operators@lists.openstack.org
>>> http://

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Joe Topjian
Hi Robert,

Can you elaborate on "multiple underlying storage services"?

The reason I asked the initial question is because historically we've made
our block storage service resilient to failure. Historically we also made
our compute environment resilient to failure, too, but over time, we've
seen users become more educated to cope with compute failure. As a result,
we've been able to become more lenient with regard to building resilient
compute environments.

We've been discussing how possible it would be to translate that same idea
to block storage. Rather than have a large HA storage cluster (whether
Ceph, Gluster, NetApp, etc), is it possible to offer simple single LVM
volume servers and push the failure handling on to the user?

Of course, this doesn't work for all types of use cases and environments.
We still have projects which require the cloud to own most responsibility
for failure than the users.

But for environments were we offer general purpose / best effort compute
and storage, what methods are available to help the user be resilient to
block storage failures?

Joe

On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer  wrote:

> I've always recommended providing multiple underlying storage services to
> provide this rather than adding the overhead to the VM.  So, not in any of
> my systems or any I've worked with.
>
> R
>
>
>
> On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian  wrote:
>
>> Hello,
>>
>> Does anyone have users RAID'ing or striping multiple block storage
>> volumes from within an instance?
>>
>> If so, what was the experience? Good, bad, possible but with caveats?
>>
>> Thanks,
>> Joe
>>
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Robert Starmer
I've always recommended providing multiple underlying storage services to
provide this rather than adding the overhead to the VM.  So, not in any of
my systems or any I've worked with.

R



On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian  wrote:

> Hello,
>
> Does anyone have users RAID'ing or striping multiple block storage volumes
> from within an instance?
>
> If so, what was the experience? Good, bad, possible but with caveats?
>
> Thanks,
> Joe
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Joe Topjian
This is a great conversation and I really appreciate everyone's input.
Though, I agree, we wandered off the original question and that's my fault
for mentioning various storage backends.

For the sake of conversation, let's just say the user has no knowledge of
the underlying storage technology. They're presented with a Block Storage
service and the rest is up to them. What known, working options does the
user have to build their own block storage resilience? (Ignoring "obvious"
solutions where the application has native replication, such as Galera,
elasticsearch, etc)

I have seen references to Cinder supporting replication, but I'm not able
to find a lot of information about it. The support matrix[1] lists very few
drivers that actually implement replication -- is this true or is there a
trove of replication docs that I just haven't been able to find?

Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One
might interpret that to mean mdadm is a supported solution within EC2 based
instances.

There are also references to DRBD and EC2, though I could not find anything
as "official" as mdadm and EC2.

Does anyone have experience (or know users) doing either? (specifically
with libvirt/KVM, but I'd be curious to know in general)

Or is it more advisable to create multiple instances where data is
replicated instance-to-instance rather than a single instance with multiple
volumes and have data replicated volume-to-volume (by way of a single
instance)? And if so, why? Is a lack of stable volume-to-volume replication
a limitation of certain hypervisors?

Or has this area just not been explored in depth within OpenStack
environments yet?

1: https://wiki.openstack.org/wiki/CinderSupportMatrix
2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html


On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <rob...@kumul.us> wrote:

> I'm not against Ceph, but even 2 machines (and really 2 machines with
> enough storage to be meaningful, e.g. not the all blade environments I've
> built some o7k  systems on) may not be available for storage, so there are
> cases where that's not necessarily the solution. I built resiliency in one
> environment with a 2 node controller/Glance/db system with Gluster, which
> enabled enough middleware resiliency to meet the customers recovery
> expectations. Regardless, even with a cattle application model, the
> infrastructure middleware still needs to be able to provide some level of
> resiliency.
>
> But we've kind-of wandered off of the original question. I think that to
> bring this back on topic, I think users can build resilience in their own
> storage construction, but I still think there are use cases where the
> middleware either needs to use it's own resiliency layer, and/or may end up
> providing it for the end user.
>
> R
>
> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin@pnnl.gov> wrote:
>
>> We've used ceph to address the storage requirement in small clouds pretty
>> well. it works pretty well with only two storage nodes with replication set
>> to 2, and because of the radosgw, you can share your small amount of
>> storage between the object store and the block store avoiding the need to
>> overprovision swift-only or cinder-only to handle usage unknowns. Its just
>> one pool of storage.
>>
>> Your right, using lvm is like telling your users, don't do pets, but then
>> having pets at the heart of your system. when you loose one, you loose a
>> lot. With a small ceph, you can take out one of the nodes, burn it to the
>> ground and put it back, and it just works. No pets.
>>
>> Do consider ceph for the small use case.
>>
>> Thanks,
>> Kevin
>>
>> ----------
>> *From:* Robert Starmer [rob...@kumul.us]
>> *Sent:* Monday, February 08, 2016 1:30 PM
>> *To:* Ned Rhudy
>> *Cc:* OpenStack Operators
>>
>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage volumes
>>
>> Ned's model is the model I meant by "multiple underlying storage
>> services".  Most of the systems I've built are LV/LVM only,  a few added
>> Ceph as an alternative/live-migration option, and one where we used Gluster
>> due to size.  Note that the environments I have worked with in general are
>> small (~20 compute), so huge Ceph environments aren't common.  I am also
>> working on a project where the storage backend is entirely NFS...
>>
>> And I think users are more and more educated to assume that there is
>> nothing guaranteed.  There is the realization, at least for a good set of
>> the customers I've worked with (and I try to educate the non-believers),
>> that the way you get best effect from a system like OpenStack is to
&

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Robert Starmer
I have not run into anyone replicating volumes or creating redundancy at
the VM level (beyond, as you point out, HDFS, etc.).

R

On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <j...@topjian.net> wrote:

> This is a great conversation and I really appreciate everyone's input.
> Though, I agree, we wandered off the original question and that's my fault
> for mentioning various storage backends.
>
> For the sake of conversation, let's just say the user has no knowledge of
> the underlying storage technology. They're presented with a Block Storage
> service and the rest is up to them. What known, working options does the
> user have to build their own block storage resilience? (Ignoring "obvious"
> solutions where the application has native replication, such as Galera,
> elasticsearch, etc)
>
> I have seen references to Cinder supporting replication, but I'm not able
> to find a lot of information about it. The support matrix[1] lists very few
> drivers that actually implement replication -- is this true or is there a
> trove of replication docs that I just haven't been able to find?
>
> Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One
> might interpret that to mean mdadm is a supported solution within EC2 based
> instances.
>
> There are also references to DRBD and EC2, though I could not find
> anything as "official" as mdadm and EC2.
>
> Does anyone have experience (or know users) doing either? (specifically
> with libvirt/KVM, but I'd be curious to know in general)
>
> Or is it more advisable to create multiple instances where data is
> replicated instance-to-instance rather than a single instance with multiple
> volumes and have data replicated volume-to-volume (by way of a single
> instance)? And if so, why? Is a lack of stable volume-to-volume replication
> a limitation of certain hypervisors?
>
> Or has this area just not been explored in depth within OpenStack
> environments yet?
>
> 1: https://wiki.openstack.org/wiki/CinderSupportMatrix
> 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html
>
>
> On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <rob...@kumul.us> wrote:
>
>> I'm not against Ceph, but even 2 machines (and really 2 machines with
>> enough storage to be meaningful, e.g. not the all blade environments I've
>> built some o7k  systems on) may not be available for storage, so there are
>> cases where that's not necessarily the solution. I built resiliency in one
>> environment with a 2 node controller/Glance/db system with Gluster, which
>> enabled enough middleware resiliency to meet the customers recovery
>> expectations. Regardless, even with a cattle application model, the
>> infrastructure middleware still needs to be able to provide some level of
>> resiliency.
>>
>> But we've kind-of wandered off of the original question. I think that to
>> bring this back on topic, I think users can build resilience in their own
>> storage construction, but I still think there are use cases where the
>> middleware either needs to use it's own resiliency layer, and/or may end up
>> providing it for the end user.
>>
>> R
>>
>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin@pnnl.gov> wrote:
>>
>>> We've used ceph to address the storage requirement in small clouds
>>> pretty well. it works pretty well with only two storage nodes with
>>> replication set to 2, and because of the radosgw, you can share your small
>>> amount of storage between the object store and the block store avoiding the
>>> need to overprovision swift-only or cinder-only to handle usage unknowns.
>>> Its just one pool of storage.
>>>
>>> Your right, using lvm is like telling your users, don't do pets, but
>>> then having pets at the heart of your system. when you loose one, you loose
>>> a lot. With a small ceph, you can take out one of the nodes, burn it to the
>>> ground and put it back, and it just works. No pets.
>>>
>>> Do consider ceph for the small use case.
>>>
>>> Thanks,
>>> Kevin
>>>
>>> --
>>> *From:* Robert Starmer [rob...@kumul.us]
>>> *Sent:* Monday, February 08, 2016 1:30 PM
>>> *To:* Ned Rhudy
>>> *Cc:* OpenStack Operators
>>>
>>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage volumes
>>>
>>> Ned's model is the model I meant by "multiple underlying storage
>>> services".  Most of the systems I've built are LV/LVM only,  a few added
>>> Ceph as an alternative/live-migration option, and one wh

Re: [Openstack-operators] RAID / stripe block storage volumes

2016-02-08 Thread Robert Starmer
Besides, wouldn't it be better to actually do application layer backup
restore, or application level distribution for replication?  That
architecture at least let's the application determine and deal with corrupt
data transmission rather than the DRBD like model where you corrupt one
data-set, you corrupt them all...

Hence my comment about having some form of object storage (SWIFT is perhaps
even a good example of this architeccture, the proxy replicates, checks
MD5, etc. to verify good data, rather than just replicating blocks of data).



On Mon, Feb 8, 2016 at 7:15 PM, Robert Starmer <rob...@kumul.us> wrote:

> I have not run into anyone replicating volumes or creating redundancy at
> the VM level (beyond, as you point out, HDFS, etc.).
>
> R
>
> On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <j...@topjian.net> wrote:
>
>> This is a great conversation and I really appreciate everyone's input.
>> Though, I agree, we wandered off the original question and that's my fault
>> for mentioning various storage backends.
>>
>> For the sake of conversation, let's just say the user has no knowledge of
>> the underlying storage technology. They're presented with a Block Storage
>> service and the rest is up to them. What known, working options does the
>> user have to build their own block storage resilience? (Ignoring "obvious"
>> solutions where the application has native replication, such as Galera,
>> elasticsearch, etc)
>>
>> I have seen references to Cinder supporting replication, but I'm not able
>> to find a lot of information about it. The support matrix[1] lists very few
>> drivers that actually implement replication -- is this true or is there a
>> trove of replication docs that I just haven't been able to find?
>>
>> Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One
>> might interpret that to mean mdadm is a supported solution within EC2 based
>> instances.
>>
>> There are also references to DRBD and EC2, though I could not find
>> anything as "official" as mdadm and EC2.
>>
>> Does anyone have experience (or know users) doing either? (specifically
>> with libvirt/KVM, but I'd be curious to know in general)
>>
>> Or is it more advisable to create multiple instances where data is
>> replicated instance-to-instance rather than a single instance with multiple
>> volumes and have data replicated volume-to-volume (by way of a single
>> instance)? And if so, why? Is a lack of stable volume-to-volume replication
>> a limitation of certain hypervisors?
>>
>> Or has this area just not been explored in depth within OpenStack
>> environments yet?
>>
>> 1: https://wiki.openstack.org/wiki/CinderSupportMatrix
>> 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html
>>
>>
>> On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <rob...@kumul.us> wrote:
>>
>>> I'm not against Ceph, but even 2 machines (and really 2 machines with
>>> enough storage to be meaningful, e.g. not the all blade environments I've
>>> built some o7k  systems on) may not be available for storage, so there are
>>> cases where that's not necessarily the solution. I built resiliency in one
>>> environment with a 2 node controller/Glance/db system with Gluster, which
>>> enabled enough middleware resiliency to meet the customers recovery
>>> expectations. Regardless, even with a cattle application model, the
>>> infrastructure middleware still needs to be able to provide some level of
>>> resiliency.
>>>
>>> But we've kind-of wandered off of the original question. I think that to
>>> bring this back on topic, I think users can build resilience in their own
>>> storage construction, but I still think there are use cases where the
>>> middleware either needs to use it's own resiliency layer, and/or may end up
>>> providing it for the end user.
>>>
>>> R
>>>
>>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin@pnnl.gov> wrote:
>>>
>>>> We've used ceph to address the storage requirement in small clouds
>>>> pretty well. it works pretty well with only two storage nodes with
>>>> replication set to 2, and because of the radosgw, you can share your small
>>>> amount of storage between the object store and the block store avoiding the
>>>> need to overprovision swift-only or cinder-only to handle usage unknowns.
>>>> Its just one pool of storage.
>>>>
>>>> Your right, using lvm is like telling your users, don't do pets, but
>>>> the