Yep. Don't get me wrong -- I agree 100% with everything you've said throughout this thread. Applications that have native replication are awesome. Swift is crazy awesome. :)
I understand that some may see the use of mdadm, Cinder-assisted replication, etc as supporting "pet" environments, and I agree to some extent. But I do think there are applicable use-cases where those services could be very helpful. As one example, I know of large cloud-based environments which handle very large data sets and are entirely stood up through configuration management systems. However, due to the sheer size of data being handled, rebuilding or resyncing a portion of the environment could take hours. Failing over to a replicated volume is instant.In addition, being able to both stripe and replicate goes a very long way in making the most out of commodity block storage environments (for example, avoiding packing problems and such). Should these types of applications be reading / writing directly to Swift, HDFS, or handling replication themselves? Sure, in a perfect world. Does Gluster fill all gaps I've mentioned? Kind of. I guess I'm just trying to survey the options available for applications and environments that would otherwise be very flexible and resilient if it wasn't for their awkward use of storage. :) On Mon, Feb 8, 2016 at 6:18 PM, Robert Starmer <rob...@kumul.us> wrote: > Besides, wouldn't it be better to actually do application layer backup > restore, or application level distribution for replication? That > architecture at least let's the application determine and deal with corrupt > data transmission rather than the DRBD like model where you corrupt one > data-set, you corrupt them all... > > Hence my comment about having some form of object storage (SWIFT is > perhaps even a good example of this architeccture, the proxy replicates, > checks MD5, etc. to verify good data, rather than just replicating blocks > of data). > > > > On Mon, Feb 8, 2016 at 7:15 PM, Robert Starmer <rob...@kumul.us> wrote: > >> I have not run into anyone replicating volumes or creating redundancy at >> the VM level (beyond, as you point out, HDFS, etc.). >> >> R >> >> On Mon, Feb 8, 2016 at 6:54 PM, Joe Topjian <j...@topjian.net> wrote: >> >>> This is a great conversation and I really appreciate everyone's input. >>> Though, I agree, we wandered off the original question and that's my fault >>> for mentioning various storage backends. >>> >>> For the sake of conversation, let's just say the user has no knowledge >>> of the underlying storage technology. They're presented with a Block >>> Storage service and the rest is up to them. What known, working options >>> does the user have to build their own block storage resilience? (Ignoring >>> "obvious" solutions where the application has native replication, such as >>> Galera, elasticsearch, etc) >>> >>> I have seen references to Cinder supporting replication, but I'm not >>> able to find a lot of information about it. The support matrix[1] lists >>> very few drivers that actually implement replication -- is this true or is >>> there a trove of replication docs that I just haven't been able to find? >>> >>> Amazon AWS publishes instructions on how to use mdadm with EBS[2]. One >>> might interpret that to mean mdadm is a supported solution within EC2 based >>> instances. >>> >>> There are also references to DRBD and EC2, though I could not find >>> anything as "official" as mdadm and EC2. >>> >>> Does anyone have experience (or know users) doing either? (specifically >>> with libvirt/KVM, but I'd be curious to know in general) >>> >>> Or is it more advisable to create multiple instances where data is >>> replicated instance-to-instance rather than a single instance with multiple >>> volumes and have data replicated volume-to-volume (by way of a single >>> instance)? And if so, why? Is a lack of stable volume-to-volume replication >>> a limitation of certain hypervisors? >>> >>> Or has this area just not been explored in depth within OpenStack >>> environments yet? >>> >>> 1: https://wiki.openstack.org/wiki/CinderSupportMatrix >>> 2: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html >>> >>> >>> On Mon, Feb 8, 2016 at 4:10 PM, Robert Starmer <rob...@kumul.us> wrote: >>> >>>> I'm not against Ceph, but even 2 machines (and really 2 machines with >>>> enough storage to be meaningful, e.g. not the all blade environments I've >>>> built some o7k systems on) may not be available for storage, so there are >>>> cases where that's not necessarily the solution. I built resiliency in one >>>> environment with a 2 node controller/Glance/db system with Gluster, which >>>> enabled enough middleware resiliency to meet the customers recovery >>>> expectations. Regardless, even with a cattle application model, the >>>> infrastructure middleware still needs to be able to provide some level of >>>> resiliency. >>>> >>>> But we've kind-of wandered off of the original question. I think that >>>> to bring this back on topic, I think users can build resilience in their >>>> own storage construction, but I still think there are use cases where the >>>> middleware either needs to use it's own resiliency layer, and/or may end up >>>> providing it for the end user. >>>> >>>> R >>>> >>>> On Mon, Feb 8, 2016 at 3:51 PM, Fox, Kevin M <kevin....@pnnl.gov> >>>> wrote: >>>> >>>>> We've used ceph to address the storage requirement in small clouds >>>>> pretty well. it works pretty well with only two storage nodes with >>>>> replication set to 2, and because of the radosgw, you can share your small >>>>> amount of storage between the object store and the block store avoiding >>>>> the >>>>> need to overprovision swift-only or cinder-only to handle usage unknowns. >>>>> Its just one pool of storage. >>>>> >>>>> Your right, using lvm is like telling your users, don't do pets, but >>>>> then having pets at the heart of your system. when you loose one, you >>>>> loose >>>>> a lot. With a small ceph, you can take out one of the nodes, burn it to >>>>> the >>>>> ground and put it back, and it just works. No pets. >>>>> >>>>> Do consider ceph for the small use case. >>>>> >>>>> Thanks, >>>>> Kevin >>>>> >>>>> ------------------------------ >>>>> *From:* Robert Starmer [rob...@kumul.us] >>>>> *Sent:* Monday, February 08, 2016 1:30 PM >>>>> *To:* Ned Rhudy >>>>> *Cc:* OpenStack Operators >>>>> >>>>> *Subject:* Re: [Openstack-operators] RAID / stripe block storage >>>>> volumes >>>>> >>>>> Ned's model is the model I meant by "multiple underlying storage >>>>> services". Most of the systems I've built are LV/LVM only, a few added >>>>> Ceph as an alternative/live-migration option, and one where we used >>>>> Gluster >>>>> due to size. Note that the environments I have worked with in general are >>>>> small (~20 compute), so huge Ceph environments aren't common. I am also >>>>> working on a project where the storage backend is entirely NFS... >>>>> >>>>> And I think users are more and more educated to assume that there is >>>>> nothing guaranteed. There is the realization, at least for a good set of >>>>> the customers I've worked with (and I try to educate the non-believers), >>>>> that the way you get best effect from a system like OpenStack is to >>>>> consider everything disposable. The one gap I've seen is that there are >>>>> plenty of folks who don't deploy SWIFT, and without some form of object >>>>> store, there's still the question of where you place your datasets so that >>>>> they can be quickly recovered (and how do you keep them up to date if you >>>>> do have one). With VMs, there's the concept that you can recover quickly >>>>> because the "dataset" e.g. your OS, is already there for you, and in >>>>> plenty >>>>> of small environments, that's only as true as the glance repository (guess >>>>> what's usually backing that when there's no SWIFT around...). >>>>> >>>>> So I see the issue as a holistic one. How do you show operators/users >>>>> that they should consider everything disposable if we only look at the >>>>> current running instance as the "thing" Somewhere you still likely need >>>>> some form of distributed resilience (and yes, I can see using the >>>>> distributed Canonical, Centos, RedHat, Fedora, Debian, etc. mirrors as >>>>> your >>>>> distributed Image backup but what about the database content, etc.). >>>>> >>>>> Robert >>>>> >>>>> On Mon, Feb 8, 2016 at 1:44 PM, Ned Rhudy (BLOOMBERG/ 731 LEX) < >>>>> erh...@bloomberg.net> wrote: >>>>> >>>>>> In our environments, we offer two types of storage. Tenants can >>>>>> either use Ceph/RBD and trade speed/latency for reliability and >>>>>> protection >>>>>> against physical disk failures, or they can launch instances that are >>>>>> realized as LVs on an LVM VG that we create on top of a RAID 0 spanning >>>>>> all >>>>>> but the OS disk on the hypervisor. This lets the users elect to go all-in >>>>>> on speed and sacrifice reliability for applications where replication/HA >>>>>> is >>>>>> handled at the app level, if the data on the instance is sourced from >>>>>> elsewhere, or if they just don't care much about the data. >>>>>> >>>>>> There are some further changes to our approach that we would like to >>>>>> make down the road, but in general our users seem to like the current >>>>>> system and being able to forgo reliability or speed as their >>>>>> circumstances >>>>>> demand. >>>>>> >>>>>> From: j...@topjian.net >>>>>> Subject: Re: [Openstack-operators] RAID / stripe block storage volumes >>>>>> >>>>>> Hi Robert, >>>>>> >>>>>> Can you elaborate on "multiple underlying storage services"? >>>>>> >>>>>> The reason I asked the initial question is because historically we've >>>>>> made our block storage service resilient to failure. Historically we also >>>>>> made our compute environment resilient to failure, too, but over time, >>>>>> we've seen users become more educated to cope with compute failure. As a >>>>>> result, we've been able to become more lenient with regard to building >>>>>> resilient compute environments. >>>>>> >>>>>> We've been discussing how possible it would be to translate that same >>>>>> idea to block storage. Rather than have a large HA storage cluster >>>>>> (whether >>>>>> Ceph, Gluster, NetApp, etc), is it possible to offer simple single LVM >>>>>> volume servers and push the failure handling on to the user? >>>>>> >>>>>> Of course, this doesn't work for all types of use cases and >>>>>> environments. We still have projects which require the cloud to own most >>>>>> responsibility for failure than the users. >>>>>> >>>>>> But for environments were we offer general purpose / best effort >>>>>> compute and storage, what methods are available to help the user be >>>>>> resilient to block storage failures? >>>>>> >>>>>> Joe >>>>>> >>>>>> On Mon, Feb 8, 2016 at 12:09 PM, Robert Starmer <rob...@kumul.us> >>>>>> wrote: >>>>>> >>>>>>> I've always recommended providing multiple underlying storage >>>>>>> services to provide this rather than adding the overhead to the VM. So, >>>>>>> not in any of my systems or any I've worked with. >>>>>>> >>>>>>> R >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 5, 2016 at 5:56 PM, Joe Topjian <j...@topjian.net> wrote: >>>>>>> >>>>>>>> Hello, >>>>>>>> >>>>>>>> Does anyone have users RAID'ing or striping multiple block storage >>>>>>>> volumes from within an instance? >>>>>>>> >>>>>>>> If so, what was the experience? Good, bad, possible but with >>>>>>>> caveats? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Joe >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> OpenStack-operators mailing list >>>>>>>> OpenStack-operators@lists.openstack.org >>>>>>>> >>>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>>>> >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-operators mailing >>>>>> listOpenStack-operators@lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-operators mailing list >>>>>> OpenStack-operators@lists.openstack.org >>>>>> >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> OpenStack-operators mailing list >>>> OpenStack-operators@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>>> >>>> >>> >> >
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators