I would love to have insights regarding people using _base with no shared storage but locally on the compute, up&down sides, experiences & comments.
Having base files on the same SATA disks where vm's are running seems big when decoupling _base from shared storage. best regards. Alejandro On Thu, Apr 3, 2014 at 11:19 AM, Alejandro Comisario <[email protected]> wrote: > > Thanks to everyone for the prompted respones! > Its clear that _base on NFS is not the way to go 100% when thinking > about avoiding dissasters. > So, i believe its good to start talking about not using _base backing > files and maybe impressions IF using _base, concerns about having > these files locally on the compute on the same disks where the vms are > running ( in our case are SATA disks ). > > That kind of discussion is think is the most relevant one. > What are the experiences of running the backing files locally on the > same compute where vms are running ? > > best > Alejandro Comisario > > On Thu, Apr 3, 2014 at 12:28 AM, Joe Topjian <[email protected]> wrote: > > Is it Ceph live migration that you don't think is mature for production or > > live migration in general? If the latter, I'd like to understand why you > > feel that way. > > > > Looping back to Alejandro's original message: I share his pain of _base > > issues. It's happened to me before and it sucks. > > > > We use shared storage for a production cloud of ours. The cloud has a 24x7 > > SLA and shared storage with live migration helps us achieve that. It's not a > > silver bullet, but it has saved us so many hours of work. > > > > The remove_unused_base_images option is stable and works. I still disagree > > with the default value being "true", but I can vouch that it has worked > > without harm for the past year in an environment where it previously shot me > > in the foot. > > > > With that option enabled, you should not have to go into _base at all. Any > > work that we do in _base is manual audits and the rare time when the > > database might be inconsistent with what's really hosted. > > > > To mitigate against potential _base issues, we just try to be as careful as > > possible -- measure 5 times before cutting. Our standard procedure is to > > move the files we plan on removing to a temporary directory and wait a few > > days to see if any users raise an alarm. > > > > Diego has a great point about not using qemu backing files: if your backend > > storage implements deduplication and/or compression, you should see the same > > savings as what _base is trying to achieve. > > > > We're in the process of building a new public cloud and made the decision to > > not implement shared storage. I have a queue of blog posts that I'd love to > > write and the thoughts behind this decision is one of them. Very briefly, > > the decision was based on the SLA that the public cloud will have combined > > with our feeling that "cattle" instances are more acceptable to the average > > end-user nowadays. > > > > That's not to say that I'm "done" with shared storage. IMO, it all depends > > on the environment. One great thing about OpenStack is that it can be > > tailored to work in so many different environments. > > > > > > > > On Wed, Apr 2, 2014 at 5:48 PM, matt <[email protected]> wrote: > >> > >> there's shared storage on a centralized network filesystem... then there's > >> shared storage on a distributed network filesystem. thus the age old > >> openafs vs nfs war is reborn. > >> > >> i'd check out ceph block device for live migration... but saying that... > >> live migration has not achieved a maturity level that i'd even consider > >> trying it in production. > >> > >> -matt > >> > >> > >> On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen > >> <[email protected]> wrote: > >>> > >>> So if you're recommending not using shared storage, what's your answer to > >>> people asking for live-migration? (Given that block migration is supposed > >>> to be going away.) > >>> > >>> Chris > >>> > >>> > >>> On 04/02/2014 05:08 PM, George Shuklin wrote: > >>>> > >>>> Every time anyone start to consolidate resources (shared storage, > >>>> virtual chassis for router, etc), it consolidate all failures to one. > >>>> One failure and every consolidated system participating in festival. > >>>> > >>>> Then they starts to increase fault tolerance of consolidated system, > >>>> raising administrative plank to the sky, requesting more and more > >>>> hardware for the clustering, requesting enterprise-grade, "no one was > >>>> fired buying enterprise <bullshit-brand-name-here>". As result - > >>>> consolidated system works with same MTBF as non-consolidated, saving > >>>> "costs" compare to even more enterprise-grade super-solution with cost > >>>> of few percent countries GDP, and actually costs more than > >>>> non-consolidated solution. > >>>> > >>>> Failure for x86 is ALWAYS option. Processor can not repeat instructions, > >>>> no comparator between few parallel processors, and so on. Compare to > >>>> mainframes. So, if failure is an option, that means, reduce importance > >>>> of that failure, it scope. > >>>> > >>>> If one of 1k hosts goes down for three hours this is sad. But it much > >>>> much much better than central system every of 1k hosts depends on goes > >>>> down just for 11 seconds (3h*3600/1000). > >>>> > >>>> So answer is simple: do not aggregate. But _base to slower drives if you > >>>> want to save costs, but do not consolidate failures. > >>>> > >>>> On 04/02/2014 09:04 PM, Alejandro Comisario wrote: > >>>>> > >>>>> Hi guys ... > >>>>> We have a pretty big openstack environment and we use a shared NFS to > >>>>> populate backing file directory ( the famous _base directory located > >>>>> on /var/lib/nova/instances/_base ) due to a human error, the backing > >>>>> file used by thousands of guests was deleted, causing this guests to > >>>>> go read-only filesystem in a second. > >>>>> > >>>>> Till that moment we were convinced to use the _base directory as a > >>>>> shared NFS because: > >>>>> > >>>>> * spawning a new ami gives total visibility to the whole cloud making > >>>>> instances take nothing to boot despite the nova region > >>>>> * ease glance workload > >>>>> * easiest management no having to replicate files constantly not > >>>>> pushing bandwidth usage internally > >>>>> > >>>>> But after this really big issue, and after what took us to recover > >>>>> from this, we were thinking about how to protect against this kind of > >>>>> "single point of failure". > >>>>> Our first aproach this days was to put Read Only the NFS share, making > >>>>> impossible for computes ( and humans ) to write to that directory, > >>>>> giving permision to just one compute whos the one responsible to spawn > >>>>> an instance from a new ami and write the file to the directory, still > >>>>> ... the storage keeps being the SPOF. > >>>>> > >>>>> So, we are handling the possibility of having the used backing files > >>>>> LOCAL on every compute ( +1K hosts ) and reduce the failure chances to > >>>>> the minimum, obviously, with a pararell talk about what technology to > >>>>> use to keep data replicated among computes when a new ami is launched, > >>>>> launching times, performance matters on compute nodes having to store > >>>>> backing files locally, etc. > >>>>> > >>>>> This make me realize, i have a huge comminity behind openstack, so > >>>>> wanted to ear from it: > >>>>> > >>>>> * what are your thoughts about what happened / what we are thinking > >>>>> right now ? > >>>>> * how does other users manage the backing file ( _base ) directory > >>>>> having all this considerations on big openstack deployments ? > >>>>> > >>>>> I will be thrilled to read from other users, experiences and thoughts. > >>>>> > >>>>> As allways, best. > >>>>> Alejandro > >>>>> > >>>>> _______________________________________________ > >>>>> OpenStack-operators mailing list > >>>>> [email protected] > >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> OpenStack-operators mailing list > >>>> [email protected] > >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >>> > >>> > >>> > >>> _______________________________________________ > >>> OpenStack-operators mailing list > >>> [email protected] > >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > >> > >> > >> _______________________________________________ > >> OpenStack-operators mailing list > >> [email protected] > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
