Hi Alejandro, On Thu, Apr 3, 2014 at 11:41 PM, Alejandro Comisario <[email protected]> wrote: > I would love to have insights regarding people using _base with no > shared storage but locally on the compute, up&down sides, experiences > & comments.
We currently have a small cloud made of heterogeneous hardware. Whenever we can we try to use the local disk of the nodes. This is the deployment scenario I would advise, if possible, and if you *really* need to migrate an instance from a node for maintenance purposes you can still try a live-migration --block-migration. Unfortunately, we also have a couple of nodes with very little storage, and for them we have a shared NFS filesystem which is mounted as /var/lib/nova/instances. This allows us to share the _base directory (improving deploymenet speed) and also allows live migration. We don't store directly glance images on _base: the first node that needs that image downloads it, the others will use the already downloaded image. One important caveat: when you share _base among multiple compute nodes, in /var/lib/nova/instances/locks a file is created by the machine which is downloading the base image, and fcntl() is called: if you use a NFSv3 filesystem you *will* have troubles, as locking in NFSv3 is very poorly implemented, so you should definitively use NFSv4. If you can't (like we couldn't), use a linux box to export a NFSv4 filesystem and mount it on /var/lib/nova/instances/locks Also, you need to remove all the nova-compute periodic task to cleanup the _base directory, since an image which is not used by that compute-node could be in use by some other node! In a couple of months we are going to deploy a few hundred nodes that have very little internal storage, and for them we are planning to deploy GlusterFS for /var/lib/nova/instances, but since we don't have much experience yet I can't tell you if this is actually advisable. I can't tell you that I would have avoided it if it was possible, though :) Final note: we run Folsom on Ubuntu 12.04 .a. >> On Thu, Apr 3, 2014 at 12:28 AM, Joe Topjian <[email protected]> wrote: >> > Is it Ceph live migration that you don't think is mature for production or >> > live migration in general? If the latter, I'd like to understand why you >> > feel that way. >> > >> > Looping back to Alejandro's original message: I share his pain of _base >> > issues. It's happened to me before and it sucks. >> > >> > We use shared storage for a production cloud of ours. The cloud has a 24x7 >> > SLA and shared storage with live migration helps us achieve that. It's not >> > a >> > silver bullet, but it has saved us so many hours of work. >> > >> > The remove_unused_base_images option is stable and works. I still disagree >> > with the default value being "true", but I can vouch that it has worked >> > without harm for the past year in an environment where it previously shot >> > me >> > in the foot. >> > >> > With that option enabled, you should not have to go into _base at all. Any >> > work that we do in _base is manual audits and the rare time when the >> > database might be inconsistent with what's really hosted. >> > >> > To mitigate against potential _base issues, we just try to be as careful as >> > possible -- measure 5 times before cutting. Our standard procedure is to >> > move the files we plan on removing to a temporary directory and wait a few >> > days to see if any users raise an alarm. >> > >> > Diego has a great point about not using qemu backing files: if your backend >> > storage implements deduplication and/or compression, you should see the >> > same >> > savings as what _base is trying to achieve. >> > >> > We're in the process of building a new public cloud and made the decision >> > to >> > not implement shared storage. I have a queue of blog posts that I'd love to >> > write and the thoughts behind this decision is one of them. Very briefly, >> > the decision was based on the SLA that the public cloud will have combined >> > with our feeling that "cattle" instances are more acceptable to the average >> > end-user nowadays. >> > >> > That's not to say that I'm "done" with shared storage. IMO, it all depends >> > on the environment. One great thing about OpenStack is that it can be >> > tailored to work in so many different environments. >> > >> > >> > >> > On Wed, Apr 2, 2014 at 5:48 PM, matt <[email protected]> wrote: >> >> >> >> there's shared storage on a centralized network filesystem... then there's >> >> shared storage on a distributed network filesystem. thus the age old >> >> openafs vs nfs war is reborn. >> >> >> >> i'd check out ceph block device for live migration... but saying that... >> >> live migration has not achieved a maturity level that i'd even consider >> >> trying it in production. >> >> >> >> -matt >> >> >> >> >> >> On Wed, Apr 2, 2014 at 7:40 PM, Chris Friesen >> >> <[email protected]> wrote: >> >>> >> >>> So if you're recommending not using shared storage, what's your answer to >> >>> people asking for live-migration? (Given that block migration is >> >>> supposed >> >>> to be going away.) >> >>> >> >>> Chris >> >>> >> >>> >> >>> On 04/02/2014 05:08 PM, George Shuklin wrote: >> >>>> >> >>>> Every time anyone start to consolidate resources (shared storage, >> >>>> virtual chassis for router, etc), it consolidate all failures to one. >> >>>> One failure and every consolidated system participating in festival. >> >>>> >> >>>> Then they starts to increase fault tolerance of consolidated system, >> >>>> raising administrative plank to the sky, requesting more and more >> >>>> hardware for the clustering, requesting enterprise-grade, "no one was >> >>>> fired buying enterprise <bullshit-brand-name-here>". As result - >> >>>> consolidated system works with same MTBF as non-consolidated, saving >> >>>> "costs" compare to even more enterprise-grade super-solution with cost >> >>>> of few percent countries GDP, and actually costs more than >> >>>> non-consolidated solution. >> >>>> >> >>>> Failure for x86 is ALWAYS option. Processor can not repeat instructions, >> >>>> no comparator between few parallel processors, and so on. Compare to >> >>>> mainframes. So, if failure is an option, that means, reduce importance >> >>>> of that failure, it scope. >> >>>> >> >>>> If one of 1k hosts goes down for three hours this is sad. But it much >> >>>> much much better than central system every of 1k hosts depends on goes >> >>>> down just for 11 seconds (3h*3600/1000). >> >>>> >> >>>> So answer is simple: do not aggregate. But _base to slower drives if you >> >>>> want to save costs, but do not consolidate failures. >> >>>> >> >>>> On 04/02/2014 09:04 PM, Alejandro Comisario wrote: >> >>>>> >> >>>>> Hi guys ... >> >>>>> We have a pretty big openstack environment and we use a shared NFS to >> >>>>> populate backing file directory ( the famous _base directory located >> >>>>> on /var/lib/nova/instances/_base ) due to a human error, the backing >> >>>>> file used by thousands of guests was deleted, causing this guests to >> >>>>> go read-only filesystem in a second. >> >>>>> >> >>>>> Till that moment we were convinced to use the _base directory as a >> >>>>> shared NFS because: >> >>>>> >> >>>>> * spawning a new ami gives total visibility to the whole cloud making >> >>>>> instances take nothing to boot despite the nova region >> >>>>> * ease glance workload >> >>>>> * easiest management no having to replicate files constantly not >> >>>>> pushing bandwidth usage internally >> >>>>> >> >>>>> But after this really big issue, and after what took us to recover >> >>>>> from this, we were thinking about how to protect against this kind of >> >>>>> "single point of failure". >> >>>>> Our first aproach this days was to put Read Only the NFS share, making >> >>>>> impossible for computes ( and humans ) to write to that directory, >> >>>>> giving permision to just one compute whos the one responsible to spawn >> >>>>> an instance from a new ami and write the file to the directory, still >> >>>>> ... the storage keeps being the SPOF. >> >>>>> >> >>>>> So, we are handling the possibility of having the used backing files >> >>>>> LOCAL on every compute ( +1K hosts ) and reduce the failure chances to >> >>>>> the minimum, obviously, with a pararell talk about what technology to >> >>>>> use to keep data replicated among computes when a new ami is launched, >> >>>>> launching times, performance matters on compute nodes having to store >> >>>>> backing files locally, etc. >> >>>>> >> >>>>> This make me realize, i have a huge comminity behind openstack, so >> >>>>> wanted to ear from it: >> >>>>> >> >>>>> * what are your thoughts about what happened / what we are thinking >> >>>>> right now ? >> >>>>> * how does other users manage the backing file ( _base ) directory >> >>>>> having all this considerations on big openstack deployments ? >> >>>>> >> >>>>> I will be thrilled to read from other users, experiences and thoughts. >> >>>>> >> >>>>> As allways, best. >> >>>>> Alejandro >> >>>>> >> >>>>> _______________________________________________ >> >>>>> OpenStack-operators mailing list >> >>>>> [email protected] >> >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> OpenStack-operators mailing list >> >>>> [email protected] >> >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> OpenStack-operators mailing list >> >>> [email protected] >> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >> >> >> >> >> >> _______________________________________________ >> >> OpenStack-operators mailing list >> >> [email protected] >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >> > >> > >> > _______________________________________________ >> > OpenStack-operators mailing list >> > [email protected] >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : [email protected] > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack -- [email protected] [email protected] +41 (0)44 635 42 22 GC3: Grid Computing Competence Center http://www.gc3.uzh.ch/ University of Zurich Winterthurerstrasse 190 CH-8057 Zurich Switzerland _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : [email protected] Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
