On Wed, Oct 1, 2014 at 4:08 AM, Jesse Pretorius <[email protected]> wrote:
> I'd like to clarify a few things, specifically related to Ceph usage, in > less of a rushed response. :) > > Note - my production experience has only been with Ceph Dumpling. Plenty > of great patches which resolve many of the issues I've experienced have > landed, so YMMV. > > On 30 September 2014 15:06, Jesse Pretorius <[email protected]> > wrote: > >> I would recommend ensuring that: >> >> 1) ceph-mon's and ceph-osd's are not hosted on the same server - they >> both demand plenty of cpu cycles >> > > The ceph-mon will generally not use much CPU. If a whole chassis is lost, > you'll see it spike heavily, but it'll drop off again after the rebuild is > complete. I would still recommend keeping at least one ceph-mon on a host > that isn't hosting OSD's. The mons are where all clients get the data > location details from, so at least one really needs to be available no > matter what happens. > > At the beginning when things are small (few OSD) I'm intending to run mons on the osd nodes. When I start to grow it, my plan is to start deploying separate monitors and eventually disable the mons on the OSD nodes entirely. > And, FYI, I would definitely recommend implementing separate networks for > client access and the storage back-end. This can allow you to ensure that > your storage replication traffic is separated and you can tune the QoS for > each differently. > I've got a dedicated, isolated 10 GB network between the Ceph nodes dedicated purely to replication traffic. Another interface (also 10 GB) will handle traffic from Openstack, and a 3rd (1 GB) will deal with RadosGW traffic from the public side. > > >> 5) instance storage on ceph doesn't work very well if you're trying to >> use the kernel module or cephfs - make sure you're using ceph volumes as >> the underlying storage (I believe this has been patched in for Juno) >> > > cephfs, certainly in Dumpling, is not production ready - our experiment > with using it in production was quickly rolled back when one of the client > servers lost connection to the ceph-mds for some reason and the storage on > it became inaccessible. The client connection to the mds in Dumpling isn't > as resilient as the client connection for the block device. > > By 'use the kernel module' I mean create an image and mounting it to the > server through the ceph block device kernel module, then building a file > system on it and using it like you would any network-based storage. > We found that when using one image as shared storage between servers, > updates from one server wasn't always visible quickly enough (within a > minute) on the other server. If you choose to use a single image per > server, then only mount server2's image on server1 in a disaster recovery > situation then it should be just fine. > We did find that mounting a file system using the kernel module would tend > to cause a kernel panic when trying to disconnect the storage. Note that > there have been several improvements in the revisions after Dumpling, > including some bug fixes for issues that look similar to what we > experienced. > > By "make sure you're using ceph volumes as the underlying storage" I meant > that each instance root disk should be stored as its own Ceph Image in a > storage pool. This can be facilitated directly from nova by using > 'images_type=rbd' in nova.conf which became available in OpenStack Havana. > Support for using RBD for Ephemeral disks as well finally landed in Juno > (see https://bugs.launchpad.net/nova/+bug/1226351), as did support for > copy-on-write cloning (see > https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler) > which rounds out the feature set for using an RBD back-end quite nicely. :) > > I was originally planning on doing what you say about using images_type=rbd with my main wish being to have the ability to live-migrate images off a compute node. I discovered yesterday that block migration works just fine with kvm/libvirt now despite assertions in the Openstack documentation. I can live with that for now. The last time I tried the RBD backend was in Havana and it had some goofy behavior, so I think I'll let this idea sit for a while and maybe try again in Kilo once the new copy-on-write code has had a chance to age a bit ;). > _______________________________________________ > OpenStack-operators mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
