I would avoid co-locating Ceph and compute processes. Memory on compute nodes is a scare resource, if you're not running with any overcommit, which you shouldn't. Ceph requires a fair amount (2GB per OSD to be safe) of guaranteed memory to deal with recovery. You can certainly overload memory and reserve it, but it is just going to make things difficult to manage and troubleshoot. I'll give an example. I have 2 Ceph clusters that were experiencing aggressive page scanning and page cache reclaimation under some moderate workload. Enough to drive the load on an OSD server to 4 digits. If that had occurred on a box also running compute resources, we would have had tickets rolling in. However, all we did is slow down some of the storage, so it largely went unnoticed.
There may also come a time when package dependencies cause conflicts that will be difficult to reconcile. OVS, kernel, Ceph, etc. It's possible to attempt to dedicate resources on a single host to various processes, but I personally don't think it's worth the effort. Warren Warren On Thu, Mar 19, 2015 at 12:33 PM, Fox, Kevin M <[email protected]> wrote: > We've running it both ways. We have clouds with dedicated storage nodes, > and clouds sharing storage/compute. > > The storage/compute solution with ceph is working ok for us. But, that > particular cloud is 1gigabit only and seems very slow compared to our other > clouds. But because of the gigabit interconnect, while the others are > 40gigabit, its not clear if its slow because of the storage/compute > together, or simply because of the slower interconnect. Could be some of > both. > > I'd be very curious if anyone else had a feeling for storage/compute > together on a faster interconnect. > > Thanks, > Kevin > > ________________________________________ > From: Jesse Keating [[email protected]] > Sent: Thursday, March 19, 2015 9:20 AM > To: [email protected] > Subject: Re: [Openstack-operators] Hyper-converged OpenStack with Ceph > > On 3/19/15 9:08 AM, Jared Cook wrote: > > Hi, I'm starting to see a number of vendors push hyper-converged > > OpenStack solutions where compute and Ceph OSD nodes are one in the > > same. In addition, Ceph monitors are placed on OpenStack controller > > nodes in these architectures. > > > > Recommendations I have read in the past have been to keep these things > > separate, but some vendors are now saying that this actually works out > > OK in practice. > > > > The biggest concern I have is that the compute node functions will > > compete with Ceph functions, and one over utilized node will slow down > > the entire Ceph cluster, which will slow down the entire cloud. Is this > > an unfounded concern? > > > > Does anyone have experience running in this mode? Experience at scale? > > > > > > Not CEPH related, but it's a known tradeoff that compute resource on > control nodes can cause resource competition. This is a tradeoff for the > total cost of the cluster and the expected use case. If the use case > plans to scale out to many compute nodes, we suggest upgrading to > dedicated control nodes. This is higher cost, but somewhat necessary for > matching performance to capacity. > > We may start small, but we can scale up to match the (growing) needs. > > > -- > -jlk > > _______________________________________________ > OpenStack-operators mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ > OpenStack-operators mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >
_______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
