Re: [PVE-User] (Very) basic question regarding PVE Ceph integration

Frank Thommen Mon, 17 Dec 2018 04:21:29 -0800

On 12/17/18 9:23 AM, Eneko Lacunza wrote:

Hi,
El 16/12/18 a las 17:16, Frank Thommen escribió:
I understand that with the new PVE release PVE hosts (hypervisors)can be
used as Ceph servers.  But it's not clear to me if (or when) that makes
sense. Do I really want to have Ceph MDS/OSD on the same hardwareas myhypervisors? Doesn't that a) accumulate multiple POFs on the samehardwareand b) occupy computing resources (CPU, RAM), that I'd rather usefor my VMsand containers? Wouldn't I rather want to have a separate Cephcluster?
The integration of Ceph services in PVE started with Proxmox VE 3.0.
With PVE 5.3 (current) we added CephFS services to the PVE. So you can
run a hyper-converged Ceph with RBD/CephFS on the same servers as your
VM/CT.

a) can you please be more specific in what you see as multiple point of
failures?
not only I run the hypervisor which controls containers and virtualmachines on the server, but also the fileservice which is used tostore the VM and container images.
I think you have less points of failure :-) because you'll have 3 points(nodes) of failure in an hyperconverged scenario and 6 in a separatevirtualization/storage cluster scenario... it depends how you look at it.

Right, but I look at it from the service side: one hardware failure ->one service affected vs. one hardware failure -> two service affected.

b) depends on the workload of your nodes. Modern server hardware has
enough power to be able to run multiple services. It all comes down to
have enough resources for each domain (eg. Ceph, KVM, CT, host).

I recommend to use a simple calculation for the start, just to get a
direction.

In principle:

==CPU==
core='CPU with HT on'

* reserve a core for each Ceph daemon
   (preferable on the same NUMA as the network; higher frequency is
   better)
* one core for the network card (higher frequency = lower latency)
* rest of the cores for OS (incl. monitoring, backup, ...), KVM/CT usage
* don't overcommit

==Memory==
* 1 GB per TB of used disk space on an OSD (more on recovery)
Note this is not true anymore with Bluestore, because you have to addcache space into account (1GB for HDD and 3GB for SSD OSDs if I recallcorrectly.), and also currently OSD processes aren't that good with RAMuse accounting... :)
* enough memory for KVM/CT
* free memory for OS, backup, monitoring, live migration
* don't overcommit

==Disk==
* one OSD daemon per disk, even disk sizes throughout the cluster
* more disks, more hosts, better distribution

==Network==
* at least 10 GbE for storage traffic (more the better),
   see our benchmark paper
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
10Gbit helps a lot with latency; small clusters can work perfectly with2x1Gbit if they aren't latency-sensitive (we have been running ahandfull of those for some years now).


I will keep the two points in mind.  Thank you.
frank
_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Re: [PVE-User] (Very) basic question regarding PVE Ceph integration

Reply via email to