On 12/17/18 9:23 AM, Eneko Lacunza wrote:
Hi,

El 16/12/18 a las 17:16, Frank Thommen escribió:
I understand that with the new PVE release PVE hosts (hypervisors) can be
used as Ceph servers.  But it's not clear to me if (or when) that makes
sense.  Do I really want to have Ceph MDS/OSD on the same hardware as my hypervisors?  Doesn't that a) accumulate multiple POFs on the same hardware and b) occupy computing resources (CPU, RAM), that I'd rather use for my VMs and containers?  Wouldn't I rather want to have a separate Ceph cluster?
The integration of Ceph services in PVE started with Proxmox VE 3.0.
With PVE 5.3 (current) we added CephFS services to the PVE. So you can
run a hyper-converged Ceph with RBD/CephFS on the same servers as your
VM/CT.

a) can you please be more specific in what you see as multiple point of
failures?

not only I run the hypervisor which controls containers and virtual machines on the server, but also the fileservice which is used to store the VM and container images.
I think you have less points of failure :-) because you'll have 3 points (nodes) of failure in an hyperconverged scenario and 6 in a separate virtualization/storage cluster scenario...  it depends how you look at it.

Right, but I look at it from the service side: one hardware failure -> one service affected vs. one hardware failure -> two service affected.



b) depends on the workload of your nodes. Modern server hardware has
enough power to be able to run multiple services. It all comes down to
have enough resources for each domain (eg. Ceph, KVM, CT, host).

I recommend to use a simple calculation for the start, just to get a
direction.

In principle:

==CPU==
core='CPU with HT on'

* reserve a core for each Ceph daemon
   (preferable on the same NUMA as the network; higher frequency is
   better)
* one core for the network card (higher frequency = lower latency)
* rest of the cores for OS (incl. monitoring, backup, ...), KVM/CT usage
* don't overcommit

==Memory==
* 1 GB per TB of used disk space on an OSD (more on recovery)
Note this is not true anymore with Bluestore, because you have to add cache space into account (1GB for HDD and 3GB for SSD OSDs if I recall correctly.), and also currently OSD processes aren't that good with RAM use accounting... :)
* enough memory for KVM/CT
* free memory for OS, backup, monitoring, live migration
* don't overcommit

==Disk==
* one OSD daemon per disk, even disk sizes throughout the cluster
* more disks, more hosts, better distribution

==Network==
* at least 10 GbE for storage traffic (more the better),
   see our benchmark paper
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
10Gbit helps a lot with latency; small clusters can work perfectly with 2x1Gbit if they aren't latency-sensitive (we have been running a handfull of those for some years now).

I will keep the two points in mind.  Thank you.
frank
_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to