We've observed that if any of the nodes boot much faster or slower than the 
other nodes, this causes big problems with both CEPH and PVE, particularly with 
quorum issues.
I've just finished switching a 9-node cluster to NFS because CEPH was too 
unreliable after repeated power failure crashes.
Turns out powering off the last few nodes is hard because they've lost quorum 
by that point and hang during shutdown for longer than the UPSes last.
Unless you have redundant power (I.e. generator) I'm not sure I would ever 
recommend a large PVE+CEPH cluster again.

On September 16, 2016 4:36:41 AM CDT, Marco Gaiarin <g...@sv.lnf.it> wrote:
>Mandi! Fabian Grünbichler
>  In chel di` si favelave...
>> two ceph nodes, two mons and two osds are all way too few for a
>> (production) ceph setup.
>I know, this is my 'test' ceph cluster as stated... ;-)
>> at least three nodes/mons (for quorum reasons),
>> and multiple osds per storage node (for performance and failure
>> are required.
>Production, as planned, will have 3 nodes/mon, and 2 OSD per node.
>I'm simply curious if starting a ceph cluster from cold iron could be a
>common failure condition, or is a consequence of my little setup...
