When NFS hangs you practically need to get hostage negotiators in to
talk to a machine into rebooting (or use -f :)
If you are running critical production infra and suffering from power
failures then its not only CEPH that would have issues.
On Fri, Sep 16, 2016 at 1:02 PM, Adam Thompson <athom...@athompso.net> wrote:
> We've observed that if any of the nodes boot much faster or slower than the
> other nodes, this causes big problems with both CEPH and PVE, particularly
> with quorum issues.
> I've just finished switching a 9-node cluster to NFS because CEPH was too
> unreliable after repeated power failure crashes.
> Turns out powering off the last few nodes is hard because they've lost quorum
> by that point and hang during shutdown for longer than the UPSes last.
> Unless you have redundant power (I.e. generator) I'm not sure I would ever
> recommend a large PVE+CEPH cluster again.
> On September 16, 2016 4:36:41 AM CDT, Marco Gaiarin <g...@sv.lnf.it> wrote:
>>Mandi! Fabian Grünbichler
>> In chel di` si favelave...
>>> two ceph nodes, two mons and two osds are all way too few for a
>>> (production) ceph setup.
>>I know, this is my 'test' ceph cluster as stated... ;-)
>>> at least three nodes/mons (for quorum reasons),
>>> and multiple osds per storage node (for performance and failure
>>> are required.
>>Production, as planned, will have 3 nodes/mon, and 2 OSD per node.
>>I'm simply curious if starting a ceph cluster from cold iron could be a
>>common failure condition, or is a consequence of my little setup...
>>dott. Marco Gaiarin GNUPG Key ID: 240A3D66
>>Associazione ``La Nostra Famiglia''
>>Polo FVG - Via della Bontà, 7 - 33078 - San Vito al Tagliamento
>>marco.gaiarin(at)lanostrafamiglia.it t +39-0434-842711 f
>> Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>> (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
>>pve-user mailing list
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> pve-user mailing list
pve-user mailing list