When NFS hangs you practically need to get hostage negotiators in to
talk to a machine into rebooting (or use -f :)

If you are running critical production infra and suffering from power
failures then its not only CEPH that would have issues.


On Fri, Sep 16, 2016 at 1:02 PM, Adam Thompson <athom...@athompso.net> wrote:
> We've observed that if any of the nodes boot much faster or slower than the 
> other nodes, this causes big problems with both CEPH and PVE, particularly 
> with quorum issues.
> I've just finished switching a 9-node cluster to NFS because CEPH was too 
> unreliable after repeated power failure crashes.
> Turns out powering off the last few nodes is hard because they've lost quorum 
> by that point and hang during shutdown for longer than the UPSes last.
> Unless you have redundant power (I.e. generator) I'm not sure I would ever 
> recommend a large PVE+CEPH cluster again.
> -Adam
>
> On September 16, 2016 4:36:41 AM CDT, Marco Gaiarin <g...@sv.lnf.it> wrote:
>>Mandi! Fabian Grünbichler
>>  In chel di` si favelave...
>>
>>> two ceph nodes, two mons and two osds are all way too few for a
>>> (production) ceph setup.
>>
>>I know, this is my 'test' ceph cluster as stated... ;-)
>>
>>
>>> at least three nodes/mons (for quorum reasons),
>>> and multiple osds per storage node (for performance and failure
>>reasons)
>>> are required.
>>
>>Production, as planned, will have 3 nodes/mon, and 2 OSD per node.
>>
>>
>>I'm simply curious if starting a ceph cluster from cold iron could be a
>>common failure condition, or is a consequence of my little setup...
>>
>>--
>>dott. Marco Gaiarin                                    GNUPG Key ID: 240A3D66
>>Associazione ``La Nostra Famiglia''
>>http://www.lanostrafamiglia.it/
>>Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento
>>(PN)
>>marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f
>>+39-0434-842797
>>
>>               Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
>>    http://www.lanostrafamiglia.it/25/index.php/component/k2/item/123
>>       (cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
>>_______________________________________________
>>pve-user mailing list
>>pve-user@pve.proxmox.com
>>http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
>
> --
> Sent from my Android device with K-9 Mail. Please excuse my brevity.
> _______________________________________________
> pve-user mailing list
> pve-user@pve.proxmox.com
> http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user
_______________________________________________
pve-user mailing list
pve-user@pve.proxmox.com
http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-user

Reply via email to