On Oct 5, 2016, at 9:38 AM, Ken Gaillot <[email protected]> wrote: > > On 10/05/2016 11:56 AM, Israel Brewster wrote >> >>>>>>>> I never did any specific configuring of CMAN, Perhaps that's the >>>>>>>> problem? I missed some configuration steps on setup? I just >>>>>>>> followed the >>>>>>>> directions >>>>>>>> here: >>>>>>>> http://jensd.be/156/linux/building-a-high-available-failover-cluster-with-pacemaker-corosync-pcs, >>>>>>>> which disabled stonith in pacemaker via the >>>>>>>> "pcs property set stonith-enabled=false" command. Is there >>>>>>>> separate CMAN >>>>>>>> configs I need to do to get everything copacetic? If so, can you >>>>>>>> point >>>>>>>> me to some sort of guide/tutorial for that? > > If you ran "pcs cluster setup", it configured CMAN for you. Normally you > don't need to modify those values, but you can see them in > /etc/cluster/cluster.conf.
Good to know. So I'm probably OK on that front. >> >> So in any case, I guess the next step here is to figure out how to do >> fencing properly, using controllable power strips or the like. Back to >> the drawing board! > > It sounds like you're on the right track for fencing, but it may not be > your best next step. Currently, your nodes are trying to fence each > other endlessly, so if you get fencing working, one of them will > succeed, and you just have a new problem. :-) > > Check the logs for the earliest occurrence (after starting the cluster) > of the "Requesting Pacemaker fence" message. Look back from that time in > /var/log/messages, /var/log/cluster/*, and /var/log/pacemaker.log (not > necessarily all will be present on your system) to try to figure out why > it wants to fence. > > One thing I noticed is that you're running CentOS 6.8, but your > pacemaker version is 1.1.11. CentOS 6.8 shipped with 1.1.14, so maybe > you partially upgraded your system from an earlier OS version? I'd try > applying all updates (especially cman, libqb, corosync, and pacemaker). I think what's you're seeing is pacemaker on my primary DB server, which is still at CentOS 6.7. The other servers I've managed to update, but I haven't figured out a *good* HA solution for my DB server (PostgreSQL 9.4 running streaming replication with named replication slots). That is, I can fail over *relatively* easily (touch a file on the secondary, move the IP, and hope all the persistent DB connections reconnect without issue), but getting the demoted primary back up and running is more of a chore (the pg_rewind feature of PostgreSQL 9.5 looks to help with this, but I'm not up to 9.5 yet). As such, I haven't updated the primary DB server as much as some of the others. Proper integration of the DB with pacemaker is something I need to look into again, but I took a stab at it when I was first setting up the application cluster, and didn't have much luck. >>>> Now if there is a version of fencing that simply >>>> e-mails/texts/whatever me and says "Ummm... something is wrong with >>>> that machine over there, you need to do something about it, because I >>>> can't guarantee operation otherwise", I could go for that. > > As digimer mentioned elsewhere, one variation is to use "fabric" > fencing, i.e. cutting off all external access (disk and/or network) to > the node. That leaves it up but unable to cause any trouble, so you can > investigate. > > If the disk is all local, or accessed over the network, then asking an > intelligent switch to cut off network access is sufficient. If the disk > is shared (e.g. iSCSI), then you need to cut it off, too. All disks are local, which would simplify this option, especially considering that I don't have any remote power control options available at the moment. I mentioned getting switched PDU's to my boss, and he'll look into it, but thinks it might not fit into his budget. If I could simply down the proper ports on the Cisco switch(s) the machines are connected to, that could be a viable alternative without any additional hardware needed. Thanks! ----------------------------------------------- Israel Brewster Systems Analyst II Ravn Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7293 ----------------------------------------------- > >>> No, that is not fencing. >>> >>> -- >>> Digimer >>> Papers and Projects: https://alteeve.ca/w/ >>> What if the cure for cancer is trapped in the mind of a person without >>> access to education? > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
