On 31/05/16 10:41 PM, Jay Scott wrote: > hooray for me, but, how? > > I got about 3/4 of Digimer's list done and got stuck. > I did a pcs cluster status, and, behold, the cluster was up. > I pinged the ClusterIP and it answered. I didn't know what > to do with the 'delay="x"' part, that's the thing I couldn't figure > out. (I've been assuming the delay part is a big deal.)
Delay works like this; Both nodes are up, but comms break (switch loop/broadcast storm, STP/stack renegotiation, iptables oops, whatever)... Both nodes declare their peer lost. Node 1's stonith config includes 'delay="15"'. Node 1 looks up how to fence node 2, calls the fence. Node 2 looks up how to fence node 1, calls fence (passing to the agent the delay). The fence agent running on node 1 executes without delay. The fence agent running on node 2 sees a delay of 15 seconds, and sleeps. Node 1 kills node 2 before the sleep exits, thus ensuring that node 1 lived and node 2 died. Assuming you have your services on node 1, then that means no recovery is needed. Now assume that node 1 truly died. Node 2's fence agent would exit the sleep after 15 seconds and proceed to shoot node 1 and then recover any resources that had been on node 1. digimer > However, there are more things for me to read and more experiments > for me to try so I'm good for now. > > Thanks to everyone for the prompt help. > > j. > > On Tue, May 31, 2016 at 5:22 PM, Ken Gaillot <[email protected] > <mailto:[email protected]>> wrote: > > On 05/31/2016 03:59 PM, Jay Scott wrote: > > Greetings, > > > > Cluster newbie > > Centos 7 > > trying to follow the "Clusters from Scratch" intro. > > 2 nodes (yeah, I know, but I'm just learning) > > <PRE> > > [root@smoking ~]# pcs status > > Cluster name: > > Last updated: Tue May 31 15:32:18 2016 Last change: Tue May 31 > > 15:02:21 > > 2016 by root via cibadmin on smoking > > Stack: unknown > > "Stack: unknown" is a big problem. The cluster isn't aware of the > corosync configuration. Did you do the "pcs cluster setup" step? > > > Current DC: NONE > > 2 nodes and 1 resource configured > > > > OFFLINE: [ mars smoking ] > > > > Full list of resources: > > > > ClusterIP (ocf::heartbeat:IPaddr2): Stopped > > > > PCSD Status: > > smoking: Online > > mars: Online > > > > Daemon Status: > > corosync: active/enabled > > pacemaker: active/enabled > > pcsd: active/enabled > > </PRE> > > > > What concerns me at the moment: > > I did > > pcs resource enable ClusterIP > > while simultaneously doing > > tail -f /var/log/cluster/corosync.log > > (the only log in there) > > The system log (/var/log/messages or whatever your system has > configured) is usually the best place to start. The cluster software > sends messages of interest to end users there, and it includes messages > from all components (corosync, pacemaker, resource agents, etc.). > > /var/log/cluster/corosync.log (and in some configurations, > /var/log/pacemaker.log) have more detailed log information for > debugging. > > > and nothing happens in the log, but the ClusterIP > > stays "Stopped". Should I be able to ping that addr? > > I can't. > > It also says OFFLINE: and both of my machines are offline, > > though the PCSD says they're online. Which do I trust? > > The first online/offline output is most important, and refers to the > node's status in the actual cluster; the "PSCD" online/offline output > simply tells whether the pcs daemon is running. Typically, the pcs > daemon is enabled at boot and is always running. The pcs daemon is not > part of the clustering itself; it's a front end to configuring and > administering the cluster. > > > [root@smoking ~]# pcs property show stonith-enabled > > Cluster Properties: > > stonith-enabled: false > > > > yet I see entries in the corosync.log referring to stonith. > > I'm guessing that's normal. > > Yes, you can enable stonith at any time, so the stonith daemon will > still run, to stay aware of the cluster status. > > > My corosync.conf file says the quorum is off. > > > > I also don't know what to include in this for any of you to > > help me debug. > > > > Ahh, also, is this considered "long", and if so, where would I post > > to the web? > > > > thx. > > > > j. > > _______________________________________________ > Users mailing list: [email protected] <mailto:[email protected]> > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
