Hi Klaus, thanks for your prompt and thoughtful feedback... Please see my answers nested below (sections entitled, "Scott's Reply"). Thanks!
- Scott Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. INTERNET: [email protected] PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 From: Klaus Wenninger <[email protected]> To: [email protected] Date: 09/08/2016 10:59 AM Subject: Re: [ClusterLabs] Pacemaker quorum behavior On 09/08/2016 03:55 PM, Scott Greenlese wrote: > > Hi all... > > I have a few very basic questions for the group. > > I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100 > VirtualDomain pacemaker-remote nodes > plus 100 "opaque" VirtualDomain resources. The cluster is configured > to be 'symmetric' and I have no > location constraints on the 200 VirtualDomain resources (other than to > prevent the opaque guests > from running on the pacemaker remote node resources). My quorum is set > as: > > quorum { > provider: corosync_votequorum > } > > As an experiment, I powered down one LPAR in the cluster, leaving 4 > powered up with the pcsd service up on the 4 survivors > but corosync/pacemaker down (pcs cluster stop --all) on the 4 > survivors. I then started pacemaker/corosync on a single cluster > "pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but did you check the status of the individual services? Scott's reply: No, I only assumed that pacemaker was down because I got this back on my pcs status command from each cluster node: [root@zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1 zs93kjpcs1 ; do ssh $host pcs status; done Wed Sep 7 15:49:27 EDT 2016 Error: cluster is not currently running on this node Error: cluster is not currently running on this node Error: cluster is not currently running on this node Error: cluster is not currently running on this node What else should I check? The pcsd.service service was still up, since I didn't not stop that anywhere. Should I have done, ps -ef |grep -e pacemaker -e corosync to check the state before assuming it was really down? > node (pcs cluster start), and this resulted in the 200 VirtualDomain > resources activating on the single node. > This was not what I was expecting. I assumed that no resources would > activate / start on any cluster nodes > until 3 out of the 5 total cluster nodes had pacemaker/corosync running. > > After starting pacemaker/corosync on the single host (zs95kjpcs1), > this is what I see : > > [root@zs95kj VD]# date;pcs status |less > Wed Sep 7 15:51:17 EDT 2016 > Cluster name: test_cluster_2 > Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12 > 2016 by hacluster via crmd on zs93kjpcs1 > Stack: corosync > Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - > partition with quorum > 106 nodes and 304 resources configured > > Node zs93KLpcs1: pending > Node zs93kjpcs1: pending > Node zs95KLpcs1: pending > Online: [ zs95kjpcs1 ] > OFFLINE: [ zs90kppcs1 ] > > . > . > . > PCSD Status: > zs93kjpcs1: Online > zs95kjpcs1: Online > zs95KLpcs1: Online > zs90kppcs1: Offline > zs93KLpcs1: Online > > So, what exactly constitutes an "Online" vs. "Offline" cluster node > w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3 > nodes, > so where does that fall? Any why "pending"? What does that mean? > > Also, what exactly is the cluster's expected reaction to quorum loss? > Cluster resources will be stopped or something else? > Depends on how you configure it using cluster property no-quorum-policy (default: stop). Scott's reply: This is how the policy is configured: [root@zs95kj VD]# date;pcs config |grep quorum Thu Sep 8 13:18:33 EDT 2016 no-quorum-policy: stop What should I expect with the 'stop' setting? > > > Where can I find this documentation? > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ Scott's reply: OK, I'll keep looking thru this doc, but I don't easily find the no-quorum-policy explained. Thanks.. > > > Thanks! > > Scott Greenlese - IBM Solution Test Team. > > > > Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. > INTERNET: [email protected] > PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
