On 09/08/2016 07:31 PM, Scott Greenlese wrote: > > Hi Klaus, thanks for your prompt and thoughtful feedback... > > Please see my answers nested below (sections entitled, "Scott's > Reply"). Thanks! > > - Scott > > > Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. > INTERNET: [email protected] > PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 > > > Inactive hide details for Klaus Wenninger ---09/08/2016 10:59:27 > AM---On 09/08/2016 03:55 PM, Scott Greenlese wrote: >Klaus Wenninger > ---09/08/2016 10:59:27 AM---On 09/08/2016 03:55 PM, Scott Greenlese > wrote: > > > From: Klaus Wenninger <[email protected]> > To: [email protected] > Date: 09/08/2016 10:59 AM > Subject: Re: [ClusterLabs] Pacemaker quorum behavior > > ------------------------------------------------------------------------ > > > > On 09/08/2016 03:55 PM, Scott Greenlese wrote: > > > > Hi all... > > > > I have a few very basic questions for the group. > > > > I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100 > > VirtualDomain pacemaker-remote nodes > > plus 100 "opaque" VirtualDomain resources. The cluster is configured > > to be 'symmetric' and I have no > > location constraints on the 200 VirtualDomain resources (other than to > > prevent the opaque guests > > from running on the pacemaker remote node resources). My quorum is set > > as: > > > > quorum { > > provider: corosync_votequorum > > } > > > > As an experiment, I powered down one LPAR in the cluster, leaving 4 > > powered up with the pcsd service up on the 4 survivors > > but corosync/pacemaker down (pcs cluster stop --all) on the 4 > > survivors. I then started pacemaker/corosync on a single cluster > > > > "pcs cluster stop" shuts down pacemaker & corosync on my test-cluster but > did you check the status of the individual services? > > Scott's reply: > > No, I only assumed that pacemaker was down because I got this back on > my pcs status > command from each cluster node: > > [root@zs95kj VD]# date;for host in zs93KLpcs1 zs95KLpcs1 zs95kjpcs1 > zs93kjpcs1 ; do ssh $host pcs status; done > Wed Sep 7 15:49:27 EDT 2016 > Error: cluster is not currently running on this node > Error: cluster is not currently running on this node > Error: cluster is not currently running on this node > Error: cluster is not currently running on this node > > > What else should I check? The pcsd.service service was still up, > since I didn't not stop that > anywhere. Should I have done, ps -ef |grep -e pacemaker -e corosync > to check the state before > assuming it was really down? > > Guess the answer from Poki should guide you well here ... > > > > node (pcs cluster start), and this resulted in the 200 VirtualDomain > > resources activating on the single node. > > This was not what I was expecting. I assumed that no resources would > > activate / start on any cluster nodes > > until 3 out of the 5 total cluster nodes had pacemaker/corosync running. > > > > After starting pacemaker/corosync on the single host (zs95kjpcs1), > > this is what I see : > > > > [root@zs95kj VD]# date;pcs status |less > > Wed Sep 7 15:51:17 EDT 2016 > > Cluster name: test_cluster_2 > > Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12 > > 2016 by hacluster via crmd on zs93kjpcs1 > > Stack: corosync > > Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - > > partition with quorum > > 106 nodes and 304 resources configured > > > > Node zs93KLpcs1: pending > > Node zs93kjpcs1: pending > > Node zs95KLpcs1: pending > > Online: [ zs95kjpcs1 ] > > OFFLINE: [ zs90kppcs1 ] > > > > . > > . > > . > > PCSD Status: > > zs93kjpcs1: Online > > zs95kjpcs1: Online > > zs95KLpcs1: Online > > zs90kppcs1: Offline > > zs93KLpcs1: Online > > > > So, what exactly constitutes an "Online" vs. "Offline" cluster node > > w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3 > > nodes, > > so where does that fall? Any why "pending"? What does that mean? > > > > Also, what exactly is the cluster's expected reaction to quorum loss? > > Cluster resources will be stopped or something else? > > > Depends on how you configure it using cluster property no-quorum-policy > (default: stop). > > Scott's reply: > > This is how the policy is configured: > > [root@zs95kj VD]# date;pcs config |grep quorum > Thu Sep 8 13:18:33 EDT 2016 > no-quorum-policy: stop > > What should I expect with the 'stop' setting? > > > > > > > > Where can I find this documentation? > > > http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/ > > Scott's reply: > > OK, I'll keep looking thru this doc, but I don't easily find the > no-quorum-policy explained. > Well, the index leads you to: http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-cluster-options.html where you find an exhaustive description of the option.
In short: you are running the default and that leads to all resources being stopped in a partition without quorum > Thanks.. > > > > > > > > Thanks! > > > > Scott Greenlese - IBM Solution Test Team. > > > > > > > > Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. > > INTERNET: [email protected] > > PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 > > > > > > > > _______________________________________________ > > Users mailing list: [email protected] > > http://clusterlabs.org/mailman/listinfo/users > > > > Project Home: http://www.clusterlabs.org > > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > > Bugs: http://bugs.clusterlabs.org > > > > _______________________________________________ > Users mailing list: [email protected] > http://clusterlabs.org/mailman/listinfo/users > > Project Home: http://www.clusterlabs.org > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf > Bugs: http://bugs.clusterlabs.org > > > >
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
