Correction... When I stopped pacemaker/corosync on the four (powered on / active) cluster node hosts, I was having an issue with the gentle method of stopping the cluster (pcs cluster stop --all), so I ended up doing individual (pcs cluster kill <cluster_node>) on each of the four cluster nodes. I then had to stop the virtual domains manually via 'virsh destroy <guestname>' on each host. Perhaps there was some residual node status affecting my quorum?
Thanks... Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. INTERNET: [email protected] PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 From: Scott Greenlese/Poughkeepsie/IBM@IBMUS To: [email protected] Cc: Si Bo Niu <[email protected]>, Scott Loveland/Poughkeepsie/IBM@IBMUS, Michael Tebolt/Poughkeepsie/IBM@IBMUS Date: 09/08/2016 10:01 AM Subject: [ClusterLabs] Pacemaker quorum behavior Hi all... I have a few very basic questions for the group. I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100 VirtualDomain pacemaker-remote nodes plus 100 "opaque" VirtualDomain resources. The cluster is configured to be 'symmetric' and I have no location constraints on the 200 VirtualDomain resources (other than to prevent the opaque guests from running on the pacemaker remote node resources). My quorum is set as: quorum { provider: corosync_votequorum } As an experiment, I powered down one LPAR in the cluster, leaving 4 powered up with the pcsd service up on the 4 survivors but corosync/pacemaker down (pcs cluster stop --all) on the 4 survivors. I then started pacemaker/corosync on a single cluster node (pcs cluster start), and this resulted in the 200 VirtualDomain resources activating on the single node. This was not what I was expecting. I assumed that no resources would activate / start on any cluster nodes until 3 out of the 5 total cluster nodes had pacemaker/corosync running. After starting pacemaker/corosync on the single host (zs95kjpcs1), this is what I see : [root@zs95kj VD]# date;pcs status |less Wed Sep 7 15:51:17 EDT 2016 Cluster name: test_cluster_2 Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12 2016 by hacluster via crmd on zs93kjpcs1 Stack: corosync Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition with quorum 106 nodes and 304 resources configured Node zs93KLpcs1: pending Node zs93kjpcs1: pending Node zs95KLpcs1: pending Online: [ zs95kjpcs1 ] OFFLINE: [ zs90kppcs1 ] . . . PCSD Status: zs93kjpcs1: Online zs95kjpcs1: Online zs95KLpcs1: Online zs90kppcs1: Offline zs93KLpcs1: Online So, what exactly constitutes an "Online" vs. "Offline" cluster node w.r.t. quorum calculation? Seems like in my case, it's "pending" on 3 nodes, so where does that fall? Any why "pending"? What does that mean? Also, what exactly is the cluster's expected reaction to quorum loss? Cluster resources will be stopped or something else? Where can I find this documentation? Thanks! Scott Greenlese - IBM Solution Test Team. Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y. INTERNET: [email protected] PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966 _______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
_______________________________________________ Users mailing list: [email protected] http://clusterlabs.org/mailman/listinfo/users Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org
