Re: [ClusterLabs] Pacemaker quorum behavior

Scott Greenlese Thu, 08 Sep 2016 07:35:22 -0700

Correction...

When I stopped pacemaker/corosync on the four (powered on / active) cluster
node hosts,  I was having an issue with
the gentle method of stopping the cluster (pcs cluster stop --all), so I
ended up doing individual (pcs cluster kill <cluster_node>)
on each of the four cluster nodes.   I then had to stop the virtual domains
manually via 'virsh destroy <guestname>' on each host.
Perhaps there was some residual node status affecting my quorum?


Thanks...

Scott Greenlese ... IBM Solutions Test,  Poughkeepsie, N.Y.
  INTERNET:  [email protected]
  PHONE:  8/293-7301 (845-433-7301)    M/S:  POK 42HA/P966




From:   Scott Greenlese/Poughkeepsie/IBM@IBMUS
To:     [email protected]
Cc:     Si Bo Niu <[email protected]>, Scott
            Loveland/Poughkeepsie/IBM@IBMUS, Michael
            Tebolt/Poughkeepsie/IBM@IBMUS
Date:   09/08/2016 10:01 AM
Subject:        [ClusterLabs] Pacemaker quorum behavior



Hi all...

I have a few very basic questions for the group.

I have a 5 node (Linux on Z LPARs) pacemaker cluster with 100 VirtualDomain
pacemaker-remote nodes
plus 100 "opaque" VirtualDomain resources. The cluster is configured to be
'symmetric' and I have no
location constraints on the 200 VirtualDomain resources (other than to
prevent the opaque guests
from running on the pacemaker remote node resources). My quorum is set as:

quorum {
provider: corosync_votequorum
}

As an experiment, I powered down one LPAR in the cluster, leaving 4 powered
up with the pcsd service up on the 4 survivors
but corosync/pacemaker down (pcs cluster stop --all) on the 4 survivors. I
then started pacemaker/corosync on a single cluster
node (pcs cluster start), and this resulted in the 200 VirtualDomain
resources activating on the single node.
This was not what I was expecting. I assumed that no resources would
activate / start on any cluster nodes
until 3 out of the 5 total cluster nodes had pacemaker/corosync running.

After starting pacemaker/corosync on the single host (zs95kjpcs1), this is
what I see :

[root@zs95kj VD]# date;pcs status |less
Wed Sep 7 15:51:17 EDT 2016
Cluster name: test_cluster_2
Last updated: Wed Sep 7 15:51:18 2016 Last change: Wed Sep 7 15:30:12 2016
by hacluster via crmd on zs93kjpcs1
Stack: corosync
Current DC: zs95kjpcs1 (version 1.1.13-10.el7_2.ibm.1-44eb2dd) - partition
with quorum
106 nodes and 304 resources configured

Node zs93KLpcs1: pending
Node zs93kjpcs1: pending
Node zs95KLpcs1: pending
Online: [ zs95kjpcs1 ]
OFFLINE: [ zs90kppcs1 ]

.
.
.
PCSD Status:
zs93kjpcs1: Online
zs95kjpcs1: Online
zs95KLpcs1: Online
zs90kppcs1: Offline
zs93KLpcs1: Online

So, what exactly constitutes an "Online" vs. "Offline" cluster node w.r.t.
quorum calculation? Seems like in my case, it's "pending" on 3 nodes,
so where does that fall? Any why "pending"? What does that mean?

Also, what exactly is the cluster's expected reaction to quorum loss?
Cluster resources will be stopped or something else?

Where can I find this documentation?

Thanks!

Scott Greenlese - IBM Solution Test Team.



Scott Greenlese ... IBM Solutions Test, Poughkeepsie, N.Y.
INTERNET: [email protected]
PHONE: 8/293-7301 (845-433-7301) M/S: POK 42HA/P966
_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

_______________________________________________
Users mailing list: [email protected]
http://clusterlabs.org/mailman/listinfo/users

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [ClusterLabs] Pacemaker quorum behavior

Reply via email to