[ClusterLabs] Antw: [EXT] Re: Help understanding recover of promotable resource after a "pcs cluster stop ‑‑all"

Ulrich Windl Mon, 02 May 2022 23:09:23 -0700

>>> Ken Gaillot <[email protected]> schrieb am 02.05.2022 um 23:25 in
Nachricht
<[email protected]>:
> On Mon, 2022‑05‑02 at 13:11 ‑0300, Salatiel Filho wrote:
>> Hi, Ken, here is the info you asked for.
>> 
>> 
>> # pcs constraint
>> Location Constraints:
>>   Resource: fence‑server1
>>     Disabled on:
>>       Node: server1 (score:‑INFINITY)
>>   Resource: fence‑server2
>>     Disabled on:
>>       Node: server2 (score:‑INFINITY)
>> Ordering Constraints:
>>   promote DRBDData‑clone then start nfs (kind:Mandatory)
>> Colocation Constraints:
>>   nfs with DRBDData‑clone (score:INFINITY) (rsc‑role:Started)
>> (with‑rsc‑role:Master)
>> Ticket Constraints:
>> 
>> # sudo crm_mon ‑1A
>> ...
>> Node Attributes:
>>   * Node: server2:
>>     * master‑DRBDData                     : 10000
> 
> In the scenario you described, only server1 is up. If there is no
> master score for server1, it cannot be master. It's up the resource
> agent to set it. I'm not familiar enough with that agent to know why it
> might not.


Additional RA output (syslog) may be helpful as well.

> 
>> 
>> 
>> 
>> Atenciosamente/Kind regards,
>> Salatiel
>> 
>> On Mon, May 2, 2022 at 12:26 PM Ken Gaillot <[email protected]>
>> wrote:
>> > On Mon, 2022‑05‑02 at 09:58 ‑0300, Salatiel Filho wrote:
>> > > Hi, I am trying to understand the recovering process of a
>> > > promotable
>> > > resource after "pcs cluster stop ‑‑all" and shutdown of both
>> > > nodes.
>> > > I have a two nodes + qdevice quorum with a DRBD resource.
>> > > 
>> > > This is a summary of the resources before my test. Everything is
>> > > working just fine and server2 is the master of DRBD.
>> > > 
>> > >  * fence‑server1    (stonith:fence_vmware_rest):     Started
>> > > server2
>> > >  * fence‑server2    (stonith:fence_vmware_rest):     Started
>> > > server1
>> > >  * Clone Set: DRBDData‑clone [DRBDData] (promotable):
>> > >    * Masters: [ server2 ]
>> > >    * Slaves: [ server1 ]
>> > >  * Resource Group: nfs:
>> > >    * drbd_fs    (ocf::heartbeat:Filesystem):     Started server2
>> > > 
>> > > 
>> > > 
>> > > then I issue "pcs cluster stop ‑‑all". The cluster will be
>> > > stopped on
>> > > both nodes as expected.
>> > > Now I restart server1( previously the slave ) and poweroff
>> > > server2 (
>> > > previously the master ). When server1 restarts it will fence
>> > > server2
>> > > and I can see that server2 is starting on vcenter, but I just
>> > > pressed
>> > > any key on grub to make sure the server2 would not restart,
>> > > instead
>> > > it
>> > > would just be "paused" on grub screen.
>> > > 
>> > > SSH'ing to server1 and running pcs status I get:
>> > > 
>> > > Cluster name: cluster1
>> > > Cluster Summary:
>> > >   * Stack: corosync
>> > >   * Current DC: server1 (version 2.1.0‑8.el8‑7c3f660707) ‑
>> > > partition
>> > > with quorum
>> > >   * Last updated: Mon May  2 09:52:03 2022
>> > >   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin
>> > > on
>> > > server1
>> > >   * 2 nodes configured
>> > >   * 11 resource instances configured
>> > > 
>> > > Node List:
>> > >   * Online: [ server1 ]
>> > >   * OFFLINE: [ server2 ]
>> > > 
>> > > Full List of Resources:
>> > >   * fence‑server1    (stonith:fence_vmware_rest):     Stopped
>> > >   * fence‑server2    (stonith:fence_vmware_rest):     Started
>> > > server1
>> > >   * Clone Set: DRBDData‑clone [DRBDData] (promotable):
>> > >     * Slaves: [ server1 ]
>> > >     * Stopped: [ server2 ]
>> > >   * Resource Group: nfs:
>> > >     * drbd_fs    (ocf::heartbeat:Filesystem):     Stopped
>> > > 
>> > > 
>> > > So I can see there is quorum, but the server1 is never promoted
>> > > as
>> > > DRBD master, so the remaining resources will be stopped until
>> > > server2
>> > > is back.
>> > > 1) What do I need to do to force the promotion and recover
>> > > without
>> > > restarting server2?
>> > > 2) Why if instead of rebooting server1 and power off server2 I
>> > > reboot
>> > > server2 and poweroff server1 the cluster can recover by itself?
>> > > 
>> > > 
>> > > Thanks!
>> > > 
>> > 
>> > You shouldn't need to force promotion, that is the default behavior
>> > in
>> > that situation. There must be something else in the configuration
>> > that
>> > is preventing promotion.
>> > 
>> > The DRBD resource agent should set a promotion score for the node.
>> > You
>> > can run "crm_mon ‑1A" to show all node attributes; there should be
>> > one
>> > like "master‑DRBDData" for the active node.
>> > 
>> > You can also show the constraints in the cluster to see if there is
>> > anything relevant to the master role.
> 
> ‑‑ 
> Ken Gaillot <[email protected]>
> 
> _______________________________________________
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users 
> 
> ClusterLabs home: https://www.clusterlabs.org/ 



_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

[ClusterLabs] Antw: [EXT] Re: Help understanding recover of promotable resource after a "pcs cluster stop ‑‑all"

Reply via email to