Re: [ClusterLabs] [EXT] Prevent cluster transition when resource unavailable on both nodes

2023-12-11 Thread Alexander Eastwood
Hi,

Thanks Ken and Ulrich for your replies. With your suggestions I ended up 
finding out about ocf:heartbeat:ethmonitor and will try to set this up as an 
additional resource within our cluster.

I can share more information once (if!) I have it working the way I want to.

Cheers,

Alex

> On 07.12.2023, at 08:59, Windl, Ulrich  wrote:
> 
> Hi!
> 
> What about this: Run a ping node for a remote resource to set up some score 
> value. If the remote is unreachable, the score will reflect that.
> Then add a rule chink that score, deciding whether to run the virtual IP or 
> not.
> 
> Regards,
> Ulrich
> 
> -Original Message-
> From: Users  On Behalf Of Alexander Eastwood
> Sent: Wednesday, December 6, 2023 5:56 PM
> To: users@clusterlabs.org
> Subject: [EXT] [ClusterLabs] Prevent cluster transition when resource 
> unavailable on both nodes
> 
> Hello, 
> 
> I administrate a Pacemaker cluster consisting of 2 nodes, which are connected 
> to each other via ethernet cable to ensure that they are always able to 
> communicate with each other. A network switch is also connected to each node 
> via ethernet cable and provides external access.
> 
> One of the managed resources of the cluster is a virtual IP, which is 
> assigned to a physical network interface card and thus depends on the network 
> switch being available. The virtual IP is always hosted on the active node.
> 
> We had the situation where the network switch lost power or was rebooted, as 
> a result both servers reported `NIC Link is Down`. The recover operation on 
> the Virtual IP resource then failed repeatedly on the active node, and a 
> transition was initiated. Since the other node was also unable to start the 
> resource, the cluster was swaying between the 2 nodes until the NIC links 
> were up again.
> 
> Is there a way to change this behaviour? I am thinking of the following 
> sequence of events, but have not been able to find a way to configure this:
> 
> 1. active node detects NIC Link is Down, which affects a resource managed by 
> the cluster (monitor operation on the resource starts to fail)
> 2. active node checks if the other (passive) node in the cluster would be 
> able to start the resource
> 3. if passive node can start the resource, transition all resources to 
> passive node
> 4. if passive node is unable to start the resource, then there is nothing to 
> be gained a transition, so no action should be taken
> 
> Any pointers or advice will be much appreciated!
> 
> Thank you and kind regards,
> 
> Alex Eastwood
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Ken Gaillot
On Fri, 2023-12-08 at 17:44 +0300, Artem wrote:
> Hello experts.
> 
> I use pacemaker for a Lustre cluster. But for simplicity and
> exploration I use a Dummy resource. I didn't like how resource
> performed failover and failback. When I shut down VM with remote
> agent, pacemaker tries to restart it. According to pcs status it
> marks the resource (not RA) Online for some time while VM stays
> down. 
> 
> OK, I wanted to improve its behavior and set up a ping monitor. I
> tuned the scores like this:
> pcs resource create FAKE3 ocf:pacemaker:Dummy
> pcs resource create FAKE4 ocf:pacemaker:Dummy
> pcs constraint location FAKE3 prefers lustre3=100
> pcs constraint location FAKE3 prefers lustre4=90
> pcs constraint location FAKE4 prefers lustre3=90
> pcs constraint location FAKE4 prefers lustre4=100
> pcs resource defaults update resource-stickiness=110
> pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local
> op monitor interval=3s timeout=7s clone meta target-role="started"
> for i in lustre{1..4}; do pcs constraint location ping-clone prefers
> $i; done
> pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined
> pingd
> pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined
> pingd
> pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined
> pingd

The gt 0 part is redundant since "defined pingd" matches *any* score.

> 
> 
> Question #1) Why I cannot see accumulated score from pingd in
> crm_simulate output? Only location score and stickiness. 
> pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
> pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
> Either when all is OK or when VM is down - score from pingd not added
> to total score of RA

ping scores aren't added to resource scores, they're just set as node
attribute values. Location constraint rules map those values to
resource scores (in this case any defined ping score gets mapped to
125).

> 
> 
> Question #2) I shut lustre3 VM down and leave it like that. pcs
> status:

How did you shut it down? Outside cluster control, or with something
like pcs resource disable?

>   * FAKE3   (ocf::pacemaker:Dummy):  Stopped
>   * FAKE4   (ocf::pacemaker:Dummy):  Started lustre4
>   * Clone Set: ping-clone [ping]:
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ] << lustre3 missing
> OK for now
> VM boots up. pcs status: 
>   * FAKE3   (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3
> lustre4 ]  << what is it?
>   * Clone Set: ping-clone [ping]:
> * ping  (ocf::pacemaker:ping):   FAILED lustre3 (blocked)   
> << why not started?
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> lustre4 ]
> I checked server processes manually and found that lustre4 runs
> "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3
> doesn't
> All is according to documentation but results are strange.
> Then I tried to add meta target-role="started" to pcs resource create
> ping and this time ping started after node rebooted. Can I expect
> that it was just missing from official setup documentation, and now
> everything will work fine?
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-11 Thread Artem
Hi Ken,

On Mon, 11 Dec 2023 at 19:00, Ken Gaillot  wrote:

> > Question #2) I shut lustre3 VM down and leave it like that


> How did you shut it down? Outside cluster control, or with something
> like pcs resource disable?
>
> I did it outside of the cluster to simulate a failure. I turned off this
VM from vCenter. Cluster is unaware of anything behind OS.


> >   * FAKE3   (ocf::pacemaker:Dummy):  Stopped
> >   * FAKE4   (ocf::pacemaker:Dummy):  Started lustre4
> >   * Clone Set: ping-clone [ping]:
> > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> > lustre4 ] << lustre3 missing
> > OK for now
> > VM boots up. pcs status:
> >   * FAKE3   (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3
> > lustre4 ]  << what is it?
> >   * Clone Set: ping-clone [ping]:
> > * ping  (ocf::pacemaker:ping):   FAILED lustre3 (blocked)
> > << why not started?
> > * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2
> > lustre4 ]
> > I checked server processes manually and found that lustre4 runs
> > "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3
> > doesn't
> > All is according to documentation but results are strange.
> > Then I tried to add meta target-role="started" to pcs resource create
> > ping and this time ping started after node rebooted. Can I expect
> > that it was just missing from official setup documentation, and now
> > everything will work fine?
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/