from:"Andrei Borzenkov"

Re: [ClusterLabs] Remote nodes in an opt-in cluster

2024-04-23 Thread Andrei Borzenkov

On 23.04.2024 19:40, Jochen wrote:

On 23. Apr 2024, at 17:41, Andrei Borzenkov wrote:

On 23.04.2024 10:02, Jochen wrote:

When trying to add a remote node to an opt-in cluster, the cluster does not
start the remote resource. When I change the cluster to opt-out the remote
resource is started.

It's not clear what do you mean. Is "remote resource" the resource used to integrate the
remote node (i.e. ocf:pacemaker:remote) or is the "remote resource" a resource you want
to start on the remote node itself?

The "ocf:pacemaker:remote" resource to integrate the remote node.

I guess I have to add a location constraint to allow the cluster to schedule
the resource. Is that correct?
And if yes, how do I create a location constraint to allow the cluster to start the remote resource anywhere on the cluster?

Quoting documentation:

If most of your resources can run on most of the nodes, then an opt-out
arrangement is likely to result in a simpler configuration.

If your resource can run anywhere, what exactly is the point of opt-in?

Since I don't want to name each node in the constraint, I looked for a rule
that always is true, or an attribute that is defined by default, but did not
find one. I then tried

crm configure location skylla-location skylla rule
skylla-location-rule: defined '#uname'
But this did not work either. Any help would be greatly appreciated.
Regards
Jochen
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Remote nodes in an opt-in cluster

2024-04-23 Thread Andrei Borzenkov


On 23.04.2024 10:02, Jochen wrote:

When trying to add a remote node to an opt-in cluster, the cluster does not 
start the remote resource. When I change the cluster to opt-out the remote 
resource is started.



It's not clear what do you mean. Is "remote resource" the resource used 
to integrate the remote node (i.e. ocf:pacemaker:remote) or is the 
"remote resource" a resource you want to start on the remote node itself?



I guess I have to add a location constraint to allow the cluster to schedule 
the resource. Is that correct?

And if yes, how do I create a location constraint to allow the cluster to start 
the remote resource anywhere on the cluster? Since I don't want to name each 
node in the constraint, I looked for a rule that always is true, or an 
attribute that is defined by default, but did not find one. I then tried

crm configure location skylla-location skylla rule 
skylla-location-rule: defined '#uname'

But this did not work either. Any help would be greatly appreciated.

Regards
Jochen



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] colocation constraint - do I get it all wrong?

2024-02-05 Thread Andrei Borzenkov

On Mon, Feb 5, 2024 at 12:44 PM lejeczek via Users
 wrote:
>
>
>
> On 01/01/2024 18:28, Ken Gaillot wrote:
> > On Fri, 2023-12-22 at 17:02 +0100, lejeczek via Users wrote:
> >> hi guys.
> >>
> >> I have a colocation constraint:
> >>
> >> -> $ pcs constraint ref DHCPD
> >> Resource: DHCPD
> >>colocation-DHCPD-GATEWAY-NM-link-INFINITY
> >>
> >> and the trouble is... I thought DHCPD is to follow GATEWAY-NM-link,
> >> always!
> >> If that is true that I see very strange behavior, namely.
> >> When there is an issue with DHCPD resource, cannot be started, then
> >> GATEWAY-NM-link gets tossed around by the cluster.
> >>
> >> Is that normal & expected - is my understanding of _colocation_
> >> completely wrong - or my cluster is indeed "broken"?
> >> many thanks, L.
> >>
> > Pacemaker considers the preferences of colocated resources when
> > assigning a resource to a node, to ensure that as many resources as
> > possible can run. So if a colocated resource becomes unable to run on a
> > node, the primary resource might move to allow the colocated resource
> > to run.
> So what is the way to "fix" this - is it simply low/er score
> for such constraint?
> In my case _dhcpd_ is important but if fails sometimes as
> it's often tampered with, so... make _dhcpd_ flow
> gateway_link but just fail _dhcp_ (it it keeps failing) and
> leave _gateway_link_ alone if/where it's good.
> Or perhaps there a global config/param for whole cluster
> behaviour?
>

In the current pacemaker (since 2.1.0) you can set "influence"
colocation attribute to avoid moving already started resource:

However, if influence is set to false in the colocation constraint,
this will happen only if B is inactive and needing to be started. If B
is already active, A’s preferences will have no effect on placing B.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov


On 19.12.2023 21:42, Artem wrote:

Andrei and Klaus thanks for prompt reply and clarification!
As I understand, design and behavior of Pacemaker is tightly coupled with
the stonith concept. But isn't it too rigid?



If you insist on shooting yourself in the foot, pacemaker gives you the 
gun. It just does not load it by default and does not shoot itself.


Seriously, this topic has been beaten to death. Just do some research.

You can avoid fencing and rely on quorum in shared-nothing case. The 
prime example that I have seen is NetApp C-Mode ONTAP where the set of 
management processes go read-only preventing any modification when 
node(s) go(es) out of quorum. But as soon as you have shared resource, 
ignoring fencing will lead to data corruption sooner or later.



Is there a way to leverage self-monitoring or pingd rules to trigger
isolated node to umount its FS? Like vSphere High Availability host
isolation response.
Can resource-stickiness=off (auto-failback) decrease risk of corruption by
unresponsive node coming back online?
Is there a quorum feature not for cluster but for resource start/stop? Got
lock - is welcome to mount, unable to refresh lease - force unmount.
Can on-fail=ignore break manual failover logic (stopped will be considered
as failed and thus ignored)?

best regards,
Artem

On Tue, 19 Dec 2023 at 17:03, Klaus Wenninger  wrote:




On Tue, Dec 19, 2023 at 10:00 AM Andrei Borzenkov 
wrote:


On Tue, Dec 19, 2023 at 10:41 AM Artem  wrote:
...

Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]

(update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is
unrunnable (node is offline)

Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]

(recurring_op_for_active)info: Start 20s-interval monitor for OST4 on
lustre3

Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]

(log_list_item)  notice: Actions: Stop   OST4( lustre4
)  blocked

This is the default for the failed stop operation. The only way
pacemaker can resolve failure to stop a resource is to fence the node
where this resource was active. If it is not possible (and IIRC you
refuse to use stonith), pacemaker has no other choice as to block it.
If you insist, you can of course sert on-fail=ignore, but this means
unreachable node will continue to run resources. Whether it can lead
to some corruption in your case I cannot guess.



Don't know if I'm reading that correctly but I understand what you had
written
above that you try to trigger the failover by stopping the VM (lustre4)
without
ordered shutdown.
With fencing disabled what we are seeing is exactly what we would expect:
The state of the resource is unknown - pacemaker tries to stop it -
doesn't work
as the node is offline - no fencing configured - so everything it can do
is wait
till there is info if the resource is up or not.
I guess the strange output below is because of fencing disabled - quite an
unusual - also not recommended - configuration and so this might not have
shown up too often in that way.

Klaus




Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107]

(pcmk__create_graph) crit: Cannot fence lustre4 because of OST4:
blocked (OST4_stop_0)

That is a rather strange phrase. The resource is blocked because the
pacemaker could not fence the node, not the other way round.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] cluster doesn't do HA as expected, pingd doesn't help

2023-12-19 Thread Andrei Borzenkov

On Tue, Dec 19, 2023 at 10:41 AM Artem  wrote:
...
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (update_resource_action_runnable)warning: OST4_stop_0 on lustre4 is 
> unrunnable (node is offline)
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (recurring_op_for_active)info: Start 20s-interval monitor for OST4 on 
> lustre3
> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (log_list_item)  notice: Actions: Stop   OST4( lustre4 )  
> blocked

This is the default for the failed stop operation. The only way
pacemaker can resolve failure to stop a resource is to fence the node
where this resource was active. If it is not possible (and IIRC you
refuse to use stonith), pacemaker has no other choice as to block it.
If you insist, you can of course sert on-fail=ignore, but this means
unreachable node will continue to run resources. Whether it can lead
to some corruption in your case I cannot guess.

> Dec 19 09:48:13 lustre-mds2.ntslab.ru pacemaker-schedulerd[785107] 
> (pcmk__create_graph) crit: Cannot fence lustre4 because of OST4: 
> blocked (OST4_stop_0)

That is a rather strange phrase. The resource is blocked because the
pacemaker could not fence the node, not the other way round.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource-agents and VMs

2023-12-15 Thread Andrei Borzenkov

On Fri, Dec 15, 2023 at 2:23 PM lejeczek via Users
 wrote:
>
> Hi guys.
>
> my resources-agents depend like so:
>
> resource-agents-deps.target
> ○ ├─00\\x2dVMsy.mount
> ● └─virt-guest-shutdown.target
>

If this is output of "systemctl list-depenedncies" - it has a lot of
flags that completely reverse the meaning of the output.

> when I reboot a node VMs seems to migrated off it live a ok, but..
> when node comes back on after a reboot, VMs fail to migrate back to it, live.
> I see on such node
>
> -> $ journalctl -lf -o cat -u virtqemud.service
> Starting Virtualization qemu daemon...
> Started Virtualization qemu daemon.
> libvirt version: 9.5.0, package: 6.el9 (buil...@centos.org, 
> 2023-08-25-08:53:56, )
> hostname: dzien.mine.priv
> Path '/00-VMsy/enc.podnode3.qcow2' is not accessible: No such file or 
> directory
>
> and I wonder if it's indeed the fact the the _path_ is absent at the moment 
> cluster just after node start, tries to migrate VM resource...

Isn't the message pretty obvious?

> Is it possible to somehow - seemingly my _resource-agents-deps.target_ does 
> not do - assure that cluster, perhaps on per-resource basis, will wait/check 
> a path first?
> BTW, that paths is available and is made sure is available to the system, 
> it's a glusterfs mount.
>

It is available when you check it. It does not mean it was also
available when something tried to access it before.

The above output shows your mount unit as inactive. It is not enough
to order some unit after another unit - something also has to trigger
activation of this another unit.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov

On Tue, Dec 12, 2023 at 4:47 PM Artem  wrote:
>
>
>
> On Tue, 12 Dec 2023 at 16:17, Andrei Borzenkov  wrote:
>>
>> On Fri, Dec 8, 2023 at 5:44 PM Artem  wrote:
>> > pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd
>> > pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd
>> > pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd
>> > pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd
>> >
>>
>> These rules are contradicting. You set the score to 125 if pingd is
>> defined and at the same time set it to 0 if the score is less than 1.
>> To be "less than 1" it must be defined to start with so both rules
>> will always apply. I do not know how the rules are ordered. Either you
>> get random behavior, or one pair of these rules is effectively
>> ignored.
>
>
> "pingd lt 1 or not_defined pingd" means to me ==0 or not_defined, that is 
> ping fails to ping GW or fails to report to corosync/pacemaker. Am I wrong?

That is correct (although I'd reverse conditions out of habit. It
meaningless to check for "less than 1" something that is not defined)

> "pingd gt 0 or defined pingd" means to me that ping gets reply from GW and 
> reports it to cluster.

No. As you were already told this is true if pingd is defined. Value
does not matter.

> Are they really contradicting?

Yes. pingd == 0 will satisfy both rules. My use of "always" was
incorrect, it does not happen for all possible values of pingd, but it
does happen for some.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource fails manual failover

2023-12-12 Thread Andrei Borzenkov

On Tue, Dec 12, 2023 at 4:50 PM Artem  wrote:
>
> Is there a detailed explanation for resource monitor and start timeouts and 
> intervals with examples, for dummies?
>
> my resource configured s follows:
> [root@lustre-mds1 ~]# pcs resource show MDT00
> Warning: This command is deprecated and will be removed. Please use 'pcs 
> resource config' instead.
> Resource: MDT00 (class=ocf provider=heartbeat type=Filesystem)
>   Attributes: MDT00-instance_attributes
> device=/dev/mapper/mds00
> directory=/lustre/mds00
> force_unmount=safe
> fstype=lustre
>   Operations:
> monitor: MDT00-monitor-interval-20s
>   interval=20s
>   timeout=40s
> start: MDT00-start-interval-0s
>   interval=0s
>   timeout=60s
> stop: MDT00-stop-interval-0s
>   interval=0s
>   timeout=60s
>
> I issued manual failover with the following commands:
> crm_resource --move -r MDT00 -H lustre-mds1
>
> resource tried but returned back with the entries in pacemaker.log like these:
> Dec 12 15:53:23  Filesystem(MDT00)[1886100]:INFO: Running start for 
> /dev/mapper/mds00 on /lustre/mds00
> Dec 12 15:53:45  Filesystem(MDT00)[1886100]:ERROR: Couldn't mount device 
> [/dev/mapper/mds00] as /lustre/mds00
>
> tried again with the same result:
> Dec 12 16:11:04  Filesystem(MDT00)[1891333]:INFO: Running start for 
> /dev/mapper/mds00 on /lustre/mds00
> Dec 12 16:11:26  Filesystem(MDT00)[1891333]:ERROR: Couldn't mount device 
> [/dev/mapper/mds00] as /lustre/mds00
>
> Why it cannot move?
>

Because it failed to start this resource on the node selected to run
this resource. Maybe the device is missing, maybe the mount point is
missing, maybe something else.

> Does this 20 sec interval (between start and error) have anything to do with 
> monitor interval settings?
>
> [root@lustre-mgs ~]# pcs constraint show --full
> Location Constraints:
>   Resource: MDT00
> Enabled on:
>   Node: lustre-mds1 (score:100) (id:location-MDT00-lustre-mds1-100)
>   Node: lustre-mds2 (score:100) (id:location-MDT00-lustre-mds2-100)
> Disabled on:
>   Node: lustre-mgs (score:-INFINITY) 
> (id:location-MDT00-lustre-mgs--INFINITY)
>   Node: lustre1 (score:-INFINITY) (id:location-MDT00-lustre1--INFINITY)
>   Node: lustre2 (score:-INFINITY) (id:location-MDT00-lustre2--INFINITY)
>   Node: lustre3 (score:-INFINITY) (id:location-MDT00-lustre3--INFINITY)
>   Node: lustre4 (score:-INFINITY) (id:location-MDT00-lustre4--INFINITY)
> Ordering Constraints:
>   start MGT then start MDT00 (kind:Optional) (id:order-MGT-MDT00-Optional)
>   start MDT00 then start OST1 (kind:Optional) (id:order-MDT00-OST1-Optional)
>   start MDT00 then start OST2 (kind:Optional) (id:order-MDT00-OST2-Optional)
>
> with regards to ordering constraint: OST1 and OST2 are started now, while I'm 
> exercising MDT00 failover.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] ocf:pacemaker:ping works strange

2023-12-12 Thread Andrei Borzenkov

On Fri, Dec 8, 2023 at 5:44 PM Artem  wrote:
>
> Hello experts.
>
> I use pacemaker for a Lustre cluster. But for simplicity and exploration I 
> use a Dummy resource. I didn't like how resource performed failover and 
> failback. When I shut down VM with remote agent, pacemaker tries to restart 
> it. According to pcs status it marks the resource (not RA) Online for some 
> time while VM stays down.
>
> OK, I wanted to improve its behavior and set up a ping monitor. I tuned the 
> scores like this:
> pcs resource create FAKE3 ocf:pacemaker:Dummy
> pcs resource create FAKE4 ocf:pacemaker:Dummy
> pcs constraint location FAKE3 prefers lustre3=100
> pcs constraint location FAKE3 prefers lustre4=90
> pcs constraint location FAKE4 prefers lustre3=90
> pcs constraint location FAKE4 prefers lustre4=100
> pcs resource defaults update resource-stickiness=110
> pcs resource create ping ocf:pacemaker:ping dampen=5s host_list=local op 
> monitor interval=3s timeout=7s clone meta target-role="started"
> for i in lustre{1..4}; do pcs constraint location ping-clone prefers $i; done
> pcs constraint location FAKE3 rule score=0 pingd lt 1 or not_defined pingd
> pcs constraint location FAKE4 rule score=0 pingd lt 1 or not_defined pingd
> pcs constraint location FAKE3 rule score=125 pingd gt 0 or defined pingd
> pcs constraint location FAKE4 rule score=125 pingd gt 0 or defined pingd
>

These rules are contradicting. You set the score to 125 if pingd is
defined and at the same time set it to 0 if the score is less than 1.
To be "less than 1" it must be defined to start with so both rules
will always apply. I do not know how the rules are ordered. Either you
get random behavior, or one pair of these rules is effectively
ignored.

>
> Question #1) Why I cannot see accumulated score from pingd in crm_simulate 
> output? Only location score and stickiness.
> pcmk__primitive_assign: FAKE3 allocation score on lustre3: 210
> pcmk__primitive_assign: FAKE3 allocation score on lustre4: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre3: 90
> pcmk__primitive_assign: FAKE4 allocation score on lustre4: 210
> Either when all is OK or when VM is down - score from pingd not added to 
> total score of RA
>
>
> Question #2) I shut lustre3 VM down and leave it like that. pcs status:
>   * FAKE3   (ocf::pacemaker:Dummy):  Stopped
>   * FAKE4   (ocf::pacemaker:Dummy):  Started lustre4
>   * Clone Set: ping-clone [ping]:
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ] 
> << lustre3 missing
> OK for now
> VM boots up. pcs status:
>   * FAKE3   (ocf::pacemaker:Dummy):  FAILED (blocked) [ lustre3 lustre4 ] 
>  << what is it?
>   * Clone Set: ping-clone [ping]:
> * ping  (ocf::pacemaker:ping):   FAILED lustre3 (blocked)<< why 
> not started?
> * Started: [ lustre-mds1 lustre-mds2 lustre-mgs lustre1 lustre2 lustre4 ]

If this is full pcs status output, I miss stonith resource.

> I checked server processes manually and found that lustre4 runs 
> "/usr/lib/ocf/resource.d/pacemaker/ping monitor" while lustre3 doesn't
> All is according to documentation but results are strange.
> Then I tried to add meta target-role="started" to pcs resource create ping 
> and this time ping started after node rebooted. Can I expect that it was just 
> missing from official setup documentation, and now everything will work fine?
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] make promoted follow promoted resource ?

2023-11-26 Thread Andrei Borzenkov


On 26.11.2023 12:32, lejeczek via Users wrote:

Hi guys.

With these:

-> $ pcs resource status REDIS-6381-clone
    * Clone Set: REDIS-6381-clone [REDIS-6381] (promotable):
      * Promoted: [ ubusrv2 ]
      * Unpromoted: [ ubusrv1 ubusrv3 ]

-> $ pcs resource status PGSQL-PAF-5433-clone
    * Clone Set: PGSQL-PAF-5433-clone [PGSQL-PAF-5433]
(promotable):
      * Promoted: [ ubusrv1 ]
      * Unpromoted: [ ubusrv2 ubusrv3 ]

-> $ pcs constraint ref REDIS-6381-clone
Resource: REDIS-6381-clone
    colocation-REDIS-6381-clone-PGSQL-PAF-5433-clone-INFINITY

basically promoted Redis should follow promoted pgSQL but
it's not happening, usually it does.
I presume pcs/cluster does something internally which
results in disobeying/ignoring that _colocation_ constraint
for these resources.
I presume scoring might play a role:
    REDIS-6385-clone with PGSQL-PAF-5435-clone (score:1001)
(rsc-role:Master) (with-rsc-role:Master)
but usually, that scoring works, only "now" it does not.
Any comments I appreciate much.
thanks, L.

I looked at pamaker log - snippet below after
REDIS-6381-clone re-enabled - but cannot see explanation for
this.
...
   notice: Calculated transition 110, saving inputs in
/var/lib/pacemaker/pengine/pe-input-3729.bz2
   notice: Transition 110 (Complete=0, Pending=0, Fired=0,
Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-input-3729.bz2): Complete
   notice: State transition S_TRANSITION_ENGINE -> S_IDLE
   notice: State transition S_IDLE -> S_POLICY_ENGINE
   notice: Actions: Start  REDIS-6381:0
(    ubusrv2 )
   notice: Actions: Start  REDIS-6381:1
(    ubusrv3 )
   notice: Actions: Start  REDIS-6381:2
(    ubusrv1 )
   notice: Calculated transition 111, saving inputs in
/var/lib/pacemaker/pengine/pe-input-3730.bz2
   notice: Initiating start operation REDIS-6381_start_0
locally on ubusrv2
   notice: Requesting local execution of start operation for
REDIS-6381 on ubusrv2
(to redis) root on none
pam_unix(su:session): session opened for user redis(uid=127)
by (uid=0)
pam_sss(su:session): Request to sssd failed. Connection refused
pam_unix(su:session): session closed for user redis
pam_sss(su:session): Request to sssd failed. Connection refused
   notice: Setting master-REDIS-6381[ubusrv2]: (unset) -> 1000


This is the only line that sets master score, so apparently ubusrv2 is 
the only node where your clone *can* be promoted. Whether pacemaker is 
expected to fail this operation because it violates constraint I do not 
know.



   notice: Transition 111 aborted by
status-2-master-REDIS-6381 doing create
master-REDIS-6381=1000: Transient attribute change
INFO: demote: Setting master to 'no-such-master'
   notice: Result of start operation for REDIS-6381 on
ubusrv2: ok
   notice: Transition 111 (Complete=4, Pending=0, Fired=0,
Skipped=1, Incomplete=14,
Source=/var/lib/pacemaker/pengine/pe-input-3730.bz2): Stopped
   notice: Actions: Promote    REDIS-6381:0 (
Unpromoted -> Promoted ubusrv2 )
   notice: Actions: Start  REDIS-6381:1
(    ubusrv1 )
   notice: Actions: Start  REDIS-6381:2
(    ubusrv3 )
   notice: Calculated transition 112, saving inputs in
/var/lib/pacemaker/pengine/pe-input-3731.bz2
   notice: Initiating notify operation
REDIS-6381_pre_notify_start_0 locally on ubusrv2
   notice: Requesting local execution of notify operation for
REDIS-6381 on ubusrv2
   notice: Result of notify operation for REDIS-6381 on
ubusrv2: ok
   notice: Initiating start operation REDIS-6381_start_0 on
ubusrv1
   notice: Initiating start operation REDIS-6381:2_start_0 on
ubusrv3
   notice: Initiating notify operation
REDIS-6381_post_notify_start_0 locally on ubusrv2
   notice: Requesting local execution of notify operation for
REDIS-6381 on ubusrv2
   notice: Initiating notify operation
REDIS-6381_post_notify_start_0 on ubusrv1
   notice: Initiating notify operation
REDIS-6381:2_post_notify_start_0 on ubusrv3
   notice: Result of notify operation for REDIS-6381 on
ubusrv2: ok
   notice: Initiating notify operation
REDIS-6381_pre_notify_promote_0 locally on ubusrv2
   notice: Requesting local execution of notify operation for
REDIS-6381 on ubusrv2
   notice: Initiating notify operation
REDIS-6381_pre_notify_promote_0 on ubusrv1
   notice: Initiating notify operation
REDIS-6381:2_pre_notify_promote_0 on ubusrv3
   notice: Result of notify operation for REDIS-6381 on
ubusrv2: ok
   notice: Initiating promote operation REDIS-6381_promote_0
locally on ubusrv2
   notice: Requesting local execution of promote operation
for REDIS-6381 on ubusrv2
   notice: Result of promote operation for REDIS-6381 on
ubusrv2: ok
   notice: Initiating notify operation
REDIS-6381_post_notify_promote_0 locally on ubusrv2
   notice: Requesting local execution of notify operation for
REDIS-6381 on ubusrv2
   notice: Initiating notify operation
REDIS-6381_post_notify_promote_0 on ubusrv1

Re: [ClusterLabs] Using cluster without fencing

2023-10-16 Thread Andrei Borzenkov

On Mon, Oct 16, 2023 at 9:28 AM Sergey Cherukhin
 wrote:
>
> Hello!
>
> I use Postgresql+Pacemaker+Corosync 3 nodes cluster with 2 Postgresql 
> instances in synchronous replication mode on two high performance nodes and 
> Pacemaker+Corosync on the third low performance node for quorum only. At the 
> same time a SCADA HMI software is running on the high performance nodes. This 
> SCADA  software uses its own redundancy technology.
>
> In this case I can't use fencing as usual to power off or reboot a failed 
> node, because the operator will be very surprised when his workstation will 
> be shutted down due to database failure.
>

You can use the third node as a quorum device instead of the full
member, it will never be fenced.

> What type of fencing should I use in this case?
>

Whatever is technically feasible. Your nodes may have BMC with IPMI.
Another possibility is iSCSI target on the third node and SBD. If you
are using HPC, you may have shared storage already.

> On the other hand,  Postgresql instances don't use any shared resources. Is 
> it possible to use cluster without fencing in this case?
>

This is a common misconception. Your replicated database *is* the
shared resource. Ask yourself - what happens if both instances decide
they are masters and start serving different clients? If you really do
not care, you do not need any failover cluster in the first place.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Mutually exclusive resources ?

2023-09-27 Thread Andrei Borzenkov

On Wed, Sep 27, 2023 at 3:21 PM Adam Cecile  wrote:
>
> Hello,
>
>
> I'm struggling to understand if it's possible to create some kind of 
> constraint to avoid two different resources to be running on the same host.
>
> Basically, I'd like to have floating IP "1" and floating IP "2" always being 
> assigned to DIFFERENT nodes.
>
> Is that something possible ?

Sure, negative colocation constraint.

> Can you give me a hint ?
>

Using crmsh:

colcoation IP1-no-with-IP2 -inf: IP1 IP2

>
> Thanks in advance, Adam.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] PAF / PGSQLMS on Ubuntu

2023-09-07 Thread Andrei Borzenkov

On Thu, Sep 7, 2023 at 5:01 PM lejeczek via Users  wrote:
>
> Hi guys.
>
> I'm trying to set ocf_heartbeat_pgsqlms agent but I get:
> ...
> Failed Resource Actions:
>   * PGSQL-PAF-5433 stop on ubusrv3 returned 'invalid parameter' because 
> 'Parameter "recovery_target_timeline" MUST be set to 'latest'. It is 
> currently set to ''' at Thu Sep  7 13:58:06 2023 after 54ms
>
> I'm new to Ubuntu and I see that Ubuntu has a bit different approach to paths 
> (in comparison to how Centos do it).
> I see separation between config & data, eg.
>
> 14  paf 5433 down   postgres /var/lib/postgresql/14/paf 
> /var/log/postgresql/postgresql-14-paf.log
>
> I create the resource like here:
>
> -> $ pcs resource create PGSQL-PAF-5433 ocf:heartbeat:pgsqlms pgport=5433 
> bindir=/usr/bin pgdata=/etc/postgresql/14/paf 
> datadir=/var/lib/postgresql/14/paf meta failure-timeout=30s master-max=1 op 
> start timeout=60s op stop timeout=60s op promote timeout=30s op demote 
> timeout=120s op monitor interval=15s timeout=10s role="Promoted" op monitor 
> interval=16s timeout=10s role="Unpromoted" op notify timeout=60s promotable 
> notify=true failure-timeout=30s master-max=1 --disable
>
> Ubuntu 22.04.3 LTS
> What am I missing can you tell?

Exactly what the message tells you. You need to set recovery_target=latest.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov

On Mon, Sep 4, 2023 at 4:44 PM David Dolan  wrote:
>
> Thanks Klaus\Andrei,
>
> So if I understand correctly what I'm trying probably shouldn't work.

It is impossible to configure corosync (or any other cluster system
for that matter) to keep the *arbitrary* last node quorate. It is
possible to designate one node as "preferred" and to keep it quorate.
Returning to your example:

> I tried adding this line to corosync.conf and I could then bring down the 
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last, the 
> cluster failed
> auto_tie_breaker_node: 1  3
>

Correct. In your scenario the tie breaker is only relevant with two
nodes. When the first node is down, the remaining two nodes select the
tiebreaker. It can only be node 1 or 3.

> This line had the same outcome as using 1 3
> auto_tie_breaker_node: 1  2 3

If it really has the same outcome (i.e. cluster fails when node 2 is
left) it is a bug. This line makes nodes 1 or 2 a possible tiebreaker.
So the cluster must fail if node 3 is left, not node 2.

What most certainly *is* possible - no-quorum-policy=ignore + reliable
fencing. This worked just fine in two node clusters without two_node.
It does not make the last node quorate, but it allows pacemaker to
continue providing services on this node *and* taking over services
from other nodes if they were fenced successfully.

> And I should attempt setting auto_tie_breaker in corosync and remove 
> last_man_standing.
> Then, I should set up another server with qdevice and configure that using 
> the LMS algorithm.
>
> Thanks
> David
>
> On Mon, 4 Sept 2023 at 13:32, Klaus Wenninger  wrote:
>>
>>
>>
>> On Mon, Sep 4, 2023 at 1:50 PM Andrei Borzenkov  wrote:
>>>
>>> On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger  wrote:
>>> >
>>> >
>>> >
>>> > On Mon, Sep 4, 2023 at 12:45 PM David Dolan  wrote:
>>> >>
>>> >> Hi Klaus,
>>> >>
>>> >> With default quorum options I've performed the following on my 3 node 
>>> >> cluster
>>> >>
>>> >> Bring down cluster services on one node - the running services migrate 
>>> >> to another node
>>> >> Wait 3 minutes
>>> >> Bring down cluster services on one of the two remaining nodes - the 
>>> >> surviving node in the cluster is then fenced
>>> >>
>>> >> Instead of the surviving node being fenced, I hoped that the services 
>>> >> would migrate and run on that remaining node.
>>> >>
>>> >> Just looking for confirmation that my understanding is ok and if I'm 
>>> >> missing something?
>>> >
>>> >
>>> > As said I've never used it ...
>>> > Well when down to 2 nodes LMS per definition is getting into trouble as 
>>> > after another
>>> > outage any of them is gonna be alone. In case of an ordered shutdown this 
>>> > could
>>> > possibly be circumvented though. So I guess your fist attempt to enable 
>>> > auto-tie-breaker
>>> > was the right idea. Like this you will have further service at least on 
>>> > one of the nodes.
>>> > So I guess what you were seeing is the right - and unfortunately only 
>>> > possible - behavior.
>>>
>>> I still do not see where fencing comes from. Pacemaker requests
>>> fencing of the missing nodes. It also may request self-fencing, but
>>> not in the default settings. It is rather hard to tell what happens
>>> without logs from the last remaining node.
>>>
>>> That said, the default action is to stop all resources, so the end
>>> result is not very different :)
>>
>>
>> But you are of course right. The expected behaviour would be that
>> the leftover node stops the resources.
>> But maybe we're missing something here. Hard to tell without
>> the exact configuration including fencing.
>> Again, as already said, I don't know anything about the LMS
>> implementation with corosync. In theory there were both arguments
>> to either suicide (but that would have to be done by pacemaker) or
>> to automatically switch to some 2-node-mode once the remaining
>> partition is reduced to just 2 followed by a fence-race (when done
>> without the precautions otherwise used for 2-node-clusters).
>> But I guess in this case it is none of those 2.
>>
>> Klaus
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov

On Mon, Sep 4, 2023 at 2:18 PM Klaus Wenninger  wrote:
>
>
>
> On Mon, Sep 4, 2023 at 12:45 PM David Dolan  wrote:
>>
>> Hi Klaus,
>>
>> With default quorum options I've performed the following on my 3 node cluster
>>
>> Bring down cluster services on one node - the running services migrate to 
>> another node
>> Wait 3 minutes
>> Bring down cluster services on one of the two remaining nodes - the 
>> surviving node in the cluster is then fenced
>>
>> Instead of the surviving node being fenced, I hoped that the services would 
>> migrate and run on that remaining node.
>>
>> Just looking for confirmation that my understanding is ok and if I'm missing 
>> something?
>
>
> As said I've never used it ...
> Well when down to 2 nodes LMS per definition is getting into trouble as after 
> another
> outage any of them is gonna be alone. In case of an ordered shutdown this 
> could
> possibly be circumvented though. So I guess your fist attempt to enable 
> auto-tie-breaker
> was the right idea. Like this you will have further service at least on one 
> of the nodes.
> So I guess what you were seeing is the right - and unfortunately only 
> possible - behavior.

I still do not see where fencing comes from. Pacemaker requests
fencing of the missing nodes. It also may request self-fencing, but
not in the default settings. It is rather hard to tell what happens
without logs from the last remaining node.

That said, the default action is to stop all resources, so the end
result is not very different :)
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov

On Mon, Sep 4, 2023 at 2:25 PM Klaus Wenninger  wrote:
>
>
> Or go for qdevice with LMS where I would expect it to be able to really go 
> down to
> a single node left - any of the 2 last ones - as there is still qdevice.#
> Sry for the confusion btw.
>

According to documentation, "LMS is also incompatible with quorum
devices, if last_man_standing is specified in corosync.conf then the
quorum device will be disabled".
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-09-04 Thread Andrei Borzenkov

On Mon, Sep 4, 2023 at 1:45 PM David Dolan  wrote:
>
> Hi Klaus,
>
> With default quorum options I've performed the following on my 3 node cluster
>
> Bring down cluster services on one node - the running services migrate to 
> another node
> Wait 3 minutes
> Bring down cluster services on one of the two remaining nodes - the surviving 
> node in the cluster is then fenced
>

Is it fenced or is it reset? It is not the same.

The default for no-quorum-policy is "stop". So you either have
"no-quorum-policy" set to "suicide", or node is reset by something
outside of pacemaker. This "something" may initiate fencing too.

> Instead of the surviving node being fenced, I hoped that the services would 
> migrate and run on that remaining node.
>
> Just looking for confirmation that my understanding is ok and if I'm missing 
> something?
>
> Thanks
> David
>
>
>
> On Thu, 31 Aug 2023 at 11:59, David Dolan  wrote:
>>
>> I just tried removing all the quorum options setting back to defaults so no 
>> last_man_standing or wait_for_all.
>> I still see the same behaviour where the third node is fenced if I bring 
>> down services on two nodes.
>> Thanks
>> David
>>
>> On Thu, 31 Aug 2023 at 11:44, Klaus Wenninger  wrote:
>>>
>>>
>>>
>>> On Thu, Aug 31, 2023 at 12:28 PM David Dolan  wrote:



 On Wed, 30 Aug 2023 at 17:35, David Dolan  wrote:
>
>
>
>> > Hi All,
>> >
>> > I'm running Pacemaker on Centos7
>> > Name: pcs
>> > Version : 0.9.169
>> > Release : 3.el7.centos.3
>> > Architecture: x86_64
>> >
>> >
>> Besides the pcs-version versions of the other cluster-stack-components
>> could be interesting. (pacemaker, corosync)
>
>  rpm -qa | egrep "pacemaker|pcs|corosync|fence-agents"
> fence-agents-vmware-rest-4.2.1-41.el7_9.6.x86_64
> corosynclib-2.4.5-7.el7_9.2.x86_64
> pacemaker-cluster-libs-1.1.23-1.el7_9.1.x86_64
> fence-agents-common-4.2.1-41.el7_9.6.x86_64
> corosync-2.4.5-7.el7_9.2.x86_64
> pacemaker-cli-1.1.23-1.el7_9.1.x86_64
> pacemaker-1.1.23-1.el7_9.1.x86_64
> pcs-0.9.169-3.el7.centos.3.x86_64
> pacemaker-libs-1.1.23-1.el7_9.1.x86_64
>>
>>
>>
>> > I'm performing some cluster failover tests in a 3 node cluster. We 
>> > have 3
>> > resources in the cluster.
>> > I was trying to see if I could get it working if 2 nodes fail at 
>> > different
>> > times. I'd like the 3 resources to then run on one node.
>> >
>> > The quorum options I've configured are as follows
>> > [root@node1 ~]# pcs quorum config
>> > Options:
>> >   auto_tie_breaker: 1
>> >   last_man_standing: 1
>> >   last_man_standing_window: 1
>> >   wait_for_all: 1
>> >
>> >
>> Not sure if the combination of auto_tie_breaker and last_man_standing 
>> makes
>> sense.
>> And as you have a cluster with an odd number of nodes auto_tie_breaker
>> should be
>> disabled anyway I guess.
>
> Ah ok I'll try removing auto_tie_breaker and leave last_man_standing
>>
>>
>>
>> > [root@node1 ~]# pcs quorum status
>> > Quorum information
>> > --
>> > Date: Wed Aug 30 11:20:04 2023
>> > Quorum provider:  corosync_votequorum
>> > Nodes:3
>> > Node ID:  1
>> > Ring ID:  1/1538
>> > Quorate:  Yes
>> >
>> > Votequorum information
>> > --
>> > Expected votes:   3
>> > Highest expected: 3
>> > Total votes:  3
>> > Quorum:   2
>> > Flags:Quorate WaitForAll LastManStanding AutoTieBreaker
>> >
>> > Membership information
>> > --
>> > Nodeid  VotesQdevice Name
>> >  1  1 NR node1 (local)
>> >  2  1 NR node2
>> >  3  1 NR node3
>> >
>> > If I stop the cluster services on node 2 and 3, the groups all 
>> > failover to
>> > node 1 since it is the node with the lowest ID
>> > But if I stop them on node1 and node 2 or node1 and node3, the cluster
>> > fails.
>> >
>> > I tried adding this line to corosync.conf and I could then bring down 
>> > the
>> > services on node 1 and 2 or node 2 and 3 but if I left node 2 until 
>> > last,
>> > the cluster failed
>> > auto_tie_breaker_node: 1  3
>> >
>> > This line had the same outcome as using 1 3
>> > auto_tie_breaker_node: 1  2 3
>> >
>> >
>> Giving multiple auto_tie_breaker-nodes doesn't make sense to me but 
>> rather
>> sounds dangerous if that configuration is possible at all.
>>
>> Maybe the misbehavior of last_man_standing is due to this (maybe not
>> recognized) misconfiguration.
>> Did you wait long enough between letting the 2 nodes fail?
>
> I've done it so many times so I

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov


On 30.08.2023 19:23, David Dolan wrote:


Use fencing. Quorum is not a replacement for fencing. With (reliable)
fencing you can simply run pacemaker with no-quorum-policy=ignore.

The practical problem is that usually the last resort that will work
in all cases is SBD + suicide and SBD cannot work without quorum.

Ah I forgot to mention I do have fencing setup, which connects to Vmware

Virtualcenter.
Do you think it's safe to set that no-quorum-policy=ignore?


fencing is always safe. fencing guarantees that when nodes take over 
resources of a missing node, the missing node is actually not running 
any of these resources. Yes, if fencing fails resource won't be taken 
over but usually it is better than possible corruption. Quorum is 
entirely orthogonal to that. If your two nodes lost connection to the 
third node, they will happily take over resources whether the third node 
already stopped them or not.


If you actually mean "is it guaranteed that the survived node will 
always be able to take over resources from other nodes" - no, it depends 
on network connectivity, if connection to VC is lost (or if anything bad 
happens during communication with VC, like somebody changed password you 
use) fencing will fail and resources won't be taken over.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] issue during Pacemaker failover testing

2023-08-30 Thread Andrei Borzenkov

On Wed, Aug 30, 2023 at 3:34 PM David Dolan  wrote:
>
> Hi All,
>
> I'm running Pacemaker on Centos7
> Name: pcs
> Version : 0.9.169
> Release : 3.el7.centos.3
> Architecture: x86_64
>
>
> I'm performing some cluster failover tests in a 3 node cluster. We have 3 
> resources in the cluster.
> I was trying to see if I could get it working if 2 nodes fail at different 
> times. I'd like the 3 resources to then run on one node.
>
> The quorum options I've configured are as follows
> [root@node1 ~]# pcs quorum config
> Options:
>   auto_tie_breaker: 1
>   last_man_standing: 1
>   last_man_standing_window: 1
>   wait_for_all: 1
>
> [root@node1 ~]# pcs quorum status
> Quorum information
> --
> Date: Wed Aug 30 11:20:04 2023
> Quorum provider:  corosync_votequorum
> Nodes:3
> Node ID:  1
> Ring ID:  1/1538
> Quorate:  Yes
>
> Votequorum information
> --
> Expected votes:   3
> Highest expected: 3
> Total votes:  3
> Quorum:   2
> Flags:Quorate WaitForAll LastManStanding AutoTieBreaker
>
> Membership information
> --
> Nodeid  VotesQdevice Name
>  1  1 NR node1 (local)
>  2  1 NR node2
>  3  1 NR node3
>
> If I stop the cluster services on node 2 and 3, the groups all failover to 
> node 1 since it is the node with the lowest ID
> But if I stop them on node1 and node 2 or node1 and node3, the cluster fails.
>
> I tried adding this line to corosync.conf and I could then bring down the 
> services on node 1 and 2 or node 2 and 3 but if I left node 2 until last, the 
> cluster failed
> auto_tie_breaker_node: 1  3
>
> This line had the same outcome as using 1 3
> auto_tie_breaker_node: 1  2 3
>
> So I'd like it to failover when any combination of two nodes fail but I've 
> only had success when the middle node isn't last.
>

Use fencing. Quorum is not a replacement for fencing. With (reliable)
fencing you can simply run pacemaker with no-quorum-policy=ignore.

The practical problem is that usually the last resort that will work
in all cases is SBD + suicide and SBD cannot work without quorum.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] pacemaker:start-delay

2023-08-18 Thread Andrei Borzenkov

On Fri, Aug 18, 2023 at 12:13 PM Mr.R via Users  wrote:
>
> Hi all，
>
> There is a problem with the start-delay of monitor during the process of
> configuring and starting resources.
>
> For example, there is the result of resource config.
>
> Resource: d1 (class=ocf provider=pacemaker type=Dummy)
>   Meta Attrs: target-role=Stopped
>   Operations: monitor interval=10s start-delay=100s timeout=20s 
> (d1-monitor-interval-10s)
> Resource: d2 (class=ocf provider=pacemaker type=Dummy)
>   Meta Attrs: target-role=Stopped
>   Operations: monitor interval=10s timeout=20s (d2-monitor-interval-10s)
>
> If resource d1 is started first and then resource d2 is started, resource d2 
> must
> wait for 100s to start. My understanding is  start-delay will not affect the 
> startup of resource d2.
> Is this phenomenon in line with the design expectations? Why design it this 
> way?
>

Each previous state transition must complete before the next state
transition can be initiated. I guess, it is the same as in

https://lists.clusterlabs.org/pipermail/users/2017-October/023060.html
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [EXT] Re: Fence Agents Format

2023-07-28 Thread Andrei Borzenkov


On 28.07.2023 09:46, Windl, Ulrich wrote:

Hi!

On " Manual fencing or meatware is when an administrator must manually power-cycle a 
machine (or unplug its storage cables) and follow up with the cluster, notifying the 
cluster that the machine has been fenced. This is never recommended.": Maybe also 
explain why: The cluster assumes that fencing is complete after some timeout.his


This is wrong. Do not generalize SBD or self suicide to general fencing.

Meatware was deprecated years ago to my best knowledge, now there is 
"stonith_admin -C" which does the same.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov


On 03.07.2023 19:39, Ken Gaillot wrote:

On Mon, 2023-07-03 at 19:22 +0300, Andrei Borzenkov wrote:

On 03.07.2023 18:07, Ken Gaillot wrote:

On Mon, 2023-07-03 at 12:20 +0200, lejeczek via Users wrote:

On 03/07/2023 11:16, Andrei Borzenkov wrote:

On 03.07.2023 12:05, lejeczek via Users wrote:

Hi guys.

I have pgsql with I constrain like so:

-> $ pcs constraint location PGSQL-clone rule role=Promoted
score=-1000 gateway-link ne 1

and I have a few more location constraints with that
ethmonitor & those work, but this one does not seem to.
When contraint is created cluster is silent, no errors nor
warning, but relocation does not take place.
I can move promoted resource manually just fine, to that
node where 'location' should move it.



Instance to promote is selected according to promotion
scores which are normally set by resource agent.
Documentation implies that standard location constraints
are also taken in account, but there is no explanation how
promotion scores interoperate with location scores. It is
possible that promotion score in this case takes precedence.

It seems to have kicked in with score=-1 but..
that was me just guessing.
Indeed it would be great to know how those are calculated,
in a way which would' be admin friendly or just obvious.

thanks, L.


It's a longstanding goal to have some sort of tool for explaining
how
scores interact in a given situation. However it's a challenging
problem and there's never enough time ...

Basically, all scores are added together for each node, and the
node
with the highest score runs the resource, subject to any placement
strategy configured. These mainly include stickiness, location
constraints, colocation constraints, and node health. Nodes may be


And you omitted the promotion scores which was the main question.


Oh right -- first, the above is used to determine the nodes on which
clone instances will be placed. After that, an appropriate number of
nodes are selected for the promoted role, based on promotion scores and
location and colocation constraints for the promoted role.



I am sorry but it does not really explain anything. Let's try concrete 
examples


a) master clone instance has location score -1000 for a node and 
promotion score 1000. Is this node eligible for promoting clone instance 
(assuming no other scores are present)?


b) promotion score is equal on two nodes A and B, but node A has better 
location score than node B. Is it guaranteed that clone will be promoted 
on A?



When colocations are considered, chained colocations are considered at
an attenuated score. If A is colocated with B, and B is colocated with
C, A's preferences are considered when assigning C to a node, but at
less than full strength. That's one of the reasons it gets complicated
to figure out a particular situation.


eliminated from consideration by resource migration thresholds,
standby/maintenance mode, etc.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov


On 03.07.2023 18:07, Ken Gaillot wrote:

On Mon, 2023-07-03 at 12:20 +0200, lejeczek via Users wrote:


On 03/07/2023 11:16, Andrei Borzenkov wrote:

On 03.07.2023 12:05, lejeczek via Users wrote:

Hi guys.

I have pgsql with I constrain like so:

-> $ pcs constraint location PGSQL-clone rule role=Promoted
score=-1000 gateway-link ne 1

and I have a few more location constraints with that
ethmonitor & those work, but this one does not seem to.
When contraint is created cluster is silent, no errors nor
warning, but relocation does not take place.
I can move promoted resource manually just fine, to that
node where 'location' should move it.



Instance to promote is selected according to promotion
scores which are normally set by resource agent.
Documentation implies that standard location constraints
are also taken in account, but there is no explanation how
promotion scores interoperate with location scores. It is
possible that promotion score in this case takes precedence.

It seems to have kicked in with score=-1 but..
that was me just guessing.
Indeed it would be great to know how those are calculated,
in a way which would' be admin friendly or just obvious.

thanks, L.


It's a longstanding goal to have some sort of tool for explaining how
scores interact in a given situation. However it's a challenging
problem and there's never enough time ...

Basically, all scores are added together for each node, and the node
with the highest score runs the resource, subject to any placement
strategy configured. These mainly include stickiness, location
constraints, colocation constraints, and node health. Nodes may be


And you omitted the promotion scores which was the main question.


eliminated from consideration by resource migration thresholds,
standby/maintenance mode, etc.


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] location constraint does not move promoted resource ?

2023-07-03 Thread Andrei Borzenkov


On 03.07.2023 12:05, lejeczek via Users wrote:

Hi guys.

I have pgsql with I constrain like so:

-> $ pcs constraint location PGSQL-clone rule role=Promoted
score=-1000 gateway-link ne 1

and I have a few more location constraints with that
ethmonitor & those work, but this one does not seem to.
When contraint is created cluster is silent, no errors nor
warning, but relocation does not take place.
I can move promoted resource manually just fine, to that
node where 'location' should move it.



Instance to promote is selected according to promotion scores which are 
normally set by resource agent. Documentation implies that standard 
location constraints are also taken in account, but there is no 
explanation how promotion scores interoperate with location scores. It 
is possible that promotion score in this case takes precedence.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] silence resource ? - PGSQL

2023-06-28 Thread Andrei Borzenkov


On 28.06.2023 14:11, lejeczek via Users wrote:

Hi guys.

Having 'pgsql' set up in what I'd say is a vanilla-default
confg, pacemaker's journal log is flooded with:
...
pam_unix(runuser:session): session closed for user postgres
pam_unix(runuser:session): session opened for user
postgres(uid=26) by (uid=0)
pam_unix(runuser:session): session closed for user postgres
pam_unix(runuser:session): session opened for user
postgres(uid=26) by (uid=0)
pam_unix(runuser:session): session closed for user postgres
pam_unix(runuser:session): session opened for user
postgres(uid=26) by (uid=0)
pam_unix(runuser:session): session closed for user postgres
...

Would you have a working fix or even a suggestion on how to
silence those?



Did you try "man pam_unix"? May be one of options does what you need?

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-27 Thread Andrei Borzenkov


On 27.06.2023 07:21, Priyanka Balotra wrote:

Hi Andrei,
After this state the system went through some more fencings and we saw the
following state:

:~ # crm status
Cluster Summary:
   * Stack: corosync
   * Current DC: FILE-2 (version
2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36) - partition
with quorum


It says "partition with quorum" so what exactly is the problem?


   * Last updated: Mon Jun 26 12:44:15 2023
   * Last change:  Mon Jun 26 12:41:12 2023 by root via cibadmin on FILE-2
   * 4 nodes configured
   * 11 resource instances configured

Node List:
   * Node FILE-1: UNCLEAN (offline)
   * Node FILE-4: UNCLEAN (offline)
   * Online: [ FILE-2 ]
   * Online: [ FILE-3 ]

At this stage FILE-1 and FILE-4 were continuously getting fenced (we have
device based stonith configured but the resource was not up ) .
Two nodes were online and two were offline. So quorum wasn't attained
again.
1)  For such a scenario we need help to be able to have one cluster live .
2)  And in cases where only one node of the cluster is up and others are
down we need the resources and cluster to be up .

Thanks
Priyanka

On Tue, Jun 27, 2023 at 12:25 AM Andrei Borzenkov 
wrote:


On 26.06.2023 21:14, Priyanka Balotra wrote:

Hi All,
We are seeing an issue where we replaced no-quorum-policy=ignore with

other

options in corosync.conf order to simulate the same behaviour :


* wait_for_all: 0*

*last_man_standing: 1last_man_standing_window: 2*

There was another property (auto-tie-breaker) tried but couldn't

configure

it as crm did not recognise this property.

But even after using these options, we are seeing that system is not
quorate if at least half of the nodes are not up.

Some properties from crm config are as follows:



*primitive stonith-sbd stonith:external/sbd \params
pcmk_delay_base=5s.*




















*.property cib-bootstrap-options: \have-watchdog=true \


dc-version="2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36"

\cluster-infrastructure=corosync \cluster-name=FILE \
stonith-enabled=true \stonith-timeout=172 \
stonith-action=reboot \stop-all-resources=false \
no-quorum-policy=ignorersc_defaults build-resource-defaults: \
resource-stickiness=1rsc_defaults rsc-options: \
resource-stickiness=100 \migration-threshold=3 \
failure-timeout=1m \cluster-recheck-interval=10minop_defaults
op-options: \timeout=600 \record-pending=true*

On a 4-node setup when the whole cluster is brought up together we see
error logs like:

*2023-06-26T11:35:17.231104+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Fencing and resource management disabled due to lack of quorum*

*2023-06-26T11:35:17.231338+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Ignoring malformed node_state entry without uname*

*2023-06-26T11:35:17.233771+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-2 is unclean!*

*2023-06-26T11:35:17.233857+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-3 is unclean!*

*2023-06-26T11:35:17.233957+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-4 is unclean!*



According to this output FILE-1 lost connection to three other nodes, in
which case it cannot be quorate.



Kindly help correct the configuration to make the system function

normally

with all resources up, even if there is just one node up.

Please let me know if any more info is needed.

Thanks
Priyanka


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/




___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] no-quorum-policy=ignore is (Deprecated ) and replaced with other options but not an effective solution

2023-06-26 Thread Andrei Borzenkov


On 26.06.2023 21:14, Priyanka Balotra wrote:

Hi All,
We are seeing an issue where we replaced no-quorum-policy=ignore with other
options in corosync.conf order to simulate the same behaviour :


* wait_for_all: 0*

*last_man_standing: 1last_man_standing_window: 2*

There was another property (auto-tie-breaker) tried but couldn't configure
it as crm did not recognise this property.

But even after using these options, we are seeing that system is not
quorate if at least half of the nodes are not up.

Some properties from crm config are as follows:



*primitive stonith-sbd stonith:external/sbd \params
pcmk_delay_base=5s.*




















*.property cib-bootstrap-options: \have-watchdog=true \
dc-version="2.1.2+20211124.ada5c3b36-150400.2.43-2.1.2+20211124.ada5c3b36"
\cluster-infrastructure=corosync \cluster-name=FILE \
   stonith-enabled=true \stonith-timeout=172 \
stonith-action=reboot \stop-all-resources=false \
no-quorum-policy=ignorersc_defaults build-resource-defaults: \
resource-stickiness=1rsc_defaults rsc-options: \
resource-stickiness=100 \migration-threshold=3 \
failure-timeout=1m \cluster-recheck-interval=10minop_defaults
op-options: \timeout=600 \record-pending=true*

On a 4-node setup when the whole cluster is brought up together we see
error logs like:

*2023-06-26T11:35:17.231104+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Fencing and resource management disabled due to lack of quorum*

*2023-06-26T11:35:17.231338+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Ignoring malformed node_state entry without uname*

*2023-06-26T11:35:17.233771+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-2 is unclean!*

*2023-06-26T11:35:17.233857+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-3 is unclean!*

*2023-06-26T11:35:17.233957+00:00 FILE-1 pacemaker-schedulerd[26359]:
warning: Node FILE-4 is unclean!*



According to this output FILE-1 lost connection to three other nodes, in 
which case it cannot be quorate.




Kindly help correct the configuration to make the system function normally
with all resources up, even if there is just one node up.

Please let me know if any more info is needed.

Thanks
Priyanka


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] host in standby causes havoc

2023-06-15 Thread Andrei Borzenkov


On 15.06.2023 13:58, Kadlecsik József wrote:

Hello,

We had a strange issue here: 7 node cluster, one node was put into standby
mode to test a new iscsi setting on it. During configuring the machine it
was rebooted and after the reboot the iscsi didn't come up. That caused a
malformed communication (atlas5 is the node in standby) with the cluster:

Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  warning: Unexpected
result (error) was recorded for probe of ocsi on atlas5 at Jun 15 10:09:32 2023


It sounds like resource agent problem. You need to investigate why probe 
returned an error.



Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  notice: If it is not
possible for ocsi to run on atlas5, see the resource-discovery option for
location constraints
Jun 15 10:10:13 atlas0 pacemaker-schedulerd[7153]:  error: Resource ocsi
is active on 2 nodes (attempting recovery)

The resource was definitely not active on 2 nodes. And that caused a storm
of killing all virtual machines as resources.

How could one prevent such cases to come up?



standby does not stop cluster from running, it simply tells pacemaker to 
exclude this node from possible candidates to run resources. To avoid 
any unwanted interaction (also due to possible resource agent or other 
software bugs) you could simply stop pacemaker and disable auto-startup.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] 99-VirtualDomain-libvirt.conf under control - ?

2023-05-05 Thread Andrei Borzenkov

On Fri, May 5, 2023 at 12:10 PM lejeczek via Users
 wrote:
>
>
>
> On 05/05/2023 10:08, Andrei Borzenkov wrote:
> > On Fri, May 5, 2023 at 11:03 AM lejeczek via Users
> >  wrote:
> >>
> >>
> >> On 29/04/2023 21:02, Reid Wahl wrote:
> >>> On Sat, Apr 29, 2023 at 3:34 AM lejeczek via Users
> >>>  wrote:
> >>>> Hi guys.
> >>>>
> >>>> I presume these are a consequence of having resource of VirtuaDomain 
> >>>> type set up(& enabled) - but where, how cab users control presence & 
> >>>> content of those?
> >>> Yep: 
> >>> https://github.com/ClusterLabs/resource-agents/blob/v4.12.0/heartbeat/VirtualDomain#L674-L680
> >>>
> >>> You can't really control the content, since it's set by the resource
> >>> agent. (You could change it after creation but that defeats the
> >>> purpose.) However, you can view it at
> >>> /run/systemd/system/resource-agents-deps.target.d/libvirt.conf.
> >>>
> >>> You can see the systemd_drop_in definition here:
> >>> https://github.com/ClusterLabs/resource-agents/blob/v4.12.0/heartbeat/ocf-shellfuncs.in#L654-L673
> >>>
> >> I wonder how much of an impact those bits have on the
> >> cluster(?)
> >> Take '99-VirtualDomain-libvirt.conf' - that one poses
> >> questions, with c9s 'libvirtd.service' is not really used or
> >> should not be, new modular approach is devised there.
> >> So, with 'resources-agents' having:
> >> After=libvirtd.service
> >> and users not being able to manage those bit - is that not
> >> asking for trouble?
> >>
> > it does no harm (missing units are simply ignored) but it certainly
> > does not do anything useful either. OTOH modular approach is also
> > optional, so you could still use monolithic libvirtd on cluster nodes.
> > So it is more a documentation issue.
> Not sure what you mean by 'missing unit' - unit is there
> only is not used, is disabled. What does
> 'resource-agents-deps' do with that?

systemd ordering dependencies are only relevant if two units are
started/stopped at the same time. Otherwise it does nothing. You can
compare it with the optional kind of ordering constraint in pacemaker.

> I don't suppose upstream, redhat & others made that effort,
> those changes with the suggestions to us consumers - do go
> back to "old" stuff.

Just because you can do something does not mean you must do it. If the
solution to this issue is to use libvirtd, you need very good
arguments why using libvirtd is not possible.

> I'd suggest, if devel/contributors read here - and I'd
> imagine other users would reckon as well - to enhance RAs,
> certainly VirtualDomain, with a parameter/attribute with
> which users could, at least to certain extent, control those
> "outside" of cluster, dependencies.
>

Manual parameter certainly sounds wrong here. Sort of auto-detection
whether a modular or monolithic installation is active may be useful -
*if* you cannot use libvirtd.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] 99-VirtualDomain-libvirt.conf under control - ?

2023-05-05 Thread Andrei Borzenkov

On Fri, May 5, 2023 at 11:03 AM lejeczek via Users
 wrote:
>
>
>
> On 29/04/2023 21:02, Reid Wahl wrote:
> > On Sat, Apr 29, 2023 at 3:34 AM lejeczek via Users
> >  wrote:
> >> Hi guys.
> >>
> >> I presume these are a consequence of having resource of VirtuaDomain type 
> >> set up(& enabled) - but where, how cab users control presence & content of 
> >> those?
> > Yep: 
> > https://github.com/ClusterLabs/resource-agents/blob/v4.12.0/heartbeat/VirtualDomain#L674-L680
> >
> > You can't really control the content, since it's set by the resource
> > agent. (You could change it after creation but that defeats the
> > purpose.) However, you can view it at
> > /run/systemd/system/resource-agents-deps.target.d/libvirt.conf.
> >
> > You can see the systemd_drop_in definition here:
> > https://github.com/ClusterLabs/resource-agents/blob/v4.12.0/heartbeat/ocf-shellfuncs.in#L654-L673
> >
> I wonder how much of an impact those bits have on the
> cluster(?)
> Take '99-VirtualDomain-libvirt.conf' - that one poses
> questions, with c9s 'libvirtd.service' is not really used or
> should not be, new modular approach is devised there.
> So, with 'resources-agents' having:
> After=libvirtd.service
> and users not being able to manage those bit - is that not
> asking for trouble?
>

it does no harm (missing units are simply ignored) but it certainly
does not do anything useful either. OTOH modular approach is also
optional, so you could still use monolithic libvirtd on cluster nodes.
So it is more a documentation issue.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] How to block/stop a resource from running twice?

2023-04-24 Thread Andrei Borzenkov

On Mon, Apr 24, 2023 at 11:52 AM Klaus Wenninger  wrote:
> The checking for a running resource that isn't expected to be running isn't 
> done periodically (at
> least not per default and I don't know a way to achieve that from the top of 
> my mind).

op monitor role=Stopped interval=20s
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-17 Thread Andrei Borzenkov

On Mon, Apr 17, 2023 at 10:48 AM Philip Schiller
 wrote:
>
> Hello Andrei,
>
> you wrote:
>
> >>As a workaround you could add dummy clone resource colocated with and
> >>ordered after your DRBD masters and order VM after this clone.
>
> Thanks for the idea. This looks like a good option to solve my problem.
>
> I have also researched a little more and came up with an option which seems 
> to work for my case.
> Would you be so kind to evaluate if i understand it correctly?
>
> As mentioned in the original thread
> >> Wed Apr 12 05:28:48 EDT 2023
>
> My system looks like this:
>
> >>I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
> >>primary/primary mode (necessary for live migration).
>
>
> Where drbd-resources and zvol are clones.
> So it is basically a chain of resources, first zvol then drbd then vm.
>
> From documentation i read that in those cases order constraints are not even 
> necessary. This can be done with colocations constraints only
> -> 
> https://access.redhat.com/documentation/de-de/red_hat_enterprise_linux/7/html/high_availability_add-on_reference/s1-orderconstraints-haar#s2-resourceorderlist-HAAR
>
> There is stated:
> >> A common situation is for an administrator to create a chain of ordered
> >> resources, where, for example, resource A starts before resource B which
> >> starts before resource C. If your configuration requires that you
> >> create a set of resources that is colocated and started in order, you
> >> can configure a resource group that contains those resources, as
> >> described in Section 6.5, “Resource Groups”.
>
> I can't create a Resource Group because apparently clone-resources are not 
> supported. So i have the following setup now:
>
> >> colocation 
> >> colocation-mas-drbd-alarmanlage-clo-pri-zfs-drbd_storage-INFINITY inf: 
> >> mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
> >> colocation colocation-pri-vm-alarmanlage-mas-drbd-alarmanlage-INFINITY 
> >> inf: pri-vm-alarmanlage:Started mas-drbd-alarmanlage:Master
> >> location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s0
>
> Migration works flawless and also the startup is correct: zvol -> drbd -> vm
>

To my best knowledge there is no implied ordering for colocated
resources. So it may work in your case simply due to specific timings.
I would not rely on it. Any software or hardware change may change
timings.

> I am little bit concerned though. Does corosync work like an interpeter and 
> knows the correct order when i do  before  drbd/vm>?
>

Colocation and ordering are entirely orthogonal. Colocating defines
where pacemaker will attempt to start resources while ordering defines
in which order it does it. It is a bit more complicated in case of
promotable clones, because master is not static and is determined at
run time based on resource agent behavior. So pacemaker may delay
placement of dependent resources until masters are known which may
look like ordering.

> Another thing is the Multistate Constraint which i implanted -> 
> pri-vm-alarmanlage:Started mas-drbd-alarmanlage:Master
> Is this equivalent to the  order-mas-drbd-alarmanlage-pri-vm-alarmanlage-mandatory 
> mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start> which i was trying to 
> achieve?
>
> Basically i just want to have zvol started then drbd stared and promoted to 
> master state and then finally vm started. All on the same node.
> Can you confirm that my cluster does this behavior permanently with this 
> configuration.
>

No, I cannot (but I am happy to be proved wrong).

> Note that I would like to avoid any order constraints and dummy resources if 
> possible. But if it is unavoidable let me know.
>
> Thanks for the replies.
>
> With kind regards
> Philip.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-16 Thread Andrei Borzenkov


On 16.04.2023 16:29, lejeczek via Users wrote:



On 16/04/2023 12:54, Andrei Borzenkov wrote:

On 16.04.2023 13:40, lejeczek via Users wrote:

Hi guys

Some agents do employ that concept of node/host map which I
do not see in any manual/docs that this agent does - would
you suggest some technique or tips on how to achieve
similar?
I'm thinking specifically of 'migrate' here, as I understand
'migration' just uses OS' own resolver to call migrate_to
node.



No, pacemaker decides where to migrate the resource and
calls agents on the current source and then on the
intended target passing this information.



Yes pacemaker does that but - as I mentioned - some agents
do employ that "internal" nodes map "technique".
I see no mention of that/similar in the manual for
VirtualDomain so I asked if perhaps somebody had an idea of
how to archive such result by some other means.



What about showing example of these "some agents" or better describing 
what you want to achieve?

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] VirtualDomain - node map - ?

2023-04-16 Thread Andrei Borzenkov


On 16.04.2023 13:40, lejeczek via Users wrote:

Hi guys

Some agents do employ that concept of node/host map which I
do not see in any manual/docs that this agent does - would
you suggest some technique or tips on how to achieve similar?
I'm thinking specifically of 'migrate' here, as I understand
'migration' just uses OS' own resolver to call migrate_to node.



No, pacemaker decides where to migrate the resource and calls agents on 
the current source and then on the intended target passing this information.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov


On 14.04.2023 14:35, Andrei Borzenkov wrote:

On Fri, Apr 14, 2023 at 11:45 AM Philip Schiller
 wrote:


I would like to know if the order constraint 
is equivalent to: "First promote ms-drbd_fs then start drbd_vm".



No, it is not. It is equivalent to

order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm:promote

which is effectively ignored (as drbd)vm is never promoted).

As far as I can tell, pacemaker simply does not support migration
together with demote/promote actions. 


As a workaround you could add dummy clone resource colocated with and 
ordered after your DRBD masters and order VM after this clone. Like


primitive drbd_fs ocf:pacemaker:Stateful \
op monitor role=Master interval=10s \
op monitor role=Slave interval=11s
primitive drbd_vm ocf:pacemaker:Dummy \
op monitor interval=10s \
meta allow-migrate=true
primitive dummy_drbd_fs ocf:pacemaker:Dummy \
op monitor interval=10s
primitive dummy_stonith stonith:external/_dummy \
op monitor interval=3600 timeout=20
clone cl-dummy_drbd_fs dummy_drbd_fs \
meta clone-max=2 clone-node-max=1 interleave=true
clone ms-drbd_fs drbd_fs \
	meta promotable=yes promoted-max=2 clone-max=2 clone-node-max=1 
promoted-node-max=1 interleave=true

location drbd_fs_not_on_qnetd ms-drbd_fs -inf: qnetd
order drbd_vm_after_dummy_drbd_fs Mandatory: cl-dummy_drbd_fs drbd_vm
location drbd_vm_not_on_qnetd drbd_vm -inf: qnetd
order dummy_drbd_fs_after_drbd_fs Mandatory: ms-drbd_fs:promote 
cl-dummy_drbd_fs:start

location dummy_drbd_fs_not_on_qnetd cl-dummy_drbd_fs -inf: qnetd
colocation dummy_drbd_fs_with_drbd_fs inf: cl-dummy_drbd_fs 
ms-drbd_fs:Master


which results in

Transition Summary:
  * Stop   drbd_fs:1   ( Master ha2 )  due to node availability
  * Migratedrbd_vm ( ha2 -> ha1 )
  * Stop   dummy_drbd_fs:1 (ha2 )  due to node availability

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov

On Fri, Apr 14, 2023 at 2:35 PM Andrei Borzenkov  wrote:
>
> As far as I can tell, pacemaker simply does not support migration
> together with demote/promote actions. I don't really know the reasons.

Thinking about it - migrating a resource depending on a master is
simply not possible in the general case. This requires the master to
be active on several nodes at the same time and it is a more or less
exceptional scenario. While stop/start works always.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov

On Fri, Apr 14, 2023 at 11:45 AM Philip Schiller
 wrote:
>
> I would like to know if the order constraint  Mandatory: ms-drbd_fs:promote drbd_vm>
> is equivalent to: "First promote ms-drbd_fs then start drbd_vm".
>

No, it is not. It is equivalent to

order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm:promote

which is effectively ignored (as drbd)vm is never promoted).

As far as I can tell, pacemaker simply does not support migration
together with demote/promote actions. I don't really know the reasons.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-14 Thread Andrei Borzenkov

On Fri, Apr 14, 2023 at 10:25 AM Philip Schiller
 wrote:
>
> Thank you very much Andrei,
> this completely solved my problem.
>

I am not sure I understand that (and as you do not quote the previous
posts you are replying to it is not even clear what you are talking
about).


> To point out, when I use the command:
>
> pcs constraint order promote mas-drbd-jabber pri-vm-jabber
>
> it sets the pri-vm action to start as default -> order 
> order-mas-drbd-jabber-pri-vm-jabber-mandatory mas-drbd-jabber:promote 
> pri-vm-jabber:start
>

Which is the correct ordering constraint in this case.

> So people can quickly run into this behavior without realising it.
>
> From my point of view this topic is FIXED.
>

As you did not even bother to explain what you did to fix it, how is
it going to help anyone?

> I am new to this Mailing list, so I am not sure if I have to mark it as fixed.
> Also I hope that I used the Mailing List correctly as I didn't really reply 
> to answers. Instead I wrote new Mails to users@... with the topic in CC.
>
> Can you elaborate a little bit more on the behavior of the order constraint 
> in the case where it is not working? [@Andrei Borzenkov]
> I failed to completely understand your explanation.
>
> With kind regards
> Philip.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-13 Thread Andrei Borzenkov


On 13.04.2023 22:24, Andrei Borzenkov wrote:
...



order drbd_vm_after_drbd_fs Mandatory: ms-drbd_fs:promote drbd_vm:start

After I added it back I get the same failed "demote" action.

Transition Summary:
* Stop   zfs_drbd_storage:0 (ha1 )  due to node
availability
* Stop   drbd_fs:0  ( Master ha1 )  due to node
availability
* Migratejust_vm( ha1 -> ha2 )
* Move   drbd_vm( ha1 -> ha2 )  due to unrunnable
ms-drbd_fs demote

I was sure that "start" is default anyway.


Scratch it. The default is the value of "first-action" so with "start" 
it ordered two "promote" actions which does nothing. So we are back at 
square one - if there is ordering constraint against master/slave.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-13 Thread Andrei Borzenkov


On 12.04.2023 15:44, Philip Schiller wrote:

Here are also some Additional some additional information for a failover with 
setting the node standby.

Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: State transition S_IDLE 
-> S_POLICY_ENGINE
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: On loss of quorum: 
Ignore
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
sto-ipmi-s0    (    s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-zfs-drbd_storage:0 (    s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-pluto:0   (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-poserver:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-webserver:0   (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-dhcp:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-wawi:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-wawius:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-saturn:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-openvpn:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-asterisk:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-alarmanlage:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-jabber:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-TESTOPTIXXX:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move   
pri-vm-jabber  (  s1 -> s0 )  due to unrunnable 
mas-drbd-jabber demote
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move   
pri-vm-alarmanlage (  s1 -> s0 )  due to unrunnable 
mas-drbd-alarmanlage demote


I had the same "unrunnable demote" yesterday when I tried to reproduce 
it, but I cannot reproduce it anymore. After some CIB modifications it 
works as expected.


Using the original execution date of: 2023-04-13 18:35:11Z
Current cluster status:
  * Node List:
* Node ha1: standby (with active resources)
* Online: [ ha2 qnetd ]

  * Full List of Resources:
* dummy_stonith (stonith:external/_dummy):   Started ha1
* Clone Set: cl-zfs_drbd_storage [zfs_drbd_storage]:
  * Started: [ ha1 ha2 ]
* Clone Set: ms-drbd_fs [drbd_fs] (promotable):
  * Masters: [ ha1 ha2 ]
* just_vm   (ocf::pacemaker:Dummy):  Started ha2
* drbd_vm   (ocf::pacemaker:Dummy):  Started ha1

Transition Summary:
  * Move   dummy_stonith  ( ha1 -> ha2 )
  * Stop   zfs_drbd_storage:0 (ha1 )  due to node 
availability
  * Stop   drbd_fs:0  ( Master ha1 )  due to node 
availability

  * Migratedrbd_vm( ha1 -> ha2 )

Executing Cluster Transition:
  * Resource action: dummy_stonith   stop on ha1
  * Pseudo action:   ms-drbd_fs_demote_0
  * Resource action: drbd_vm migrate_to on ha1
  * Resource action: dummy_stonith   start on ha2
  * Resource action: drbd_fs demote on ha1
  * Pseudo action:   ms-drbd_fs_demoted_0
  * Pseudo action:   ms-drbd_fs_stop_0
  * Resource action: drbd_vm migrate_from on ha2
  * Resource action: drbd_vm stop on ha1
  * Resource action: dummy_stonith   monitor=360 on ha2
  * Pseudo action:   cl-zfs_drbd_storage_stop_0
  * Resource action: drbd_fs stop on ha1
  * Pseudo action:   ms-drbd_fs_stopped_0
  * Pseudo action:   drbd_vm_start_0
  * Resource action: zfs_drbd_storage stop on ha1
  * Pseudo action:   cl-zfs_drbd_storage_stopped_0
  * Resource action: drbd_vm monitor=1 on ha2
Using the original execution date of: 2023-04-13 18:35:11Z

Revised Cluster Status:
  * Node List:
* Node ha1: standby
* Online: [

Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Andrei Borzenkov

On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov  wrote:
>
> Hi,
>
> Just add a Master role for drbd resource in the colocation. Default is 
> Started (or Slave).
>

Could you elaborate why it is needed? The problem is not leaving the
resource on the node with a demoted instance - when the node goes into
standby, all resources must be evacuated from it anyway. How
collocating VM with master changes it?

>
> Philip Schiller  12 апреля 2023 г. 11:28:57 написал:
>>
>> 
>>
>> Hi All,
>>
>> I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
>> primary/primary mode (necessary for live migration).  My configuration:
>>
>> primitive pri-vm-alarmanlage VirtualDomain \
>> params config="/etc/libvirt/qemu/alarmanlage.xml" 
>> hypervisor="qemu:///system" migration_transport=ssh \
>> meta allow-migrate=true target-role=Started is-managed=true \
>> op monitor interval=0 timeout=120 \
>> op start interval=0 timeout=120 \
>> op stop interval=0 timeout=1800 \
>> op migrate_to interval=0 timeout=1800 \
>> op migrate_from interval=0 timeout=1800 \
>> utilization cpu=2 hv_memory=4096
>> ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
>> meta clone-max=2 promoted-max=2 notify=true promoted-node-max=1 
>> clone-node-max=1 interleave=true target-role=Started is-managed=true
>> colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: 
>> mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
>> location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
>> order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: 
>> mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
>>
>> So to summerize:
>> - A  resource for Virsh
>> - A Master/Slave DRBD ressources for the VM filesystem .
>> - a "order" directive to start the VM after drbd has been promoted.
>>
>> Node startup is ok, the VM is started after DRBD is promoted.
>> Migration with virsh or over crm  
>> works fine.
>>
>> Node standby is problematic. Assuming the Virsh VM runs on node s1 :
>>
>> When puting node s1 in standby when node s0 is active, a live migration
>> is started, BUT in the same second, pacemaker tries to demote DRBD
>> volumes on s1 (while live migration is in progress).
>>
>> All this results in "stopping the vm" on s1 and starting the "vm on s0".
>>
>> I do not understand why pacemaker does demote/stop DRBD volumes before VM is 
>> migrated.
>> Do i need additional constraints?
>>
>> Setup is done with
>> - Corosync Cluster Engine, version '3.1.6'
>> - Pacemaker 2.1.2
>> - Ubuntu 22.04.2 LTS
>>
>> Thanks for your help,
>>
>> with kind regards Philip
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Andrei Borzenkov

On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot  wrote:
>
> On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote:
> > I fixed the issue by changing location definition from:
> >
> > location intranet-ip_on_any_nginx intranet-ip \
> > rule -inf: opa-nginx_1_active eq 0 \
> > rule -inf: opa-nginx_2_active eq 0
> >
> > To:
> >
> > location intranet-ip_on_any_nginx intranet-ip \
> > rule opa-nginx_1_active eq 1 \
> >rule opa-nginx_2_active eq 1
> >
> > Now it works fine and shows the constraint with: crm res constraint
> > intranet-ip
>
> Ah, I suspect the issue was that the original constraint compared only
> against 0, when initially (before the resources ever start) the
> attribute is undefined.
>

This does not really explain the original question. Apparently the
attribute *was* defined but somehow ignored.

Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice: Setting
opa-nginx_1_active[intranet-test1]: 1 -> 0
...
  * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-test1
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] NFS mount fails to stop if NFS server is lost

2023-04-11 Thread Andrei Borzenkov


On 11.04.2023 17:35, Miro Igov wrote:

Hello,

I have a node nas-sync-test1 with NFS server and NFS export running and
another node intranet-test1 with data_1 fs mount:

  


primitive data_1 Filesystem \

 params device="nas-sync-test1:/home/pharmya/NAS" fstype=nfs
options=v4 directory="/data/synology/pharmya_office/NAS_Sync/NAS" \

 op monitor interval=10s

  


Disconnecting nas-sync-test1 from the network shows it's state as UNCLEAN
and pacemaker fences it.

Then it tries to stop data_1 but it shows timeout error.

  


I know unmounting of NFS mount  when NFS server is gone requires force. Is
there such option in Filesystem RA ?



It does it unconditionally from the very beginning

# For networked filesystems, there's merit in trying -f:
case "$FSTYPE" in
nfs4|nfs|efs|cifs|smbfs) umount_force="-f" ;;
esac

But IIRC it is not enough (at least, not always) - so the trick is to 
setup the same IP address as server had. It could be an alias on a local 
client interface, does not matter. I suppose client gets stuck somewhere 
below application layer and having IP resets TCP connection allowing it 
to proceed.


I certainly had to do it in the past, and simple "umount -f" did not work.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Location not working

2023-04-10 Thread Andrei Borzenkov

On Mon, Apr 10, 2023 at 4:26 PM Ken Gaillot  wrote:
>
> On Mon, 2023-04-10 at 14:18 +0300, Miro Igov wrote:
> > Hello,
> > I have a resource with location constraint set to:
> >
> > location intranet-ip_on_any_nginx intranet-ip \
> > rule -inf: opa-nginx_1_active eq 0 \
> > rule -inf: opa-nginx_2_active eq 0
>
> You haven't specified a score for the constraint, so it defaults to 0,
> meaning the resource is allowed on those nodes but has no preference
> for them.
>

But each rule has score -INFINITY?

  

  


  

  

This exactly matches the example in documentation where score is moved
from rsc_location to rule.

> >
> > In syslog I see the attribute transition:
> > Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice:
> > Setting opa-nginx_1_active[intranet-test1]: 1 -> 0
> >
> > Current cluster status is :
> >
> > Node List:
> >   * Online: [ intranet-test1 intranet-test2 nas-sync-test1 nas-sync-
> > test2 ]
> >
> > * stonith-sbd (stonith:external/sbd):  Started intranet-test2
> >   * admin-ip(ocf::heartbeat:IPaddr2):Started nas-sync-
> > test2
> >   * cron_symlink(ocf::heartbeat:symlink):Started
> > intranet-test1
> >   * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-
> > test1
> >   * mysql_1 (systemd:mariadb@intranet-test1):Started
> > intranet-test1
> >   * mysql_2 (systemd:mariadb@intranet-test2):Started
> > intranet-test2
> >   * nginx_1 (systemd:nginx@intranet-test1):  Stopped
> >   * nginx_1_active  (ocf::pacemaker:attribute):  Stopped
> >   * nginx_2 (systemd:nginx@intranet-test2):  Started intranet-
> > test2
> >   * nginx_2_active  (ocf::pacemaker:attribute):  Started
> > intranet-test2
> >   * php_1   (systemd:php5.6-fpm@intranet-test1): Started
> > intranet-test1
> >   * php_2   (systemd:php5.6-fpm@intranet-test2): Started
> > intranet-test2
> >   * data_1  (ocf::heartbeat:Filesystem): Stopped
> >   * data_2  (ocf::heartbeat:Filesystem): Started intranet-
> > test2
> >   * nfs_export_1(ocf::heartbeat:exportfs):   Stopped
> >   * nfs_export_2(ocf::heartbeat:exportfs):   Started nas-
> > sync-test2
> >   * nfs_server_1(systemd:nfs-server@nas-sync-test1):
> > Stopped
> >   * nfs_server_2(systemd:nfs-server@nas-sync-test2):
> > Started nas-sync-test2
> >
> > Failed Resource Actions:
> >   * nfs_server_1_start_0 on nas-sync-test1 'error' (1): call=95,
> > status='complete', exitreason='', last-rc-change='2023-04-10 12:35:12
> > +02:00', queued=0ms, exec=209ms
> >
> >
> > Why intranet-ip is located on intranet-test1 while nginx_1_active is
> > 0 ?
> >
> > # crm res constraint intranet-ip
> >
> > cron_symlink
> > (score=INFINITY, id=c_cron_symlink_on_intranet-ip)
> > * intranet-ip
> >   : Node nas-sync-
> > test2
> > (score=-INFINITY, id=intranet-ip_loc-rule)
> >   : Node nas-sync-
> > test1
> > (score=-INFINITY, id=intranet-ip_loc-rule)
> >
> > Why no constraint entry for intranet-ip_on_any_nginx location ?
> >
> >
> >  This message has been sent as a part of discussion between PHARMYA
> > and the addressee whose name is specified above. Should you receive
> > this message by mistake, we would be most grateful if you informed us
> > that the message has been sent to you. In this case, we also ask that
> > you delete this message from your mailbox, and do not forward it or
> > any part of it to anyone else.
> > Thank you for your cooperation and understanding.
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home: https://www.clusterlabs.org/
> --
> Ken Gaillot 
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Location not working

2023-04-10 Thread Andrei Borzenkov

On Mon, Apr 10, 2023 at 2:19 PM Miro Igov  wrote:

> Hello,
>
> I have a resource with location constraint set to:
>
>
>
> location intranet-ip_on_any_nginx intranet-ip \
>
> rule -inf: opa-nginx_1_active eq 0 \
>
> rule -inf: opa-nginx_2_active eq 0
>
>
>
> In syslog I see the attribute transition:
>
> Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice: Setting
> opa-nginx_1_active[intranet-test1]: 1 -> 0
>
>
>
> Current cluster status is :
>
>
>
> Node List:
>
>   * Online: [ intranet-test1 intranet-test2 nas-sync-test1 nas-sync-test2 ]
>
>
>
> * stonith-sbd (stonith:external/sbd):  Started intranet-test2
>
>   * admin-ip(ocf::heartbeat:IPaddr2):Started nas-sync-test2
>
>   * cron_symlink(ocf::heartbeat:symlink):Started
> intranet-test1
>
>   * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-test1
>
>   * mysql_1 (systemd:mariadb@intranet-test1):Started
> intranet-test1
>
>   * mysql_2 (systemd:mariadb@intranet-test2):Started
> intranet-test2
>
>   * nginx_1 (systemd:nginx@intranet-test1):  Stopped
>
>   * nginx_1_active  (ocf::pacemaker:attribute):  Stopped
>
>   * nginx_2 (systemd:nginx@intranet-test2):  Started intranet-test2
>
>   * nginx_2_active  (ocf::pacemaker:attribute):  Started
> intranet-test2
>
>   * php_1   (systemd:php5.6-fpm@intranet-test1): Started
> intranet-test1
>
>   * php_2   (systemd:php5.6-fpm@intranet-test2): Started
> intranet-test2
>
>   * data_1  (ocf::heartbeat:Filesystem): Stopped
>
>   * data_2  (ocf::heartbeat:Filesystem): Started intranet-test2
>
>   * nfs_export_1(ocf::heartbeat:exportfs):   Stopped
>
>   * nfs_export_2(ocf::heartbeat:exportfs):   Started
> nas-sync-test2
>
>   * nfs_server_1(systemd:nfs-server@nas-sync-test1): Stopped
>
>   * nfs_server_2(systemd:nfs-server@nas-sync-test2): Started
> nas-sync-test2
>
>
>
> Failed Resource Actions:
>
>   * nfs_server_1_start_0 on nas-sync-test1 'error' (1): call=95,
> status='complete', exitreason='', last-rc-change='2023-04-10 12:35:12
> +02:00', queued=0ms, exec=209ms
>
>
>
>
>
> Why intranet-ip is located on intranet-test1 while nginx_1_active is 0 ?
>
>
>
> # crm res constraint intranet-ip
>
>
> cron_symlink
> (score=INFINITY, id=c_cron_symlink_on_intranet-ip)
>
> * intranet-ip
>
>   : Node
> nas-sync-test2
> (score=-INFINITY, id=intranet-ip_loc-rule)
>
>   : Node
> nas-sync-test1
> (score=-INFINITY, id=intranet-ip_loc-rule)
>
>
>
> Why no constraint entry for intranet-ip_on_any_nginx location ?
>
>
>

It is impossible to answer based on those fragments of information you
provided. Full output of "crm config show" (or even better full CIB) may
give some hint.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Help with tweaking an active/passive NFS cluster

2023-04-05 Thread Andrei Borzenkov

On Wed, Apr 5, 2023 at 10:36 AM Andrei Borzenkov  wrote:
> but in your case members of set are on the same node

Are NOT on the same node of course.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Help with tweaking an active/passive NFS cluster

2023-04-05 Thread Andrei Borzenkov

On Fri, Mar 31, 2023 at 12:42 AM Ronny Adsetts
 wrote:
>
> Hi,
>
> I wonder if someone more familiar with the workings of pacemaker/corosync 
> would be able to assist in solving an issue.
>
> I have a 3-node NFS cluster which exports several iSCSI LUNs. The LUNs are 
> presented to the nodes via multipathd.
>
> This all works fine except that I can't stop just one export. Sometimes I 
> need to take a single filesystem offline for maintenance for example. Or if 
> there's an issue and a filesystem goes offline and can't come back.
>
> There's a trimmed down config below but essentially I want all the NFS 
> exports on one node but I don't want any of the exports to block. So it's OK 
> to stop (or fail) a single export.
>
> My config has a group for each export and filesystem and another group for 
> the NFS server and VIP. I then co-locate them together.
>
> Cut-down config to limit the number of exports:
>
> node 1: nfs-01
> node 2: nfs-02
> node 3: nfs-03
> primitive NFSExportAdminHomes exportfs \
> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
> directory="/srv/adminhomes" fsid=dcfd1bbb-c026-4d6d-8541-7fc29d6fef1a \
> op monitor timeout=20 interval=10 \
> op_params interval=10
> primitive NFSExportArchive exportfs \
> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
> directory="/srv/archive" fsid=3abb6e34-bff2-4896-b8ff-fc1123517359 \
> op monitor timeout=20 interval=10 \
> op_params interval=10 \
> meta target-role=Started
> primitive NFSExportDBBackups exportfs \
> params clientspec="172.16.40.0/24" options="rw,async,no_root_squash" 
> directory="/srv/dbbackups" fsid=df58b9c0-593b-45c0-9923-155b3d7d9483 \
> op monitor timeout=20 interval=10 \
> op_params interval=10
> primitive NFSFSAdminHomes Filesystem \
> params device="/dev/mapper/adminhomes-part1" 
> directory="/srv/adminhomes" fstype=xfs \
> op start interval=0 timeout=120 \
> op monitor interval=60 timeout=60 \
> op_params OCF_CHECK_LEVEL=20 \
> op stop interval=0 timeout=240
> primitive NFSFSArchive Filesystem \
> params device="/dev/mapper/archive-part1" directory="/srv/archive" 
> fstype=xfs \
> op start interval=0 timeout=120 \
> op monitor interval=60 timeout=60 \
> op_params OCF_CHECK_LEVEL=20 \
> op stop interval=0 timeout=240 \
> meta target-role=Started
> primitive NFSFSDBBackups Filesystem \
> params device="/dev/mapper/dbbackups-part1" 
> directory="/srv/dbbackups" fstype=xfs \
> op start timeout=60 interval=0 \
> op monitor interval=20 timeout=40 \
> op stop timeout=60 interval=0 \
> op_params OCF_CHECK_LEVEL=20
> primitive NFSIP-01 IPaddr2 \
> params ip=172.16.40.17 cidr_netmask=24 nic=ens14 \
> op monitor interval=30s
> group AdminHomes NFSFSAdminHomes NFSExportAdminHomes \
> meta target-role=Started
> group Archive NFSFSArchive NFSExportArchive \
> meta target-role=Started
> group DBBackups NFSFSDBBackups NFSExportDBBackups \
> meta target-role=Started
> group NFSServerIP NFSIP-01 NFSServer \
> meta target-role=Started
> colocation NFSMaster inf: NFSServerIP AdminHomes Archive DBBackups

This is entirely equivalent to defining a group and says that
resources must be started in strict order on the same node. Like with
a group, if an earlier resource cannot be started, all following
resources are not started either.

> property cib-bootstrap-options: \
> have-watchdog=false \
> dc-version=2.0.1-9e909a5bdd \
> cluster-infrastructure=corosync \
> cluster-name=nfs-cluster \
> stonith-enabled=false \
> last-lrm-refresh=1675344768
> rsc_defaults rsc-options: \
> resource-stickiness=200
>
>
> The problem is that if one export fails, none of the following exports will 
> be attempted. Reading the docs, that's to be expected as each item in the 
> colocation needs the preceding item to succeed.
>
> I tried changing the colocation line like so to remove the dependency:
>
> colocation NFSMaster inf: NFSServerIP ( AdminHomes Archive DBBackups )
>

1. The ( AdminHomes Archive DBBackups ) creates a set with
sequential=false. Now, the documentation for "sequential" is one of
the most obscure I have seen, but judging by "the individual members
within any one set may or may not be colocated relative to each other
(determined by the set’s sequential property)" and "A colocated set
with sequential="false" makes sense only if there is another set in
the constraint. Otherwise, the constraint has no effect" members of a
set with sequential=false are not colocated on the same node.

2. The condition is backward. You colocate NFSServerIP *with* set (
AdminHomes Archive DBBackups ), while you actually want to colocate
set ( AdminHomes Archive DBBackups ) *with* NFSServerIP.

So the

colocation

[ClusterLabs] The latest shim from Leap 15.4 disallows shim from Tumbleweed and possibly other distributions

2023-04-01 Thread Andrei Borzenkov

https://forums.opensuse.org/t/after-a-shim-update-yesterday-no-longer-able-to-boot-with-secure-boot-enabled/165382/16

https://bugzilla.opensuse.org/show_bug.cgi?id=1209985

To explain.

There is relatively new standard SBAT which makes it possible to "mass
blacklist" EFI binaries supporting it. In rough overview, each binary
carries a "generation" number which is increased when vulnerability is
fixed. System has SbatLevel EFI variable which lists minimal accepted
generations. Any binary which has smaller generation is rejected (by
shim). What is important is that shim also verifies itself when starting.

That is the general theory. Now comes implementation - there is no tool
to manage SbatLevel variable directly - it is written by shim itself! At
best you can select between two different values ("previous" and
"latest") or reset this variable to some initial value. What "previous"
and "latest" contain is entirely up to shim developers/maintainers.
Initial value is empty.

What happened now was the following

1. Leap 15.4 shim fixed some CVE and increased its own generation
2. Current Leap shim automatically enforces "latest" content of
SbatLevel during installation which now requires at least shim generation 2
3. shim in Tumbleweed on one hand supports SBAT, on the other hand it
has too old generation and so refuses to run.

It also means that once you booted current Leap 15.4 shim at least once,
you can no more to boot any installation image that includes shim new
enough to understand SBAT but old enough to have generation 1 (like
current Ubuntu shim :) ) with Secure Boot enabled. It will simply turn
system off after showing error message.

It is possible to mitigate it by using

mokutil --set-sbat-policy previous

*before* updating to the new shim. In this case new shim will set
embedded previous policy that does only lists grub2.

Initial policy after reset (empty, only header):

sbat,1,2021030218

Policy after rebooting into new shim with "previous" set:

sbat,1,2022052400
grub,2

Policy after rebooting with new shim after default installation
(implicit "latest"):

sbat,1,2022111500
shim,2
grub,3

As you see, even "previous" adds minimal grub requirement which may
block grub from other distributions (all to protect you of course :) ).

What is *NOT* possible is to tell shim to leave SbatLevel the hell alone
on *my* system.

Frankly the implementation looks like security theater to me, but I am
not security expert ...

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] resource cloned group colocations

2023-03-02 Thread Andrei Borzenkov

On Thu, Mar 2, 2023 at 4:16 PM Gerald Vogt  wrote:
>
> On 02.03.23 13:51, Klaus Wenninger wrote:
> > Now if I stop pacemaker on one of those nodes, e.g. on node ha2, it's
> > fine. ip2 will be moved immediately to ha3. Good.
> >
> > However, if pacemaker on ha2 starts up again, it will immediately
> > remove
> > ip2 from ha3 and keep it offline, while the services in the group are
> > starting on ha2. As the services unfortunately take some time to come
> > up, ip2 is offline for more than a minute.
> >
> > It seems the colocations with the clone are already good once the clone
> > group begins to start services and thus allows the ip to be removed
> > from
> > the current node.
> >
> >
> > To achieve this you have to add orders on top of collocations.
>
> I don't understand that.
>
> "order" and "colocation" are constraints. They work on resources.
>
> I don't see how I could add an order on top of a colocation constraint...
>

You cannot, but asymmetrical serializing constraint may do it

first start clone then stop ip

when new node comes up, pacemaker builds a transaction which starts
clone on the new node and moves ip (stops on the old node and starts
on the new node). These actions are (should be) part of the same
transaction so serializing constraints should apply.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Systemd resource started on node after reboot before cluster is stable ?

2023-02-15 Thread Andrei Borzenkov

On Wed, Feb 15, 2023 at 12:49 PM Adam Cecile  wrote:
>
> Hello,
>
> Just had some issue with unexpected server behavior after reboot. This node 
> was powered off, so cluster was running fine with this tomcat9 resource 
> running on a different machine.
>
> After powering on this node again, it briefly started tomcat before joining 
> the cluster and decided to stop it again. I'm not sure why.
>
>
> Here is the systemctl status tomcat9 on this host:
>
> tomcat9.service - Apache Tomcat 9 Web Application Server
>  Loaded: loaded (/lib/systemd/system/tomcat9.service; disabled; vendor 
> preset: enabled)
> Drop-In: /etc/systemd/system/tomcat9.service.d
>  └─override.conf
>  Active: inactive (dead)
>Docs: https://tomcat.apache.org/tomcat-9.0-doc/index.html
>
> Feb 15 09:43:27 server tomcat9[1398]: Starting service [Catalina]
> Feb 15 09:43:27 server tomcat9[1398]: Starting Servlet engine: [Apache 
> Tomcat/9.0.43 (Debian)]
> Feb 15 09:43:27 server tomcat9[1398]: [...]
> Feb 15 09:43:29 server systemd[1]: Stopping Apache Tomcat 9 Web Application 
> Server...
> Feb 15 09:43:29 server systemd[1]: tomcat9.service: Succeeded.
> Feb 15 09:43:29 server systemd[1]: Stopped Apache Tomcat 9 Web Application 
> Server.
> Feb 15 09:43:29 server systemd[1]: tomcat9.service: Consumed 8.017s CPU time.
>
> You can see it is disabled and should NOT be started

"Disabled" in systemd just means that links in [Install] section are
not present. This unit may be started by explicit request, or by
explicit dependency like Wants or Requires in another unit. Check
"systemctl show -p WantedBy -p RequiredBy tomcat9.service".

> with the same, start/stop is under Corosync control
>
>
> The systemd resource is defined like this:
>
> primitive tomcat9 systemd:tomcat9.service \
> op start interval=0 timeout=120 \
> op stop interval=0 timeout=120 \
> op monitor interval=60 timeout=100
>
>
> Any idea why this happened ?
>
> Best regards, Adam.
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Load balancing, of a sort

2023-01-25 Thread Andrei Borzenkov

On Wed, Jan 25, 2023 at 3:49 PM Antony Stone
 wrote:
>
> Hi.
>
> I have a corosync / pacemaker 3-node cluster with a resource group which can
> run on any node in the cluster.
>
> Every night a cron job on the node which is running the resources performs
> "crm_standby -v on" followed a short while later by "crm_standby -v off" in
> order to force the resources to migrate to another node member.
>
> We do this partly to verify that all nodes are capable of running the
> resources, and partly because some of those resources generate significant log
> files, and if one machine just keeps running them day after day, we run out of
> disk space (which effectively means we just need to add more capacity to the
> machines, which can be done, but at a cost).
>
> So long as a machine gets a day when it's not running the resources, a
> combination of migrating the log files to a central server, plus standard
> logfile rotation, takes care of managing the disk space.
>
> What I notice, though, is that two of the machines tend to swap the resources
> between them, and the third machine hardly ever becomes the active node.
>

Pacemaker simply checks each eligible node whether it can run a
resource and I believe the order of the node list does not change (at
least as long as there is no join/leave event). So effectively the
resource just oscillates between the first two nodes in the list.

> Is there some way of influencing the node selection mechanism when resources
> need to move away from the currently active node, so that, for example, the
> least recently used node could be favoured over the rest?
>

I do not think pacemaker even knows which node is "the least recently
used", it does not keep this history. You can add a rule to define
location constraint based on some node attribute(s) and set this
attribute in the same script where you call crm_standby. E.g. you
could set a timestamp on the node where the resource is currently
active before doing crm_standby and select the node with the oldest
timestamp (I do not think pacemaker supports such computation in its
rules).
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Failed 'virsh' call when test RA run by crm_resource (con't)

2023-01-12 Thread Andrei Borzenkov

On Thu, Jan 12, 2023 at 12:50 PM Keisuke MORI  wrote:
>
> Hi,
>
> Just a guess but could it be the same issue with this?
>
> https://serverfault.com/questions/1105733/virsh-command-hangs-when-script-runs-in-the-background
>

That is exactly the same issue. The reason for SIGTTOU is explained in
https://gitlab.com/libvirt/libvirt/-/issues/366#note_1102131966:

most likely due to pkttyagent wanting to change the terminal mode to
read the password

and pkttyagent gets SIGTTOU immediately after trying to set terminal mode.



> 2023年1月12日(木) 15:36 Madison Kelly :
> >
> > On 2023-01-12 01:26, Reid Wahl wrote:
> > > On Wed, Jan 11, 2023 at 10:21 PM Madison Kelly  wrote:
> > >>
> > >> On 2023-01-12 01:12, Reid Wahl wrote:
> > >>> On Wed, Jan 11, 2023 at 8:11 PM Madison Kelly  
> > >>> wrote:
> > 
> >  Hi all,
> > 
> >   There was a lot of sub-threads, so I figured it's helpful to 
> >  start a
> >  new thread with a summary so far. For context; I have a super simple
> >  perl script that pretends to be an RA for the sake of debugging.
> > 
> >  https://pastebin.com/9z314TaB
> > 
> >   I've had variations log environment variables and confirmed that 
> >  all
> >  the variables in the direct call that work are in the crm_resource
> >  triggered call. There are no selinux issues logged in audit.log and
> >  selinux is permissive. The script logs the real and effective UID and
> >  GID and it's the same in both instances. Calling other shell programs
> >  (tested with 'hostname') run fine, this is specifically crm_resource ->
> >  test RA -> virsh call.
> > 
> >   I ran strace on the virsh call from inside my test script 
> >  (changing
> >  'virsh.good' to 'virsh.bad' between running directly and via
> >  crm_resource. The strace runs made six files each time. Below are
> >  pastebin links with the outputs of the six runs in one paste, but each
> >  file's output is in it's own block (search for file: to see the
> >  different file outputs)
> > 
> >  Good/direct run of the test RA:
> >  - https://pastebin.com/xtqe9NSG
> > 
> >  Bad/crm_resource triggered run of the test RA:
> >  - https://pastebin.com/vBiLVejW
> > 
> >  Still absolutely stumped.
> > >>>
> > >>> The strace outputs show that your bad runs are all getting stopped
> > >>> with SIGTTOU. If you've never heard of that, me either.
> > >>
> > >> The hell?! This is new to me also.
> > >>
> > >>> https://www.gnu.org/software/libc/manual/html_node/Job-Control-Signals.html
> > >>>
> > >>> Macro: int SIGTTOU
> > >>>
> > >>>   This is similar to SIGTTIN, but is generated when a process in a
> > >>> background job attempts to write to the terminal or set its modes.
> > >>> Again, the default action is to stop the process. SIGTTOU is only
> > >>> generated for an attempt to write to the terminal if the TOSTOP output
> > >>> mode is set; see Output Modes.
> > >>>
> > >>>
> > >>> Maybe this has something to do with the buffer settings in the perl
> > >>> script(?). It might be worth trying a version that doesn't fiddle with
> > >>> the outputs and buffer settings.
> > >>
> > >> I tried removing the $|, and then I changed the script to be entirely a
> > >> bash script, still hanging. I tried 'virsh --connect  list
> > >> --all' where method was qemu:///system, qemu:///session, and
> > >> ssh+qemu:///root@localhost/system, all hang. In bash or perl.
> > >>
> > >>> I don't know which difference between your environment and mine is
> > >>> relevant here, such that I can't reproduce the issue using your test
> > >>> script. It works perfectly fine for me.
> > >>>
> > >>> Can you run `stty -a | grep tostop`? If there's a minus sign
> > >>> ("-tostop"), it's disabled; if it's present without a minus sign
> > >>> ("tostop"), it's enabled, as best I can tell.
> > >>
> > >> -tostop is there
> > >>
> > >> 
> > >> [root@mk-a07n02 ~]# stty -a | grep tostop
> > >> isig icanon iexten echo echoe echok -echonl -noflsh -xcase -tostop 
> > >> -echoprt
> > >> [root@mk-a07n02 ~]#
> > >> 
> > >>
> > >>> I'm just spitballing here. It's disabled by default on my machine...
> > >>> but even when I enable it, crm_resource --validate works fine. It may
> > >>> be set differently when running under crm_resource.
> > >>
> > >> How do you enable it?
> > >
> > > With `stty tostop`
> > >
> > > It's 100% possible that this whole thing is a red herring by the way.
> > > I'm looking for anything that might explain the discrepancy. SIGTTOU
> > > may not be directly tied to the root cause.
> >
> > Appreciate the stab, didn't stop the hang though :(
> >
> > --
> > Madison Kelly
> > Alteeve's Niche!
> > Chief Technical Officer
> > c: +1-647-471-0951
> > https://alteeve.com/
> >
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> >
> > ClusterLabs home:

Re: [ClusterLabs] Antw: [EXT] Re: Stonith

2022-12-20 Thread Andrei Borzenkov

On Tue, Dec 20, 2022 at 10:07 AM Ulrich Windl
 wrote:
> >
> > But keep in mind that if the whole site is down (or unaccessible) you
> > will not have access to IPMI/PDU/whatever on this site so your stonith
> > agents will fail ...
>
> But, considering the design, such site won't have a quorum and should commit 
> suicide, right?
>

Not by default.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov

On Mon, Dec 19, 2022 at 4:01 PM Antony Stone
 wrote:
>
> On Monday 19 December 2022 at 13:55:45, Andrei Borzenkov wrote:
>
> > On Mon, Dec 19, 2022 at 3:44 PM Antony Stone
> >
> >  wrote:
> > > So, do I simply create one stonith resource for each server, and rely on
> > > some other random server to invoke it when needed?
> >
> > Yes, this is the most simple approach. You need to restrict this
> > stonith resource to only one cluster node (set pcmk_host_list).
>
> So, just to be clear, I create one stonith resource for each machine which
> needs to be able to be shut down by some other server?
>

Correct.

> I ask simply because the acronym stonith refers to "the other node", so it
> sounds to me more like something I need to define so that a working machine 
> can
> kill another one.
>

Yes, you define a stonith resource that can kill node A and nodes B,
C, D, ... will use this resource to kill A when needed. As long as
your stonith resource can actually work on any node it does not matter
which one will do the killing. You can restrict which nodes can use
this stonith agent using usual location constraints if necessary.

But keep in mind that if the whole site is down (or unaccessible) you
will not have access to IPMI/PDU/whatever on this site so your stonith
agents will fail ...
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Stonith

2022-12-19 Thread Andrei Borzenkov

On Mon, Dec 19, 2022 at 3:44 PM Antony Stone
 wrote:
>
> So, do I simply create one stonith resource for each server, and rely on some
> other random server to invoke it when needed?
>

Yes, this is the most simple approach. You need to restrict this
stonith resource to only one cluster node (set pcmk_host_list).

> Or do I in fact create one stonith resource for each server, and that resource
> then means that this server can shut down any other server?
>

If a stonith agent supports mapping between cluster nodes and IP
addresses (or whatever is needed to identify the correct instance to
kill the selected cluster node) - this would be an option. I do not
think either ssh or IPMI agents support it though.

> Or, do I need to create 6 x 7 = 42 stonith resources so that any machine can
> shut down any other?
>

No, that is not needed, by default any node can use any stonith agent.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] [External] : Re: Fence Agent tests

2022-11-09 Thread Andrei Borzenkov

On Mon, Nov 7, 2022 at 5:07 PM Robert Hayden  wrote:
>
>
> > -Original Message-
> > From: Users  On Behalf Of Valentin Vidic
> > via Users
> > Sent: Sunday, November 6, 2022 5:20 PM
> > To: users@clusterlabs.org
> > Cc: Valentin Vidić 
> > Subject: Re: [ClusterLabs] [External] : Re: Fence Agent tests
> >
> > On Sun, Nov 06, 2022 at 09:08:19PM +, Robert Hayden wrote:
> > > When SBD_PACEMAKER was set to "yes", the lack of network connectivity
> > to the node
> > > would be seen and acted upon by the remote nodes (evicts and takes
> > > over ownership of the resources).  But the impacted node would just
> > > sit logging IO errors.  Pacemaker would keep updating the /dev/watchdog
> > > device so SBD would not self evict.   Once I re-enabled the network, then
> > the
> >
> > Interesting, not sure if this is the expected behaviour based on:
> >
> > https://urldefense.com/v3/__https://lists.clusterlabs.org/pipermail/users/2
> > 017-
> > August/022699.html__;!!ACWV5N9M2RV99hQ!IvnnhGI1HtTBGTKr4VFabWA
> > LeMfBWNhcS0FHsPFHwwQ3Riu5R3pOYLaQPNia-
> > GaB38wRJ7Eq4Q3GyT5C3s8y7w$
> >
> > Does SBD log "Majority of devices lost - surviving on pacemaker" or
> > some other messages related to Pacemaker?
>
> Yes.
>
> >
> > Also what is the status of Pacemaker when the network is down? Does it
> > report no quorum or something else?
> >
>
> Pacemaker on the failing node shows quorum even though it has lost
> communication to the Quorum Device and to the other node in the cluster.
> The non-failing node of the cluster can see the Quorum Device system and
> thus correctly determines to fence the failing node and take over its
> resources.
>
> Only after I run firewall-cmd --panic-off, will the failing node start to log
> messages about loss of TOTEM and getting a new consensus with the
> now visible members.
>

Where exactly do you use firewalld panic mode? You have hosts, you
have VM, you have qnode ...

Have you verified that the network is blocked bidirectionally? I had
rather mixed experience with asymmetrical firewalls which resembles
your description.

Also it may depend on the corosync driver in use.

> I think all of that explains the lack of self-fencing when the sbd setting of
> SBD_PACEMAKER=yes is used.
>

Correct. This means that at least under some conditions
pacemaker/corosync fail to detect isolation.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fence Agent tests

2022-11-05 Thread Andrei Borzenkov


On 04.11.2022 23:46, Robert Hayden wrote:

I am working on a Fencing agent for the Oracle Cloud Infrastructure (OCI) 
environment to complete power fencing of compute instances.  The only fencing 
setups I have seen for OCI are using SBD, but that is insufficient with full 
network interruptions since OCI uses iSCSI to write/read to the SBD disk.



Out of curiosity - why is it insufficient? If cluster node is completely 
isolated, it should commit suicide. If host where cluster node is 
running is completely isolated, then you cannot do anything with this 
host anyway.


I am not familiar with OCI architecture so I may be missing something 
obvious here.



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Andrei Borzenkov

On 24.08.2022 08:13, Lentes, Bernd wrote:
> 
> 
> - On 24 Aug, 2022, at 07:03, arvidjaar arvidj...@gmail.com wrote:
> 
>> On 24.08.2022 07:34, Lentes, Bernd wrote:
>>>
>>>
>>> - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote:
>>>
>>>
 The stop-all-resources cluster property is set to true. Is that 
 intentional?
>>> OMG. Thanks Reid !
>>>
>>> But unfortunately not all virtual domains are running:
>>>
>>
>> what exactly is not clear in this output? All these resources are
>> explicitly disabled (target-role=stopped) and so will not be started.
>>
> That's clear. But a manual "crm resource start virtual_domain" should start 
> them,
> but it doesn't.
> 

There is no resource with the name "virtual_domain" in your list. All
non-active resources in your list are either disabled or unmanaged.
Without actual commands that list resource state before "crm resource
start", "crm resource start" itself and once more resource state after
this command any answer will be just a wild guess.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Cluster does not start resources

2022-08-23 Thread Andrei Borzenkov

On 24.08.2022 07:34, Lentes, Bernd wrote:
> 
> 
> - On 24 Aug, 2022, at 05:33, Reid Wahl nw...@redhat.com wrote:
> 
> 
>> The stop-all-resources cluster property is set to true. Is that intentional?
> OMG. Thanks Reid ! 
> 
> But unfortunately not all virtual domains are running:
> 

what exactly is not clear in this output? All these resources are
explicitly disabled (target-role=stopped) and so will not be started.

> Stack: corosync
> Current DC: ha-idg-2 (version 
> 1.1.24+20210811.f5abda0ee-3.21.9-1.1.24+20210811.f5abda0ee) - partition with 
> quorum
> Last updated: Wed Aug 24 06:14:37 2022
> Last change: Wed Aug 24 06:04:24 2022 by root via cibadmin on ha-idg-1
> 
> 2 nodes configured
> 40 resource instances configured (21 DISABLED)
> 
> Node ha-idg-1: online
> fence_ilo_ha-idg-2  (stonith:fence_ilo2):   Started fenct 
> ha-idg-2 mit ILO
> dlm (ocf::pacemaker:controld):  Started
> clvmd   (ocf::heartbeat:clvm):  Started
> vm-mausdb   (ocf::lentes:VirtualDomain):Started
> fs_ocfs2(ocf::lentes:Filesystem.new):   Started
> vm-nc-mcd   (ocf::lentes:VirtualDomain):Started
> fs_test_ocfs2   (ocf::lentes:Filesystem.new):   Started
> gfs2_snap   (ocf::heartbeat:Filesystem):Started
> gfs2_share  (ocf::heartbeat:Filesystem):Started
> Node ha-idg-2: online
> fence_ilo_ha-idg-1  (stonith:fence_ilo4):   Started fenct 
> ha-idg-1 mit ILO
> clvmd   (ocf::heartbeat:clvm):  Started
> dlm (ocf::pacemaker:controld):  Started
> vm-sim  (ocf::lentes:VirtualDomain):Started
> gfs2_snap   (ocf::heartbeat:Filesystem):Started
> fs_ocfs2(ocf::lentes:Filesystem.new):   Started
> gfs2_share  (ocf::heartbeat:Filesystem):Started
> vm-seneca   (ocf::lentes:VirtualDomain):Started
> vm-ssh  (ocf::lentes:VirtualDomain):Started
> 
> Inactive resources:
> 
>  Clone Set: ClusterMon-clone [ClusterMon-SMTP]
>  Stopped (disabled): [ ha-idg-1 ha-idg-2 ]
> vm-geneious (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-idcc-devel   (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-genetrap (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-mouseidgenes (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-greensql (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-severin  (ocf::lentes:VirtualDomain):Stopped (disabled)
> ping_19216810010(ocf::pacemaker:ping):  Stopped (disabled)
> ping_19216810020(ocf::pacemaker:ping):  Stopped (disabled)
> vm_crispor  (ocf::heartbeat:VirtualDomain): Stopped (unmanaged)
> vm-dietrich (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-pathway  (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-crispor-server   (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-geneious-license (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-amok (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-geneious-license-mcd (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-documents-oo (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm_snipanalysis (ocf::lentes:VirtualDomain):Stopped (disabled, unmanaged)
> vm-photoshop(ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-check-mk (ocf::lentes:VirtualDomain):Stopped (disabled)
> vm-encore   (ocf::lentes:VirtualDomain):Stopped (disabled)
> 
> Migration Summary:
> * Node ha-idg-1:
> * Node ha-idg-2:
> 
> Also a manual "crm resource start" wasn't successfull.
> 
> Bernd
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-18 Thread Andrei Borzenkov

On 17.08.2022 16:58, Miro Igov wrote:
> As you guessed i am using crm res stop nfs_export_1. 
> I tried the solution with attribute and it does not work correct.
> 

It does what you asked for originally, but you are shifting the
goalposts ...

> When i stop nfs_export_1 it stops data_1 data_1_active, then it starts
> data_2_failover - so far so good.
> 
> When i start nfs_export_1 it starts data_1, starts data_1_active and then
> stops data_2_failover as result of order data_1_active_after_data_1 and
> location data_2_failover_if_data_1_inactive.
> 
> But stopping data_2_failover unmounts the mount and end result is having no
> NFS export mounted:
> 

Nowhere before did you mention that you have two resources managing the
same mount point.

...
> Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Running
> start for nas-sync-test1:/home/pharmya/NAS on
> /data/synology/pharmya_office/NAS_Sync/NAS
> Aug 17 15:24:52 intranet-test1 Filesystem(data_1)[16382]: INFO: Filesystem
> /data/synology/pharmya_office/NAS_Sync/NAS is already mounted.
...
> Aug 17 15:24:52 intranet-test1 Filesystem(data_2_failover)[16456]: INFO:
> Trying to unmount /data/synology/pharmya_office/NAS_Sync/NAS
> Aug 17 15:24:52 intranet-test1 systemd[1]:
> data-synology-pharmya_office-NAS_Sync-NAS.mount: Succeeded.

This configuration is wrong - period. Filesystem agent monitor action
checks for mounted mountpoint, so pacemaker cannot determine which
resource is started. You may get away with it because by default
pacemaker does not run recurrent monitor for inactive resource, but any
probe will give wrong results.

It is almost always wrong to have multiple independent pacemaker
resources managing the same underlying physical resource.

It looks like you attempt to reimplement high available NFS server on
client side. If you insist on this, I see as the only solution separate
resource agent that monitors state of export/data resources and sets
attribute accordingly. But effectively you will be duplicating pacemaker
logic.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Start resource only if another resource is stopped

2022-08-11 Thread Andrei Borzenkov

On 11.08.2022 17:34, Miro Igov wrote:
> Hello,
> 
> I am trying to create failover resource that would start if another resource
> is stopped and stop when the resource is started back.
> 
> It is 4 node cluster (with qdevice) where nodes are virtual machines and two
> of them are hosted in a datacenter and the other 2 VMs in another
> datacenter.
> 
> Names of the nodes are:
> 
> nas-sync-test1
> 
> intranet-test1
> 
> nas-sync-test2
> 
> intranet-test2
> 
> The nodes ending with 1 are hosted in same datacenter and ending in 2 are in
> the other datacenter.
> 
>  
> 
> nas-sync-test* nodes are running NFS servers and exports:
> 
> nfs_server_1, nfs_export_1 (running on nas-sync-test1)
> 
> nfs_server_2, nfs_export_2 (running on nas-sync-test2)
> 
>  
> 
> intranet-test1 is running NFS mount data_1 (mounting the nfs_export_1),
> intranet-test2 is running data_2 (mounting nfs_export_2).
> 
> I created data_1_failover which is mounting the nfs_export_1 too and would
> like to be running on intranet-test2 ONLY if data_2 is down. So the idea is
> it mounts nfs_export_1 on intranet-test2 only when the local mount data_2 is
> stopped (note the nfs_server_1 runs on one datacenter and intranet-test2 in
> the another DC)
> 
> Also created data_2_failover with the same purpose as data_1_failover.
> 
>  
> 
> I would like to ask how to set the failover mounts automatically start when
> ordinary mounts stop?
> 
>  
> 
> Current configuration of the constraints:
> 
>  
> 
> tag all_mounts data_1 data_2 data_1_failover data_2_failover
> 
> tag sync_1 nfs_server_1 nfs_export_1
> 
> tag sync_2 nfs_server_2 nfs_export_2
> 
> location deny_data_1 data_1 -inf: intranet-test2
> 
> location deny_data_2 data_2 -inf: intranet-test1
> 
> location deny_failover_1 data_1_failover -inf: intranet-test1
> 
> location deny_failover_2 data_2_failover -inf: intranet-test2
> 
> location deny_sync_1 sync_1 \
> 
> rule -inf: #uname ne nas-sync-test1
> 
> location deny_sync_2 sync_2 \
> 
> rule -inf: #uname ne nas-sync-test2
> 
> location mount_on_intranet all_mounts \
> 
> rule -inf: #uname eq nas-sync-test1 or #uname eq nas-sync-test2
> 
>  
> 
> colocation nfs_1 inf: nfs_export_1 nfs_server_1
> 
> colocation nfs_2 inf: nfs_export_2 nfs_server_2
> 
>  
> 
> order nfs_server_export_1 Mandatory: nfs_server_1 nfs_export_1
> 
> order nfs_server_export_2 Mandatory: nfs_server_2 nfs_export_2
> 
> order mount_1 Mandatory: nfs_export_1 data_1
> 
> order mount_1_failover Mandatory: nfs_export_1 data_1_failover
> 
> order mount_2 Mandatory: nfs_export_2 data_2
> 
> order mount_2_failover Mandatory: nfs_export_2 data_2_failover
> 
>  
> 
>  
> 
> I tried adding following colocation:
> 
>colocation failover_1 -inf: data_2_failover data_1
> 

This colocation does not say "start data_2_failover when data_1 is
stopped". This colocation says "do not allocate data_2_failover to the
same node where data_1 is already allocated". There is difference
between "resource A can run on node N" and "resource A is active on node N".

> and it is stopping data_2_failover when data_1 is started, also it starts
> data_2_failover when data_1 is stopped - exactly as needed!
> 
> Full List of Resources:
> 
>   * admin-ip(ocf::heartbeat:IPaddr2):Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1(ocf::heartbeat:exportfs):   Started
> nas-sync-test1
> 
>   * nfs_server_1(systemd:nfs-server):Started nas-sync-test1
> 
>   * nfs_export_2(ocf::heartbeat:exportfs):   Started
> nas-sync-test2
> 
>   * nfs_server_2(systemd:nfs-server):Started nas-sync-test2
> 
>   * data_1_failover (ocf::heartbeat:Filesystem): Started
> intranet-test2
> 
>   * data_2_failover (ocf::heartbeat:Filesystem): Stopped
> 
>   * data_2  (ocf::heartbeat:Filesystem): Started intranet-test2
> 
>   * data_1  (ocf::heartbeat:Filesystem): Started intranet-test1
> 
>  
> 

For the future - it is much better to simply copy and paste actual
commands you used with their output. While we may guess that you used
"crm resource stop" or equivalent command, it is just a guess. Any
conclusion based on this guess will be wrong if we guessed wrong.

>  
> 
> Full List of Resources:
> 
>   * admin-ip(ocf::heartbeat:IPaddr2):Started intranet-test2
> 
>   * stonith-sbd (stonith:external/sbd):  Started intranet-test1
> 
>   * nfs_export_1(ocf::heartbeat:exportfs):   Started
> nas-sync-test1
> 
>   * nfs_server_1(systemd:nfs-server):Started nas-sync-test1
> 
>   * nfs_export_2(ocf::heartbeat:exportfs):   Started
> nas-sync-test2
> 
>   * nfs_server_2(systemd:nfs-server):Started nas-sync-test2
> 
>   * data_1_failover (ocf::heartbeat:Filesystem): Started
> intranet-test2
> 
>   * data_2_failover (ocf::heartbeat:Filesystem): Started
> intranet-test1
> 
>   * data_2

Re: [ClusterLabs] Antw: [EXT] node1 and node2 communication time question

2022-08-10 Thread Andrei Borzenkov

On 10.08.2022 09:37, Ulrich Windl wrote:
> Unfortunately the documentation for fencing agents leaves verymuch to be 
> desired:
> When I tried to write one myself, I just stopped due to lack of details.
> 

It is not about writing own agent but about using existing ones. There
are enough fencing agents for most common use cases, no need to write
anything from scratch.

Do not confuse newbies even more ...
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] 2-Node Cluster - fencing with just one node running ?

2022-08-04 Thread Andrei Borzenkov

On 04.08.2022 16:06, Lentes, Bernd wrote:
> 
> - On 4 Aug, 2022, at 00:27, Reid Wahl nw...@redhat.com wrote:
> 
>>
>> Such constraints are unnecessary.
>>
>> Let's say we have two stonith devices called "fence_dev1" and
>> "fence_dev2" that fence nodes 1 and 2, respectively. If node 2 needs
>> to be fenced, and fence_dev2 is running on node 2, node 1 will still
>> use fence_dev2 to fence node 2. The current location of the stonith
>> device only tells us which node is running the recurring monitor
>> operation for that stonith device. The device is available to ALL
>> nodes, unless it's disabled or it's banned from a given node. So these
>> constraints serve no purpose in most cases.
> 
> Would do you mean by "banned" ? "crm resource ban ..." ?
> Is that something different than a location constraint ?
> 

"crm resource ban" creates location constraint, but not every location
constraint is created by "crm resource ban".

>> If you ban fence_dev2 from node 1, then node 1 won't be able to use
>> fence_dev2 to fence node 2. Likewise, if you ban fence_dev1 from node
>> 1, then node 1 won't be able to use fence_dev1 to fence itself.
>> Usually that's unnecessary anyway, but it may be preferable to power
>> ourselves off if we're the last remaining node and a stop operation
>> fails.
> So banning a fencing device from a node means that this node can't use the 
> fencing device ?
>  

Correct. Node where fencing device is not allowed cannot be selected to
use this fencing device to perform fencing.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Q: About a false negative of storage_mon

2022-08-03 Thread Andrei Borzenkov

On 03.08.2022 09:02, Ulrich Windl wrote:
 Ken Gaillot  schrieb am 02.08.2022 um 16:09 in
> Nachricht
> <0a2125a43bbfc09d2ca5bad1a693710f00e33731.ca...@redhat.com>:
>> On Tue, 2022-08-02 at 19:13 +0900, 井上和徳 wrote:
>>> Hi,
>>>
>>> Since O_DIRECT is not specified in open() [1], it reads the buffer
>>> cache and
>>> may result in a false negative. I fear that this possibility
>>> increases
>>> in environments with large buffer cache and running disk-reading
>>> applications
>>> such as database.
>>>
>>> So, I think it's better to specify O_RDONLY|O_DIRECT, but what about
>>> it?
>>> (in this case, lseek() processing is unnecessary.)
>>>
>>> # I am ready to create a patch that works with O_DIRECT. Also, I
>>> wouldn't mind
>>> # a "change to add a new mode of inspection with O_DIRECT
>>> # (add a option to storage_mon) while keeping the current inspection
>>> process".
>>>
>>> [1] 
>>>
>>
> https://github.com/ClusterLabs/resource-agents/blob/main/tools/storage_mon.c#
> 
>> L47-L90
>>>
>>> Best Regards,
>>> Kazunori INOUE
>>
>> I agree, it makes sense to use O_DIRECT when available. I don't think
>> an option is necessary.
>>
>> However, O_DIRECT is not available on all OSes, so the configure script
>> should detect support. Also, it is not supported by all filesystems, so
>> if the open fails, we should retry without O_DIRECT.
> 
> I just looked it up: It seems POSIX has O_RSYNC and O_SYNC and O_DSYNC)
> instead.

That is something entirely different. O_SYNC etc are about *file system
level*, while O_DIRECT is about *device* level. O_DIRECT makes process
to talk directly to device. It is unclear whether this is side effect of
implementation or intentional.

> The buffer cache handling may be different though.
> 

Synchronous operation does not actually imply media access.

O_RSYNC: "the operation has been completed or diagnosed if unsuccessful.
The read is complete only when an image of the data has been
successfully transferred to the requesting process". Returning buffered
data satisfies this definition. Besides, Linux does not support O_RSYNC.

O_DSYNC: "the operation has been completed or diagnosed if unsuccessful.
The write is complete only when the data specified in the write request
is successfully transferred and all file system information required to
retrieve the data is successfully transferred". Writing to journal
located on external device seems to comply with this definition.

O_SYNC simply adds filesystem metadata update completion.

So no, O_SYNC & Co cannot replace O_DIRECT.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Fencing for quorum device?

2022-07-15 Thread Andrei Borzenkov

On 15.07.2022 09:24, Viet Nguyen wrote:
> Hi,
> 
> I just wonder that do we need to have fencing for a quorum device? I have 2
> node cluster with one quorum device. Both 2 nodes have fencing agents.
> 
> But I wonder that should i define the fencing agent for quorum device or
> not? 

You cannot. Quorum device is not part of pacemaker cluster, so there is
no way to initiate fencing of quorum device.

> Just in case it is laggy...
> 
> Thank you so much!
> 
> Regards,
> Viet
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Question regarding the security of corosync

2022-06-21 Thread Andrei Borzenkov

On 22.06.2022 02:27, Antony Stone wrote:
> On Friday 17 June 2022 at 11:39:14, Mario Freytag wrote:
> 
>> I’d like to ask about the security of corosync. We’re using a Proxmox HA
>> setup in our testing environment and need to confirm it’s compliance with
>> PCI guidelines.
>>
>> We have a few questions:
>>
>> Is the communication encrypted?
>> What method of encryption is used?
>> What method of authentication is used?
>> What is the recommended way of separation for the corosync network? VLAN?
> 
> Your first three questions are probably well-answered by 
> https://github.com/fghaas/corosync/blob/master/SECURITY
> 

This is thirteen years old file which is not present in the current
corosync sources. I hesitate to use it as the answer to anything
*today*. If it is still relevant, why it was removed?

> For the fourth, I agree with Jan Friesse - a dedicated physical network is 
> best; a dedicated VLAN is second best.
> 
> 
> Antony.
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] related to fencing in general , docker containers

2022-06-19 Thread Andrei Borzenkov

On 19.06.2022 18:24, Sridhar K wrote:
> Team,
> 
> Trying to test docker resource agent , getting below error , any issues
> w.r.t below command ?
> 
> [root@5e8851dac338 /]# pcs resource create test_docker2
> ocf:heartbeat:docker image=nginx reuse=true  op monitor interval=30s
> 
> Docker container nginx is running
> [root@5e8851dac338 /]# docker ps
> CONTAINER ID   IMAGE COMMAND  CREATED STATUS
>   PORTSNAMES
> 3adb20c8a3fc   nginx "/docker-entrypoint.…"   2 minutes ago   Up 2
> minutes   0.0.0.0:80->80/tcp   mynginx1
> [
> 
> pcs status command :
> Failed Resource action
> * test_docker2_stop_0 on 450196e74dc0 'unknown error' (1): call=42,
> status=complete, exitreason='Docker service is in error state while
> checking for test_docker2, based on image, nginx: Client: Docker Engine -
> Community',

This error is string is returned by resource agent when "docker version"
returns 1.

> last-rc-change='Sun Jun 19 20:51:58 2022', queued=0ms, exec=97ms
> 
> regards
> Sridhar
> 
> 
> 
> 
> 
> 
> On Sun, 19 Jun 2022 at 15:40, Sridhar K  wrote:
> 
>> Thank you Ken,Andrei for the response
>>
>> As of now, I have completed the below  (below is for POC purpose) Ideally
>> there will be 3 VM's and each VM will have one docker container
>> [pacemaker + corosync + sqlServer] same network
>> 1.Created a docker image with [pacemaker + corosync + sqlServer]
>> 2. Created a docker network and brought up 3 containers of the above image
>> which are part of the created network
>> 3. Created  a pacemaker cluster out of 3 containers
>> 4. Create AG group of sqlserver
>> 5. Tested failover of Sqlserver , by making a container(i,e pcs cluster
>> standby container id) to standby , failover was working.
>>
>> All above was in a VM , now need to think w.r.t fencing
>> As the pacemaker is running within the docker container anything w.r.t
>> fencing may not help
>> next was checking, how about doing something w.r.t fencing at VM level so
>> that fencing can kill/restart the docker container as required.
>>
>> considering 3 VMs
>> If 1 VM itself goes down , SqlServer will be still up as two VM's are fine
>> and within them, docker containers (pacemaker + corosync + sqlServer) are
>> running
>> If 2 VM goes down , then SqlServer will not be available.
>>
>>
>> Related to Bundles
>> I tried, and got an error posted the same  in the forum, also in Bundles
>> required number of replicas will be spawned, How can I configure out of 3
>> replicas which will be my master Sqlserver and the rest will be replicas
>>
>>
>> Hi Ken,
>> will check this, if am able to fence docker containers(pacemaker +
>> corosync + sqlServer), it should be of some help, and will check how to
>> handle VM failures
>>
>> You
>>> will probably need one fencing agent for each physical host where
>>> docker
>>> is running and map cluster nodes (containers) to the correct agent
>>> (i.e.
>>> physical host).
>>
>> Regards
>> Sridhar
>>
>>
>>
>>
>> On Fri, 17 Jun 2022 at 20:29, Andrei Borzenkov 
>> wrote:
>>
>>> On 17.06.2022 16:53, Sridhar K wrote:
>>>> Hi Team,
>>>>
>>>> Please share any pointers, references, example usage's w.r.t fencing in
>>>> general and its use w.r.t docker containers.
>>>>
>>>> referring as of now
>>>> https://clusterlabs.org/pacemaker/doc/crm_fencing.html
>>>>
>>>> need to check the feasibility of fencing w.r.t docker containers
>>>>
>>>
>>> There is fence_docker, you will need to configure docker on each
>>> physical host to accept remote connections from each cluster node. You
>>> will probably need one fencing agent for each physical host where docker
>>> is running and map cluster nodes (containers) to the correct agent (i.e.
>>> physical host).
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>>
>>
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] related to fencing in general , docker containers

2022-06-17 Thread Andrei Borzenkov

On 17.06.2022 16:53, Sridhar K wrote:
> Hi Team,
> 
> Please share any pointers, references, example usage's w.r.t fencing in
> general and its use w.r.t docker containers.
> 
> referring as of now
> https://clusterlabs.org/pacemaker/doc/crm_fencing.html
> 
> need to check the feasibility of fencing w.r.t docker containers
> 

There is fence_docker, you will need to configure docker on each
physical host to accept remote connections from each cluster node. You
will probably need one fencing agent for each physical host where docker
is running and map cluster nodes (containers) to the correct agent (i.e.
physical host).

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov

On 08.06.2022 17:01, Ken Gaillot wrote:
> On Wed, 2022-06-08 at 18:31 +0530, Sridhar K wrote:
>> Hi Team,
>>
>> Required guidance w.r.t below problem statement
>>
>> Need to have a HA setup for SQLServer running as a docker container
>> and HA managed by the Pacemaker which is running as a separate docker
>> container.
>>
>> I have done a setup where pacemaker and SQL Server are running as a
>> single docker container, able to achieve HA.
>>
>> How to achieve the same when Pacemaker , and Sqlserver are running in
>> different containers.
> 
> I suspect it's not feasible.
> 
> At a minimum, the Pacemaker container needs to run corosync as well as
> pacemaker (implying a custom pid 1 script that starts both) and needs
> to be privileged. I'm not sure corosync has been successfully
> containerized before.
> 
> Bundles won't work because they can only run on Pacemaker nodes.
> 
> The db container could be configured as a remote node, basically
> reproducing how a bundle is created internally. The db container would
> be configured with pacemaker-remoted as its pid 1, and an IP given to
> it that both pacemaker containers can reach. Launching all the
> containers would need to be done by the OS at boot or manually.
> 
> An ocf:pacemaker:remote resource would be configured in the cluster to
> allow the pacemaker containers to manage the db via pacemaker-remoted
> in the db containers.
> 
> A custom fence agent would be needed to allow a pacemaker container to
> ask the VM to reboot (kill and relaunch) any other container. Each VM

Actually there is fence_docker which should work in this case.

> and VM host would become a single point of failure unless a pacemaker
> container could fence the VM and then the host as fallback fence
> mechanisms in a topology.
> 
> E.g. try to fence the container -> if that fails, try to fence the VM
> -> if that fails, try to fence the host. Without all of that working,
> something becomes a single point of failure.
> 
> A preferred setup would be to run corosync and pacemaker on the VMs,
> and configure bundles for the db containers.
> 
>>
>> Checked remote node,bundle  concepts in Pacemaker unable to make HA
>> setup work.
>>
>> Please let me know whether the above scenario can be handled, any
>> links, examples would be of great help.
>>
>> Attaching a picture that depicts the scenario.
>>
>> Please do the needful, Thank you
>>
>> Regards
>> Sridhar
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov

On 08.06.2022 16:57, Sridhar K wrote:
> Thank you Andrei, for the response
> 
> All the databases(1 primary, other will be replica) will be part of
> SqlServer Availability group which will have an Availability  listener IP ,
> this IP will be the Virtual IP resource added in PCS , which clients connect
> 
> When PCS or DB container in VM1 dies . the Virtual IP will route traffic to
> AG listener and PCS should ideally promote replica to act as primary. (this
> will handle by the sql ha resource agent)
> 

pacemaker cannot promote replica until it knows that primary is no more
running.

> Screen Shot : Able to setup HA when both PCS and sqlserver are in same
> container.
> 
> PCS cluster process with sqlserver process in a docker container.
> the cluster is made of two docker containers
> 
> Not getting how to make HA working when PCS and sqlserver are running as
> different containers
> 
> Regards
> Sridharan
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, 8 Jun 2022 at 18:52, Andrei Borzenkov  wrote:
> 
>> On Wed, Jun 8, 2022 at 4:01 PM Sridhar K  wrote:
>>>
>>> Hi Team,
>>>
>>> Required guidance w.r.t below problem statement
>>>
>>> Need to have a HA setup for SQLServer running as a docker container and
>> HA managed by the Pacemaker which is running as a separate docker container.
>>>
>>
>> It is very unlikely to be useful in a production environment (see
>> below). What is your actual use case?
>>
>>> I have done a setup where pacemaker and SQL Server are running as a
>> single docker container, able to achieve HA.
>>>
>>> How to achieve the same when Pacemaker , and Sqlserver are running in
>> different containers.
>>>
>>> Checked remote node,bundle  concepts in Pacemaker unable to make HA
>> setup work.
>>>
>>
>> "Unable to make it work" does not provide much information about what
>> you did and what problem you encountered. But at the very least,
>> pacemaker needs to control docker to manage containers. And docker
>> does not run inside of containers, does it? So how is the pacemaker in
>> the PCS container supposed to access docker that runs on host?
>>
>> Remote node should work as long as connectivity between containers is
>> available.
>>
>>> Please let me know whether the above scenario can be handled, any links,
>> examples would be of great help.
>>>
>>> Attaching a picture that depicts the scenario.
>>>
>>
>> The PCS container on VM1 dies. How exactly are you going to ensure
>> that the DB container on VM1 is stopped so that DB on VM2 can take
>> over?
>>
>>> Please do the needful, Thank you
>>>
>>> Regards
>>> Sridhar
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Required guidance w.r.t pacemaker

2022-06-08 Thread Andrei Borzenkov

On Wed, Jun 8, 2022 at 4:01 PM Sridhar K  wrote:
>
> Hi Team,
>
> Required guidance w.r.t below problem statement
>
> Need to have a HA setup for SQLServer running as a docker container and HA 
> managed by the Pacemaker which is running as a separate docker container.
>

It is very unlikely to be useful in a production environment (see
below). What is your actual use case?

> I have done a setup where pacemaker and SQL Server are running as a single 
> docker container, able to achieve HA.
>
> How to achieve the same when Pacemaker , and Sqlserver are running in 
> different containers.
>
> Checked remote node,bundle  concepts in Pacemaker unable to make HA setup 
> work.
>

"Unable to make it work" does not provide much information about what
you did and what problem you encountered. But at the very least,
pacemaker needs to control docker to manage containers. And docker
does not run inside of containers, does it? So how is the pacemaker in
the PCS container supposed to access docker that runs on host?

Remote node should work as long as connectivity between containers is available.

> Please let me know whether the above scenario can be handled, any links, 
> examples would be of great help.
>
> Attaching a picture that depicts the scenario.
>

The PCS container on VM1 dies. How exactly are you going to ensure
that the DB container on VM1 is stopped so that DB on VM2 can take
over?

> Please do the needful, Thank you
>
> Regards
> Sridhar
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov

On 07.06.2022 11:50, Klaus Wenninger wrote:
>>
>> From the documentation is not clear to me whether this would be:
>> a) multiple fencing where ipmi would be first level and sbd would be a 
>> second level fencing (where sbd always succeeds)
>> b) or this is considered a single level fencing with a timeout
> 
> With b) falling back to watchdog-fencing wouldn't work properly
> although I remember
> some recent change that might make it fall back without issues.

b) works here:

Jun 07 17:35:50 ha2 pacemaker-controld[7069]:  notice: Requesting
fencing (reboot) of node qnetd

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Client
pacemaker-controld.7069 wants to fence (reboot) qnetd using any device

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Requesting peer
fencing (reboot) targeting qnetd

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: watchdog is not
eligible to fence (reboot) qnetd: static-list

Jun 07 17:35:50 ha2 pacemaker-schedulerd[7068]:  warning: Calculated
transition 14 (with warnings), saving inputs in
/var/lib/pacemaker/pengine/pe-warn-95.bz2

Jun 07 17:35:50 ha2 pacemaker-fenced[7065]:  notice: Requesting that ha1
perform 'reboot' action targeting qnetd

Jun 07 17:35:53 ha2 pacemaker-fenced[7065]:  notice: Requesting that ha2
perform 'reboot' action targeting qnetd

Jun 07 17:35:53 ha2 pacemaker-fenced[7065]:  notice: watchdog is not
eligible to fence (reboot) qnetd: static-list

Jun 07 17:35:55 ha2 stonith[11138]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1

Jun 07 17:35:57 ha2 stonith[11142]: external_reset_req: '_dummy reset'
for host qnetd failed with rc 1

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  error: Operation 'reboot'
[11141] targeting qnetd using dummy_stonith returned 1

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  warning:
dummy_stonith[11141] [ Performing: stonith -t external/_dummy -E -T
reset qnetd ]

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  warning:
dummy_stonith[11141] [ failed: qnetd 5 ]

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  notice: Couldn't find
anyone to fence (reboot) qnetd using any device

Jun 07 17:35:57 ha2 pacemaker-fenced[7065]:  notice: Waiting 10s for
qnetd to self-fence (reboot) for client pacemaker-controld.7069

Jun 07 17:36:07 ha2 pacemaker-fenced[7065]:  notice: Self-fencing
(reboot) by qnetd for pacemaker-controld.7069 assumed complete

Jun 07 17:36:07 ha2 pacemaker-fenced[7065]:  notice: Operation 'reboot'
targeting qnetd by ha2 for pacemaker-controld.7069@ha2: OK (complete)

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Fence operation 7
for qnetd passed

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Transition 14
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pacemaker/pengine/pe-warn-95.bz2): Complete

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: State transition
S_TRANSITION_ENGINE -> S_IDLE

Jun 07 17:36:07 ha2 pacemaker-controld[7069]:  notice: Peer qnetd was
terminated (reboot) by ha2 on behalf of pacemaker-controld.7069@ha2: OK



The only gotcha is this stray error after everything have already completed.


Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  notice: Peer's 'reboot'
action targeting qnetd for client pacemaker-controld.7069 timed out

Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  notice: Couldn't find
anyone to fence (reboot) qnetd using any device

Jun 07 17:37:05 ha2 pacemaker-fenced[7065]:  error:
request_peer_fencing: Triggered fatal assertion at fenced_remote.c:1799
: op->state < st_done

bor@bor-Latitude-E5450:~/src/ClusterLabs/pacemaker$

> I would try to go for a) as with a reasonably current
> pacemaker-version (iirc 2.1.0 and above)
> you should be able to make the watchdog-fencing-device visible as with
> other fencing-devices

Yep.

dummy_stonith

watchdog

2 fence devices found



> (just use fence_watchdog as the fence-agent - still implemented inside
> pacemaker
> fence-watchdog-binary actually just provides the meta-data).
> Like this you can limit watchdog-fencing to certain-nodes that do
> actually provide a proper
> hardware-watchdog and you can add it to a topology.
> 

Well, as could be seen from above even though "watchdog" is not
eligible, pacemaker is still using it. So I am not sure it will work.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] fencing configuration

2022-06-07 Thread Andrei Borzenkov

On 07.06.2022 11:26, Zoran Bošnjak wrote:
> 
> In the test scenario, the dummy resource is currently running on node1. I 
> have simulated node failure by unplugging the ipmi AND host network 
> interfaces from node1. The result was that node1 gets rebooted (by watchdog), 
> but the rest of the pacemaker cluster was unable to fence node1 (this is 
> expected, since node1's ipmi is not accessible). The problem is that dummy 
> resource remains stopped and node1 unclean. I was expecting that 
> stonith-watchdog-timeout kicks in, so that dummy resource gets restarted on 
> some other node which has quorum. 
> 

I cannot reproduce it, watchdog fencing works here as expected.

> Obviously there is something wrong with my configuration, since this seems to 
> be a reasonably simple scenario for the pacemaker. Appreciate your help.
> 

It is impossible to say anything without logs.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov

On 03.06.2022 16:51, Zoran Bošnjak wrote:
> Thanks for all your answers. Sorry, my mistake. The ipmi_watchdog is indeed 
> OK. I was first experimenting with "softdog", which is blacklisted. So the 
> reasonable question is how to properly start "softdog" on ubuntu.
> 

blacklist prevents autoloading of modules by alias during hardware
detection. Neither softdog or ipmi_watchdog have any alias so they
cannot be autoloaded and blacklist is irrelevant here.

> The reason to unload watchdog module (ipmi or softdog) is that there seems to 
> be a difference between normal reboot and watchdog reboot.
> In case of ipmi watchdog timer reboot:
> - the system hangs at the end of reboot cycle for some time
> - restart seems to be harder (like power off/on cycle), BIOS runs more 
> diagnostics at startup
> - it turns on HW diagnostic indication on the server front panel (dell 
> server) which stays on forever
> - it logs the event to IDRAC, which is unnecessary, because it was not a 
> hardware event, but just a normal reboot
> 
> In case of "sudo reboot" command, I would like to skip this... so the idea is 
> to fully stop the watchdog just before reboot. I am not sure how to do this 
> properly.
> 
> The "softdog" is better in this respect. It does not trigger nothing from the 
> list above, but I still get the message during reboot
> [ ... ] watchdog: watchdog0: watchdog did not stop!
> ... with some small timeout.
> 

The first obvious question - is there only one watchdog? Some watchdog
drivers *are* autoloaded.

Is there only one user of watchdog? systemd may use it too as example.

> So after some additional testing, the situation is the following:
> 
> - without any watchdog and without sbd package, the server reboots normally
> - with "softdog" module loaded, I only get "watchdog did not stop message" at 
> reboot
> - with "softdog" loaded, but unloaded with "ExecStop=...rmmod", reboot is 
> normal again
> - same as above, but with "sbd" package loaded, I am getting "watchdog did 
> not stop message" again
> - switching from "softdog" to "ipmi_watchdog" gets me to the original list of 
> problems
> 
> It looks like the "sbd" is preventing the watchdog to close, so that watchdog 
> triggers always, even in the case of normal reboot. What am I missing here?

While the only way I can reproduce it on my QEMU VM is "reboot -f"
(without stopping all services), there is certainly a race condition in
sbd.service.

ExecStop=@bindir@/kill -TERM $MAINPID


systemd will continue as soon as "kill" completes without waiting for
sbd to actually stop. It means systemd may complete shutdown sequence
before sbd had chance to react on signal and then simply kill it. Which
leaves watchdog armed.

For test purpose try to use script that loops until sbd is actually
stopped for ExecStop.

Note that systemd strongly recommends to use synchronous command for
ExecStop (we may argue that this should be handled by service manager
itself, but well ...).

> 
> Zoran
> 
> - Original Message -
> From: "Andrei Borzenkov" 
> To: "users" 
> Sent: Friday, June 3, 2022 11:24:03 AM
> Subject: Re: [ClusterLabs] normal reboot with active sbd does not work
> 
> On 03.06.2022 11:18, Zoran Bošnjak wrote:
>> Hi all,
>> I would appreciate an advice about sbd fencing (without shared storage).
>>
>> I am using ubuntu 20.04., with default packages from the repository 
>> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
>>
>> HW watchdog is present on servers. The first problem was to load/unload the 
>> watchdog module. For some reason the module is blacklisted on ubuntu,
> 
> What makes you think so?
> 
> bor@bor-Latitude-E5450:~$ lsb_release  -d
> 
> Description:  Ubuntu 20.04.4 LTS
> 
> bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog
> 
> bor@bor-Latitude-E5450:~$
> 
> 
> 
> 
> 
>> so I've created a service for this purpose.
>>
> 
> man modules-load.d
> 
> 
>> --- file: /etc/systemd/system/watchdog.service
>> [Unit]
>> Description=Load watchdog timer module
>> After=syslog.target
>>
> 
> Without any explicit dependencies stop will be attempted as soon as
> possible.
> 
>> [Service]
>> Type=oneshot
>> RemainAfterExit=yes
>> ExecStart=/sbin/modprobe ipmi_watchdog
>> ExecStop=/sbin/rmmod ipmi_watchdog
>>
> 
> Why on earth do you need to unload kernel driver when system reboots?
> 
>> [Install]
>> WantedBy=multi-user.target
>> ---
>>
>> Is this a proper way to load watchdog module under ub

Re: [ClusterLabs] normal reboot with active sbd does not work

2022-06-03 Thread Andrei Borzenkov

On 03.06.2022 11:18, Zoran Bošnjak wrote:
> Hi all,
> I would appreciate an advice about sbd fencing (without shared storage).
> 
> I am using ubuntu 20.04., with default packages from the repository 
> (pacemaker, corosync, fence-agents, ipmitool, pcs...).
> 
> HW watchdog is present on servers. The first problem was to load/unload the 
> watchdog module. For some reason the module is blacklisted on ubuntu,

What makes you think so?

bor@bor-Latitude-E5450:~$ lsb_release  -d

Description:Ubuntu 20.04.4 LTS

bor@bor-Latitude-E5450:~$ modprobe -c | grep ipmi_watchdog

bor@bor-Latitude-E5450:~$





> so I've created a service for this purpose.
>

man modules-load.d


> --- file: /etc/systemd/system/watchdog.service
> [Unit]
> Description=Load watchdog timer module
> After=syslog.target
> 

Without any explicit dependencies stop will be attempted as soon as
possible.

> [Service]
> Type=oneshot
> RemainAfterExit=yes
> ExecStart=/sbin/modprobe ipmi_watchdog
> ExecStop=/sbin/rmmod ipmi_watchdog
> 

Why on earth do you need to unload kernel driver when system reboots?

> [Install]
> WantedBy=multi-user.target
> ---
> 
> Is this a proper way to load watchdog module under ubuntu?
> 

There is standard way to load non-autoloaded drivers on *any* systemd
based distribution. Which is modules-load.d.

> Anyway, once the module is loaded, the /dev/watchdog (which is required by 
> 'sbd') is present.
> Next, the 'sbd' is installed by
> 
> sudo apt install sbd
> (followed by one reboot to get the sbd active)
> 
> The configuration of the 'sbd' is default. The sbd reacts to network failure 
> as expected (reboots the server). However, when the 'sbd' is active, the 
> server won't reboot normally any more. For example from the command line 
> "sudo reboot", it gets stuck at the end of the reboot sequence. There is a 
> message on the console:
> 
> ... reboot progress
> [ OK ] Finished Reboot.
> [ OK ] Reached target Reboot.
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> [ ... ] IPMI Watchdog: Unexpected close, not stopping watchdog!
> ... it gets stuck at this point
> 
> After some long timeout, it looks like the watchdog timer expires and server 
> boots, but the failure indication remains on the front panel of the server. 
> If I uninstall the 'sbd' package, the "sudo reboot" works normally again.
> 
> My question is: How do I configure the system, to have the 'sbd' function 
> present, but still be able to reboot the system normally.
> 

As the first step - do not unload watchdog driver on shutdown.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] More pacemaker oddities while stopping DC

2022-05-27 Thread Andrei Borzenkov

On 25.05.2022 09:47, Gao,Yan via Users wrote:
> On 2022/5/25 8:10, Ulrich Windl wrote:
>> Hi!
>>
>> We are still suffering from kernel RAM corruption on the Xen hypervisor when 
>> a VM or the hypervisor is doing I/O (three months since the bug report at 
>> SUSE, but no fix or workaround meaning the whole Xen cluster project was 
>> canceled after 20 years, but that's a different topic). All VMs will be 
>> migrated to VMware, dumping the whole SLES15 Xen cluster very soon.
>>
>> My script that detected RAM corruption tried to shutdown pacemaker, hoping 
>> for the best (i.e. VMs to be live-migrated away). However there are very 
>> strange decisions made 
>> (pacemaker-2.0.5+20201202.ba59be712-150300.4.21.1.x86_64):
>>
>> May 24 17:05:07 h16 VirtualDomain(prm_xen_test-jeos7)[24460]: INFO: 
>> test-jeos7: live migration to h19 succeeded.
>> May 24 17:05:07 h16 VirtualDomain(prm_xen_test-jeos9)[24463]: INFO: 
>> test-jeos9: live migration to h19 succeeded.
>> May 24 17:05:07 h16 pacemaker-execd[7504]:  notice: prm_xen_test-jeos7 
>> migrate_to (call 321, PID 24281) exited with status 0 (execution time 
>> 5500ms, queue time 0ms)
>> May 24 17:05:07 h16 pacemaker-controld[7509]:  notice: Result of migrate_to 
>> operation for prm_xen_test-jeos7 on h16: ok
>> May 24 17:05:07 h16 pacemaker-execd[7504]:  notice: prm_xen_test-jeos9 
>> migrate_to (call 323, PID 24283) exited with status 0 (execution time 
>> 5514ms, queue time 0ms)
>> May 24 17:05:07 h16 pacemaker-controld[7509]:  notice: Result of migrate_to 
>> operation for prm_xen_test-jeos9 on h16: ok
>>
>> Would you agree that the migration was successful? I'd say YES!
> 
> Maybe practically yes with what migrate_to has achieved with 
> VirtualDomain RA, but technically no from pacemaker's point of view.
> 
> Following the migrate_to on the source node, a migrate_from operation on 
> the target node and a stop operation on the source node will be needed 
> to eventually make a successful live-migration.
> 

I do not know of there is formal state machine for performing live
migration in pacemaker, but speaking about VirtualDomain RA

a) successful migrate_to means that VM is running on target node
b) migrate_from does not need anything from source node and could be run
even if the source node becomes unavailable
c) successful fencing of the source node means that resource is stopped
on the source node

So technically this could work, but this would need pacemaker to
recognize "partially migrated" resource state.

>>
>> However this is what happened:
>>
>> May 24 17:05:19 h16 pacemaker-controld[7509]:  notice: Transition 2460 
>> (Complete=16, Pending=0, Fired=0, Skipped=7, Incomplete=57, 
>> Source=/var/lib/pacemaker/pengine/pe-input-89.bz2): Stopped
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Unexpected result 
>> (error) was recorded for stop of prm_ping_gw1:1 on h16 at May 24 17:05:02 
>> 2022
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Unexpected result 
>> (error) was recorded for stop of prm_ping_gw1:1 on h16 at May 24 17:05:02 
>> 2022
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Cluster node h16 
>> will be fenced: prm_ping_gw1:1 failed there
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Unexpected result 
>> (error) was recorded for stop of prm_iotw-md10:1 on h16 at May 24 17:05:02 
>> 2022
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Unexpected result 
>> (error) was recorded for stop of prm_iotw-md10:1 on h16 at May 24 17:05:02 
>> 2022
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_ping_gw1 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_ping_gw1 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_ping_gw1 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_iotw-md10 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_iotw-md10 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Forcing 
>> cln_iotw-md10 away from h16 after 100 failures (max=100)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  notice: Resource 
>> prm_xen_test-jeos7 can no longer migrate from h16 to h19 (will stop on both 
>> nodes)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  notice: Resource 
>> prm_xen_test-jeos9 can no longer migrate from h16 to h19 (will stop on both 
>> nodes)
>> May 24 17:05:19 h16 pacemaker-schedulerd[7508]:  warning: Scheduling Node 
>> h16 for STONITH
>>
>> So the DC considers the migration to have failed, even though it was 
>> reported as success!
> 
> A so-called partial live-migration could no longer continue here.
> 
> Regards,
>Yan
> 
>> (The ping had

Re: [ClusterLabs] how does the VirtualDomain RA know with which options it's called ?

2022-05-12 Thread Andrei Borzenkov

On 12.05.2022 21:03, Lentes, Bernd wrote:
> Hi,
> 
> from my understanding the resource agents in 
> /usr/lib/ocf/resource.d/heartbeat are quite similar
> to the old scripts in /etc/init.d started by init.
> Init starts these scripts with "script [start|stop|reload|restart|status]".
> Inside the script there is a case construct which checks the options the 
> script is started with, and calls the appropriate function.
> 
> Similar to the init scripts the cluster calls the RA with "script 
> [start|stop|monitor ...]"
> But i'm missing this construct in the VirtualDomain RA. From where does it 
> know how it is invoked ?
> I don't see any logic which checks the options the script is called with.
> 

It's function ocf_rarun() that does it.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Help understanding recover of promotable resource after a "pcs cluster stop ‑‑all"

2022-05-03 Thread Andrei Borzenkov

On 03.05.2022 10:40, Ulrich Windl wrote:
> Hi!
> 
> I don't use DRBD, but I can imagine:
> If DRBD does asynchronous replication, it may make sense not to promote the
> slave as master after an interrupte dconnection (such when the master died) 
> (as
> this will cause some data loss).
> Probably it only wants to switch roles when both nodes are online to avoid
> that type of data loss (the master may have some newer data it wants to
> transfer first).
> 

Yes. See below.


 # sudo crm_mon ‑1A
 ...
 Node Attributes:
   * Node: server2:
 * master‑DRBDData : 1
>>>
>>> In the scenario you described, only server1 is up. If there is no
>>> master score for server1, it cannot be master. It's up the resource
>>> agent to set it. I'm not familiar enough with that agent to know why it
>>> might not.
>>>
>>
>> I can trivially reproduce it. When pacemaker with slave drbd instance is
>> stopped, DRBD disk state is set to "outdated". When it comes up, it will
>> not be selected for promoting. Setting master score does not work, it
>> just results in failure attempt to bring up the outdated replica. When
>> former master comes up, its disk state is "consistent" so it is selected
>> for promotion, becomes primary and synchronized with secondary.
>>
>> DRBD RA has an option to force outdated state on stop, but this option
>> is off by default as far as I can tell.
>>
>> This is probably something in DRBD configuration, but I am not familiar
>> with it on this deep level. Manually forcing primary on outdated replica
>> works and is reflected on pacemaker level (resource goes in promoted
> state).

Without any agent involved, doing "drbdadm down" on secondary instance
with active connection to primary marks it "outdated". Which is correct,
as from now on we do not know anything about state of primary. Doing
"drbdadm down" on a single replica without active connections leaves it
in "consistent" state.

When DRBD connection is active, booth replicas have "consistent" state
and when cluster nodes reboot from crash, anyone can assume master role.

I guess it is the same operational issue as with pacemaker itself - can
we shutdown both sides of DRBD leaving them in consistent state? But
even if we can, because pacemaker itself does not provide any means to
initiate such cluster-wide shutdown it would not help at all.

OTOH it is not really a big problem. Cluster reboot is manual action -
so administrator will need to manually activate remaining replica IF
ADMINISTRATOR IS SURE IT IS UP TO DATE. Rebooting individual nodes
sequentially should be OK.

>>



 Atenciosamente/Kind regards,
 Salatiel

 On Mon, May 2, 2022 at 12:26 PM Ken Gaillot 
 wrote:
> On Mon, 2022‑05‑02 at 09:58 ‑0300, Salatiel Filho wrote:
>> Hi, I am trying to understand the recovering process of a
>> promotable
>> resource after "pcs cluster stop ‑‑all" and shutdown of both
>> nodes.
>> I have a two nodes + qdevice quorum with a DRBD resource.
>>
>> This is a summary of the resources before my test. Everything is
>> working just fine and server2 is the master of DRBD.
>>
>>  * fence‑server1(stonith:fence_vmware_rest): Started
>> server2
>>  * fence‑server2(stonith:fence_vmware_rest): Started
>> server1
>>  * Clone Set: DRBDData‑clone [DRBDData] (promotable):
>>* Masters: [ server2 ]
>>* Slaves: [ server1 ]
>>  * Resource Group: nfs:
>>* drbd_fs(ocf::heartbeat:Filesystem): Started server2
>>
>>
>>
>> then I issue "pcs cluster stop ‑‑all". The cluster will be
>> stopped on
>> both nodes as expected.
>> Now I restart server1( previously the slave ) and poweroff
>> server2 (
>> previously the master ). When server1 restarts it will fence
>> server2
>> and I can see that server2 is starting on vcenter, but I just
>> pressed
>> any key on grub to make sure the server2 would not restart,
>> instead
>> it
>> would just be "paused" on grub screen.
>>
>> SSH'ing to server1 and running pcs status I get:
>>
>> Cluster name: cluster1
>> Cluster Summary:
>>   * Stack: corosync
>>   * Current DC: server1 (version 2.1.0‑8.el8‑7c3f660707) ‑
>> partition
>> with quorum
>>   * Last updated: Mon May  2 09:52:03 2022
>>   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin
>> on
>> server1
>>   * 2 nodes configured
>>   * 11 resource instances configured
>>
>> Node List:
>>   * Online: [ server1 ]
>>   * OFFLINE: [ server2 ]
>>
>> Full List of Resources:
>>   * fence‑server1(stonith:fence_vmware_rest): Stopped
>>   * fence‑server2(stonith:fence_vmware_rest): Started
>> server1
>>   * Clone Set: DRBDData‑clone [DRBDData] (promotable):
>> * Slaves: [ server1 ]
>> * Stopped: [ server2 ]

Re: [ClusterLabs] Help understanding recover of promotable resource after a "pcs cluster stop --all"

2022-05-03 Thread Andrei Borzenkov

On 03.05.2022 00:25, Ken Gaillot wrote:
> On Mon, 2022-05-02 at 13:11 -0300, Salatiel Filho wrote:
>> Hi, Ken, here is the info you asked for.
>>
>>
>> # pcs constraint
>> Location Constraints:
>>   Resource: fence-server1
>> Disabled on:
>>   Node: server1 (score:-INFINITY)
>>   Resource: fence-server2
>> Disabled on:
>>   Node: server2 (score:-INFINITY)
>> Ordering Constraints:
>>   promote DRBDData-clone then start nfs (kind:Mandatory)
>> Colocation Constraints:
>>   nfs with DRBDData-clone (score:INFINITY) (rsc-role:Started)
>> (with-rsc-role:Master)
>> Ticket Constraints:
>>
>> # sudo crm_mon -1A
>> ...
>> Node Attributes:
>>   * Node: server2:
>> * master-DRBDData : 1
> 
> In the scenario you described, only server1 is up. If there is no
> master score for server1, it cannot be master. It's up the resource
> agent to set it. I'm not familiar enough with that agent to know why it
> might not.
> 

I can trivially reproduce it. When pacemaker with slave drbd instance is
stopped, DRBD disk state is set to "outdated". When it comes up, it will
not be selected for promoting. Setting master score does not work, it
just results in failure attempt to bring up the outdated replica. When
former master comes up, its disk state is "consistent" so it is selected
for promotion, becomes primary and synchronized with secondary.

DRBD RA has an option to force outdated state on stop, but this option
is off by default as far as I can tell.

This is probably something in DRBD configuration, but I am not familiar
with it on this deep level. Manually forcing primary on outdated replica
works and is reflected on pacemaker level (resource goes in promoted state).

>>
>>
>>
>> Atenciosamente/Kind regards,
>> Salatiel
>>
>> On Mon, May 2, 2022 at 12:26 PM Ken Gaillot 
>> wrote:
>>> On Mon, 2022-05-02 at 09:58 -0300, Salatiel Filho wrote:
 Hi, I am trying to understand the recovering process of a
 promotable
 resource after "pcs cluster stop --all" and shutdown of both
 nodes.
 I have a two nodes + qdevice quorum with a DRBD resource.

 This is a summary of the resources before my test. Everything is
 working just fine and server2 is the master of DRBD.

  * fence-server1(stonith:fence_vmware_rest): Started
 server2
  * fence-server2(stonith:fence_vmware_rest): Started
 server1
  * Clone Set: DRBDData-clone [DRBDData] (promotable):
* Masters: [ server2 ]
* Slaves: [ server1 ]
  * Resource Group: nfs:
* drbd_fs(ocf::heartbeat:Filesystem): Started server2

 then I issue "pcs cluster stop --all". The cluster will be
 stopped on
 both nodes as expected.
 Now I restart server1( previously the slave ) and poweroff
 server2 (
 previously the master ). When server1 restarts it will fence
 server2
 and I can see that server2 is starting on vcenter, but I just
 pressed
 any key on grub to make sure the server2 would not restart,
 instead
 it
 would just be "paused" on grub screen.

 SSH'ing to server1 and running pcs status I get:

 Cluster name: cluster1
 Cluster Summary:
   * Stack: corosync
   * Current DC: server1 (version 2.1.0-8.el8-7c3f660707) -
 partition
 with quorum
   * Last updated: Mon May  2 09:52:03 2022
   * Last change:  Mon May  2 09:39:22 2022 by root via cibadmin
 on
 server1
   * 2 nodes configured
   * 11 resource instances configured

 Node List:
   * Online: [ server1 ]
   * OFFLINE: [ server2 ]

 Full List of Resources:
   * fence-server1(stonith:fence_vmware_rest): Stopped
   * fence-server2(stonith:fence_vmware_rest): Started
 server1
   * Clone Set: DRBDData-clone [DRBDData] (promotable):
 * Slaves: [ server1 ]
 * Stopped: [ server2 ]
   * Resource Group: nfs:
 * drbd_fs(ocf::heartbeat:Filesystem): Stopped

 So I can see there is quorum, but the server1 is never promoted
 as
 DRBD master, so the remaining resources will be stopped until
 server2
 is back.
 1) What do I need to do to force the promotion and recover
 without
 restarting server2?
 2) Why if instead of rebooting server1 and power off server2 I
 reboot
 server2 and poweroff server1 the cluster can recover by itself?

 Thanks!

>>>
>>> You shouldn't need to force promotion, that is the default behavior
>>> in
>>> that situation. There must be something else in the configuration
>>> that
>>> is preventing promotion.
>>>
>>> The DRBD resource agent should set a promotion score for the node.
>>> You
>>> can run "crm_mon -1A" to show all node attributes; there should be
>>> one
>>> like "master-DRBDData" for the active node.
>>>
>>> You can also show the constraints in the cluster to see if there is
>>> anything

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov

On 22.04.2022 16:01, john tillman wrote:
>> On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek 
>> wrote:
>>>
>>> As discussed in other branches of this thread, you need to figure out
>>> why pacemaker is not starting. Even if one node is not running, corosync
>>> and pacemaker are expected to be able to start on the other node.
>>
>> Well, when trying to reproduce this behavior I configured SBD with a
>> non-existent device and enabled it. If enabled, pacemaker.service
>> Requires sbd.service. sbd.service failed to start and so
>> pacemaker.service was not started either. Just one example.
>>
>> But as long as the only information we have is "nothing suspicious in
>> logs", we can guess until the doomsday. Output of "journalctl -b"
>> immediately after boot would give at least some starting points.
> 
> 
> Thank you all for the responses.  I  shall try to answer your questions
> here.  But I found the problem, I think.  I mean, I made this change and
> now the cluster will start when only one node is booted!
> 
> The drbd service was enabled.  Once I disabled it the cluster would start
> at boot with only one node.
> 
> I took another look through the output of journalctl -b.  I saw this and
> it made me test it:
> 
> systemd[1]: Starting DRBD -- please disable. Unless you are NOT using a
> cluster manager
> 
> @Klaus - the systemd configuration files showed 2 dependencies before
> pacemaker could start.  one was corosync.  The other was a
> "resource-agents-deps.target", which I could not find.  It looked like an
> option an admin could add.  I'm the admin and I didn't add any.  However,
> this maybe related in someway to pacemaker's relationship to drbd???
> 

If you are an admin, you really should make yourself more familiar with
systemd.

> @Ulrich - the actual pacemaker service was not running.
> 
> @Tomas & Andrei - jounrnalctl did provide the clue.
> 
> Now, can anyone explain why drbd being enabled might keep pacemaker from
> starting? 

Upstream drbd.service has Before=pacemaker.service. If activation of
drbd.service is delayed, so will be pacemaker.


> Would you still care to see the journalctl output?  Maybe
> edited to where systemd starts its logging?
> 

That is really not a pacemaker problem and is better suited for general
systemd troubleshooting resource. But as you have been already
requested, show at least "systemctl status pacemaker.service" and
"systemctl status drbd.service" when pacemaker does not start.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-22 Thread Andrei Borzenkov

On Fri, Apr 22, 2022 at 12:05 PM Tomas Jelinek  wrote:
>
> As discussed in other branches of this thread, you need to figure out
> why pacemaker is not starting. Even if one node is not running, corosync
> and pacemaker are expected to be able to start on the other node.

Well, when trying to reproduce this behavior I configured SBD with a
non-existent device and enabled it. If enabled, pacemaker.service
Requires sbd.service. sbd.service failed to start and so
pacemaker.service was not started either. Just one example.

But as long as the only information we have is "nothing suspicious in
logs", we can guess until the doomsday. Output of "journalctl -b"
immediately after boot would give at least some starting points.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-21 Thread Andrei Borzenkov

On 21.04.2022 18:26, john tillman wrote:
>> Dne 20. 04. 22 v 20:21 john tillman napsal(a):
 On 20.04.2022 19:53, john tillman wrote:
> I have a two node cluster that won't start any resources if only one
> node
> is booted; the pacemaker service does not start.
>
> Once the second node boots up, the first node will start pacemaker and
> the
> resources are started.  All is well.  But I would like the resources
> to
> start when the first node boots by itself.
>
> I thought the problem was with the wait_for_all option but I have it
> set
> to "0".
>
> On the node that is booted by itself, when I run "corosync-quorumtool"
> I
> see:
>
> [root@test00 ~]# corosync-quorumtool
> Quorum information
> --
> Date: Wed Apr 20 16:05:07 2022
> Quorum provider:  corosync_votequorum
> Nodes:1
> Node ID:  1
> Ring ID:  1.2f
> Quorate:  Yes
>
> Votequorum information
> --
> Expected votes:   2
> Highest expected: 2
> Total votes:  1
> Quorum:   1
> Flags:2Node Quorate
>
> Membership information
> --
> Nodeid  Votes Name
>  1  1 test00 (local)
>
>
> My config file look like this:
> totem {
> version: 2
> cluster_name: testha
> transport: knet
> crypto_cipher: aes256
> crypto_hash: sha256
> }
>
> nodelist {
> node {
> ring0_addr: test00
> name: test00
> nodeid: 1
> }
>
> node {
> ring0_addr: test01
> name: test01
> nodeid: 2
> }
> }
>
> quorum {
> provider: corosync_votequorum
> two_node: 1
> wait_for_all: 0
> }
>
> logging {
> to_logfile: yes
> logfile: /var/log/cluster/corosync.log
> to_syslog: yes
> timestamp: on
> debug: on
> syslog_priority: debug
> logfile_priority: debug
> }
>
> Fencing is disabled.
>

 That won't work.

> I've also looked in "corosync.log" but I don't know what to look for
> to
> diagnose this issue.  I mean there are many lines similar to:
> [QUORUM] This node is within the primary component and will provide
> service.
> and
> [VOTEQ ] Sending quorum callback, quorate = 1
> and
> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
>
> Is there something specific I should look for in the log?
>
> So can a two node cluster work after booting only one node?  Maybe it
> never will and I am wasting a lot of time, yours and mine.
>
> If it can, what else can I investigate further?
>

 Before node can start handling resources it needs to know status of
 other node. Without successful fencing there is no way to accomplish
 it.

 Yes, you can tell pacemaker to ignore unknown status. Depending on your
 resources this could simply prevent normal work or lead to data
 corruption.
>>>
>>>
>>> Makes sense.  Thank you.
>>>
>>> Perhaps some future enhancement could allow for this situation?  I mean,
>>> It might be desirable for some cases to allow for a single node to boot,
>>> determine quorum by two_node=1 and wait_for_all=0, and start resources
>>> without ever seeing the other node.  Sure, there are dangers of split
>>> brain but I can see special cases where I want the node to work alone
>>> for
>>> a period of time despite the danger.
>>>
>>
>> Hi John,
>>
>> How about 'pcs quorum unblock'?
>>
>> Regards,
>> Tomas
>>
> 
> 
> Tomas,
> 
> Thank you for the suggestion.  However it didn't work.  It returned:
> Error: unable to check quorum status
>   crm_mon: Error: cluster is not available on this node
> I checked pacemaker, just in case, and it still isn't running.
> 

Either pacemaker or some service it depends upon attempted to start and
failed or systemd still waits for some service that is required before
pacemaker. Checks logs or provide "journalctl -b" output in this state.


> I very curious how I could convince the cluster to start its resources on
> one node in the event that the other node is not able to boot.  But I'm
> afraid the answer is either to use fencing or add a third node to the
> cluster or both.
> 
> -John
> 
> 
>>> Thank you again.
>>>
>>>
 ___
 Manage your subscription:

Re: [ClusterLabs] Can a two node cluster start resources if only one node is booted?

2022-04-20 Thread Andrei Borzenkov

On 20.04.2022 19:53, john tillman wrote:
> I have a two node cluster that won't start any resources if only one node
> is booted; the pacemaker service does not start.
> 
> Once the second node boots up, the first node will start pacemaker and the
> resources are started.  All is well.  But I would like the resources to
> start when the first node boots by itself.
> 
> I thought the problem was with the wait_for_all option but I have it set
> to "0".
> 
> On the node that is booted by itself, when I run "corosync-quorumtool" I see:
> 
>[root@test00 ~]# corosync-quorumtool
>Quorum information
>--
>Date: Wed Apr 20 16:05:07 2022
>Quorum provider:  corosync_votequorum
>Nodes:1
>Node ID:  1
>Ring ID:  1.2f
>Quorate:  Yes
> 
>Votequorum information
>--
>Expected votes:   2
>Highest expected: 2
>Total votes:  1
>Quorum:   1
>Flags:2Node Quorate
> 
>Membership information
>--
>Nodeid  Votes Name
> 1  1 test00 (local)
> 
> 
> My config file look like this:
>totem {
>version: 2
>cluster_name: testha
>transport: knet
>crypto_cipher: aes256
>crypto_hash: sha256
>}
> 
>nodelist {
>node {
>ring0_addr: test00
>name: test00
>nodeid: 1
>}
> 
>node {
>ring0_addr: test01
>name: test01
>nodeid: 2
>}
>}
> 
>quorum {
>provider: corosync_votequorum
>two_node: 1
>wait_for_all: 0
>}
> 
>logging {
>to_logfile: yes
>logfile: /var/log/cluster/corosync.log
>to_syslog: yes
>timestamp: on
>debug: on
>syslog_priority: debug
>logfile_priority: debug
>}
> 
> Fencing is disabled.
> 

That won't work.

> I've also looked in "corosync.log" but I don't know what to look for to
> diagnose this issue.  I mean there are many lines similar to:
> [QUORUM] This node is within the primary component and will provide service.
> and
> [VOTEQ ] Sending quorum callback, quorate = 1
> and
> [VOTEQ ] flags: quorate: Yes Leaving: No WFA Status: No First: Yes
> Qdevice: No QdeviceAlive: No QdeviceCastVote: No QdeviceMasterWins: No
> 
> Is there something specific I should look for in the log?
> 
> So can a two node cluster work after booting only one node?  Maybe it
> never will and I am wasting a lot of time, yours and mine.
> 
> If it can, what else can I investigate further?
> 

Before node can start handling resources it needs to know status of
other node. Without successful fencing there is no way to accomplish it.

Yes, you can tell pacemaker to ignore unknown status. Depending on your
resources this could simply prevent normal work or lead to data corruption.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected

2022-04-08 Thread Andrei Borzenkov

On 08.04.2022 20:16, Ken Gaillot wrote:
> Hi all,
> 
> I'm hoping to have the first release candidate for Pacemaker 2.1.3
> available in a couple of weeks.
> 
> One of the new features will be a new possible value for the "multiple-
> active" resource meta-attribute, which specifies how the cluster should
> react if multiple instances of a resource are detected to be active
> when only one should be.
> 
> The default behavior, "restart", stops all the instances and then
> starts one instance where it should be. This is the safest approach
> since some services become disrupted when multiple copies are started.
> 
> However if the user is confident that only the extra copies need to be
> stopped, they can now set multiple-active to "stop_unexpected". The
> instance that is active where it is supposed to be will not be stopped,
> but all other instances will be.
> 
> If any resources are ordered after the multiply active resource, those
> other resources will still need to be fully restarted. This is because
> any ordering constraint "start A then start B" implies "stop B then
> stop A", so we can't stop the wrongly active instances of A until B is
> stopped.

But in the case of multiple-active=stop_unexpected "the correct" A does
remain active. If any dependent resource needs to be restarted anyway, I
miss the intended use case. What is the difference with default option
(except it may be faster)?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: [EXT] Re: Failed migration causing fencing loop

2022-04-03 Thread Andrei Borzenkov

On 31.03.2022 14:02, Ulrich Windl wrote:
 "Gao,Yan"  schrieb am 31.03.2022 um 11:18 in Nachricht
> <67785c2f-f875-cb16-608b-77d63d9b0...@suse.com>:
>> On 2022/3/31 9:03, Ulrich Windl wrote:
>>> Hi!
>>>
>>> I just wanted to point out one thing that hit us with SLES15 SP3:
>>> Some failed live VM migration causing node fencing resulted in a fencing 
>> loop, because of two reasons:
>>>
>>> 1) Pacemaker thinks that even _after_ fencing there is some migration to 
>> "clean up". Pacemaker treats the situation as if the VM is running on both 
>> nodes, thus (50% chance?) trying to stop the VM on the node that just booted 
>> after fencing. That's supid but shouldn't be fatal IF there weren't...
>>>
>>> 2) The stop operation of the VM (that atually isn't running) fails,
>>
>> AFAICT it could not connect to the hypervisor, but the logic in the RA 
>> is kind of arguable that the probe (monitor) of the VM returned "not 
>> running", but the stop right after that returned failure...
>>
>> OTOH, the point about pacemaker is the stop of the resource on the 
>> fenced and rejoined node is not really necessary. There has been 
>> discussions about this here and we are trying to figure out a solution 
>> for it:
>>
>> https://github.com/ClusterLabs/pacemaker/pull/2146#discussion_r828204919 
>>
>> For now it requires administrator's intervene if the situation happens:
>> 1) Fix the access to hypervisor before the fenced node rejoins.
> 
> Thanks for the explanation!
> 
> Unfortunately this can be tricky if libvirtd is involved (as it is here):
> libvird uses locking (virtlockd), which in turn needs a cluster-wird 
> filesystem for locks across the nodes.
> When that filesystem is provided by the cluster, it's hard to delay node 
> joining until filesystem,  virtlockd and libvirtd are running.
> 

So do not use filesystem provided by the same cluster. Use separate
filesystem mounted outside of cluster, like separate high-available NFS.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-29 Thread Andrei Borzenkov

On 29.03.2022 15:38, john tillman wrote:
>> On 29.03.2022 00:26, john tillman wrote:
 On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote:
> Greetings all,
>
> Is it possible to have an order constraint with a timeout?  I can't
> find
> one but perhaps I am using the wrong keywords in google.
>
> I have several Filesystem resource and one nfs service resource.  If
> I
> create 3 order constraints:
>pcs constraint order start fsRsc1 then start myNfsServiceRsc
>pcs constraint order start fsRsc2 then start myNfsServiceRsc
>pcs constraint order start fsRsc3 then start myNfsServiceRsc
>
> I would like to make sure that the nfs service will be started even
> if one
> of the Filesystem resources fails to start.  Is there a timeout that
> could
> be used?
...
>>
>> What exactly "failed to start after "X"" means? Start operation is still
>> running after "X" seconds? Then simply set timeout of start to "X".
>> Start operation failed before "X" seconds? Then what is the point to
>> wait additional time, how do you expect resource to become active if
>> start operation already failed?
>>
> 
> 
> 
> Andrei,
> 
> My understanding is that if one of the three Filesystem resources fails to
> start then the nfsservice would fail to start, even if the other
> Filesystem resources started successfully.  Am I wrong?
> 

That depends on your configuration.

> I suggested a timeout because I could not think of another cluster
> mechanism that fit.  But I see your point.  The result is known and there
> is no need to wait.  So a timeout is not what I need.
> 
> What can I use to start the nfs service only after all the Filesystem
> resources have been given the chance to start?
> 

Resource set with both sequential and require-all false.

primitive A1 ocf:_local:Dummy \

op monitor interval=10

primitive A2 ocf:_local:Dummy \

params delay=55 \

op start timeout=60 interval=0 \

op monitor interval=10

primitive B ocf:_local:Dummy \

op monitor interval=10

order set_delay Mandatory: [ A1 A2 ] B



B will be started after start requests for both A1 and A2 completed and
at least one of A1 or A2 successfully. If start of both A1 and A2
failed, B will not be started.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Order constraint with a timeout?

2022-03-28 Thread Andrei Borzenkov

On 29.03.2022 00:26, john tillman wrote:
>> On Mon, 2022-03-28 at 14:03 -0400, john tillman wrote:
>>> Greetings all,
>>>
>>> Is it possible to have an order constraint with a timeout?  I can't
>>> find
>>> one but perhaps I am using the wrong keywords in google.
>>>
>>> I have several Filesystem resource and one nfs service resource.  If
>>> I
>>> create 3 order constraints:
>>>pcs constraint order start fsRsc1 then start myNfsServiceRsc
>>>pcs constraint order start fsRsc2 then start myNfsServiceRsc
>>>pcs constraint order start fsRsc3 then start myNfsServiceRsc
>>>
>>> I would like to make sure that the nfs service will be started even
>>> if one
>>> of the Filesystem resources fails to start.  Is there a timeout that
>>> could
>>> be used?
>>>
>>> There is the "kind=Optional" parameter but that looks like it will
>>> immediately start the second resource if the first failed to
>>> start.  There
>>> is no timeout option.
>>>
>>> Best regards,
>>> -John
>>>
>>
>> How do you envision the timeout working?
>>
>> You can add a timeout for the ordering itself using rules, where the
>> ordering no longer applies after a certain date/time, but it doesn't
>> sound like that's what you want.
>> --
>> Ken Gaillot 
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>>
> 
> 
> Thank you for the reply, Ken.
> 
> I was hoping that I could give the Filesystem resource "X" seconds to
> start.  If it failed to start after "X" then I would start the nfs service
> anyway.  So Those Filesystems that successfully started could be accessed,
> albeit with a bit of a delay before nfs is started.
> 

What exactly "failed to start after "X"" means? Start operation is still
running after "X" seconds? Then simply set timeout of start to "X".
Start operation failed before "X" seconds? Then what is the point to
wait additional time, how do you expect resource to become active if
start operation already failed?

> Basically, I want to start the nfs service regardless of whether any or
> all of the Filesystem resources started.  But I want to give them all a
> chance start before starting nfs.
> 
> That said, it doesn't look like the rules suggestion you made is what I
> need.  Any other ideas?
> 
> Best Regards,
> -John
> 
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Resources too_active (active on all nodes of the cluster, instead of only 1 node)

2022-03-23 Thread Andrei Borzenkov

On 23.03.2022 08:30, Balotra, Priyanka wrote:
> Hi All,
> 
> We have a scenario on SLES 12 SP3 cluster.
> The scenario is explained as follows in the order of events:
> 
>   *   There is a 2-node cluster (FILE-1, FILE-2)
>   *   The cluster and the resources were up and running fine initially .
>   *   Then fencing request from pacemaker got issued on both nodes 
> simultaneously
> 
> Logs from 1st node:
> 2022-02-22T03:26:36.737075+00:00 FILE-1 corosync[12304]: [TOTEM ] Failed to 
> receive the leave message. failed: 2
> .
> .
> 2022-02-22T03:26:36.977888+00:00 FILE-1 pacemaker-fenced[12331]: notice: 
> Requesting that FILE-1 perform 'off' action targeting FILE-2
> 
> Logs from 2nd node:
> 2022-02-22T03:26:36.738080+00:00 FILE-2 corosync[4989]: [TOTEM ] Failed to 
> receive the leave message. failed: 1
> .
> .
> Feb 22 03:26:38 FILE-2 pacemaker-fenced [5015] (call_remote_stonith) notice: 
> Requesting that FILE-2 perform 'off' action targeting FILE-1
> 

This is normal behavior in case of split brain. Each node will try to
fence another node to be able to take over resources from it.

> 
>   *   When the nodes came up after unfencing, the DC got set after election

What exactly "came up" means?

>   *   After that the resources which were expected to run on only one node 
> became active on both (all) nodes of the cluster.
> 

It sounds like both nodes believed fencing has been successful and so
each node took over resources from another node. It is impossible to
tell more without seeing actual logs from both nodes and actual
configuration.

> 27290 2022-02-22T04:16:31.699186+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource stonith-sbd is active on 2 nodes (attempting recovery)
> 27291 2022-02-22T04:16:31.699397+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27292 2022-02-22T04:16:31.699590+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource FILE_Filesystem is active on 2 nodes (attem pting recovery)
> 27293 2022-02-22T04:16:31.699731+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27294 2022-02-22T04:16:31.699878+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource IP_Floating is active on 2 nodes (attemptin g recovery)
> 27295 2022-02-22T04:16:31.700027+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27296 2022-02-22T04:16:31.700203+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_Postgresql is active on 2 nodes (at tempting recovery)
> 27297 2022-02-22T04:16:31.700354+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27298 2022-02-22T04:16:31.700501+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_Postgrest is active on 2 nodes (att empting recovery)
> 27299 2022-02-22T04:16:31.700648+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27300 2022-02-22T04:16:31.700792+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Service_esm_primary is active on 2 nodes (a ttempting 
> recovery)
> 27301 2022-02-22T04:16:31.700939+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> notice: See https://wiki.clusterlabs.org/wiki/FAQ#Resource_ is_Too_Active for 
> more information
> 27302 2022-02-22T04:16:31.701086+00:00 FILE-2 pacemaker-schedulerd[5018]: 
> error: Resource Shared_Cluster_Backup is active on 2 nodes (attempting 
> recovery)
> 
> 
> Can you guys please help us understand if this is indeed a split-brain 
> scenario ? 

I do not understand this question and I suspect you are using "split
brain" incorrectly. Split brain is condition when corosync/pacemaker on
two nodes cannot communicate. Split brain ends with fencing request.

> Under what circumstances can such a scenario be observed?

When two nodes are unable to communicate with each other if "such
scenario" refers to "split brain".

> We can have very serious impact if such a case can re-occur inspite of 
> stonith already configured. Hence the ask .
> In case this situation gets reproduced, how can it be handled?
> 

Stonith agent must never return success unless it can confirm that
fencing was successful.

> Note: We have stonith configured and it has been working fine so far. In this 
> case also, the initial fencing happened from stonith only.
> 
> Thanks in advance!
> 
> 
> 
> 
> 
> Internal Use - Confidential
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-17 Thread Andrei Borzenkov

On 17.03.2022 14:14, ZZ Wave wrote:
>> Define "network connectivity to node2".
> 
> pacemaker instances can reach each other, I think.

This is called split brain, the only way to resolve it is fencing.

> In case of connectivity
> loss (turn off network interface manually, disconnect eth cable etc), it
> should turn off virtsrc and then virtip on active node, turn virtip on and
> then virtsrc on second node, and vice-versa. IPaddr2 alone works fine this
> way "out of a box", but IPsrcaddr doesn't :(
> 

According to scarce logs you provided stop request for IPsrcaddr
resource failed which is fatal. You do not use fencing so pacemaker
blocks any further change of resource state.

I cannot say whether this is resource agent bug or agent legitimately
cannot perform stop action. Personally I would claim that if
corresponding routing entry is not present, resource is stopped so
failing stop request because no route entry was found sounds like a bug.

> Is my setup correct for this anyway?

You need to define "this". Your definition of "network connectivity"
("pacemaker instances can reach each other") does not match what you
describe later. Most likely you want failover if current node does not
some *external* connectivity.

> Howtos and google give me only "just
> add both resources to group or to colocation+order and that's all", but it
> definitely doesn't work the way I expect.
> 

So your expectations are wrong. You need to define more precisely what
is network connectivity in your case and how you check for it.

>> What are static IPs?
> 
> node1 192.168.80.21/24
> node2 192.168.80.22/24
> floating 192.168.80.23/24
> gw 192.168.80.1
> 

I did not ask for IP addresses. I asked for your explanation what
"static IP" means to you and how is it different from "gloating IP".

>> I do not see anything wrong here.
> 
> Let me explain. After initial setup, virtip and virtsrc successfully apply
> on node1. There are both .23 alias and def route src. After a network
> failure, there is NO default route at all on both nodes and IPsrcaddr
> fails, as it requires default route.
> 

I already explained above why IPsrcaddr was not migrated.

> 
> ср, 16 мар. 2022 г. в 19:23, Andrei Borzenkov :
> 
>> On 16.03.2022 12:24, ZZ Wave wrote:
>>> Hello. I'm trying to implement floating IP with pacemaker but I can't
>>> get IPsrcaddr to work correctly. I want a following thing - floating
>>> IP and its route SRC is started on node1. If node1 loses network
>>> connectivity to node2, node1 should instantly remove floating IP and
>>> restore default route,
>>
>> Define "network connectivity to node2".
>>
>>> node2 brings these things up. And vice-versa when node1 returns.
>>> Static IPs should be intact in any way.
>>>
>>
>> What are static IPs?
>>
>>> What I've done:
>>>
>>> pcs host auth node1 node2
>>> pcs cluster setup my_cluster node1 node2 --force
>>> pcs cluster enable node1 node2
>>> pcs cluster start node1 node2
>>> pcs property set stonith-enabled=false
>>> pcs property set no-quorum-policy=ignore
>>> pcs resource create virtip ocf:heartbeat:IPaddr2 ip=192.168.80.23
>>> cidr_netmask=24 op monitor interval=30s
>>> pcs resource create virtsrc ocf:heartbeat:IPsrcaddr
>>> ipaddress=192.168.80.23 cidr_netmask=24 op monitor interval=30
>>> pcs constraint colocation add virtip with virtsrc
>>> pcs constraint order virtip then virtsrc
>>>
>>> It sets IP and src correctly on node1 one time after this setup, but
>>> in case of failover to node2 a havoc occurs -
>>
>> Havoc is not useful technical description. Explain what is wrong.
>>
>>> https://pastebin.com/GZMtG480
>>>
>>> What's wrong?
>>
>> You tell us. I do not see anything wrong here.
>>
>>> Help me please :)
>>>
>>>
>>> ___
>>> Manage your subscription:
>>> https://lists.clusterlabs.org/mailman/listinfo/users
>>>
>>> ClusterLabs home: https://www.clusterlabs.org/
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] What's wrong with IPsrcaddr?

2022-03-16 Thread Andrei Borzenkov

On 16.03.2022 12:24, ZZ Wave wrote:
> Hello. I'm trying to implement floating IP with pacemaker but I can't
> get IPsrcaddr to work correctly. I want a following thing - floating
> IP and its route SRC is started on node1. If node1 loses network
> connectivity to node2, node1 should instantly remove floating IP and
> restore default route,

Define "network connectivity to node2".

> node2 brings these things up. And vice-versa when node1 returns.
> Static IPs should be intact in any way.
> 

What are static IPs?

> What I've done:
> 
> pcs host auth node1 node2
> pcs cluster setup my_cluster node1 node2 --force
> pcs cluster enable node1 node2
> pcs cluster start node1 node2
> pcs property set stonith-enabled=false
> pcs property set no-quorum-policy=ignore
> pcs resource create virtip ocf:heartbeat:IPaddr2 ip=192.168.80.23
> cidr_netmask=24 op monitor interval=30s
> pcs resource create virtsrc ocf:heartbeat:IPsrcaddr
> ipaddress=192.168.80.23 cidr_netmask=24 op monitor interval=30
> pcs constraint colocation add virtip with virtsrc
> pcs constraint order virtip then virtsrc
> 
> It sets IP and src correctly on node1 one time after this setup, but
> in case of failover to node2 a havoc occurs -

Havoc is not useful technical description. Explain what is wrong.

> https://pastebin.com/GZMtG480
> 
> What's wrong?

You tell us. I do not see anything wrong here.

> Help me please :)
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov

On 15.03.2022 21:53, john tillman wrote:
>> On 15.03.2022 19:35, john tillman wrote:
>>> Hello,
>>>
>>> I'm trying to guarantee that all my cloned drbd resources start on the
>>> same node and I can't figure out the syntax of the constraint to do it.
>>>
>>> I could nominate one of the drbd resources as a "leader" and have all
>>> the
>>> others follow it.  But then if something happens to that leader the
>>> others
>>> are without constraint.
>>>
>>
>> Colocation is asymmetric. Resource B is colocated with resource A, so
>> pacemaker decides placement of resource A first. If resource A cannot
>> run anywhere (which is probably what you mean under "something happens
>> to that leader"), resource B cannot run anywhere. This is true also for
>> resources inside resource set.
>>
>> I do not think pacemaker supports "always run these resources together,
>> no matter how many resources can run".
>>

This probably can be emulated by assigning high score less than INFINITY
to colocation constraint.

> 
> 
> Huh, no way to get all the masters to start on the same node.  Interesting.
> 
> The set construct has a boolean field "require-all".  I'll try that before
> I give up.
> 

This applies to constraints external to the set, not to resources inside
the set.

> Could I create a resource (some systemd service) that all the masters are
> colocated with?  Feels like a hack but would it work?
> 

Yes, of course. You could also make colocation based on node attribute
and use some other means to set this attribute.

> Thank you for the response.
> 
> -John
> 
> 
>>> I tried adding them to a group but got a syntax error from pcs saying
>>> that
>>> I wasn't allowed to add cloned resources to a group.
>>>
>>> If anyone is interested, it started from this example:
>>> https://edmondcck.medium.com/setup-a-highly-available-nfs-cluster-with-disk-encryption-using-luks-drbd-corosync-and-pacemaker-a96a5bdffcf8
>>> There's a DRBD partition that gets mounted onto a local directory.  The
>>> local directory is then mounted onto an exported directory (mount
>>> --bind).
>>>  Then the nfs service (samba too) get started and finally the VIP.
>>>
>>> Please note that while I have 3 DRBD resources currently, that number
>>> may
>>> increase after the initial configuration is performed.
>>>
>>> I would just like to know a mechanism to make sure all the DRBD
>>> resources
>>> are colocated.  Any suggestions welcome.
>>>
>>> [root@nas00 ansible]# pcs resource
>>>   * Clone Set: drbdShare-clone [drbdShare] (promotable):
>>> * Masters: [ nas00 ]
>>> * Slaves: [ nas01 ]
>>>   * Clone Set: drbdShareRead-clone [drbdShareRead] (promotable):
>>> * Masters: [ nas00 ]
>>> * Slaves: [ nas01 ]
>>>   * Clone Set: drbdShareWrite-clone [drbdShareWrite] (promotable):
>>> * Masters: [ nas00 ]
>>> * Slaves: [ nas01 ]
>>>   * localShare(ocf::heartbeat:Filesystem): Started nas00
>>>   * localShareRead(ocf::heartbeat:Filesystem): Started nas00
>>>   * localShareWrite   (ocf::heartbeat:Filesystem): Started nas00
>>>   * nfsShare  (ocf::heartbeat:Filesystem): Started nas00
>>>   * nfsShareRead  (ocf::heartbeat:Filesystem): Started nas00
>>>   * nfsShareWrite (ocf::heartbeat:Filesystem): Started nas00
>>>   * nfsService  (systemd:nfs-server):Started nas00
>>>   * smbService  (systemd:smb):   Started nas00
>>>   * vipN  (ocf::heartbeat:IPaddr2):Started nas00
>>>
>>> [root@nas00 ansible]# pcs constraint show --all
>>> Location Constraints:
>>> Ordering Constraints:
>>>   promote drbdShare-clone then start localShare (kind:Mandatory)
>>>   promote drbdShareRead-clone then start localShareRead (kind:Mandatory)
>>>   promote drbdShareWrite-clone then start localShareWrite
>>> (kind:Mandatory)
>>>   start localShare then start nfsShare (kind:Mandatory)
>>>   start localShareRead then start nfsShareRead (kind:Mandatory)
>>>   start localShareWrite then start nfsShareWrite (kind:Mandatory)
>>>   start nfsShare then start nfsService (kind:Mandatory)
>>>   start nfsShareRead then start nfsService (kind:Mandatory)
>>>   start nfsShareWrite then start nfsService (kind:Mandatory)
>>>   start nfsService then start smbService (kind:Mandatory)
>>>   start nfsService then start vipN (kind:Mandatory)
>>> Colocation Constraints:
>>>   localShare with drbdShare-clone (score:INFINITY)
>>> (with-rsc-role:Master)
>>>   localShareRead with drbdShareRead-clone (score:INFINITY)
>>> (with-rsc-role:Master)
>>>   localShareWrite with drbdShareWrite-clone (score:INFINITY)
>>> (with-rsc-role:Master)
>>>   nfsShare with localShare (score:INFINITY)
>>>   nfsShareRead with localShareRead (score:INFINITY)
>>>   nfsShareWrite with localShareWrite (score:INFINITY)
>>>   nfsService with nfsShare (score:INFINITY)
>>>   nfsService with nfsShareRead (score:INFINITY)
>>>   nfsService with nfsShareWrite (score:INFINITY)
>>>   smbService with nfsShare (score:INFINITY)
>>>   smbService with nfsShareRead (score:INFINITY)

Re: [ClusterLabs] constraining multiple cloned resources to the same node

2022-03-15 Thread Andrei Borzenkov

On 15.03.2022 19:35, john tillman wrote:
> Hello,
> 
> I'm trying to guarantee that all my cloned drbd resources start on the
> same node and I can't figure out the syntax of the constraint to do it.
> 
> I could nominate one of the drbd resources as a "leader" and have all the
> others follow it.  But then if something happens to that leader the others
> are without constraint.
> 

Colocation is asymmetric. Resource B is colocated with resource A, so
pacemaker decides placement of resource A first. If resource A cannot
run anywhere (which is probably what you mean under "something happens
to that leader"), resource B cannot run anywhere. This is true also for
resources inside resource set.

I do not think pacemaker supports "always run these resources together,
no matter how many resources can run".

> I tried adding them to a group but got a syntax error from pcs saying that
> I wasn't allowed to add cloned resources to a group.
> 
> If anyone is interested, it started from this example:
> https://edmondcck.medium.com/setup-a-highly-available-nfs-cluster-with-disk-encryption-using-luks-drbd-corosync-and-pacemaker-a96a5bdffcf8
> There's a DRBD partition that gets mounted onto a local directory.  The
> local directory is then mounted onto an exported directory (mount --bind).
>  Then the nfs service (samba too) get started and finally the VIP.
> 
> Please note that while I have 3 DRBD resources currently, that number may
> increase after the initial configuration is performed.
> 
> I would just like to know a mechanism to make sure all the DRBD resources
> are colocated.  Any suggestions welcome.
> 
> [root@nas00 ansible]# pcs resource
>   * Clone Set: drbdShare-clone [drbdShare] (promotable):
> * Masters: [ nas00 ]
> * Slaves: [ nas01 ]
>   * Clone Set: drbdShareRead-clone [drbdShareRead] (promotable):
> * Masters: [ nas00 ]
> * Slaves: [ nas01 ]
>   * Clone Set: drbdShareWrite-clone [drbdShareWrite] (promotable):
> * Masters: [ nas00 ]
> * Slaves: [ nas01 ]
>   * localShare(ocf::heartbeat:Filesystem): Started nas00
>   * localShareRead(ocf::heartbeat:Filesystem): Started nas00
>   * localShareWrite   (ocf::heartbeat:Filesystem): Started nas00
>   * nfsShare  (ocf::heartbeat:Filesystem): Started nas00
>   * nfsShareRead  (ocf::heartbeat:Filesystem): Started nas00
>   * nfsShareWrite (ocf::heartbeat:Filesystem): Started nas00
>   * nfsService  (systemd:nfs-server):Started nas00
>   * smbService  (systemd:smb):   Started nas00
>   * vipN  (ocf::heartbeat:IPaddr2):Started nas00
> 
> [root@nas00 ansible]# pcs constraint show --all
> Location Constraints:
> Ordering Constraints:
>   promote drbdShare-clone then start localShare (kind:Mandatory)
>   promote drbdShareRead-clone then start localShareRead (kind:Mandatory)
>   promote drbdShareWrite-clone then start localShareWrite (kind:Mandatory)
>   start localShare then start nfsShare (kind:Mandatory)
>   start localShareRead then start nfsShareRead (kind:Mandatory)
>   start localShareWrite then start nfsShareWrite (kind:Mandatory)
>   start nfsShare then start nfsService (kind:Mandatory)
>   start nfsShareRead then start nfsService (kind:Mandatory)
>   start nfsShareWrite then start nfsService (kind:Mandatory)
>   start nfsService then start smbService (kind:Mandatory)
>   start nfsService then start vipN (kind:Mandatory)
> Colocation Constraints:
>   localShare with drbdShare-clone (score:INFINITY) (with-rsc-role:Master)
>   localShareRead with drbdShareRead-clone (score:INFINITY)
> (with-rsc-role:Master)
>   localShareWrite with drbdShareWrite-clone (score:INFINITY)
> (with-rsc-role:Master)
>   nfsShare with localShare (score:INFINITY)
>   nfsShareRead with localShareRead (score:INFINITY)
>   nfsShareWrite with localShareWrite (score:INFINITY)
>   nfsService with nfsShare (score:INFINITY)
>   nfsService with nfsShareRead (score:INFINITY)
>   nfsService with nfsShareWrite (score:INFINITY)
>   smbService with nfsShare (score:INFINITY)
>   smbService with nfsShareRead (score:INFINITY)
>   smbService with nfsShareWrite (score:INFINITY)
>   vipN with nfsService (score:INFINITY)
> Ticket Constraints:
> 
> Thank you for your time and attention.
> 
> -John
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Filesystem resource agent w/ filesystem attribute 'noauto'

2022-03-09 Thread Andrei Borzenkov

On 09.03.2022 18:45, Asseel Sidique wrote:
> Hi Team,
> 
> My question is regarding the filesystem resource agent. In the filesystem 
> resource 
> agent
>  , there is a comment that states:
> 
> # Do not put this filesystem in /etc/fstab. This script manages all of
> # that for you.
> 
> From what I understand, this means that if you are using a filesystem that is 
> being managed by Pacemaker, you should remove it from /etc/fstab to avoid the 
> OS from also trying to automatically mount the filesystem.
> 
> Would it still be okay to add an entry for a filesystem in /etc/fstab if the 
> 'noauto' option is specified? (This will disable the automatic mount for the 
> filesystem).

Why? What exactly are you trying to achieve?

> 
> I did do a test with the 'noauto' specified for a filesystem that was managed 
> by Pacemaker and did not face any issues - but I still wanted to confirm if 
> this was okay.
> 

> Best,
> Asseel
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Q: fence_kdump and fence_kdump_send

2022-02-25 Thread Andrei Borzenkov

On Fri, Feb 25, 2022 at 2:23 PM Reid Wahl  wrote:
>
> On Fri, Feb 25, 2022 at 3:22 AM Reid Wahl  wrote:
> >
...
> > >
> > > So what happens most likely is that the watchdog terminates the kdump.
> > > In that case all the mess with fence_kdump won't help, right?
> >
> > You can configure extra_modules in your /etc/kdump.conf file to
> > include the watchdog module, and then restart kdump.service. For
> > example:
> >
> > # grep ^extra_modules /etc/kdump.conf
> > extra_modules i6300esb
> >
> > If you're not sure of the name of your watchdog module, wdctl can help
> > you find it. sbd needs to be stopped first, because it keeps the
> > watchdog device timer busy.
> >
> > # pcs cluster stop --all
> > # wdctl | grep Identity
> > Identity:  i6300ESB timer [version 0]
> > # lsmod | grep -i i6300ESB
> > i6300esb   13566  0
> >
> >
> > If you're also using fence_sbd (poison-pill fencing via block device),
> > then you should be able to protect yourself from that during a dump by
> > configuring fencing levels so that fence_kdump is level 1 and
> > fence_sbd is level 2.
>
> RHKB, for anyone interested:
>   - sbd watchdog timeout causes node to reboot during crash kernel
> execution (https://access.redhat.com/solutions/3552201)

What is not clear from this KB (and quotes from it above) - what
instance updates watchdog? Quoting (emphasis mine)

--><--
With the module loaded, the timer *CAN* be updated so that it does not
expire and force a reboot in the middle of vmcore generation.
--><--

Sure it can, but what program exactly updates the watchdog during
kdump execution? I am pretty sure that sbd does not run at this point.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Booth ticket multi-site and quorum /Pacemaker

2022-02-24 Thread Andrei Borzenkov

On Thu, Feb 24, 2022 at 1:17 PM Jan Friesse  wrote:
>
> On 24/02/2022 10:28, Viet Nguyen wrote:
> > Hi,
> >
> > Thank you so so much for your help. May i ask a following up question:
> >
> > For the option of having one big cluster with 4 nodes without booth, then,
> > if one site (having 2 nodes) is down, then the other site does not work as
> > it does not have quorum, am I right? Even if we have a quorum voter in
>
> Yup, you are right
>
> > either site A or B, then, if the site with quorum down, then, the other
> > site does not work.  So, how can we avoid this situation as I want
> > that if one site is down, the other site still services?
>
> probably only with qnetd - so basically yet again site C.
>

The problem with multisite cluster is not quorum (which is not
actually needed to run pacemaker) but fencing. One site cannot take
over resources until another site is fenced and if another site is
completely down fencing does not work.

So qnetd does not really help here (except with suicide self fencing).
If network conditions allow, it is better to have iSCSI target on the
third site and use SBD disk heartbeat. Self fencing requires SBD
anyway.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov

On 16.02.2022 20:48, Andrei Borzenkov wrote:
> 
> I guess the real question here is why "Transition aborted" is logged although
> transition apparently continues. Transition 128 started at 20:54:30 and 
> completed
> at 21:04:26, but there were multiple "Transition 128 aborted" messages in 
> between
> (unfortunately one needs now to hunt for another mail to put them together).
> 
> It looks like "Transition aborted" is more "we try to abort this transition if
> possible". My guess is that pacemaker must wait for currently running 
> action(s)
> which can take quite some time when stopping virtual domain. Transition 128
> was initiated when stopping vm_pathway, but we have no idea when it was 
> stopped.
> 

Yes, when code logs "Transition aborted", nothing is really aborted. It just 
tells
pacemaker to not start any further actions which are part of this transition. 
But
for all I can tell it does not affect currently running action.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] crm resource stop VirtualDomain - but VirtualDomain shutdown start some minutes later

2022-02-16 Thread Andrei Borzenkov

On 16.02.2022 14:35, Lentes, Bernd wrote:
> 
> 
> - On Feb 16, 2022, at 12:52 AM, kgaillot kgail...@redhat.com wrote:
> 
> 
>>> Any idea ?
>>> What is about that transition 128, which is aborted ?
>>
>> A transition is the set of actions that need to be taken in response to
>> current conditions. A transition is aborted any time conditions change
>> (here, the target-role being changed in the configuration), so that a
>> new set of actions can be calculated.
>>
>> Someone once defined a transition as an "action plan", and I'm tempted
>> to use that instead. Plus maybe replace "aborted" with "interrupted",
>> so then we'd have "Action plan interrupted" which is maybe a little
>> more understandable.
>>
>>>
>>> Transition 128 is finished:
>>> Feb 15 21:04:26 [15370] ha-idg-2   crmd:   notice:
>>> run_graph:   Transition 128 (Complete=1, Pending=0, Fired=0,
>>> Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-
>>> 3548.bz2): Complete
>>>
>>> And one second later the shutdown starts. Is that normal that there
>>> is such a big time gap ?
>>>
>>
>> No, there should be another transition calculated (with a "saving
>> input" message) immediately after the original transition is aborted.
>> What's the timestamp on that?
>> --
> 
> Hi Ken,
> 
> this is what i found:
> 
> Feb 15 20:54:30 [15369] ha-idg-2pengine:   notice: process_pe_message:
>   Calculated transition 128, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-3548.bz2
> Feb 15 20:54:30 [15370] ha-idg-2   crmd: info: do_state_transition:   
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
> Feb 15 20:54:30 [15370] ha-idg-2   crmd:   notice: do_te_invoke:
> Processing graph 128 (ref=pe_calc-dc-1644954870-403) derived from 
> /var/lib/pacemaker/pengine/pe-input-3548.bz2
> Feb 15 20:54:30 [15370] ha-idg-2   crmd:   notice: te_rsc_command:  
> Initiating stop operation vm_pathway_stop_0 locally on ha-idg-2 | action 76
> 
> Feb 15 21:04:26 [15369] ha-idg-2pengine:   notice: process_pe_message:
>   Calculated transition 129, saving inputs in 
> /var/lib/pacemaker/pengine/pe-input-3549.bz2
> Feb 15 21:04:26 [15370] ha-idg-2   crmd: info: do_state_transition:   
>   State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | 
> input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
> Feb 15 21:04:26 [15370] ha-idg-2   crmd:   notice: do_te_invoke:
> Processing graph 129 (ref=pe_calc-dc-1644955466-405) derived from 
> /var/lib/pacemaker/pengine/pe-input-3549.bz2
> 


Splitting logs between different messages does not really help in interpreting 
them.

I guess the real question here is why "Transition aborted" is logged although
transition apparently continues. Transition 128 started at 20:54:30 and 
completed
at 21:04:26, but there were multiple "Transition 128 aborted" messages in 
between
(unfortunately one needs now to hunt for another mail to put them together).

It looks like "Transition aborted" is more "we try to abort this transition if
possible". My guess is that pacemaker must wait for currently running action(s)
which can take quite some time when stopping virtual domain. Transition 128
was initiated when stopping vm_pathway, but we have no idea when it was stopped.

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Antw: Antw: [EXT] Re: heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-28 Thread Andrei Borzenkov

On Fri, Jan 28, 2022 at 11:00 AM Ulrich Windl
 wrote:
>
> >>> "Ulrich Windl"  schrieb am 28.01.2022
> um
> 08:51 in Nachricht <61f3a06602a100047...@gwsmtp.uni-regensburg.de>:
> >>>> Andrei Borzenkov  schrieb am 28.01.2022 um 06:38 in
> > Nachricht
> > :
> >> On Thu, Jan 27, 2022 at 5:10 PM Ulrich Windl
> >>  wrote:
> >>>
> >>> Any better ideas anyone?
> >>>
> >>
> >> Perform online upgrade. Any reason you need to do an offline upgrade
> >> in the first place?
> >
> > More than 25 years of experience with it ;-)
>
> And I forgot: If you have no Internet connection or a slow Internet connection
> that's a nice method.
> It also avopiuds downloading the same packages when updating multiple
> servers.
>

That is what RMT is for. Or download Full image and define it as a
local repository (and you can export it to all your servers via
HTTP/FTP/NFS/...). There is more than one way to skin a cat.

Anyway, this is off topic here. File SLE bug report or raise service
request with your SLE support.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] heads up: Possible VM data corruption upgrading to SLES15 SP3

2022-01-27 Thread Andrei Borzenkov

On Thu, Jan 27, 2022 at 5:10 PM Ulrich Windl
 wrote:
>
> Any better ideas anyone?
>

Perform online upgrade. Any reason you need to do an offline upgrade
in the first place?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Question: Mount Monitoring for Non-shared File-system

2021-12-07 Thread Andrei Borzenkov

On 07.12.2021 21:35, Asseel Sidique wrote:
> Hi Everyone,
> I'm looking for some insight on what the best way is to configure mount 
> monitoring for a cloned database resource.
> Consider the resource model below:
> * Clone Set: database_1-clone [database_1] (promotable):
>  * Masters: [ node-1 ]
>  * Slaves: [ node-2 ]
>* db2fs_node-1_Filesystem(ocf::heartbeat:Filesystem): Started 
> node-1
>* db2fs_node-2_Filesystem(ocf::heartbeat:Filesystem): Started 
> node-2
> 
> The database will exist on two hosts, and each database clone is dependent on 
> the filesystem running on its corresponding
> hosts. The filesystem is not shared, so I've created them as separate 
> resources
> (db2fs_node-1_Filesystem and db2fs_node-2_Filesystem).
> The behavior I'm attempting to create is:
> The cloned resource will only be able to start if the Filesystem on the same 
> host is already started.
> i.e. If the filesystem on node-1 is down, the clone database resource on 
> node-1 
> will be stopped
> and if the filesystem on node-2 is up, the the clone database resource on 
> node-2 
> will be running
> I've tried to set constraints for the cloned resource a few different ways:
> 1. Order constraints
> order fileSystem-node-1-then-db Mandatory: db2fs_node-1_Filesystem 
> db2_regress1_regress1_SAMPLE-clone
> order fileSystem-node-2-then-db Mandatory: db2fs_node-2_Filesystem 
> db2_regress1_regress1_SAMPLE-clone

The resource name in these constraints does not match clone name you show 
earlier.

> Using these constraints, the clone database resource will stop if either 
> filesystem resource is stopped.

Try setting interleave=true for clone.

> 2. Colocation constraints
> colocation order-rule-Filesystem-node-1-then-db inf: 
> db2_regress1_regress1_SAMPLE-clone:Master db2fs_node-1_Filesystem
> colocation order-rule-Filesystem-node-2-then-db inf: 
> db2_regress1_regress1_SAMPLE-clone:Slave db2fs_node-2_Filesystem
> With these colocation constraints, if stopping one or both filesystem 
> resources, 
> the clone database resource looks like this:
> * Clone Set: database_1-clone [database_1] (promotable):
>  * Slaves: [ node-1 node-2 ]
> 
> Is there any advice on what the best practices to achieve this behaviour?
> Best,
> Asseel
> 
> 
> 
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
> 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

1 2 3 4 5 6 7 >

1 - 100 of 661 matches

Mail list logo