Re: [ClusterLabs] ClusterMon resource creation getting illegal option -- E in ClusterMon

2023-04-12 Thread Ken Gaillot
ClusterMon with -E has been superseded by Pacemaker's built-in alerts
functionality:

https://clusterlabs.org/pacemaker/doc/2.1/Pacemaker_Explained/singlehtml/index.html#document-alerts

On Wed, 2023-04-12 at 12:03 +, S Sathish S via Users wrote:
> Hi Team,
>  
> While creating ClusterMon resource agent in Clusterlab High
> Availability getting illegal option -- E in ClusterMon.
>  
> [root@node1 tmp]# pcs resource create SNMP_test
> ocf:pacemaker:ClusterMon  extra_options="-E /tmp/tools/PCSESA.sh"
> Error: Validation result from agent (use --force to override):
>   /usr/lib/ocf/resource.d/pacemaker/ClusterMon: illegal option -- E
>   Apr 12 13:36:47 ERROR: Invalid options -E /tmp/tools/PCSESA.sh!
> Error: Errors have occurred, therefore pcs is unable to continue
> [root@node1 tmp]#
>  
> As per above error we use --force option now resource is getting
> created but still we get this error in the system  , But ClusterMon
> resource functionality is working as expected . we need to understand
> any impact with below error / how to rectify illegal option on
> ClusterMon.
>  
> [root@node1 tmp]# pcs resource create SNMP_test
> ocf:pacemaker:ClusterMon  extra_options="-E /tmp/tools/PCSESA.sh" --
> force
> Warning: Validation result from agent:
>   /usr/lib/ocf/resource.d/pacemaker/ClusterMon: illegal option -- E
>   Apr 12 13:49:43 ERROR: Invalid options -E /tmp/tools/PCSESA.sh!
> [root@node1 tmp]#
>  
> Please find the Clusterlab RPM version used:
> pacemaker-cluster-libs-2.1.4-1.2.1.4.git.el8.x86_64
> resource-agents-4.11.0-1.el8.x86_64
> pacemaker-cli-2.1.4-1.2.1.4.git.el8.x86_64
> pcs-0.10.14-1.el8.x86_64
> corosynclib-3.1.7-1.el8.x86_64
> corosync-3.1.7-1.el8.x86_64
> pacemaker-2.1.4-1.2.1.4.git.el8.x86_64
> pacemaker-libs-2.1.4-1.2.1.4.git.el8.x86_64
> pacemaker-schemas-2.1.4-1.2.1.4.git.el8.noarch
>  
> Thanks and Regards,
> S Sathish S
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Philip Schiller

Here are also some Additional some additional information for a failover with 
setting the node standby.

Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: State transition S_IDLE 
-> S_POLICY_ENGINE
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: On loss of quorum: 
Ignore
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
sto-ipmi-s0    (    s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-zfs-drbd_storage:0 (    s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-pluto:0   (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-poserver:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-webserver:0   (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-dhcp:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-wawi:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-wawius:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-saturn:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-openvpn:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-asterisk:0    (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-alarmanlage:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-jabber:0  (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Stop   
pri-drbd-TESTOPTIXXX:0 (   Promoted s1 )  due to node 
availability
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move   
pri-vm-jabber  (  s1 -> s0 )  due to unrunnable 
mas-drbd-jabber demote
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Actions: Move   
pri-vm-alarmanlage (  s1 -> s0 )  due to unrunnable 
mas-drbd-alarmanlage demote
Apr 12 12:40:28 s1 pacemaker-schedulerd[1611989]:  notice: Calculated 
transition 5, saving inputs in /var/lib/pacemaker/pengine/pe-input-2478.bz2
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Initiating stop 
operation sto-ipmi-s0_stop_0 locally on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Requesting local 
execution of stop operation for sto-ipmi-s0 on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Initiating stop 
operation pri-vm-jabber_stop_0 locally on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Requesting local 
execution of stop operation for pri-vm-jabber on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Initiating stop 
operation pri-vm-alarmanlage_stop_0 locally on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Requesting local 
execution of stop operation for pri-vm-alarmanlage on s1
Apr 12 12:40:28 s1 pacemaker-controld[1611990]:  notice: Result of stop 
operation for sto-ipmi-s0 on s1: ok

Any help on this topic are very welcome because lack to find any solution for 
this behaviour.

Kind regards Philip.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] ClusterMon resource creation getting illegal option -- E in ClusterMon

2023-04-12 Thread S Sathish S via Users
Hi Team,

While creating ClusterMon resource agent in Clusterlab High Availability 
getting illegal option -- E in ClusterMon.

[root@node1 tmp]# pcs resource create SNMP_test ocf:pacemaker:ClusterMon  
extra_options="-E /tmp/tools/PCSESA.sh"
Error: Validation result from agent (use --force to override):
  /usr/lib/ocf/resource.d/pacemaker/ClusterMon: illegal option -- E
  Apr 12 13:36:47 ERROR: Invalid options -E /tmp/tools/PCSESA.sh!
Error: Errors have occurred, therefore pcs is unable to continue
[root@node1 tmp]#

As per above error we use --force option now resource is getting created but 
still we get this error in the system  , But ClusterMon resource functionality 
is working as expected . we need to understand any impact with below error / 
how to rectify illegal option on ClusterMon.

[root@node1 tmp]# pcs resource create SNMP_test ocf:pacemaker:ClusterMon  
extra_options="-E /tmp/tools/PCSESA.sh" --force
Warning: Validation result from agent:
  /usr/lib/ocf/resource.d/pacemaker/ClusterMon: illegal option -- E
  Apr 12 13:49:43 ERROR: Invalid options -E /tmp/tools/PCSESA.sh!
[root@node1 tmp]#

Please find the Clusterlab RPM version used:
pacemaker-cluster-libs-2.1.4-1.2.1.4.git.el8.x86_64
resource-agents-4.11.0-1.el8.x86_64
pacemaker-cli-2.1.4-1.2.1.4.git.el8.x86_64
pcs-0.10.14-1.el8.x86_64
corosynclib-3.1.7-1.el8.x86_64
corosync-3.1.7-1.el8.x86_64
pacemaker-2.1.4-1.2.1.4.git.el8.x86_64
pacemaker-libs-2.1.4-1.2.1.4.git.el8.x86_64
pacemaker-schemas-2.1.4-1.2.1.4.git.el8.noarch

Thanks and Regards,
S Sathish S
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Philip Schiller

hank you for the reply.

I added the following line to crm configure:

colocation colo_pri-vm-alarmanlage-drbd-jabber_master inf: 
pri-vm-jabber:Started mas-drbd-jabber:Master

This doesn't change anything though. Here some output of crm_mon:

Cluster Summary:
  * Stack: corosync
  * Current DC: s0 (version 2.1.2-ada5c3b36e2) - partition with quorum
  * Last updated: Wed Apr 12 11:39:12 2023
  * Last change:  Wed Apr 12 11:38:56 2023 by root via crm_attribute on s0
  * 2 nodes configured
  * 30 resource instances configured

Node List:
  * Online: [ s0 s1 ]

Full List of Resources:
  * sto-ipmi-s0    (stonith:external/ipmi):     Started s1
  * sto-ipmi-s1    (stonith:external/ipmi):     Started s0
  * Clone Set: clo-pri-zfs-drbd_storage [pri-zfs-drbd_storage]:
    * Started: [ s0 s1 ]
  * Clone Set: mas-drbd-pluto [pri-drbd-pluto] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-poserver [pri-drbd-poserver] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-webserver [pri-drbd-webserver] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-dhcp [pri-drbd-dhcp] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-wawi [pri-drbd-wawi] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-wawius [pri-drbd-wawius] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-saturn [pri-drbd-saturn] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-openvpn [pri-drbd-openvpn] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-asterisk [pri-drbd-asterisk] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-alarmanlage [pri-drbd-alarmanlage] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-jabber [pri-drbd-jabber] (promotable):
    * Promoted: [ s0 s1 ]
  * Clone Set: mas-drbd-TESTOPTIXXX [pri-drbd-TESTOPTIXXX] (promotable):
    * Promoted: [ s0 s1 ]
  * pri-vm-jabber    (ocf:heartbeat:VirtualDomain):     Started s1
  * pri-vm-alarmanlage    (ocf:heartbeat:VirtualDomain):     Started s1

The wierd thing is, that if i put my node back online it unmigrates perfectly.
Do i need more constrains? If you need any more information about setup/confs i 
am happy to provide them.

Philip.
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov
On Wed, 2023-04-12 at 14:04 +0300, Andrei Borzenkov wrote:
> On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov  ok.com> wrote:
> > 
> > Hi,
> > 
> > Just add a Master role for drbd resource in the colocation. Default
> > is Started (or Slave).
> > 
> 
> Could you elaborate why it is needed? The problem is not leaving the
> resource on the node with a demoted instance - when the node goes
> into
> standby, all resources must be evacuated from it anyway. How
> collocating VM with master changes it?

Just an experience. Having constraints non-consistent with each other
touches many corner cases in the code. Especially in such extreme
circumstances like node going to standby, which usually involves
several transitions.

For me that is just a rule of thumb:
colocate VM:Started with drbd:Master
order drbd:promote then VM:start



> 
> > 
> > Philip Schiller  12 апреля 2023 г.
> > 11:28:57 написал:
> > > 
> > > 
> > > 
> > > Hi All,
> > > 
> > > I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh
> > > in
> > > primary/primary mode (necessary for live migration).  My
> > > configuration:
> > > 
> > > primitive pri-vm-alarmanlage VirtualDomain \
> > >     params config="/etc/libvirt/qemu/alarmanlage.xml"
> > > hypervisor="qemu:///system" migration_transport=ssh \
> > >     meta allow-migrate=true target-role=Started is-
> > > managed=true \
> > >     op monitor interval=0 timeout=120 \
> > >     op start interval=0 timeout=120 \
> > >     op stop interval=0 timeout=1800 \
> > >     op migrate_to interval=0 timeout=1800 \
> > >     op migrate_from interval=0 timeout=1800 \
> > >     utilization cpu=2 hv_memory=4096
> > > ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
> > >     meta clone-max=2 promoted-max=2 notify=true promoted-
> > > node-max=1 clone-node-max=1 interleave=true target-role=Started
> > > is-managed=true
> > > colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-
> > > storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
> > > location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage
> > > 200: s1
> > > order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory:
> > > mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
> > > 
> > > So to summerize:
> > > - A  resource for Virsh
> > > - A Master/Slave DRBD ressources for the VM filesystem .
> > > - a "order" directive to start the VM after drbd has been
> > > promoted.
> > > 
> > > Node startup is ok, the VM is started after DRBD is promoted.
> > > Migration with virsh or over crm  > > alarmanlage s0> works fine.
> > > 
> > > Node standby is problematic. Assuming the Virsh VM runs on node
> > > s1 :
> > > 
> > > When puting node s1 in standby when node s0 is active, a live
> > > migration
> > > is started, BUT in the same second, pacemaker tries to demote
> > > DRBD
> > > volumes on s1 (while live migration is in progress).
> > > 
> > > All this results in "stopping the vm" on s1 and starting the "vm
> > > on s0".
> > > 
> > > I do not understand why pacemaker does demote/stop DRBD volumes
> > > before VM is migrated.
> > > Do i need additional constraints?
> > > 
> > > Setup is done with
> > > - Corosync Cluster Engine, version '3.1.6'
> > > - Pacemaker 2.1.2
> > > - Ubuntu 22.04.2 LTS
> > > 
> > > Thanks for your help,
> > > 
> > > with kind regards Philip
> > > 
> > > ___
> > > Manage your subscription:
> > > https://lists.clusterlabs.org/mailman/listinfo/users
> > > 
> > > ClusterLabs home: https://www.clusterlabs.org/
> > > 
> > 
> > ___
> > Manage your subscription:
> > https://lists.clusterlabs.org/mailman/listinfo/users
> > 
> > ClusterLabs home: https://www.clusterlabs.org/
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Andrei Borzenkov
On Wed, Apr 12, 2023 at 1:21 PM Vladislav Bogdanov  wrote:
>
> Hi,
>
> Just add a Master role for drbd resource in the colocation. Default is 
> Started (or Slave).
>

Could you elaborate why it is needed? The problem is not leaving the
resource on the node with a demoted instance - when the node goes into
standby, all resources must be evacuated from it anyway. How
collocating VM with master changes it?

>
> Philip Schiller  12 апреля 2023 г. 11:28:57 написал:
>>
>> 
>>
>> Hi All,
>>
>> I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
>> primary/primary mode (necessary for live migration).  My configuration:
>>
>> primitive pri-vm-alarmanlage VirtualDomain \
>> params config="/etc/libvirt/qemu/alarmanlage.xml" 
>> hypervisor="qemu:///system" migration_transport=ssh \
>> meta allow-migrate=true target-role=Started is-managed=true \
>> op monitor interval=0 timeout=120 \
>> op start interval=0 timeout=120 \
>> op stop interval=0 timeout=1800 \
>> op migrate_to interval=0 timeout=1800 \
>> op migrate_from interval=0 timeout=1800 \
>> utilization cpu=2 hv_memory=4096
>> ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
>> meta clone-max=2 promoted-max=2 notify=true promoted-node-max=1 
>> clone-node-max=1 interleave=true target-role=Started is-managed=true
>> colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: 
>> mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
>> location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
>> order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: 
>> mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start
>>
>> So to summerize:
>> - A  resource for Virsh
>> - A Master/Slave DRBD ressources for the VM filesystem .
>> - a "order" directive to start the VM after drbd has been promoted.
>>
>> Node startup is ok, the VM is started after DRBD is promoted.
>> Migration with virsh or over crm  
>> works fine.
>>
>> Node standby is problematic. Assuming the Virsh VM runs on node s1 :
>>
>> When puting node s1 in standby when node s0 is active, a live migration
>> is started, BUT in the same second, pacemaker tries to demote DRBD
>> volumes on s1 (while live migration is in progress).
>>
>> All this results in "stopping the vm" on s1 and starting the "vm on s0".
>>
>> I do not understand why pacemaker does demote/stop DRBD volumes before VM is 
>> migrated.
>> Do i need additional constraints?
>>
>> Setup is done with
>> - Corosync Cluster Engine, version '3.1.6'
>> - Pacemaker 2.1.2
>> - Ubuntu 22.04.2 LTS
>>
>> Thanks for your help,
>>
>> with kind regards Philip
>>
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Vladislav Bogdanov

Hi,

Just add a Master role for drbd resource in the colocation. Default is 
Started (or Slave).



Philip Schiller  12 апреля 2023 г. 11:28:57 написал:
Hi All, I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in 
primary/primary mode (necessary for live migration). My configuration: 
primitive pri-vm-alarmanlage VirtualDomain \ params 
config="/etc/libvirt/qemu/alarmanlage.xml" hypervisor="qemu:///system" 
migration_transport=ssh \ meta allow-migrate=true target-role=Started 
is-managed=true \ op monitor interval=0 timeout=120 \ op start interval=0 
timeout=120 \ op stop interval=0 timeout=1800 \ op migrate_to interval=0 
timeout=1800 \ op migrate_from interval=0 timeout=1800 \ utilization cpu=2 
hv_memory=4096
ms mas-drbd-alarmanlage pri-drbd-alarmanlage \ meta clone-max=2 
promoted-max=2 notify=true promoted-node-max=1 clone-node-max=1 
interleave=true target-role=Started is-managed=true
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: 
mas-drbd-alarmanlage clo-pri-zfs-drbd_storage

location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: 
mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start So to summerize:

- A resource for Virsh
- A Master/Slave DRBD ressources for the VM filesystem . - a "order" 
directive to start the VM after drbd has been promoted. Node startup is ok, 
the VM is started after DRBD is promoted.
Migration with virsh or over crm  
works fine. Node standby is problematic. Assuming the Virsh VM runs on node 
s1 : When puting node s1 in standby when node s0 is active, a live 
migration is started, BUT in the same second, pacemaker tries to demote 
DRBD volumes on s1 (while live migration is in progress). All this results 
in "stopping the vm" on s1 and starting the "vm on s0". I do not understand 
why pacemaker does demote/stop DRBD volumes before VM is migrated.
Do i need additional constraints? Setup is done with - Corosync Cluster 
Engine, version '3.1.6'

- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS Thanks for your help, with kind regards Philip

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] HA problem: No live migration when setting node on standby

2023-04-12 Thread Philip Schiller



Hi All,

I am using a simple two-nodes cluster with Zvol -> DRBD -> Virsh in
primary/primary mode (necessary for live migration).  My configuration:

primitive pri-vm-alarmanlage VirtualDomain \
params config="/etc/libvirt/qemu/alarmanlage.xml" 
hypervisor="qemu:///system" migration_transport=ssh \
meta allow-migrate=true target-role=Started is-managed=true \
op monitor interval=0 timeout=120 \
op start interval=0 timeout=120 \
op stop interval=0 timeout=1800 \
op migrate_to interval=0 timeout=1800 \
op migrate_from interval=0 timeout=1800 \
utilization cpu=2 hv_memory=4096
ms mas-drbd-alarmanlage pri-drbd-alarmanlage \
meta clone-max=2 promoted-max=2 notify=true promoted-node-max=1 
clone-node-max=1 interleave=true target-role=Started is-managed=true
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: 
mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_pri-alarmanlage-after-mas-drbd-alarmanlage Mandatory: 
mas-drbd-alarmanlage:promote pri-vm-alarmanlage:start

So to summerize:
- A  resource for Virsh
- A Master/Slave DRBD ressources for the VM filesystem .
- a "order" directive to start the VM after drbd has been promoted.

Node startup is ok, the VM is started after DRBD is promoted.
Migration with virsh or over crm  
works fine.

Node standby is problematic. Assuming the Virsh VM runs on node s1 :

When puting node s1 in standby when node s0 is active, a live migration
is started, BUT in the same second, pacemaker tries to demote DRBD
volumes on s1 (while live migration is in progress).

All this results in "stopping the vm" on s1 and starting the "vm on s0".

I do not understand why pacemaker does demote/stop DRBD volumes before VM is 
migrated.
Do i need additional constraints?

Setup is done with
- Corosync Cluster Engine, version '3.1.6'
- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS

Thanks for your help,

with kind regards Philip
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Klaus Wenninger
On Wed, Apr 12, 2023 at 9:27 AM Andrei Borzenkov 
wrote:

> On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot  wrote:
> >
> > On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote:
> > > I fixed the issue by changing location definition from:
> > >
> > > location intranet-ip_on_any_nginx intranet-ip \
> > > rule -inf: opa-nginx_1_active eq 0 \
> > > rule -inf: opa-nginx_2_active eq 0
> > >
> > > To:
> > >
> > > location intranet-ip_on_any_nginx intranet-ip \
> > > rule opa-nginx_1_active eq 1 \
> > >rule opa-nginx_2_active eq 1
> > >
> > > Now it works fine and shows the constraint with: crm res constraint
> > > intranet-ip
> >
> > Ah, I suspect the issue was that the original constraint compared only
> > against 0, when initially (before the resources ever start) the
> > attribute is undefined.
> >
>
> This does not really explain the original question. Apparently the
> attribute *was* defined but somehow ignored.
>
> Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice: Setting
> opa-nginx_1_active[intranet-test1]: 1 -> 0
> ...
>   * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-test1
>

But that log is from a different node that sets the node-attribute for
another node and some time has gone by till the IP was detected
to be running unwantedly (a failure of the nfs-service recorded 14min
later - so at least that much). A lot can have happened in between like
rejoins - not saying what happened (like merging the CIBs) did
happen as it should have - but the log-line isn't necessarily a proof
that the attribute was still at 0. Searching the logs for changes
in the cluster-topology in the time in between may give some insights.

Klaus

> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Location not working [FIXED]

2023-04-12 Thread Andrei Borzenkov
On Tue, Apr 11, 2023 at 6:27 PM Ken Gaillot  wrote:
>
> On Tue, 2023-04-11 at 17:31 +0300, Miro Igov wrote:
> > I fixed the issue by changing location definition from:
> >
> > location intranet-ip_on_any_nginx intranet-ip \
> > rule -inf: opa-nginx_1_active eq 0 \
> > rule -inf: opa-nginx_2_active eq 0
> >
> > To:
> >
> > location intranet-ip_on_any_nginx intranet-ip \
> > rule opa-nginx_1_active eq 1 \
> >rule opa-nginx_2_active eq 1
> >
> > Now it works fine and shows the constraint with: crm res constraint
> > intranet-ip
>
> Ah, I suspect the issue was that the original constraint compared only
> against 0, when initially (before the resources ever start) the
> attribute is undefined.
>

This does not really explain the original question. Apparently the
attribute *was* defined but somehow ignored.

Apr 10 12:11:02 intranet-test2 pacemaker-attrd[1511]:  notice: Setting
opa-nginx_1_active[intranet-test1]: 1 -> 0
...
  * intranet-ip (ocf::heartbeat:IPaddr2):Started intranet-test1
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/