Re: [ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected

2022-04-08 Thread Andrei Borzenkov
On 08.04.2022 20:16, Ken Gaillot wrote:
> Hi all,
> 
> I'm hoping to have the first release candidate for Pacemaker 2.1.3
> available in a couple of weeks.
> 
> One of the new features will be a new possible value for the "multiple-
> active" resource meta-attribute, which specifies how the cluster should
> react if multiple instances of a resource are detected to be active
> when only one should be.
> 
> The default behavior, "restart", stops all the instances and then
> starts one instance where it should be. This is the safest approach
> since some services become disrupted when multiple copies are started.
> 
> However if the user is confident that only the extra copies need to be
> stopped, they can now set multiple-active to "stop_unexpected". The
> instance that is active where it is supposed to be will not be stopped,
> but all other instances will be.
> 
> If any resources are ordered after the multiply active resource, those
> other resources will still need to be fully restarted. This is because
> any ordering constraint "start A then start B" implies "stop B then
> stop A", so we can't stop the wrongly active instances of A until B is
> stopped.

But in the case of multiple-active=stop_unexpected "the correct" A does
remain active. If any dependent resource needs to be restarted anyway, I
miss the intended use case. What is the difference with default option
(except it may be faster)?
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-08 Thread Aj Revelino
Hi Ulrich,
I set the cluster in maintenance mode due to the consistent logging of the
error messages in the system log.

Pacemaker has attempted to execute the monitor operation of the resource
agent here. Is there a way to find out why pacemaker says 'No such device
or address'?
 hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
operation: No such device or address]*

Regards,
Aj

On Fri, Apr 8, 2022 at 8:23 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> "maintenance-mode=true"? Why?
>
>
> >>> Aj Revelino  schrieb am 08.04.2022 um 11:17 in
> Nachricht
> :
> > Hello All,
> > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> > monitors the data replication between the primary and the secondary node.
> > The issue is that crm status shows that everything is okay but the system
> > log shows the following error log.
> >
> >
> > *pacemaker-controld[3582]:  notice:
> > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
> > operation: No such device or address]*
> > I am unable to identify the cause of the error message and resolve it
> >
> > And due to the above, the data replication between the 2 nodes is
> recorded
> > as failed (SFAIL) . Pls see the excerpt from the CIB below:
> >
> >   > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
> > expected="member">
> >   
> > 
> >  *  > name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
> >> value="2.00.056.00.1624618329"/>
> >> name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
> >   * > name="hana_hpn_sync_state" value="SFAIL"/>*
> >> value="4:S:master1:master:worker:master"/>
> > 
> >   
> >
> > Pacemaker is able to failover the resources from the primary to the
> > secondary but they all fail back to the primary, the moment I clean up
> the
> > failure in the primary node.
> > I deleted and recreated the entire configuration and reconfigured the
> hana
> > data replication but it hasn't helped.
> >
> >
> > *Cluster configuration:*
> > hanapopdb1:~ # crm configure show
> > node 1: hanapopdb1 \
> > attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> > node 2: hanapopdb2 \
> > attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
> > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
> > hana_hpn_site=SITE2PO hana_hpn_srmode=sync
> > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> > operations $id=rsc_sap2_HPN_HDB00-operations \
> > op monitor interval=10 timeout=600 \
> > op start interval=0 timeout=600 \
> > op stop interval=0 timeout=300 \
> > params SID=HPN InstanceNumber=00
> > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> > operations $id=rsc_sap_HPN_HDB00-operations \
> > op start interval=0 timeout=3600 \
> > op stop interval=0 timeout=3600 \
> > op promote interval=0 timeout=3600 \
> > op monitor interval=60 role=Master timeout=700 \
> > op monitor interval=61 role=Slave timeout=700 \
> > params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> > primitive rsc_ip_HPN_HDB00 IPaddr2 \
> > meta target-role=Started \
> > operations $id=rsc_ip_HPN_HDB00-operations \
> > op monitor interval=10s timeout=20s \
> > params ip=10.10.1.60
> > primitive rsc_nc_HPN_HDB00 azure-lb \
> > params port=62506
> > primitive stonith-sbd stonith:external/sbd \
> > params pcmk_delay_max=30 \
> > op monitor interval=30 timeout=30
> > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> > meta is-managed=true notify=true clone-max=2 clone-node-max=1
> > target-role=Started interleave=true
> > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> > meta clone-node-max=1 target-role=Started interleave=true
> > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> > msl_SAPHana_HPN_HDB00:Master
> > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> > msl_SAPHana_HPN_HDB00
> > property cib-bootstrap-options: \
> > last-lrm-refresh=1649387935 \
> > maintenance-mode=true
> >
> > Regards,
> >
> > Aj
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected

2022-04-08 Thread Ken Gaillot
Hi all,

I'm hoping to have the first release candidate for Pacemaker 2.1.3
available in a couple of weeks.

One of the new features will be a new possible value for the "multiple-
active" resource meta-attribute, which specifies how the cluster should
react if multiple instances of a resource are detected to be active
when only one should be.

The default behavior, "restart", stops all the instances and then
starts one instance where it should be. This is the safest approach
since some services become disrupted when multiple copies are started.

However if the user is confident that only the extra copies need to be
stopped, they can now set multiple-active to "stop_unexpected". The
instance that is active where it is supposed to be will not be stopped,
but all other instances will be.

If any resources are ordered after the multiply active resource, those
other resources will still need to be fully restarted. This is because
any ordering constraint "start A then start B" implies "stop B then
stop A", so we can't stop the wrongly active instances of A until B is
stopped.
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

2022-04-08 Thread Strahil Nikolov via Users
You can use 'kind' and 'symmetrical' to control order constraints. The default 
value for symmetrical is 'true' which means that in order to stop dummy1 , the 
cluster has to stop dummy1 & dummy2.
Best Regards,Strahil Nikolov
 
 
  On Fri, Apr 8, 2022 at 15:29, ChittaNagaraj, 
Raghav wrote:
Hello Team,
 
  
 
Hope you are doing well.
 
  
 
I have a 4 node pacemaker cluster where I created clone dummy resources test-1, 
test-2 and test-3 below:
 
  
 
$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone
 
$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone
 
$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone
 
  
 
Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone 
starts after test-2-clone:
 
$ sudo pcs constraint order test-1-clone then test-2-clone
 
Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)
 
$ sudo pcs constraint order test-2-clone then test-3-clone
 
Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)
 
  
 
Here are my clone sets(snippet of "pcs status" output pasted below):
 
  * Clone Set: test-1-clone [test-1]:
 
    * Started: [ node2_a node2_b node1_a node1_b ]
 
  * Clone Set: test-2-clone [test-2]:
 
    * Started: [ node2_a node2_b node1_a node1_b ]
 
  * Clone Set: test-3-clone [test-3]:
 
    * Started: [ node2_a node2_b node1_a node1_b ]
 
  
 
Then I restart test-1 on just node1_a:
 
$ sudo pcs resource restart test-1 node1_a
 
Warning: using test-1-clone... (if a resource is a clone, master/slave or 
bundle you must use the clone, master/slave or bundle name)
 
test-1-clone successfully restarted
 
  
 
  
 
This causes test-2 and test-3 clones to restart on all pacemaker nodes when my 
intention is for them to restart on just node1_a.
 
Below is the log tracing seen on the Designated Controller NODE1-B:
 
Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Stop   
test-1:1   (   
node1_a )   due to node availability
 
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:0   (  
node1_b )   due to required test-1-clone running
 
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:1   (   
node1_a )   due to required test-1-clone running
 
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:2   (  
node2_b )   due to required test-1-clone running
 
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart    
test-2:3   (  
node2_a )   due to required test-1-clone running
 
  
 
Above is a representation of the observed behavior using dummy resources.
 
Is this the expected behavior of cloned resources?
 
  
 
My goal is to be able to restart test-2-clone and test-3-clone on just the node 
that experienced test-1 restart rather than all other nodes in the cluster.
 
  
 
Please let us know if any additional information will help for you to be able 
to provide feedback.
 
  
 
Thanks for your help!
 
  
 
- Raghav
 

Internal Use - Confidential
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address

2022-04-08 Thread Ken Gaillot
On Fri, 2022-04-08 at 17:17 +0800, Aj Revelino wrote:
> Hello All,
> I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> monitors the data replication between the primary and the secondary
> node. The issue is that crm status shows that everything is okay but
> the system log shows the following error log. 
> 
> pacemaker-controld[3582]:  notice: hanapopdb1-
> rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing operation:
> No such device or address]
> I am unable to identify the cause of the error message and resolve it
> 
> And due to the above, the data replication between the 2 nodes is
> recorded as failed (SFAIL) . Pls see the excerpt from the CIB below:
> 
>   origin="do_update_resource" uname="zhanapopdb2" join="member"
> expected="member">
>   
> 
>name="hana_hpn_clone_state" value="WAITING4PRIM"/>
>name="hana_hpn_version" value="2.00.056.00.1624618329"/>
>name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
>name="hana_hpn_sync_state" value="SFAIL"/>
>value="4:S:master1:master:worker:master"/>
> 
>   
> 
> Pacemaker is able to failover the resources from the primary to the
> secondary but they all fail back to the primary, the moment I clean
> up the failure in the primary node.

I'm not familiar enough with SAP to speak to that side of things, but
the behavior after clean-up is normal. If you don't want resources to
go back to their preferred node after a failure is cleaned up, set the
resource-stickiness meta-attribute to a positive number (either on the
resource itself, or in resource defaults if you want it to apply to
everything).

> I deleted and recreated the entire configuration and reconfigured the
> hana data replication but it hasn't helped. 
> 
> 
> Cluster configuration:
> hanapopdb1:~ # crm configure show
> node 1: hanapopdb1 \
> attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> node 2: hanapopdb2 \
> attributes lpa_hpn_lpt=10
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_vhost=hanapopdb2
> hana_hpn_remoteHost=hanapopdb1 hana_hpn_site=SITE2PO
> hana_hpn_srmode=sync
> primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> operations $id=rsc_sap2_HPN_HDB00-operations \
> op monitor interval=10 timeout=600 \
> op start interval=0 timeout=600 \
> op stop interval=0 timeout=300 \
> params SID=HPN InstanceNumber=00
> primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> operations $id=rsc_sap_HPN_HDB00-operations \
> op start interval=0 timeout=3600 \
> op stop interval=0 timeout=3600 \
> op promote interval=0 timeout=3600 \
> op monitor interval=60 role=Master timeout=700 \
> op monitor interval=61 role=Slave timeout=700 \
> params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> primitive rsc_ip_HPN_HDB00 IPaddr2 \
> meta target-role=Started \
> operations $id=rsc_ip_HPN_HDB00-operations \
> op monitor interval=10s timeout=20s \
> params ip=10.10.1.60
> primitive rsc_nc_HPN_HDB00 azure-lb \
> params port=62506
> primitive stonith-sbd stonith:external/sbd \
> params pcmk_delay_max=30 \
> op monitor interval=30 timeout=30
> group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> meta is-managed=true notify=true clone-max=2 clone-node-max=1 
> target-role=Started interleave=true
> clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> meta clone-node-max=1 target-role=Started interleave=true
> colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> msl_SAPHana_HPN_HDB00:Master
> order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> msl_SAPHana_HPN_HDB00
> property cib-bootstrap-options: \
> last-lrm-refresh=1649387935 \
> maintenance-mode=true
> 
> Regards,
> 
> Aj
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
> 
> ClusterLabs home: https://www.clusterlabs.org/
-- 
Ken Gaillot 

___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

2022-04-08 Thread Ulrich Windl
Hi!

Just two short notes:
1: I think "timeout > interval" does not make much sense.
2: I guess you configured the wrong type of clone ("meta interleave=true"?)

(3: I don't know the "pcs" command)

Regards,
Ulrich

>>> "ChittaNagaraj, Raghav"  schrieb am
07.04.2022
um 22:48 in Nachricht


> Hello Team,
> 
> Hope you are doing well.
> 
> I have a 4 node pacemaker cluster where I created clone dummy resources 
> test‑1, test‑2 and test‑3 below:
> 
> $ sudo pcs resource create test‑1 ocf:heartbeat:Dummy op monitor
timeout="20" 
> interval="10" clone
> $ sudo pcs resource create test‑2 ocf:heartbeat:Dummy op monitor
timeout="20" 
> interval="10" clone
> $ sudo pcs resource create test‑3 ocf:heartbeat:Dummy op monitor
timeout="20" 
> interval="10" clone
> 
> Then I ordered them so test‑2‑clone starts after test‑1‑clone and
test‑3‑clone 
> starts after test‑2‑clone:
> $ sudo pcs constraint order test‑1‑clone then test‑2‑clone
> Adding test‑1‑clone test‑2‑clone (kind: Mandatory) (Options:
first‑action=start 
> then‑action=start)
> $ sudo pcs constraint order test‑2‑clone then test‑3‑clone
> Adding test‑2‑clone test‑3‑clone (kind: Mandatory) (Options:
first‑action=start 
> then‑action=start)
> 
> Here are my clone sets(snippet of "pcs status" output pasted below):
>   * Clone Set: test‑1‑clone [test‑1]:
> * Started: [ node2_a node2_b node1_a node1_b ]
>   * Clone Set: test‑2‑clone [test‑2]:
> * Started: [ node2_a node2_b node1_a node1_b ]
>   * Clone Set: test‑3‑clone [test‑3]:
> * Started: [ node2_a node2_b node1_a node1_b ]
> 
> Then I restart test‑1 on just node1_a:
> $ sudo pcs resource restart test‑1 node1_a
> Warning: using test‑1‑clone... (if a resource is a clone, master/slave or 
> bundle you must use the clone, master/slave or bundle name)
> test‑1‑clone successfully restarted
> 
> 
> This causes test‑2 and test‑3 clones to restart on all pacemaker nodes when
my 
> intention is for them to restart on just node1_a.
> Below is the log tracing seen on the Designated Controller NODE1‑B:
> Apr 07 20:25:01 NODE1‑B pacemaker‑schedulerd[95746]:  notice:  * Stop  

> test‑1:1   (   
> node1_a )   due to node availability
> Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]:  notice:  * Restart   

> test‑2:0   (   
> node1_b )   due to required test‑1‑clone running
> Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]:  notice:  * Restart   

> test‑2:1   (   
> node1_a )   due to required test‑1‑clone running
> Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]:  notice:  * Restart   

> test‑2:2   (   
> node2_b )   due to required test‑1‑clone running
> Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]:  notice:  * Restart   

> test‑2:3   (   
> node2_a )   due to required test‑1‑clone running
> 
> Above is a representation of the observed behavior using dummy resources.
> Is this the expected behavior of cloned resources?
> 
> My goal is to be able to restart test‑2‑clone and test‑3‑clone on just the
node 
> that experienced test‑1 restart rather than all other nodes in the cluster.
> 
> Please let us know if any additional information will help for you to be 
> able to provide feedback.
> 
> Thanks for your help!
> 
> ‑ Raghav
> 
> 
> Internal Use ‑ Confidential



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster

2022-04-08 Thread ChittaNagaraj, Raghav
Hello Team,

Hope you are doing well.

I have a 4 node pacemaker cluster where I created clone dummy resources test-1, 
test-2 and test-3 below:

$ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone
$ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone
$ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" 
interval="10" clone

Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone 
starts after test-2-clone:
$ sudo pcs constraint order test-1-clone then test-2-clone
Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)
$ sudo pcs constraint order test-2-clone then test-3-clone
Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start 
then-action=start)

Here are my clone sets(snippet of "pcs status" output pasted below):
  * Clone Set: test-1-clone [test-1]:
* Started: [ node2_a node2_b node1_a node1_b ]
  * Clone Set: test-2-clone [test-2]:
* Started: [ node2_a node2_b node1_a node1_b ]
  * Clone Set: test-3-clone [test-3]:
* Started: [ node2_a node2_b node1_a node1_b ]

Then I restart test-1 on just node1_a:
$ sudo pcs resource restart test-1 node1_a
Warning: using test-1-clone... (if a resource is a clone, master/slave or 
bundle you must use the clone, master/slave or bundle name)
test-1-clone successfully restarted


This causes test-2 and test-3 clones to restart on all pacemaker nodes when my 
intention is for them to restart on just node1_a.
Below is the log tracing seen on the Designated Controller NODE1-B:
Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Stop   
test-1:1   (   
node1_a )   due to node availability
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart
test-2:0   (   
node1_b )   due to required test-1-clone running
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart
test-2:1   (   
node1_a )   due to required test-1-clone running
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart
test-2:2   (   
node2_b )   due to required test-1-clone running
Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]:  notice:  * Restart
test-2:3   (   
node2_a )   due to required test-1-clone running

Above is a representation of the observed behavior using dummy resources.
Is this the expected behavior of cloned resources?

My goal is to be able to restart test-2-clone and test-3-clone on just the node 
that experienced test-1 restart rather than all other nodes in the cluster.

Please let us know if any additional information will help for you to be able 
to provide feedback.

Thanks for your help!

- Raghav


Internal Use - Confidential
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-08 Thread Ulrich Windl
"maintenance-mode=true"? Why?


>>> Aj Revelino  schrieb am 08.04.2022 um 11:17 in 
>>> Nachricht
:
> Hello All,
> I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> monitors the data replication between the primary and the secondary node.
> The issue is that crm status shows that everything is okay but the system
> log shows the following error log.
> 
> 
> *pacemaker-controld[3582]:  notice:
> hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
> operation: No such device or address]*
> I am unable to identify the cause of the error message and resolve it
> 
> And due to the above, the data replication between the 2 nodes is recorded
> as failed (SFAIL) . Pls see the excerpt from the CIB below:
> 
>   crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
> expected="member">
>   
> 
>  *  name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
>value="2.00.056.00.1624618329"/>
>name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
>   * name="hana_hpn_sync_state" value="SFAIL"/>*
>value="4:S:master1:master:worker:master"/>
> 
>   
> 
> Pacemaker is able to failover the resources from the primary to the
> secondary but they all fail back to the primary, the moment I clean up the
> failure in the primary node.
> I deleted and recreated the entire configuration and reconfigured the hana
> data replication but it hasn't helped.
> 
> 
> *Cluster configuration:*
> hanapopdb1:~ # crm configure show
> node 1: hanapopdb1 \
> attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> node 2: hanapopdb2 \
> attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
> hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
> hana_hpn_site=SITE2PO hana_hpn_srmode=sync
> primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> operations $id=rsc_sap2_HPN_HDB00-operations \
> op monitor interval=10 timeout=600 \
> op start interval=0 timeout=600 \
> op stop interval=0 timeout=300 \
> params SID=HPN InstanceNumber=00
> primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> operations $id=rsc_sap_HPN_HDB00-operations \
> op start interval=0 timeout=3600 \
> op stop interval=0 timeout=3600 \
> op promote interval=0 timeout=3600 \
> op monitor interval=60 role=Master timeout=700 \
> op monitor interval=61 role=Slave timeout=700 \
> params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> primitive rsc_ip_HPN_HDB00 IPaddr2 \
> meta target-role=Started \
> operations $id=rsc_ip_HPN_HDB00-operations \
> op monitor interval=10s timeout=20s \
> params ip=10.10.1.60
> primitive rsc_nc_HPN_HDB00 azure-lb \
> params port=62506
> primitive stonith-sbd stonith:external/sbd \
> params pcmk_delay_max=30 \
> op monitor interval=30 timeout=30
> group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> meta is-managed=true notify=true clone-max=2 clone-node-max=1
> target-role=Started interleave=true
> clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> meta clone-node-max=1 target-role=Started interleave=true
> colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> msl_SAPHana_HPN_HDB00:Master
> order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> msl_SAPHana_HPN_HDB00
> property cib-bootstrap-options: \
> last-lrm-refresh=1649387935 \
> maintenance-mode=true
> 
> Regards,
> 
> Aj



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address

2022-04-08 Thread Aj Revelino
Hello All,
I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
monitors the data replication between the primary and the secondary node.
The issue is that crm status shows that everything is okay but the system
log shows the following error log.


*pacemaker-controld[3582]:  notice:
hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
operation: No such device or address]*
I am unable to identify the cause of the error message and resolve it

And due to the above, the data replication between the 2 nodes is recorded
as failed (SFAIL) . Pls see the excerpt from the CIB below:

 
  

 * *
  
  
  **
  

  

Pacemaker is able to failover the resources from the primary to the
secondary but they all fail back to the primary, the moment I clean up the
failure in the primary node.
I deleted and recreated the entire configuration and reconfigured the hana
data replication but it hasn't helped.


*Cluster configuration:*
hanapopdb1:~ # crm configure show
node 1: hanapopdb1 \
attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
node 2: hanapopdb2 \
attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
hana_hpn_site=SITE2PO hana_hpn_srmode=sync
primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
operations $id=rsc_sap2_HPN_HDB00-operations \
op monitor interval=10 timeout=600 \
op start interval=0 timeout=600 \
op stop interval=0 timeout=300 \
params SID=HPN InstanceNumber=00
primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
operations $id=rsc_sap_HPN_HDB00-operations \
op start interval=0 timeout=3600 \
op stop interval=0 timeout=3600 \
op promote interval=0 timeout=3600 \
op monitor interval=60 role=Master timeout=700 \
op monitor interval=61 role=Slave timeout=700 \
params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
primitive rsc_ip_HPN_HDB00 IPaddr2 \
meta target-role=Started \
operations $id=rsc_ip_HPN_HDB00-operations \
op monitor interval=10s timeout=20s \
params ip=10.10.1.60
primitive rsc_nc_HPN_HDB00 azure-lb \
params port=62506
primitive stonith-sbd stonith:external/sbd \
params pcmk_delay_max=30 \
op monitor interval=30 timeout=30
group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
meta is-managed=true notify=true clone-max=2 clone-node-max=1
target-role=Started interleave=true
clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
meta clone-node-max=1 target-role=Started interleave=true
colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
msl_SAPHana_HPN_HDB00:Master
order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
msl_SAPHana_HPN_HDB00
property cib-bootstrap-options: \
last-lrm-refresh=1649387935 \
maintenance-mode=true

Regards,

Aj
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/