Re: [ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected
On 08.04.2022 20:16, Ken Gaillot wrote: > Hi all, > > I'm hoping to have the first release candidate for Pacemaker 2.1.3 > available in a couple of weeks. > > One of the new features will be a new possible value for the "multiple- > active" resource meta-attribute, which specifies how the cluster should > react if multiple instances of a resource are detected to be active > when only one should be. > > The default behavior, "restart", stops all the instances and then > starts one instance where it should be. This is the safest approach > since some services become disrupted when multiple copies are started. > > However if the user is confident that only the extra copies need to be > stopped, they can now set multiple-active to "stop_unexpected". The > instance that is active where it is supposed to be will not be stopped, > but all other instances will be. > > If any resources are ordered after the multiply active resource, those > other resources will still need to be fully restarted. This is because > any ordering constraint "start A then start B" implies "stop B then > stop A", so we can't stop the wrongly active instances of A until B is > stopped. But in the case of multiple-active=stop_unexpected "the correct" A does remain active. If any dependent resource needs to be restarted anyway, I miss the intended use case. What is the difference with default option (except it may be faster)? ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address
Hi Ulrich, I set the cluster in maintenance mode due to the consistent logging of the error messages in the system log. Pacemaker has attempted to execute the monitor operation of the resource agent here. Is there a way to find out why pacemaker says 'No such device or address'? hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing operation: No such device or address]* Regards, Aj On Fri, Apr 8, 2022 at 8:23 PM Ulrich Windl < ulrich.wi...@rz.uni-regensburg.de> wrote: > "maintenance-mode=true"? Why? > > > >>> Aj Revelino schrieb am 08.04.2022 um 11:17 in > Nachricht > : > > Hello All, > > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker > > monitors the data replication between the primary and the secondary node. > > The issue is that crm status shows that everything is okay but the system > > log shows the following error log. > > > > > > *pacemaker-controld[3582]: notice: > > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing > > operation: No such device or address]* > > I am unable to identify the cause of the error message and resolve it > > > > And due to the above, the data replication between the 2 nodes is > recorded > > as failed (SFAIL) . Pls see the excerpt from the CIB below: > > > > > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member" > > expected="member"> > > > > > > * > name="hana_hpn_clone_state" value="WAITING4PRIM"/>* > >> value="2.00.056.00.1624618329"/> > >> name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/> > > * > name="hana_hpn_sync_state" value="SFAIL"/>* > >> value="4:S:master1:master:worker:master"/> > > > > > > > > Pacemaker is able to failover the resources from the primary to the > > secondary but they all fail back to the primary, the moment I clean up > the > > failure in the primary node. > > I deleted and recreated the entire configuration and reconfigured the > hana > > data replication but it hasn't helped. > > > > > > *Cluster configuration:* > > hanapopdb1:~ # crm configure show > > node 1: hanapopdb1 \ > > attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO > > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync > > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2 > > node 2: hanapopdb2 \ > > attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess > > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1 > > hana_hpn_site=SITE2PO hana_hpn_srmode=sync > > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \ > > operations $id=rsc_sap2_HPN_HDB00-operations \ > > op monitor interval=10 timeout=600 \ > > op start interval=0 timeout=600 \ > > op stop interval=0 timeout=300 \ > > params SID=HPN InstanceNumber=00 > > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \ > > operations $id=rsc_sap_HPN_HDB00-operations \ > > op start interval=0 timeout=3600 \ > > op stop interval=0 timeout=3600 \ > > op promote interval=0 timeout=3600 \ > > op monitor interval=60 role=Master timeout=700 \ > > op monitor interval=61 role=Slave timeout=700 \ > > params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true > > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false > > primitive rsc_ip_HPN_HDB00 IPaddr2 \ > > meta target-role=Started \ > > operations $id=rsc_ip_HPN_HDB00-operations \ > > op monitor interval=10s timeout=20s \ > > params ip=10.10.1.60 > > primitive rsc_nc_HPN_HDB00 azure-lb \ > > params port=62506 > > primitive stonith-sbd stonith:external/sbd \ > > params pcmk_delay_max=30 \ > > op monitor interval=30 timeout=30 > > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00 > > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \ > > meta is-managed=true notify=true clone-max=2 clone-node-max=1 > > target-role=Started interleave=true > > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \ > > meta clone-node-max=1 target-role=Started interleave=true > > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started > > msl_SAPHana_HPN_HDB00:Master > > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00 > > msl_SAPHana_HPN_HDB00 > > property cib-bootstrap-options: \ > > last-lrm-refresh=1649387935 \ > > maintenance-mode=true > > > > Regards, > > > > Aj > > > > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ > ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Coming in Pacemaker 2.1.3: multiple-active=stop_unexpected
Hi all, I'm hoping to have the first release candidate for Pacemaker 2.1.3 available in a couple of weeks. One of the new features will be a new possible value for the "multiple- active" resource meta-attribute, which specifies how the cluster should react if multiple instances of a resource are detected to be active when only one should be. The default behavior, "restart", stops all the instances and then starts one instance where it should be. This is the safest approach since some services become disrupted when multiple copies are started. However if the user is confident that only the extra copies need to be stopped, they can now set multiple-active to "stop_unexpected". The instance that is active where it is supposed to be will not be stopped, but all other instances will be. If any resources are ordered after the multiply active resource, those other resources will still need to be fully restarted. This is because any ordering constraint "start A then start B" implies "stop B then stop A", so we can't stop the wrongly active instances of A until B is stopped. -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster
You can use 'kind' and 'symmetrical' to control order constraints. The default value for symmetrical is 'true' which means that in order to stop dummy1 , the cluster has to stop dummy1 & dummy2. Best Regards,Strahil Nikolov On Fri, Apr 8, 2022 at 15:29, ChittaNagaraj, Raghav wrote: Hello Team, Hope you are doing well. I have a 4 node pacemaker cluster where I created clone dummy resources test-1, test-2 and test-3 below: $ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone $ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone $ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone starts after test-2-clone: $ sudo pcs constraint order test-1-clone then test-2-clone Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start then-action=start) $ sudo pcs constraint order test-2-clone then test-3-clone Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start then-action=start) Here are my clone sets(snippet of "pcs status" output pasted below): * Clone Set: test-1-clone [test-1]: * Started: [ node2_a node2_b node1_a node1_b ] * Clone Set: test-2-clone [test-2]: * Started: [ node2_a node2_b node1_a node1_b ] * Clone Set: test-3-clone [test-3]: * Started: [ node2_a node2_b node1_a node1_b ] Then I restart test-1 on just node1_a: $ sudo pcs resource restart test-1 node1_a Warning: using test-1-clone... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name) test-1-clone successfully restarted This causes test-2 and test-3 clones to restart on all pacemaker nodes when my intention is for them to restart on just node1_a. Below is the log tracing seen on the Designated Controller NODE1-B: Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]: notice: * Stop test-1:1 ( node1_a ) due to node availability Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:0 ( node1_b ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:1 ( node1_a ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:2 ( node2_b ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:3 ( node2_a ) due to required test-1-clone running Above is a representation of the observed behavior using dummy resources. Is this the expected behavior of cloned resources? My goal is to be able to restart test-2-clone and test-3-clone on just the node that experienced test-1 restart rather than all other nodes in the cluster. Please let us know if any additional information will help for you to be able to provide feedback. Thanks for your help! - Raghav Internal Use - Confidential ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/ ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
Re: [ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address
On Fri, 2022-04-08 at 17:17 +0800, Aj Revelino wrote: > Hello All, > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker > monitors the data replication between the primary and the secondary > node. The issue is that crm status shows that everything is okay but > the system log shows the following error log. > > pacemaker-controld[3582]: notice: hanapopdb1- > rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing operation: > No such device or address] > I am unable to identify the cause of the error message and resolve it > > And due to the above, the data replication between the 2 nodes is > recorded as failed (SFAIL) . Pls see the excerpt from the CIB below: > > origin="do_update_resource" uname="zhanapopdb2" join="member" > expected="member"> > > >name="hana_hpn_clone_state" value="WAITING4PRIM"/> >name="hana_hpn_version" value="2.00.056.00.1624618329"/> >name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/> >name="hana_hpn_sync_state" value="SFAIL"/> >value="4:S:master1:master:worker:master"/> > > > > Pacemaker is able to failover the resources from the primary to the > secondary but they all fail back to the primary, the moment I clean > up the failure in the primary node. I'm not familiar enough with SAP to speak to that side of things, but the behavior after clean-up is normal. If you don't want resources to go back to their preferred node after a failure is cleaned up, set the resource-stickiness meta-attribute to a positive number (either on the resource itself, or in resource defaults if you want it to apply to everything). > I deleted and recreated the entire configuration and reconfigured the > hana data replication but it hasn't helped. > > > Cluster configuration: > hanapopdb1:~ # crm configure show > node 1: hanapopdb1 \ > attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2 > node 2: hanapopdb2 \ > attributes lpa_hpn_lpt=10 > hana_hpn_op_mode=logreplay_readaccess hana_hpn_vhost=hanapopdb2 > hana_hpn_remoteHost=hanapopdb1 hana_hpn_site=SITE2PO > hana_hpn_srmode=sync > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \ > operations $id=rsc_sap2_HPN_HDB00-operations \ > op monitor interval=10 timeout=600 \ > op start interval=0 timeout=600 \ > op stop interval=0 timeout=300 \ > params SID=HPN InstanceNumber=00 > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \ > operations $id=rsc_sap_HPN_HDB00-operations \ > op start interval=0 timeout=3600 \ > op stop interval=0 timeout=3600 \ > op promote interval=0 timeout=3600 \ > op monitor interval=60 role=Master timeout=700 \ > op monitor interval=61 role=Slave timeout=700 \ > params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false > primitive rsc_ip_HPN_HDB00 IPaddr2 \ > meta target-role=Started \ > operations $id=rsc_ip_HPN_HDB00-operations \ > op monitor interval=10s timeout=20s \ > params ip=10.10.1.60 > primitive rsc_nc_HPN_HDB00 azure-lb \ > params port=62506 > primitive stonith-sbd stonith:external/sbd \ > params pcmk_delay_max=30 \ > op monitor interval=30 timeout=30 > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00 > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \ > meta is-managed=true notify=true clone-max=2 clone-node-max=1 > target-role=Started interleave=true > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \ > meta clone-node-max=1 target-role=Started interleave=true > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started > msl_SAPHana_HPN_HDB00:Master > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00 > msl_SAPHana_HPN_HDB00 > property cib-bootstrap-options: \ > last-lrm-refresh=1649387935 \ > maintenance-mode=true > > Regards, > > Aj > ___ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/ -- Ken Gaillot ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster
Hi! Just two short notes: 1: I think "timeout > interval" does not make much sense. 2: I guess you configured the wrong type of clone ("meta interleave=true"?) (3: I don't know the "pcs" command) Regards, Ulrich >>> "ChittaNagaraj, Raghav" schrieb am 07.04.2022 um 22:48 in Nachricht > Hello Team, > > Hope you are doing well. > > I have a 4 node pacemaker cluster where I created clone dummy resources > test‑1, test‑2 and test‑3 below: > > $ sudo pcs resource create test‑1 ocf:heartbeat:Dummy op monitor timeout="20" > interval="10" clone > $ sudo pcs resource create test‑2 ocf:heartbeat:Dummy op monitor timeout="20" > interval="10" clone > $ sudo pcs resource create test‑3 ocf:heartbeat:Dummy op monitor timeout="20" > interval="10" clone > > Then I ordered them so test‑2‑clone starts after test‑1‑clone and test‑3‑clone > starts after test‑2‑clone: > $ sudo pcs constraint order test‑1‑clone then test‑2‑clone > Adding test‑1‑clone test‑2‑clone (kind: Mandatory) (Options: first‑action=start > then‑action=start) > $ sudo pcs constraint order test‑2‑clone then test‑3‑clone > Adding test‑2‑clone test‑3‑clone (kind: Mandatory) (Options: first‑action=start > then‑action=start) > > Here are my clone sets(snippet of "pcs status" output pasted below): > * Clone Set: test‑1‑clone [test‑1]: > * Started: [ node2_a node2_b node1_a node1_b ] > * Clone Set: test‑2‑clone [test‑2]: > * Started: [ node2_a node2_b node1_a node1_b ] > * Clone Set: test‑3‑clone [test‑3]: > * Started: [ node2_a node2_b node1_a node1_b ] > > Then I restart test‑1 on just node1_a: > $ sudo pcs resource restart test‑1 node1_a > Warning: using test‑1‑clone... (if a resource is a clone, master/slave or > bundle you must use the clone, master/slave or bundle name) > test‑1‑clone successfully restarted > > > This causes test‑2 and test‑3 clones to restart on all pacemaker nodes when my > intention is for them to restart on just node1_a. > Below is the log tracing seen on the Designated Controller NODE1‑B: > Apr 07 20:25:01 NODE1‑B pacemaker‑schedulerd[95746]: notice: * Stop > test‑1:1 ( > node1_a ) due to node availability > Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]: notice: * Restart > test‑2:0 ( > node1_b ) due to required test‑1‑clone running > Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]: notice: * Restart > test‑2:1 ( > node1_a ) due to required test‑1‑clone running > Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]: notice: * Restart > test‑2:2 ( > node2_b ) due to required test‑1‑clone running > Apr 07 20:25:03 NODE1‑B pacemaker‑schedulerd[95746]: notice: * Restart > test‑2:3 ( > node2_a ) due to required test‑1‑clone running > > Above is a representation of the observed behavior using dummy resources. > Is this the expected behavior of cloned resources? > > My goal is to be able to restart test‑2‑clone and test‑3‑clone on just the node > that experienced test‑1 restart rather than all other nodes in the cluster. > > Please let us know if any additional information will help for you to be > able to provide feedback. > > Thanks for your help! > > ‑ Raghav > > > Internal Use ‑ Confidential ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Restarting parent of ordered clone resources on specific node causes restart of all resources in the ordering constraint on all nodes of the cluster
Hello Team, Hope you are doing well. I have a 4 node pacemaker cluster where I created clone dummy resources test-1, test-2 and test-3 below: $ sudo pcs resource create test-1 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone $ sudo pcs resource create test-2 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone $ sudo pcs resource create test-3 ocf:heartbeat:Dummy op monitor timeout="20" interval="10" clone Then I ordered them so test-2-clone starts after test-1-clone and test-3-clone starts after test-2-clone: $ sudo pcs constraint order test-1-clone then test-2-clone Adding test-1-clone test-2-clone (kind: Mandatory) (Options: first-action=start then-action=start) $ sudo pcs constraint order test-2-clone then test-3-clone Adding test-2-clone test-3-clone (kind: Mandatory) (Options: first-action=start then-action=start) Here are my clone sets(snippet of "pcs status" output pasted below): * Clone Set: test-1-clone [test-1]: * Started: [ node2_a node2_b node1_a node1_b ] * Clone Set: test-2-clone [test-2]: * Started: [ node2_a node2_b node1_a node1_b ] * Clone Set: test-3-clone [test-3]: * Started: [ node2_a node2_b node1_a node1_b ] Then I restart test-1 on just node1_a: $ sudo pcs resource restart test-1 node1_a Warning: using test-1-clone... (if a resource is a clone, master/slave or bundle you must use the clone, master/slave or bundle name) test-1-clone successfully restarted This causes test-2 and test-3 clones to restart on all pacemaker nodes when my intention is for them to restart on just node1_a. Below is the log tracing seen on the Designated Controller NODE1-B: Apr 07 20:25:01 NODE1-B pacemaker-schedulerd[95746]: notice: * Stop test-1:1 ( node1_a ) due to node availability Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:0 ( node1_b ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:1 ( node1_a ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:2 ( node2_b ) due to required test-1-clone running Apr 07 20:25:03 NODE1-B pacemaker-schedulerd[95746]: notice: * Restart test-2:3 ( node2_a ) due to required test-1-clone running Above is a representation of the observed behavior using dummy resources. Is this the expected behavior of cloned resources? My goal is to be able to restart test-2-clone and test-3-clone on just the node that experienced test-1 restart rather than all other nodes in the cluster. Please let us know if any additional information will help for you to be able to provide feedback. Thanks for your help! - Raghav Internal Use - Confidential ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address
"maintenance-mode=true"? Why? >>> Aj Revelino schrieb am 08.04.2022 um 11:17 in >>> Nachricht : > Hello All, > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker > monitors the data replication between the primary and the secondary node. > The issue is that crm status shows that everything is okay but the system > log shows the following error log. > > > *pacemaker-controld[3582]: notice: > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing > operation: No such device or address]* > I am unable to identify the cause of the error message and resolve it > > And due to the above, the data replication between the 2 nodes is recorded > as failed (SFAIL) . Pls see the excerpt from the CIB below: > > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member" > expected="member"> > > > * name="hana_hpn_clone_state" value="WAITING4PRIM"/>* >value="2.00.056.00.1624618329"/> >name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/> > * name="hana_hpn_sync_state" value="SFAIL"/>* >value="4:S:master1:master:worker:master"/> > > > > Pacemaker is able to failover the resources from the primary to the > secondary but they all fail back to the primary, the moment I clean up the > failure in the primary node. > I deleted and recreated the entire configuration and reconfigured the hana > data replication but it hasn't helped. > > > *Cluster configuration:* > hanapopdb1:~ # crm configure show > node 1: hanapopdb1 \ > attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2 > node 2: hanapopdb2 \ > attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1 > hana_hpn_site=SITE2PO hana_hpn_srmode=sync > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \ > operations $id=rsc_sap2_HPN_HDB00-operations \ > op monitor interval=10 timeout=600 \ > op start interval=0 timeout=600 \ > op stop interval=0 timeout=300 \ > params SID=HPN InstanceNumber=00 > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \ > operations $id=rsc_sap_HPN_HDB00-operations \ > op start interval=0 timeout=3600 \ > op stop interval=0 timeout=3600 \ > op promote interval=0 timeout=3600 \ > op monitor interval=60 role=Master timeout=700 \ > op monitor interval=61 role=Slave timeout=700 \ > params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false > primitive rsc_ip_HPN_HDB00 IPaddr2 \ > meta target-role=Started \ > operations $id=rsc_ip_HPN_HDB00-operations \ > op monitor interval=10s timeout=20s \ > params ip=10.10.1.60 > primitive rsc_nc_HPN_HDB00 azure-lb \ > params port=62506 > primitive stonith-sbd stonith:external/sbd \ > params pcmk_delay_max=30 \ > op monitor interval=30 timeout=30 > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00 > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \ > meta is-managed=true notify=true clone-max=2 clone-node-max=1 > target-role=Started interleave=true > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \ > meta clone-node-max=1 target-role=Started interleave=true > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started > msl_SAPHana_HPN_HDB00:Master > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00 > msl_SAPHana_HPN_HDB00 > property cib-bootstrap-options: \ > last-lrm-refresh=1649387935 \ > maintenance-mode=true > > Regards, > > Aj ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/
[ClusterLabs] SAP HANA monitor fails - Error performing operation: No such device or address
Hello All, I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker monitors the data replication between the primary and the secondary node. The issue is that crm status shows that everything is okay but the system log shows the following error log. *pacemaker-controld[3582]: notice: hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing operation: No such device or address]* I am unable to identify the cause of the error message and resolve it And due to the above, the data replication between the 2 nodes is recorded as failed (SFAIL) . Pls see the excerpt from the CIB below: * * ** Pacemaker is able to failover the resources from the primary to the secondary but they all fail back to the primary, the moment I clean up the failure in the primary node. I deleted and recreated the entire configuration and reconfigured the hana data replication but it hasn't helped. *Cluster configuration:* hanapopdb1:~ # crm configure show node 1: hanapopdb1 \ attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2 node 2: hanapopdb2 \ attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1 hana_hpn_site=SITE2PO hana_hpn_srmode=sync primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \ operations $id=rsc_sap2_HPN_HDB00-operations \ op monitor interval=10 timeout=600 \ op start interval=0 timeout=600 \ op stop interval=0 timeout=300 \ params SID=HPN InstanceNumber=00 primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \ operations $id=rsc_sap_HPN_HDB00-operations \ op start interval=0 timeout=3600 \ op stop interval=0 timeout=3600 \ op promote interval=0 timeout=3600 \ op monitor interval=60 role=Master timeout=700 \ op monitor interval=61 role=Slave timeout=700 \ params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false primitive rsc_ip_HPN_HDB00 IPaddr2 \ meta target-role=Started \ operations $id=rsc_ip_HPN_HDB00-operations \ op monitor interval=10s timeout=20s \ params ip=10.10.1.60 primitive rsc_nc_HPN_HDB00 azure-lb \ params port=62506 primitive stonith-sbd stonith:external/sbd \ params pcmk_delay_max=30 \ op monitor interval=30 timeout=30 group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00 ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \ meta is-managed=true notify=true clone-max=2 clone-node-max=1 target-role=Started interleave=true clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \ meta clone-node-max=1 target-role=Started interleave=true colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started msl_SAPHana_HPN_HDB00:Master order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00 msl_SAPHana_HPN_HDB00 property cib-bootstrap-options: \ last-lrm-refresh=1649387935 \ maintenance-mode=true Regards, Aj ___ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/