Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-10 Thread Strahil Nikolov via Users
debug start is doing the described in 
https://wiki.clusterlabs.org/wiki/Debugging_Resource_Failures
Best Regards,Strahil Nikolov
 
 
  On Mon, Apr 11, 2022 at 7:21, Aj Revelino wrote:   Hi 
Strahil, Yes I went through the documentation from Azure. In fact, we have 6 
production clusters running on SLES 15 sp but none of them are using the hook. 
SAPHanaSR-showAttr shows SFAIL for the replication status. But this output is 
read from the CIB where the replication attribute is set as 'SFAIL'  by 
pacemaker. I think, like you mentioned, the hook should be able to resolve it. 
I'll try this weekend. 
What I fail to understand is this error msg 
'hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
operation: No such device or address]* Did you mean to trace the resource? If 
so, i'll be doing it this weekend
Regards,Aj
On Sat, Apr 9, 2022 at 6:49 PM Strahil Nikolov  wrote:

You can use pcs resource debug-start, but you have to shut it down before that.
Have you used some documentation for the setup ? Usually I reffer to the 
vendor's documentation. Go over it and check for a step that was not 
implemented.
RH's latest version is: 
https://access.redhat.com/sites/default/files/attachments/v10_ha_solution_for_sap_hana_scale_out_system_replication_0.pdf
https://access.redhat.com/articles/3004101

SLES:https://documentation.suse.com/sbp/all/html/SLES4SAP-hana-scaleOut-PerfOpt-15/index.html
https://documentation.suse.com/sbp/all/single-html/SLES4SAP-hana-sr-guide-PerfOpt-15/index.html
Based on my experience, the most critical component is the hook setup, so the 
cluster can properly identify replication status.
Also, unmanaged resources do not probe for replication status and thus the 
cluster never identifies if replication is restored until the resource is again 
'managed'.
When removing maintenance, it's always nice to 'crm_simulate' . One very good 
article is https://www.suse.com/support/kb/doc/?id=19158 .
What is the output of SAPHanaSR-showAttr ?
Best Regards,Strahil Nikolov
 
 
  On Sat, Apr 9, 2022 at 0:27, Aj Revelino wrote:   
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  

  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-10 Thread Aj Revelino
Hi Strahil, Yes I went through the documentation from Azure. In fact, we
have 6 production clusters running on SLES 15 sp but none of them are using
the hook.
SAPHanaSR-showAttr shows SFAIL for the replication status. But this output
is read from the CIB where the replication attribute is set as 'SFAIL'  by
pacemaker. I think, like you mentioned, the hook should be able to resolve
it. I'll try this weekend.

What I fail to understand is this error msg '
*hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing*
*operation: No such device or address]* *
Did you mean to trace the resource? If so, i'll be doing it this weekend

Regards,
Aj

On Sat, Apr 9, 2022 at 6:49 PM Strahil Nikolov 
wrote:

> You can use pcs resource debug-start, but you have to shut it down before
> that.
>
> Have you used some documentation for the setup ? Usually I reffer to the
> vendor's documentation. Go over it and check for a step that was not
> implemented.
>
> RH's latest version is:
> https://access.redhat.com/sites/default/files/attachments/v10_ha_solution_for_sap_hana_scale_out_system_replication_0.pdf
>
> https://access.redhat.com/articles/3004101
>
>
> SLES:
>
> https://documentation.suse.com/sbp/all/html/SLES4SAP-hana-scaleOut-PerfOpt-15/index.html
>
>
> https://documentation.suse.com/sbp/all/single-html/SLES4SAP-hana-sr-guide-PerfOpt-15/index.html
>
> Based on my experience, the most critical component is the hook setup, so
> the cluster can properly identify replication status.
>
> Also, unmanaged resources do not probe for replication status and thus the
> cluster never identifies if replication is restored until the resource is
> again 'managed'.
>
> When removing maintenance, it's always nice to 'crm_simulate' . One very
> good article is https://www.suse.com/support/kb/doc/?id=19158 .
>
> What is the output of SAPHanaSR-showAttr ?
>
> Best Regards,
> Strahil Nikolov
>
> On Sat, Apr 9, 2022 at 0:27, Aj Revelino
>  wrote:
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-09 Thread Strahil Nikolov via Users
You can use pcs resource debug-start, but you have to shut it down before that.
Have you used some documentation for the setup ? Usually I reffer to the 
vendor's documentation. Go over it and check for a step that was not 
implemented.
RH's latest version is: 
https://access.redhat.com/sites/default/files/attachments/v10_ha_solution_for_sap_hana_scale_out_system_replication_0.pdf
https://access.redhat.com/articles/3004101

SLES:https://documentation.suse.com/sbp/all/html/SLES4SAP-hana-scaleOut-PerfOpt-15/index.html
https://documentation.suse.com/sbp/all/single-html/SLES4SAP-hana-sr-guide-PerfOpt-15/index.html
Based on my experience, the most critical component is the hook setup, so the 
cluster can properly identify replication status.
Also, unmanaged resources do not probe for replication status and thus the 
cluster never identifies if replication is restored until the resource is again 
'managed'.
When removing maintenance, it's always nice to 'crm_simulate' . One very good 
article is https://www.suse.com/support/kb/doc/?id=19158 .
What is the output of SAPHanaSR-showAttr ?
Best Regards,Strahil Nikolov
 
 
  On Sat, Apr 9, 2022 at 0:27, Aj Revelino wrote:   
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/
  
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


Re: [ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-08 Thread Aj Revelino
Hi Ulrich,
I set the cluster in maintenance mode due to the consistent logging of the
error messages in the system log.

Pacemaker has attempted to execute the monitor operation of the resource
agent here. Is there a way to find out why pacemaker says 'No such device
or address'?
 hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
operation: No such device or address]*

Regards,
Aj

On Fri, Apr 8, 2022 at 8:23 PM Ulrich Windl <
ulrich.wi...@rz.uni-regensburg.de> wrote:

> "maintenance-mode=true"? Why?
>
>
> >>> Aj Revelino  schrieb am 08.04.2022 um 11:17 in
> Nachricht
> :
> > Hello All,
> > I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> > monitors the data replication between the primary and the secondary node.
> > The issue is that crm status shows that everything is okay but the system
> > log shows the following error log.
> >
> >
> > *pacemaker-controld[3582]:  notice:
> > hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
> > operation: No such device or address]*
> > I am unable to identify the cause of the error message and resolve it
> >
> > And due to the above, the data replication between the 2 nodes is
> recorded
> > as failed (SFAIL) . Pls see the excerpt from the CIB below:
> >
> >   > crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
> > expected="member">
> >   
> > 
> >  *  > name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
> >> value="2.00.056.00.1624618329"/>
> >> name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
> >   * > name="hana_hpn_sync_state" value="SFAIL"/>*
> >> value="4:S:master1:master:worker:master"/>
> > 
> >   
> >
> > Pacemaker is able to failover the resources from the primary to the
> > secondary but they all fail back to the primary, the moment I clean up
> the
> > failure in the primary node.
> > I deleted and recreated the entire configuration and reconfigured the
> hana
> > data replication but it hasn't helped.
> >
> >
> > *Cluster configuration:*
> > hanapopdb1:~ # crm configure show
> > node 1: hanapopdb1 \
> > attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> > hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> > lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> > node 2: hanapopdb2 \
> > attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
> > hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
> > hana_hpn_site=SITE2PO hana_hpn_srmode=sync
> > primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> > operations $id=rsc_sap2_HPN_HDB00-operations \
> > op monitor interval=10 timeout=600 \
> > op start interval=0 timeout=600 \
> > op stop interval=0 timeout=300 \
> > params SID=HPN InstanceNumber=00
> > primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> > operations $id=rsc_sap_HPN_HDB00-operations \
> > op start interval=0 timeout=3600 \
> > op stop interval=0 timeout=3600 \
> > op promote interval=0 timeout=3600 \
> > op monitor interval=60 role=Master timeout=700 \
> > op monitor interval=61 role=Slave timeout=700 \
> > params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> > DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> > primitive rsc_ip_HPN_HDB00 IPaddr2 \
> > meta target-role=Started \
> > operations $id=rsc_ip_HPN_HDB00-operations \
> > op monitor interval=10s timeout=20s \
> > params ip=10.10.1.60
> > primitive rsc_nc_HPN_HDB00 azure-lb \
> > params port=62506
> > primitive stonith-sbd stonith:external/sbd \
> > params pcmk_delay_max=30 \
> > op monitor interval=30 timeout=30
> > group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> > ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> > meta is-managed=true notify=true clone-max=2 clone-node-max=1
> > target-role=Started interleave=true
> > clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> > meta clone-node-max=1 target-role=Started interleave=true
> > colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> > msl_SAPHana_HPN_HDB00:Master
> > order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> > msl_SAPHana_HPN_HDB00
> > property cib-bootstrap-options: \
> > last-lrm-refresh=1649387935 \
> > maintenance-mode=true
> >
> > Regards,
> >
> > Aj
>
>
>
> ___
> Manage your subscription:
> https://lists.clusterlabs.org/mailman/listinfo/users
>
> ClusterLabs home: https://www.clusterlabs.org/
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/


[ClusterLabs] Antw: [EXT] SAP HANA monitor fails ‑ Error performing operation: No such device or address

2022-04-08 Thread Ulrich Windl
"maintenance-mode=true"? Why?


>>> Aj Revelino  schrieb am 08.04.2022 um 11:17 in 
>>> Nachricht
:
> Hello All,
> I've a 2 node SAP Hana cluster (hanapodb1 and hanapodb2). Pacemaker
> monitors the data replication between the primary and the secondary node.
> The issue is that crm status shows that everything is okay but the system
> log shows the following error log.
> 
> 
> *pacemaker-controld[3582]:  notice:
> hanapopdb1-rsc_SAPHana_HPN_HDB00_monitor_6:195 [ Error performing
> operation: No such device or address]*
> I am unable to identify the cause of the error message and resolve it
> 
> And due to the above, the data replication between the 2 nodes is recorded
> as failed (SFAIL) . Pls see the excerpt from the CIB below:
> 
>   crm-debug-origin="do_update_resource" uname="zhanapopdb2" join="member"
> expected="member">
>   
> 
>  *  name="hana_hpn_clone_state" value="WAITING4PRIM"/>*
>value="2.00.056.00.1624618329"/>
>name="master-rsc_SAPHana_HPN_HDB00" value="-INFINITY"/>
>   * name="hana_hpn_sync_state" value="SFAIL"/>*
>value="4:S:master1:master:worker:master"/>
> 
>   
> 
> Pacemaker is able to failover the resources from the primary to the
> secondary but they all fail back to the primary, the moment I clean up the
> failure in the primary node.
> I deleted and recreated the entire configuration and reconfigured the hana
> data replication but it hasn't helped.
> 
> 
> *Cluster configuration:*
> hanapopdb1:~ # crm configure show
> node 1: hanapopdb1 \
> attributes hana_hpn_vhost=hanapopdb1 hana_hpn_site=SITE1PO
> hana_hpn_op_mode=logreplay_readaccess hana_hpn_srmode=sync
> lpa_hpn_lpt=1649393239 hana_hpn_remoteHost=hanapopdb2
> node 2: hanapopdb2 \
> attributes lpa_hpn_lpt=10 hana_hpn_op_mode=logreplay_readaccess
> hana_hpn_vhost=hanapopdb2 hana_hpn_remoteHost=hanapopdb1
> hana_hpn_site=SITE2PO hana_hpn_srmode=sync
> primitive rsc_SAPHanaTopology_HPN_HDB00 ocf:suse:SAPHanaTopology \
> operations $id=rsc_sap2_HPN_HDB00-operations \
> op monitor interval=10 timeout=600 \
> op start interval=0 timeout=600 \
> op stop interval=0 timeout=300 \
> params SID=HPN InstanceNumber=00
> primitive rsc_SAPHana_HPN_HDB00 ocf:suse:SAPHana \
> operations $id=rsc_sap_HPN_HDB00-operations \
> op start interval=0 timeout=3600 \
> op stop interval=0 timeout=3600 \
> op promote interval=0 timeout=3600 \
> op monitor interval=60 role=Master timeout=700 \
> op monitor interval=61 role=Slave timeout=700 \
> params SID=HPN InstanceNumber=00 PREFER_SITE_TAKEOVER=true
> DUPLICATE_PRIMARY_TIMEOUT=7200 AUTOMATED_REGISTER=false
> primitive rsc_ip_HPN_HDB00 IPaddr2 \
> meta target-role=Started \
> operations $id=rsc_ip_HPN_HDB00-operations \
> op monitor interval=10s timeout=20s \
> params ip=10.10.1.60
> primitive rsc_nc_HPN_HDB00 azure-lb \
> params port=62506
> primitive stonith-sbd stonith:external/sbd \
> params pcmk_delay_max=30 \
> op monitor interval=30 timeout=30
> group g_ip_HPN_HDB00 rsc_ip_HPN_HDB00 rsc_nc_HPN_HDB00
> ms msl_SAPHana_HPN_HDB00 rsc_SAPHana_HPN_HDB00 \
> meta is-managed=true notify=true clone-max=2 clone-node-max=1
> target-role=Started interleave=true
> clone cln_SAPHanaTopology_HPN_HDB00 rsc_SAPHanaTopology_HPN_HDB00 \
> meta clone-node-max=1 target-role=Started interleave=true
> colocation col_saphana_ip_HPN_HDB00 4000: g_ip_HPN_HDB00:Started
> msl_SAPHana_HPN_HDB00:Master
> order ord_SAPHana_HPN_HDB00 Optional: cln_SAPHanaTopology_HPN_HDB00
> msl_SAPHana_HPN_HDB00
> property cib-bootstrap-options: \
> last-lrm-refresh=1649387935 \
> maintenance-mode=true
> 
> Regards,
> 
> Aj



___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/