Re: [ClusterLabs] Pacemaker not issuing start command intermittently

Strahil Nikolov Fri, 28 May 2021 19:06:52 -0700

Most RA scripts are writen in bash.Usually you can change the shebang to 
'!#/usr/bin/bash -x' or you can set trace_ra=1 via 'pcs resource update 
RESOURCE trace_ra=1 trace_file=/somepath'.
If you don't define trace_file, it should create them in 
/var/lib/heartbeat/trace_ra (based on memory -> so use find/locate).
Best Regards,Strahil Nikolov
 
  On Fri, May 28, 2021 at 22:10, Abithan Kumarasamy<abithan.kumaras...@ibm.com> 
wrote:   Hello Team, We have been recently running some tests on our Pacemaker 
clusters that involve two Pacemaker resources on two nodes respectively. The 
test case in which we are experiencing intermittent problems is one in which we 
bring down the Pacemaker resources on both nodes simultaneously. Now our 
expected behaviour is that our monitor function in our resource agent script 
detects the downtime, and then should issue a start command. This happens on 
most successful iterations of our test case. However, on some iterations 
(approximately 1 out of 30 simulations) we notice that Pacemaker is issuing the 
start command on only one of the hosts. On the troubled host the monitor 
function is logging that the resource is down as expected and is exiting with 
OCF_ERR_GENERIC return code (1) . According to the documentation, this should 
perform a soft disaster recovery, but when scanning the Pacemaker logs, there 
is no indication of the start command being issued or invoked. However, it 
works as expected on the other host.  To summarize the issue:   
   - The resource’s monitor is running and returning OCF_ERR_GENERIC
   - The constraints we have for the resources are satisfied.
   - There are no visible differences in the Pacemaker logs between the test 
iteration that failed, and the multiple successful iterations, other than the 
fact that Pacemaker does not start the resource after the monitor returns 
OCF_ERR_GENERIC   
  
Could you provide some more insight into why this may be happening and how we 
can further debug this issue? We are currently relying on Pacemaker logs, but 
are there additional diagnostics to further debug?
  Thanks,Abithan
_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users


ClusterLabs home: https://www.clusterlabs.org/

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Re: [ClusterLabs] Pacemaker not issuing start command intermittently

Reply via email to